We propose eFreeSplat, an efficient feed-forward 3DGS-based model for generalizable novel view synthesis that operates independently of epipolar line constraints.
To enhance multiview feature extraction with 3D perception, we employ a selfsupervised Vision Transformer (ViT) with cross-view completion pre-training on large-scale datasets. Additionally, we introduce an Iterative Cross-view Gaussians Alignment method to ensure consistent depth scales across different views. Our eFreeSplat represents an innovative approach for generalizable novel view synthesis. Different from the existing pure geometry-free methods, eFreeSplat focuses more on achieving epipolar-free feature matching and encoding by providing 3D priors through cross-view pretraining.
We evaluate eFreeSplat on wide-baseline novel view synthesis tasks using the RealEstate10K and ACID datasets. Extensive experiments demonstrate that eFreeSplat surpasses state-of-the-art baselines that rely on epipolar priors, achieving superior geometry reconstruction and novel view synthesis quality.
Qualitative comparisons with the generalizable 3DGS-based state-of-the-art models: pixelSplat and MVSplat.
Comparison results about 3D Gaussians (top) and predicted depth maps of the reference viewpoints (bottom). Compared to SOTA 3DGS-based methods, our method achieves higher quality in 3D Gaussian Splatting and produces smoother depth maps.
Our method reconstructs more reliable results than MVSplat when the reference views overlap is low. In the histogram, the blue bars represent the frequency at which our method exceeds MVSplat in rendering quality under the current overlap conditions, while the orange bars indicate the opposite.
@article{min2024epipolar,
title={Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View Synthesis},
author={Min, Zhiyuan and Luo, Yawei and Sun, Jianwen and Yang, Yi},
journal={arXiv preprint arXiv:2410.22817},
year={2024}
}