eFreeSplat

Abstract

We propose eFreeSplat, an efficient feed-forward 3DGS-based model for generalizable novel view synthesis that operates independently of epipolar line constraints.

To enhance multiview feature extraction with 3D perception, we employ a selfsupervised Vision Transformer (ViT) with cross-view completion pre-training on large-scale datasets. Additionally, we introduce an Iterative Cross-view Gaussians Alignment method to ensure consistent depth scales across different views. Our eFreeSplat represents an innovative approach for generalizable novel view synthesis. Different from the existing pure geometry-free methods, eFreeSplat focuses more on achieving epipolar-free feature matching and encoding by providing 3D priors through cross-view pretraining.

We evaluate eFreeSplat on wide-baseline novel view synthesis tasks using the RealEstate10K and ACID datasets. Extensive experiments demonstrate that eFreeSplat surpasses state-of-the-art baselines that rely on epipolar priors, achieving superior geometry reconstruction and novel view synthesis quality.

Overview

Overview of eFreeSplat. (a) Epipolar-free Cross-view Mutual Perception leverages selfsupervised cross-view completion pre-training to extract robust 3D priors. The ViT with shared weights processes the reference images, followed by a cross-attention decoder to generate multiview feature maps, forming 3D perception without epipolar priors. (b) Iterative Cross-view Gaussians Alignment module iteratively refines Gaussian attributes through a 2D U-Net. The process involves warped features to align corresponding features and depths, ensuring consistent depth scales across different views. (c) The final step involves employing rasterization-based volume rendering to generate high-quality geometry and realistic novel view images.

Comparisons with the State-of-the-art

Qualitative comparisons with the generalizable 3DGS-based state-of-the-art models: pixelSplat and MVSplat.

comparison on Real Estate 10k and ACID dataset

Comparisons of Geometry Reconstruction

Comparison results about 3D Gaussians (top) and predicted depth maps of the reference viewpoints (bottom). Compared to SOTA 3DGS-based methods, our method achieves higher quality in 3D Gaussian Splatting and produces smoother depth maps.

Performance with Low-overlapped Observations.

Our method reconstructs more reliable results than MVSplat when the reference views overlap is low. In the histogram, the blue bars represent the frequency at which our method exceeds MVSplat in rendering quality under the current overlap conditions, while the orange bars indicate the opposite.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (62293554, 62206249, U2336212), "Leading Goose" R&D Program of Zhejiang (No. 2024C01161), and Young Elite Scientists Sponsorship Program by CAST (2023QNRC001).

BibTeX

@article{min2024epipolar,
      title={Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View Synthesis},
      author={Min, Zhiyuan and Luo, Yawei and Sun, Jianwen and Yang, Yi},
      journal={arXiv preprint arXiv:2410.22817},
      year={2024}
    }

Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View Synthesis

NeurIPS 2024

Epipolar priors can be unreliable across extremely sparse views, especially in non-overlapping or occluded areas.

TL;DR: eFreeSplat leverages self-supervised cross-view completion pretraining to enhance generalizable novel view synthesis through an epipolar-free 3D Gaussian Splatting framework.