💠XScale-NVS: Cross-Scale Novel View Synthesis
with Hash Featurized Manifold
CVPR 2024

1Tsinghua University,   2Alibaba Group
arXiv Code

🎥 Neural Rendering the Tengwang Pavilion

It is recommonded to view the demo in 4K resolution.

🎥 Relit the Tengwang Pavilion


We propose XScale-NVS for high-fidelity cross-scale novel view synthesis of real-world large-scale scenes. Existing scene representations have limitations in capturing cross-scale details. Representations based on explicit surface suffer from discretization resolution or UV distortion, while implicit volumetric representations lack scalability for large scenes due to the dispersed weight distribution and surface ambiguity. In light of the above challenges, we introduce hash featurized manifold, a novel hash-based featurization coupled with a deferred neural rendering framework. This approach fully unlocks the expressivity of the representation by explicitly concentrating the hash entries on the 2D manifold, thus effectively representing highly detailed contents independent of the discretization resolution. We also introduce a novel dataset, namely GigaNVS, to benchmark cross-scale, high-resolution novel view synthesis of real-world large-scale scenes. Our method significantly outperforms competing baselines on various real-world scenes, yielding an average LPIPS that is 40% lower than prior state-of-the-art on the challenging GigaNVS benchmark.

Surface manifold unleashes the power of volumetric hash encoding

(a) UV-based featurizations tend to disorganize the feature distribution due to distortions in surface parametrization. (b) Existing 3D-surface-based featurizations fail to express the sub-primitive-scale intricate details given the limited discretization resolution. (c) Volumetric featurizations inevitably yield a dispersed weight distribution during volume rendering, where many multi-view inconsistent yet highly weighted samples ambiguate surface colour and deteriorate surface features with inconsistent colour gradient. (d) Our method leverages hash encoding to unlock the dependence of featuremetric resolution on discretization resolution, and utilizes rasterization to fully unleash the expressivity of volumetric hash encoding by propagating clean and multi-view consistent signals to surface features.

Cross-scale neural rendering empowered by hash featurized manifold

We address the challenging task of NVS for large-scale in-the-wild scenes. Hash featurized manifold better exploits the imagery with unstructured capture viewpoints and scale variations, representing structural and textural details at any scale, which may have the potential to replace traditional UV textures and benefit various applications. Please zoom-in to see our high-quality details.

The GigaNVS dataset (Coming soon)

Characterized by the real-shot, cross-scale, high-resolution imagery, GigaNVS captures millimeter-level details from scenes with square-kilometer-level areas.


author = {Wang, Guangyu and Zhang, Jinzhi and Wang, Fan and Huang, Ruqi and Fang, Lu},
title = {XScale-NVS: Cross-Scale Novel View Synthesis with Hash Featurized Manifold},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {21029-21039} }