Compact3D: Compressing Gaussian Splat Radiance Field Models with Vector Quantization

University of California, Davis

(*) Denotes equal contribution

Abstract

3D Gaussian Splatting is a new method for modeling and rendering 3D radiance fields that achieves much faster learning and rendering time compared to SOTA NeRF methods. However, it comes with a drawback in the much larger storage demand compared to NeRF methods since it needs to store the parameters for several 3D Gaussians.

We notice that many Gaussians may share similar parameters, so we introduce a simple vector quantization method based on kmeans algorithm to quantize the Gaussian parameters. Then, we store the small codebook along with the index of the code for each Gaussian. Moreover, we compress the indices further by sorting them and using a method similar to run-length encoding.

We do extensive experiments on standard benchmarks as well as a new benchmark which is an order of magnitude larger than the standard benchmarks. We show that our simple yet effective method can reduce the storage cost for the original 3D Gaussian Splatting method by a factor of almost 20× with a very small drop in the quality of rendered images.

How It Works

We compress the 3D Gaussian Splatting (3DGS) model using vector quantization of the parameters of the Gaussians. The quantization is performed along with the training of the Gaussian parameters. Considering each Gaussian as a vector, we perform K-means clustering on the covariance and color parameters of all Gaussians to represent the N Gaussians in the model with k cluster centers (codes). Each Gaussian is then replaced by its corresponding code for rendering and loss calculation. The gradients with respect to centers are copied to all the elements in the corresponding cluster and the non-quantized versions of the parameters are updated. Only the codebook and code assignments for each Gaussian are stored and used for inference. Our method, CompGS, maintains the real-time rendering property of 3DGS while compressing it by an order of magnitude.


Comparison with SOTA Methods

3DGS performs comparably or outperforms the best of the NeRF based approaches while maintaining a high rendering speed during inference. Trained NeRF models are significantly smaller than 3DGS since NeRFs are parameterized using neural networks while 3DGS requires storage of parameters of millions of 3D Gaussians. Our method, CompGS, is a vector quantized version of 3DGS that maintains the speed and performance advantages of 3DGS while being an order of magnitude smaller. We report the averaged FPS and memory over all datasets. CompGS is identical to 3DGS during inference and thus has the same FPS. ∗ Reproduced using official code. † Reported from 3DGS.


Comparison of Compression Methods

We evaluate different baseline approaches for compressing 3DGS. All memory compresssion values are normalized by our smallest model (CompGS 4k, Int16), so that its compression value is 1. CompGS performs favorably compared to all methods both in terms of novel view synthesis performance and compression. We find that K-means based quantization of a pretrained model is not effective and that is crucial to perform our quantization during the training of the Gaussian parameters. Bit-quantization approaches closely match the original method when the number of bits is high but the performance greatly degrades when it is reduced to just 4-bits per value. Not quantizing the position (Int-x no-pos) is crucial, especially with higher degrees of quantization. Since harmonics constitute 76% of each Gaussian, 3DGS-no-SH achieves a high level of compression. But CompGS with only quantized harmonics achieves similar compression with nearly no loss in performance compared to 3DGS.


Results on the Large-scale ARKit Dataset

We introduce ARKit with 200 scenes as a large scale benchmark for novel view synthesis. The benchmark is created using a subset of multi-view images from the ARKit indoor scene understanding dataset. All memory compresssion values are normalized by our smallest model (CompGS 4k, Int16). CompGS achieves a high level of compression with nearly identical metrics for view synthesis. We additionally report PSNR-AM as the PSNR calculated using arithmetic mean of MSE over all the scenes in the dataset to prevent the domination of high-PSNR scenes. Compressing such large scale indoor scenes can be particularly helpful for VR applications.


We visualize the results of CompGS along with the uncomressed 3DGS and its variant 3DGS-No-SH on ARKit dataset. Presence of large noisy blobs is a common error mode for 3DGS-No-SH on this dataset. It also fails to faithfully reproduce the colors and lighting in several scenes. The visual quality of the synthesized images for all methods is lower on this dataset compared to the scenes present in standard benchmarks like Mip-NeRF360, indicating its utility as a novel benchmark.


Conclusion

3D Gaussian Splatting efficiently models 3D radiance fields, outperforming NeRF in learning and rendering efficiency at the cost of increased storage. To reduce storage demands, we apply k-means-based vector quantization, compressing indices and employing a compact codebook. Our method cuts the storage cost of 3D Gaussian Splatting by almost 20×, maintaining image quality across benchmarks.


BibTeX

@article{navaneet2023compact3d,
          title={Compact3D: Compressing Gaussian Splat Radiance Field Models with Vector Quantization},
          author={Navaneet, KL and Meibodi, Kossar Pourahmadi and Koohpayegani, Soroush Abbasi and Pirsiavash, Hamed},
          journal={arXiv preprint arXiv:2311.18159},
          year={2023}
        }