skip to main content
research-article

Tanks and temples: benchmarking large-scale scene reconstruction

Published: 20 July 2017 Publication History

Abstract

We present a benchmark for image-based 3D reconstruction. The benchmark sequences were acquired outside the lab, in realistic conditions. Ground-truth data was captured using an industrial laser scanner. The benchmark includes both outdoor scenes and indoor environments. High-resolution video sequences are provided as input, supporting the development of novel pipelines that take advantage of video input to increase reconstruction fidelity. We report the performance of many image-based 3D reconstruction pipelines on the new benchmark. The results point to exciting challenges and opportunities for future work.

Supplementary Material

MP4 File (papers-0083.mp4)

References

[1]
Henrik Aanæs, Rasmus Ramsbøl Jensen, George Vogiatzis, Engin Tola, and Anders Bjorholm Dahl. 2016. Large-scale data for multiple-view stereopsis. International Journal of Computer Vision 120, 2 (2016).
[2]
Sameer Agarwal, Yasutaka Furukawa, Noah Snavely, Ian Simon, Brian Curless, Steven M. Seitz, and Richard Szeliski. 2011. Building Rome in a day. Communications of the ACM 54, 10 (2011).
[3]
Sameer Agarwal, Noah Snavely, Steven M. Seitz, and Richard Szeliski. 2010. Bundle adjustment in the large. In ECCV.
[4]
Matthew Berger, Joshua A. Levine, Luis Gustavo Nonato, Gabriel Taubin, and Cláudio T. Silva. 2013. A benchmark for surface reconstruction. ACM Transactions on Graphics 32, 2 (2013).
[5]
Michael Burri, Janosch Nikolic, Pascal Gohl, Thomas Schneider, Joern Rehder, Sammy Omari, Markus W. Achtelik, and Roland Siegwart. 2016. The EuRoC micro aerial vehicle datasets. International Journal of Robotics Research 35, 10 (2016).
[6]
Sungjoon Choi, Qian-Yi Zhou, and Vladlen Koltun. 2015. Robust reconstruction of indoor scenes. In CVPR.
[7]
Jakob Engel, Vladlen Koltun, and Daniel Cremers. 2017. Direct sparse odometry. Pattern Analysis and Machine Intelligence 39 (2017).
[8]
Jakob Engel, Thomas Schöps, and Daniel Cremers. 2014. LSD-SLAM: Large-scale direct monocular SLAM. In ECCV.
[9]
Jan-Michael Frahm, Marc Pollefeys, Svetlana Lazebnik, David Gallup, Brian Clipp, Rahul Raguram, Changchang Wu, Christopher Zach, and Tim Johnson. 2010. Fast robust large-scale mapping from video and Internet photo collections. 65, 6 (2010).
[10]
Simon Fuhrmann, Fabian Langguth, Nils Moehrle, Michael Waechter, and Michael Goesele. 2015. MVE - An image-based reconstruction environment. Computers & Graphics 53 (2015).
[11]
Yasutaka Furukawa. 2011. CMVS and PMVS2. http://www.di.ens.fr/cmvs. (2011).
[12]
Yasutaka Furukawa, Brian Curless, Steven M. Seitz, and Richard Szeliski. 2009. Reconstructing building interiors from images. In ICCV.
[13]
Yasutaka Furukawa, Brian Curless, Steven M. Seitz, and Richard Szeliski. 2010. Towards Internet-scale multi-view stereo. In CVPR.
[14]
Yasutaka Furukawa and Carlos Hernández. 2015. Multi-view stereo: A tutorial. Foundations and Trends in Computer Graphics and Vision 9, 1--2 (2015).
[15]
Yasutaka Furukawa and Jean Ponce. 2010. Accurate, dense, and robust multiview stereopsis. Pattern Analysis and Machine Intelligence 32, 8 (2010).
[16]
Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets robotics: The KITTI dataset. International Journal of Robotics Research 32, 11 (2013).
[17]
Michael Goesele, Noah Snavely, Brian Curless, Hugues Hoppe, and Steven M. Seitz. 2007. Multi-view stereo for community photo collections. In ICCV.
[18]
Ankur Handa, Thomas Whelan, John McDonald, and Andrew J. Davison. 2014. A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In ICRA.
[19]
Richard Hartley and Andrew Zisserman. 2000. Multiple view geometry in computer vision. Cambridge University Press.
[20]
Jared Heinly, Johannes L. Schönberger, Enrique Dunn, and Jan-Michael Frahm. 2015. Reconstructing the world* in six days. In CVPR.
[21]
Satoshi Ikehata, Hang Yang, and Yasutaka Furukawa. 2015. Structured indoor modeling. In ICCV.
[22]
Wenzel Jakob. 2010. Mitsuba renderer. http://www.mitsuba-renderer.org. (2010).
[23]
Michal Jancosek and Tomas Pajdla. 2011. Multi-view reconstruction preserving weakly-supported surfaces. In CVPR.
[24]
Kalin Kolev, Petri Tanskanen, Pablo Speciale, and Marc Pollefeys. 2014. Turning mobile phones into 3D scanners. In CVPR.
[25]
Fabian Langguth, Kalyan Sunkavalli, Sunil Hadap, and Michael Goesele. 2016. Shading-aware multi-view stereo. In ECCV.
[26]
Xiaowei Li, Changchang Wu, Christopher Zach, Svetlana Lazebnik, and Jan-Michael Frahm. 2008. Modeling and recognition of landmark image collections using iconic scene graphs. In ECCV.
[27]
Andrew Mastin, Jeremy Kepner, and John Fisher. 2009. Automatic registration of LIDAR and optical images of urban scenes. In CVPR.
[28]
Paul Merrell, Philippos Mordohai, Jan-Michael Frahm, and Marc Pollefeys. 2007. Evaluation of large scale scene reconstruction. In ICCV Workshops.
[29]
Pierre Moulon, Pascal Monasse, Renaud Marlet, and others. 2016. OpenMVG: An open multiple view geometry library. https://github.com/openMVG/openMVG. (2016).
[30]
Richard A. Newcombe, Steven Lovegrove, and Andrew J. Davison. 2011. DTAM: Dense tracking and mapping in real-time. In ICCV.
[31]
Marc Pollefeys, David Nistér, Jan-Michael Frahm, Amir Akbarzadeh, Philippos Mordohai, Brian Clipp, Chris Engels, David Gallup, Seon Joo Kim, Paul Merrell, C. Salmi, Sudipta N. Sinha, B. Talton, Liang Wang, Qingxiong Yang, Henrik Stewénius, Ruigang Yang, Greg Welch, and Herman Towles. 2008. Detailed real-time urban 3D reconstruction from video. International Journal of Computer Vision 78, 2--3 (2008).
[32]
Johannes L. Schönberger. 2016. COLMAP. https://colmap.github.io. (2016).
[33]
Johannes L. Schönberger and Jan-Michael Frahm. 2016. Structure-from-motion revisited. In CVPR.
[34]
Johannes L. Schönberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. 2016. Pixelwise view selection for unstructured multi-view stereo. In ECCV.
[35]
Thomas Schöps, Torsten Sattler, Christian Häne, and Marc Pollefeys. 2015. 3D modeling on the go: Interactive 3D reconstruction of large-scale scenes on mobile devices. In 3DV.
[36]
Thomas Schöps, Johannes L. Schönberger, Silvano Galliani, Torsten Sattler, Konrad Schindler, Marc Pollefeys, and Andreas Geiger. 2017. A multi-view stereo benchmark with high-resolution images and multi-camera videos. In CVPR.
[37]
Steven M. Seitz, Brian Curless, James Diebel, Daniel Scharstein, and Richard Szeliski. 2006. A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR.
[38]
Qi Shan, Riley Adams, Brian Curless, Yasutaka Furukawa, and Steven M. Seitz. 2013. The visual Turing test for scene reconstruction. In 3DV.
[39]
Noah Snavely. 2010. Bundler: Structure from motion (SfM) for unordered image collections. https://github.com/snavely/bundler_sfm. (2010).
[40]
Noah Snavely, Steven M. Seitz, and Richard Szeliski. 2008. Modeling the world from Internet photo collections. International Journal of Computer Vision 80, 2 (2008).
[41]
Christoph Strecha, Wolfgang von Hansen, Luc J. Van Gool, Pascal Fua, and Ulrich Thoennessen. 2008. On benchmarking camera calibration and multi-view stereo for high resolution imagery. In CVPR.
[42]
Jürgen Sturm, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. 2012. A benchmark for the evaluation of RGB-D SLAM systems. In IROS.
[43]
Chris Sweeney. 2016. Theia multiview geometry library. http://theia-sfm.org. (2016).
[44]
Petri Tanskanen, Kalin Kolev, Lorenz Meier, Federico Camposeco, Olivier Saurer, and Marc Pollefeys. 2013. Live metric 3D reconstruction on mobile phones. In ICCV.
[45]
Engin Tola, Christoph Strecha, and Pascal Fua. 2012. Efficient large-scale multi-view stereo for ultra high-resolution image sets. Machine Vision and Applications 23, 5 (2012).
[46]
Bill Triggs, Philip Mclauchlan, Richard Hartley, and Andrew Fitzgibbon. 2000. Bundle adjustment - a modern synthesis. In Vision Algorithms: Theory and Practice.
[47]
Shinji Umeyama. 1991. Least-squares estimation of transformation parameters between two point patterns. Pattern Analysis and Machine Intelligence 13, 4 (1991).
[48]
George Vogiatzis and Carlos Hernández. 2011. Video-based, real-time multi-view stereo. Image and Vision Computing 29, 7 (2011).
[49]
Hoang-Hiep Vu, Patrick Labatut, Jean-Philippe Pons, and Renaud Keriven. 2012. High accuracy and visibility-consistent dense multiview stereo. Pattern Analysis and Machine Intelligence 34, 5 (2012).
[50]
Michael Waechter, Mate Beljan, Simon Fuhrmann, Nils Moehrle, Johannes Kopf, and Michael Goesele. 2017. Virtual rephotography: Novel view prediction error for 3D reconstruction. ACM Transactions on Graphics 36, 1 (2017).
[51]
Andreas Wendel, Michael Maurer, Gottfried Graber, Thomas Pock, and Horst Bischof. 2012. Dense reconstruction on-the-fly. In CVPR.
[52]
Changchang Wu. 2011. VisualSFM: A visual structure from motion system. http://ccwu.me/vsfm. (2011).
[53]
Changchang Wu. 2013. Towards linear-time incremental structure from motion. In 3DV.
[54]
Changchang Wu, Sameer Agarwal, Brian Curless, and Steven M. Seitz. 2011. Multicore bundle adjustment. In CVPR.
[55]
Jianxiong Xiao and Yasutaka Furukawa. 2014. Reconstructing the world's museums. International Journal of Computer Vision 110, 3 (2014).
[56]
Qian-Yi Zhou and Vladlen Koltun. 2013. Dense scene reconstruction with points of interest. ACM Transactions on Graphics 32, 4 (2013).

Cited By

View all
  • (2025)ICDDPM: Image-conditioned denoising diffusion probabilistic model for real-world complex point cloud single view reconstructionExpert Systems with Applications10.1016/j.eswa.2024.125370259(125370)Online publication date: Jan-2025
  • (2024)Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field ConditionsPlant Phenomics10.34133/plantphenomics.0235Online publication date: 24-Jul-2024
  • (2024)Three-Dimensional Dense Reconstruction: A Review of Algorithms and DatasetsSensors10.3390/s2418586124:18(5861)Online publication date: 10-Sep-2024
  • Show More Cited By

Index Terms

  1. Tanks and temples: benchmarking large-scale scene reconstruction

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Graphics
    ACM Transactions on Graphics  Volume 36, Issue 4
    August 2017
    2155 pages
    ISSN:0730-0301
    EISSN:1557-7368
    DOI:10.1145/3072959
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 July 2017
    Published in TOG Volume 36, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. image-based reconstruction
    2. large-scale scene reconstruction
    3. multi-view stereo
    4. structure from motion

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)540
    • Downloads (Last 6 weeks)43
    Reflects downloads up to 12 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)ICDDPM: Image-conditioned denoising diffusion probabilistic model for real-world complex point cloud single view reconstructionExpert Systems with Applications10.1016/j.eswa.2024.125370259(125370)Online publication date: Jan-2025
    • (2024)Evaluating Neural Radiance Fields (NeRFs) for 3D Plant Geometry Reconstruction in Field ConditionsPlant Phenomics10.34133/plantphenomics.0235Online publication date: 24-Jul-2024
    • (2024)Three-Dimensional Dense Reconstruction: A Review of Algorithms and DatasetsSensors10.3390/s2418586124:18(5861)Online publication date: 10-Sep-2024
    • (2024)Large-Scale Indoor Visual–Geometric Multimodal Dataset and Benchmark for Novel View SynthesisSensors10.3390/s2417579824:17(5798)Online publication date: 6-Sep-2024
    • (2024)Efficient Structure from Motion for Large-Size Videos from an Open Outdoor UAV DatasetSensors10.3390/s2410303924:10(3039)Online publication date: 10-May-2024
    • (2024)LNMVSNet: A Low-Noise Multi-View Stereo Depth Inference Method for 3D ReconstructionSensors10.3390/s2408240024:8(2400)Online publication date: 9-Apr-2024
    • (2024)Building Better Models: Benchmarking Feature Extraction and Matching for Structure from Motion at Construction SitesRemote Sensing10.3390/rs1616297416:16(2974)Online publication date: 14-Aug-2024
    • (2024)A Brief Review on Differentiable Rendering: Recent Advances and ChallengesElectronics10.3390/electronics1317354613:17(3546)Online publication date: 6-Sep-2024
    • (2024)Learning-based light field imaging: an overviewJournal on Image and Video Processing10.1186/s13640-024-00628-12024:1Online publication date: 30-May-2024
    • (2024)StopThePop: Sorted Gaussian Splatting for View-Consistent Real-time RenderingACM Transactions on Graphics10.1145/365818743:4(1-17)Online publication date: 19-Jul-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media