skip to main content
research-article

Make Full Use of Priors: Cross-View Optimized Filter for Multi-View Depth Enhancement

Published: 17 December 2020 Publication History

Abstract

Multi-view video plus depth (MVD) is the promising and widely adopted data representation for future 3D visual applications and interactive media. However, compression distortions on depth videos impede the development of such applications, and filters are crucially needed for the quality enhancement at the terminal side. Cross-view priors can intuitively be involved in filter design, but these priors are also distorted in compression and thus the contribution of them can hardly be considered in previous research. In this article, we propose a cross-view optimized filter for depth map quality enhancement by making full use of inner- and cross-view priors. We dedicate to evaluate the contributions of distorted cross-view priors in filtering the current view of depth, and then both inner- and cross-view priors can be involved in the filter design. Thus, distortions of cross-view priors are not barriers again as before. For the purpose of that, mutual information guided cross-view consistency is designed to evaluate the contributions of cross-view priors from compression distortions of MVD. After that, under the framework of global optimization, both inner- and cross-view priors are modeled and taken to minimize the designed energy function where both data accuracy and spatial smoothness are modeled. The experimental results show that the proposed model outperforms state-of-the-art methods, where 3.289 dB and 0.0407 average gains on peak signal-to-noise ratio and structural similarity metrics can be obtained, respectively. For the subjective evaluations, object details and structure information are recovered in the compressed depth video. We also verify our method via several practical applications, including virtual view synthesis for smooth interaction and point cloud for 3D modeling for accuracy evaluation. In these verifications, the ringing and malposition artifacts on object contours are properly handled for interactive video, and discontinuous object surfaces are restored for 3D modeling. All of these results suggest that compression distortions in MVD can be properly filtered by the proposed model, which provides a promising solution for future bandwidth constrained 3D and interactive visual applications.

References

[1]
Clément Godard, Oisin Mac Aodha, Michael Firman, and Gabriel J. Brostow. 2019. Digging into self-supervised monocular depth estimation. In Proceedings of the IEEE International Conference on Computer Vision. 3828--3838.
[2]
Jun Liu, Henghui Ding, Amir Shahroudy, Ling-Yu Duan, Xudong Jiang, Gang Wang, and Alex Kot Chichung. 2020. Feature boosting network for 3D pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 2 (2020), 494--501.
[3]
Karsten Müller and Anthony Vetro. 2014. Common test conditions of 3DV core experiments, joint collaborative team on 3D video coding extensions (JCT-3V) document jct3v-g1100. In Proceedings of the 7th Meeting of the JCT.
[4]
Guillaume Rochette, Chris Russell, and Richard Bowden. 2019. Weakly-supervised 3D pose estimation from a single image using multi-view consistency. arXiv:1909.06119
[5]
Feng Shao, Gangyi Jiang, Mei Yu, Ken Chen, and Yo-Sung Ho. 2011. Asymmetric coding of multi-view video plus depth based 3-D video for view rendering. IEEE Transactions on Multimedia 14, 1 (2011), 157--167.
[6]
Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Süsstrunk2012. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 11 (2012), 2274--2282.
[7]
Payman Aflaki, Miska M. Hannuksela, Jukka Häkkinen, Paul Lindroos, and Moncef Gabbouj. 2010. Subjective study on compressed asymmetric stereoscopic video. In Proceedings of the IEEE International Conference on Image Processing.4021--4024.
[8]
Dimitrios S. Alexiadis, Dimitrios Zarpalas, and Petros Daras. 2012. Real-time, full 3-D reconstruction of moving foreground objects from multiple consumer depth cameras. IEEE Transactions on Multimedia 15, 2 (2012), 339--358.
[9]
Yuri Boykov, Olga Veksler, and Ramin Zabih. 2001. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 11 (2001), 1222--1239.
[10]
Derek Chan, Hylke Buisman, Christian Theobalt, and Sebastian Thrun. 2008. A noise-aware filter for real-time depth upsampling. In Proceedings of the Workshop on Multi-Camera and Multi-Modal Sensor Fusion Algorithms and Applications.
[11]
Siqi Chen, Qiong Liu, and You Yang. 2019. Multi-view multi-modality priors residual network of depth video enhancement for bandwidth limited asymmetric coding framework. In Proceedings of the 2019 Data Compression Conference (DCC’19). IEEE, Los Alamitos, CA, 560.
[12]
J. Choi, D. Min, and K. Sohn. 2014. Reliability-based multiview depth enhancement considering interview coherence. IEEE Transactions on Circuits and Systems for Video Technology 24, 4 (2014), 603--616.
[13]
Rui Dai and Ian F. Akyildiz. 2009. Joint effect of multiple correlated cameras in wireless multimedia sensor networks. In Proceedings of the IEEE International Conference on Communications. 143--147.
[14]
James Diebel and Sebastian Thrun. 2006. An application of Markov random fields to range sensing. In Advances in Neural Information Processing Systems. 291--298.
[15]
Weisheng Dong, Guangming Shi, Xin Li, Kefan Peng, Jinjian Wu, and Zhenhua Guo. 2016. Color-guided depth recovery via joint local structural and nonlocal low-rank regularization. IEEE Transactions on Multimedia 19, 2 (2016), 293--301.
[16]
David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth map prediction from a single image using a multi-scale deep network. In Advances in Neural Information Processing Systems. 2366--2374.
[17]
Erhan Ekmekcioglu, Vladan Velisavljevic, and Stewart T. Worrall. 2011. Content adaptive enhancement of multi-view depth maps for free viewpoint video. IEEE Journal of Selected Topics in Signal Processing 5, 2 (2011), 352--361.
[18]
Christoph Fehn, Peter Kauff, Sukhee Cho, Hyoungjin Kwon, Namho Hur, and Jinwoong Kim. 2007. Asymmetric coding of stereoscopic video for transmission over T-DMB. In Proceedings of the 3DTV Conference. 1--4.
[19]
Bumsub Ham, Minsu Cho, and Jean Ponce. 2015. Robust image filtering using joint static and dynamic guidance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4823--4831.
[20]
Kaiming He, Jian Sun, and Xiaoou Tang. 2013. Guided image filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 6 (2013), 1397--1409.
[21]
Alain Hore and Djemel Ziou. 2010. Image quality metrics: PSNR vs. SSIM. In Proceedings of the International Conference on Pattern Recognition (ICPR’10). IEEE, Los Alamitos, CA, 2366--2369.
[22]
Zhi Jin, Tammam Tillo, and Lei Luo. 2015. Quality enhancement of quality-asymmetric multiview plus depth video by using virtual view. In Proceedings of the IEEE International Conference on Multimedia and Expo Workshops. 1--6.
[23]
Deukhyeon Kim, Jinwook Choi, and Kwanghoon Sohn. 2013. Multiview ToF sensor fusion technique for high-quality depth map. In Three-Dimensional Image Processing (3DIP) and Applications, Vol. 8650. International Society for Optics and Photonics, 865006.
[24]
Johannes Kopf, Michael F. Cohen, Dani Lischinski, and Matt Uyttendaele. 2007. Joint bilateral upsampling. ACM Transactions on Graphics 26, 3 (2007), 96.
[25]
Qiong Liu, You Yang, Yue Gao, Rongrong Ji, and Li Yu. 2013. A Bayesian framework for dense depth estimation based on spatial-temporal correlation. Neurocomputing 104 (2013), 1--9.
[26]
Qiong Liu, You Yang, Rongrong Ji, Yue Gao, and Li Yu. 2012. Cross-view down/up-sampling method for multiview depth video coding. IEEE Signal Processing Letters 19, 5 (2012), 295--298.
[27]
Wei Liu, Xiaogang Chen, Jie Yang, and Qiang Wu. 2017. Robust color guided depth map restoration. IEEE Transactions on Image Processing 26, 1 (2017), 315--327.
[28]
Zhen Liu, Qiong Liu, You Yang, Yuchi Liu, Gangyi Jiang, and Mei Yu. 2016. Cluster-based cross-view filtering for compressed multi-view depth maps. In Proceedings of the Conference on Visual Communications and Image Processing (VCIP’16). IEEE, Los Alamitos, CA, 1--4.
[29]
Jiangbo Lu, Dongbo Min, Ramanpreet Singh Pahwa, and Minh N. Do. 2011. A revisit to MRF-based depth map super-resolution and enhancement. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’11). IEEE, Los Alamitos, CA, 985--988.
[30]
Dongbo Min, Jiangbo Lu, and Minh N. Do. 2012. Depth video enhancement based on weighted mode filtering.IEEE Transactions on Image Processing 21, 3 (2012), 1176--1190.
[31]
Patrick Ndjiki-Nya, Martin Koppel, Dimitar Doshkov, Haricharan Lakshman, Philipp Merkle, Karsten Muller, and Thomas Wiegand. 2011. Depth image-based rendering with advanced texture synthesis for 3-D video. IEEE Transactions on Multimedia 13, 3 (2011), 453--465.
[32]
Jaesik Park, Hyeongwoo Kim, Yu-Wing Tai, Michael S. Brown, and Inso Kweon. 2011. High quality depth map upsampling for 3D-ToF cameras. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’11). IEEE, Los Alamitos, CA, 1623--1630.
[33]
Josien P. W. Pluim, J. B. Antoine Maintz, and Max A. Viergever. 2003. Mutual-information-based registration of medical images: A survey. IEEE Transactions on Medical Imaging 22, 8 (2003), 986--1004.
[34]
Yiguo Qiao, Licheng Jiao, Shuyuan Yang, and Biao Hou. 2018. A novel segmentation based depth map up-sampling. IEEE Transactions on Multimedia 21, 1 (2018), 1--14.
[35]
Zhou Ren, Junsong Yuan, Jingjing Meng, and Zhengyou Zhang. 2013. Robust part-based hand gesture recognition using Kinect sensor. IEEE Transactions on Multimedia 15, 5 (2013), 1110--1120.
[36]
Karsten Müller and Anthony Vetro. 2014. Common test conditions of 3DV core experiments. ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, document JCT3V-G1100, San Jose, January 2014.
[37]
Gary J. Sullivan, Jill M. Boyce, Ying Chen, Jens Rainer Ohm, C. Andrew Segall, and Anthony Vetro. 2013. Standardized extensions of high efficiency video coding (HEVC). IEEE Journal of Selected Topics in Signal Processing 7, 6 (2013), 1001--1016.
[38]
M. Tanimoto, T. Fujii, and K. Suzuki. 2008. Reference software of depth estimation and view synthesis for FTV/3DV. ISO/IEC JTC1/SC29/WG11, document M15836, Busan, Korea, October 2008.
[39]
C. Tomasi and R. Manduchi. 1998. Bilateral filtering for gray and color images. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’98). 839--846.
[40]
Y. Wang, Y. Yang, and Q. Liu. 2020. Feature-aware trilateral filter with energy minimization for 3D mesh denoising. IEEE Access 8 (2020), 52232--52244.
[41]
Yanke Wang, Fan Zhong, Qunsheng Peng, and Xueying Qin. 2014. Depth map enhancement based on color and depth consistency. Visual Computer 30, 10 (2014), 1157--1168.
[42]
Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600.
[43]
Jun Xie, Rogerio Schmidt Feris, Shiaw-Shian Yu, and Ming-Ting Sun. 2015. Joint super resolution and denoising from a single depth image. IEEE Transactions on Multimedia 17, 9 (2015), 1525--1537.
[44]
Xuyuan Xu, Lai Man Po, Chun Ho Cheung, Kwok Wai Cheung, Litong Feng, Chi Wang Ting, and Ka Ho Ng. 2014. Adaptive depth truncation filter for MVC based compressed depth image. Signal Processing Image Communication 29, 3 (2014), 316--331.
[45]
J. Yang, X. Ye, K. Li, C. Hou, and Y. Wang. 2014. Color-guided depth recovery from RGB-D data using an adaptive autoregressive model.IEEE Transactions on Image Processing 23, 8 (2014), 3443--3458.
[46]
You Yang, Qiong Liu, Xin He, and Zhen Liu. 2018. Cross-view multi-lateral filter for compressed multi-view depth video. IEEE Transactions on Image Processing 28, 1 (2018), 302--315.
[47]
Lijun Zhao, Huihui Bai, Anhong Wang, Yao Zhao, and Bing Zeng. 2017. Two-stage filtering of compressed depth images with Markov random field. Signal Processing Image Communication 54 (2017), 11--22.
[48]
Lijun Zhao, Anhong Wang, Bing Zeng, and Yingchun Wu. 2015. Candidate value-based boundary filtering for compressed depth images. Electronics Letters 51, 3 (2015), 224--226.

Cited By

View all
  • (2024)MSEConv: A Unified Warping Framework for Video Frame InterpolationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3648364Online publication date: 14-Feb-2024
  • (2023)Research progress of six degree of freedom(6DoF) video technologyJournal of Image and Graphics10.11834/jig.23002528:6(1863-1890)Online publication date: 2023
  • (2023)Local Bidirection Recurrent Network for Efficient Video Deblurring with the Fused Temporal Merge ModuleACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358746819:5s(1-18)Online publication date: 7-Jun-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 16, Issue 4
November 2020
372 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3444749
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 December 2020
Accepted: 01 June 2020
Revised: 01 May 2020
Received: 01 October 2019
Published in TOMM Volume 16, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Multi-view video plus depth
  2. global optimization
  3. view consistency

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • National Natural Science Fundation of China
  • National Key R&D Program

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)1
Reflects downloads up to 24 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)MSEConv: A Unified Warping Framework for Video Frame InterpolationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3648364Online publication date: 14-Feb-2024
  • (2023)Research progress of six degree of freedom(6DoF) video technologyJournal of Image and Graphics10.11834/jig.23002528:6(1863-1890)Online publication date: 2023
  • (2023)Local Bidirection Recurrent Network for Efficient Video Deblurring with the Fused Temporal Merge ModuleACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358746819:5s(1-18)Online publication date: 7-Jun-2023
  • (2021)An Algorithm for Motion Estimation Based on the Interframe Difference Detection Function ModelComplexity10.1155/2021/66387922021Online publication date: 1-Jan-2021

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media