This page presents a technology for efficient coding of 3D video in Multi-view plus Depth (MVD) representation.
The new 3D video systems will include glassless displays and will provide realistic impression of depth as well as controllable stereoscopic base-line distance. In such 3D video systems the description of a 3D scene should be richer than just a stereo pair. Some applications will need many views to be available at the receiver. For example, future autostereoscopic displays are expected to present simultaneously even 50 different views corresponding to cameras with parallel optical axes equally spaced within an interval of order of the human inter-ocular distance (about 64 millimeters). Such dense spacing of the views yields strong similarity between the neighboring views that can exploited for compression. Moreover, in the receiver many virtual views may be efficiently synthesized using the Depth-Image-Based Rendering (DIBR) [3,6,19] and for transmission the MVD format often may be limited to only 2-3 views accompanied with the corresponding depth maps [1]. In a realistic example of a system with an autostereoscopic display only 3 views with 3 depth maps are transmitted (Fig. 1).
Fig. 1. An example of a 3D video system where 3 views with 3 depth maps are transmitted and used for synthesis of many virtual views.
The proposed technology is backward compatible with HEVC standard, so one of the views called base view can be decoded by a legacy HEVC decoder (video only). The remaining data (video and depth) can only be decoded by the 3D decoder, because additional syntax structures are used in the bitstream (fig. 2). For both videos and depth maps hierarchical view coding structure similar to MVC is used: the already coded views are used as references for prediction of the subsequent views. There are three main inter-view prediction mechanizms used:
Fig 2. Proposed codec structure for 3-view MVD.
The detailed technical description can be found in [1].
In october 2011, the codec has been submitted as a proposal to Call for Proposal [2] issued by Motion Picture Experts Group (MPEG) on behalf of International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC). After the submission, all proposals have been extensively tested both subjectively and objectively and the proposal from Poznan University Technolgy has been qualified as one of the best performing.
Even after the resolution of CfP, we have continued to develop and assess our codec in finer details. The objective and subjective experiments that we have performed [3,4,5] proved the results attained by MPEG and also provided in-depth look into the performace of our technology.
The evaluation methodology, can be found in [2,6].
Table 1. Input views position and synthesized output view position for 2-view and 3-view configurations.
Sequence | 2-view input | 3-view input | Views to Synthesize from 2-view test scenario (and stereo pair) | Views to Synthesize from 3-view test scenario (and stereo pair) |
---|---|---|---|---|
Poznan_Hall2 | 7-6 | 7-6-5 | 6.5 (6.5-6) | 6.125-5.875 (6.125-5.875) |
Poznan_Street | 4-3 | 5-4-3 | 3.5 (3.5-3) | 4.125-3.875 (4.125-3.875) |
Undo_Dancer | 2-5 | 1-5-9 | 3 (3-5) | 4.5-5.5 (4.5-5.5) |
GT_Fly | 5-2 | 9-5-1 | 4 (4-2) | 5.5-4.5 (5.5-4.5) |
Kendo | 3-5 | 1-3-5 | 4 (4-5) | 2.75-3.25 (2.75-3.25) |
Balloons | 3-5 | 1-3-5 | 4 (4-5) | 2.75-3.25 (2.75-3.25) |
Lovebird1 | 6-8 | 4-6-7 | 7 (7-8) | 5.75-6.25 (5.75-6.25) |
Newspaper | 4-6 | 2-4-6 | 5 (5-6) | 3.75-4.25 (3.75-4.25) |
Below, it can find an results of coding with our technology for 2-view case and 3-view case for all 8 sequences defined in CfP. The results include
Fig 3. Synthesized stereopair of Poznan Street sequence in side by side format created based on 3 videos and 3 depth maps encoded at 1950 kbps.
Fig 4. Synthesized stereopair of Poznan Street sequence in side by side format created based on 3 videos and 3 depth maps encoded at 1180 kbps.
Fig 5. Synthesized stereopair of Poznan Street sequence in side by side format created based on 3 videos and 3 depth maps encoded at 710 kbps.
Fig 6. Synthesized stereopair of Poznan Street sequence in side by side format created based on 3 videos and 3 depth maps encoded at 410 kbps.
Table 2. AVI file with synthesized stereo pair from decoded video and depth maps, bitstreams and executable decoder for 3 view case.
Sequence | avi | bitstream |
---|---|---|
Poznan Hall 2 | 770 kbps 480 kbps 310 kbps 210 kbps | 770 kbps 480 kbps 310 kbps 210 kbps |
Poznan Street | 1950 kbps 1180 kbps 710 kbps 410 kbps | 1950 kbps 1180 kbps 710 kbps 410 kbps |
Undo Dancer | 2010 kbps 1200 kbps 780 kbps 430 kbps | 2010 kbps 1200 kbps 780 kbps 430 kbps |
GT Fly | 1600 kbps 1080 kbps 600 kbps 340 kbps | 1600 kbps 100 kbps 600 kbps 340 kbps |
Kendo | 1040 kbps 670 kbps 430 kbps 280 kbps | 1040 kbps 670 kbps 430 kbps 280 kbps |
Balloons | 1200 kbps 770 kbps 480 kbps 300 kbps | 1200 kbps 770 kbps 480 kbps 300 kbps |
Newspaper | 900 kbps 680 kbps 450 kbps 340 kbps | 900 kbps 680 kbps 450 kbps 340 kbps |
Lovebird 1 | 1270 kbps 730 kbps 420 kbps 260 kbps | 1270 kbps 730 kbps 420 kbps 260 kbps |
Table 3. AVI file with synthesized stereo pair from decoded video and depth maps, bitstreams and executable decoder for 2 view case.
Sequence | avi | bitstream |
---|---|---|
Poznan Hall 2 | 520 kbps 320 kbps 210 kbps 140 kbps | 520 kbps 320 kbps 210 kbps 140 kbps |
Poznan Street | 1310 kbps 800 kbps 480 kbps 280 kbps | 1310 kbps 800 kbps 480 kbps 280 kbps |
Undo Dancer | 1000 kbps 710 kbps 430 kbps 290 kbps | 1000 kbps 710 kbps 430 kbps 290 kbps |
GT Fly | 1100 kbps 730 kbps 400 kbps 230 kbps | 1100 kbps 730 kbps 400 kbps 230 kbps |
Kendo | 690 kbps 480 kbps 360 kbps 230 kbps | 690 kbps 480 kbps 360 kbps 230 kbps |
Balloons | 800 kbps 520 kbps 350 kbps 250 kbps | 800 kbps 520 kbps 350 kbps 250 kbps |
Newspaper | 720 kbps 480 kbps 360 kbps 230 kbps | 720 kbps 480 kbps 360 kbps 230 kbps |
Lovebird 1 | 830 kbps 480 kbps 300 kbps 220 kbps | 830 kbps 480 kbps 300 kbps 220 kbps |
References