Poznan University of Technology
3D Codec

This page presents a technology for efficient coding of 3D video in Multi-view plus Depth (MVD) representation.

The new 3D video systems will include glassless displays and will provide realistic impression of depth as well as controllable stereoscopic base-line distance. In such 3D video systems the description of a 3D scene should be richer than just a stereo pair. Some applications will need many views to be available at the receiver. For example, future autostereoscopic displays are expected to present simultaneously even 50 different views corresponding to cameras with parallel optical axes equally spaced within an interval of order of the human inter-ocular distance (about 64 millimeters). Such dense spacing of the views yields strong similarity between the neighboring views that can exploited for compression. Moreover, in the receiver many virtual views may be efficiently synthesized using the Depth-Image-Based Rendering (DIBR) [3,6,19] and for transmission the MVD format often may be limited to only 2-3 views accompanied with the corresponding depth maps [1]. In a realistic example of a system with an autostereoscopic display only 3 views with 3 depth maps are transmitted (Fig. 1).

Fig. 1. An example of a 3D video system where 3 views with 3 depth maps are transmitted and used for synthesis of many virtual views.

The proposed technology is backward compatible with HEVC standard, so one of the views called base view can be decoded by a legacy HEVC decoder (video only). The remaining data (video and depth) can only be decoded by the 3D decoder, because additional syntax structures are used in the bitstream (fig. 2). For both videos and depth maps hierarchical view coding structure similar to MVC is used: the already coded views are used as references for prediction of the subsequent views. There are three main inter-view prediction mechanizms used:

The main idea of the proposed coding technology is to exploit view-synthesis prediction as much as possible. The base view (HEVC-compatible view) and its depth are coded directly i.e. without any inter-view prediction. The side views (video and depths) are synthesized from the base view. Then, in the side views, disoccluded regions (hidden by the occlusion in the base view) are identified. Only the disoccluded regions from the side views are coded. Coding of the side views takes advantage of other inter-view prediction modes: disparity compensation and DBMP. The cameras parameters are compressed and transmitted together with the videos and depth maps in a single bitstream.

Fig 2. Proposed codec structure for 3-view MVD.

The detailed technical description can be found in [1].

In october 2011, the codec has been submitted as a proposal to Call for Proposal [2] issued by Motion Picture Experts Group (MPEG) on behalf of International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC). After the submission, all proposals have been extensively tested both subjectively and objectively and the proposal from Poznan University Technolgy has been qualified as one of the best performing.

Even after the resolution of CfP, we have continued to develop and assess our codec in finer details. The objective and subjective experiments that we have performed [3,4,5] proved the results attained by MPEG and also provided in-depth look into the performace of our technology.

The evaluation methodology, can be found in [2,6].

Table 1. Input views position and synthesized output view position for 2-view and 3-view configurations.

Sequence2-view input3-view inputViews to Synthesize from 2-view test scenario (and stereo pair)Views to Synthesize from 3-view test scenario (and stereo pair)
Poznan_Hall27-67-6-56.5 (6.5-6)6.125-5.875 (6.125-5.875)
Poznan_Street4-35-4-33.5 (3.5-3)4.125-3.875 (4.125-3.875)
Undo_Dancer2-51-5-93 (3-5)4.5-5.5 (4.5-5.5)
GT_Fly5-29-5-14 (4-2)5.5-4.5 (5.5-4.5)
Kendo3-51-3-54 (4-5)2.75-3.25 (2.75-3.25)
Balloons3-51-3-54 (4-5)2.75-3.25 (2.75-3.25)
Lovebird16-84-6-77 (7-8)5.75-6.25 (5.75-6.25)
Newspaper4-62-4-65 (5-6)3.75-4.25 (3.75-4.25)

Below, it can find an results of coding with our technology for 2-view case and 3-view case for all 8 sequences defined in CfP. The results include

Provided stereoscopic pair of views was synthesized based on decoded video and depth maps from provided bitstreams. Exact position of the input and output view position can be found in table 1. Provided in AVI file view was subjectively evaluated during formal subjective evaluation of the proposals. Downloaded AVI file can be viewer by a dedicated 3D software like Stereoscopic player. Along bitstream and executable decoder, bath file is provided that can be used to decode the bitstreams. Executable decoder file is prepared for runing in windows 64 bit envirolement (other platforms executable decoder file can be obtain upon request). Decoder outputs reconstructed video and depth maps, along with camera parameters that can be used for view synthesis.

Fig 3. Synthesized stereopair of Poznan Street sequence in side by side format created based on 3 videos and 3 depth maps encoded at 1950 kbps.

Fig 4. Synthesized stereopair of Poznan Street sequence in side by side format created based on 3 videos and 3 depth maps encoded at 1180 kbps.

Fig 5. Synthesized stereopair of Poznan Street sequence in side by side format created based on 3 videos and 3 depth maps encoded at 710 kbps.

Fig 6. Synthesized stereopair of Poznan Street sequence in side by side format created based on 3 videos and 3 depth maps encoded at 410 kbps.

Table 2. AVI file with synthesized stereo pair from decoded video and depth maps, bitstreams and executable decoder for 3 view case.

Sequenceavibitstream
Poznan Hall 2770 kbps 480 kbps 310 kbps 210 kbps 770 kbps 480 kbps 310 kbps 210 kbps
Poznan Street1950 kbps 1180 kbps 710 kbps 410 kbps 1950 kbps 1180 kbps 710 kbps 410 kbps
Undo Dancer2010 kbps 1200 kbps 780 kbps 430 kbps 2010 kbps 1200 kbps 780 kbps 430 kbps
GT Fly1600 kbps 1080 kbps 600 kbps 340 kbps 1600 kbps 100 kbps 600 kbps 340 kbps
Kendo1040 kbps 670 kbps 430 kbps 280 kbps 1040 kbps 670 kbps 430 kbps 280 kbps
Balloons1200 kbps 770 kbps 480 kbps 300 kbps 1200 kbps 770 kbps 480 kbps 300 kbps
Newspaper900 kbps 680 kbps 450 kbps 340 kbps 900 kbps 680 kbps 450 kbps 340 kbps
Lovebird 11270 kbps 730 kbps 420 kbps 260 kbps 1270 kbps 730 kbps 420 kbps 260 kbps

Table 3. AVI file with synthesized stereo pair from decoded video and depth maps, bitstreams and executable decoder for 2 view case.

Sequenceavibitstream
Poznan Hall 2520 kbps 320 kbps 210 kbps 140 kbps 520 kbps 320 kbps 210 kbps 140 kbps
Poznan Street1310 kbps 800 kbps 480 kbps 280 kbps 1310 kbps 800 kbps 480 kbps 280 kbps
Undo Dancer1000 kbps 710 kbps 430 kbps 290 kbps 1000 kbps 710 kbps 430 kbps 290 kbps
GT Fly1100 kbps 730 kbps 400 kbps 230 kbps 1100 kbps 730 kbps 400 kbps 230 kbps
Kendo690 kbps 480 kbps 360 kbps 230 kbps 690 kbps 480 kbps 360 kbps 230 kbps
Balloons800 kbps 520 kbps 350 kbps 250 kbps 800 kbps 520 kbps 350 kbps 250 kbps
Newspaper720 kbps 480 kbps 360 kbps 230 kbps 720 kbps 480 kbps 360 kbps 230 kbps
Lovebird 1830 kbps 480 kbps 300 kbps 220 kbps 830 kbps 480 kbps 300 kbps 220 kbps

References