Video Transcoding Architectures for Multimedia Real Time Services
by Maurizio A. Bonuccelli, Francesca Lonetti and Francesca Martelli
The video transcoding project at PisaTel, a laboratory located at ISTI-CNR in a joint collaboration between ISTI, Ericsson Lab Italy, the Scuola S. Anna and Pisa University, aims at developing efficient solutions for real-time video coding and transcoding.
A key technology for many applications, such as Digital TV Broadcasting, Distance Learning, Video on Demand, Video Telephony and Video Conferencing, is Digital Video Coding. In third generation telecommunication systems, communication technologies are extremely heterogeneous. Adapting the media content to the characteristics of different networks (communication links and access terminals) in order to obtain video delivery with acceptable service quality is thus an important issue. Video transcoding converts one compressed video bitstream into another with a different format, size (spatial transcoding), bit rate (quality transcoding), or frame rate (temporal transcoding). The goal of transcoding is to enable the interoperability of heterogeneous multimedia networks reducing complexity and run time by avoiding the total decoding and re-encoding of a video stream.
We are interested in temporal video transcoding. This process skips some frames in order to change the frame rate of a video sequence without decreasing the video quality of non-skipped frames. In third generation mobile telecommunication systems (UMTS ), the bandwidth of a coded video stream must be drastically reduced in order to cope with the constrained transmission channel. Frame skipping is a promising approach for transcoding one video sequence into another with a lower bit rate, while maintaining good video quality. Many multimedia services (such as videoconferencing, video telephony) have real-time features, so transcoding must guarantee a fixed communication delay. We have concentrated on this aspect, and have developed and evaluated two temporal transcoding architectures.
Temporal Transcoding Architectures
In a video sequence, many frames are coded with reference to previous frames, using motion vectors and prediction errors. In temporal transcoding, when a frame is skipped, the references of the next frame are no longer valid. Motion Vector Composition (MVC) is a procedure that computes the new motion vectors of the non-skipped frames. Once new motion vectors have been computed, new prediction errors are also needed for the transcoded frames. Another important issue in temporal transcoding is the choice of frames to be skipped. A first frame rate control architecture, Dynamic Frame Skipping (DFS), dynamically adjusts the number of skipped frames according to motion activity. This gives a measure of the motion in a frame and frames with much motion are not skipped. Another temporal transcoding architecture, Frame Skipping Control (FSC), computes the prediction errors; this produces re-encoding errors, and frames are skipped on the basis of the effect of re-encoding errors and motion activity. The goal of this strategy is to minimize the re-encoding errors and to preserve the motion smoothness of the transcoded frames.
The real time features of many advanced multimedia applications are not taken into account by the above architectures. In order to meet the needs of such applications, we have modified both architectures so that the output bit rate is constant, and the maximum communication delay is fixed. We achieved this by introducing a transcoder output buffer, and by skipping frames according to the buffer occupancy. The maximum communication delay depends on the buffer size.
We implemented an MPEG4-based temporal transcoder and evaluated the performance of both our architectures over several benchmark videos. The results, in terms of PSNR (a measure indicating the quality of the transcoded sequence) are compared with those of a quality transcoder (QT). The comparison shows that better performance is achieved by quality transcoding for videos with a lot of motion and by temporal transcoding (DFS and FSC) for videos with little motion (see the figure). Moreover, we observed that the DFS architecture has the better performance since in FSC many frames are skipped because of re-encoding errors. We obtained similar results using different MVC algorithms (Bilinear Interpolation (BI), Telescopic Vector Composition (TVC), Forward Dominant Vector Selection (FDVS), Activity Dominant Vector Selection (ADVS)).
Francesca Martelli, ISTI-CNR, Italy
Tel: +39 050 315 3468