ERCIM News No.48, January 2002 [contents]
MASCOT Adaptive and Morphological Wavelets for Scalable Video Coding
by Henk Heijmans
The European MASCOT project seeks to improve the quality and efficiency of video coding systems by exploiting metadata information, and to design a scalable video coding scheme by exploiting novel morphological and adaptive wavelet decomposition methods. The project is funded by the EC 5th Framework IST Programme and is coordinated by CWI.
The explosion of multimedia applications is leading to a great expansion of video transmission over heterogeneous channels such as the Internet, mobile nets and in-home digital networks. The requirements for bandwidth availability and quick and easy access to large multimedia databases are becoming increasingly stringent. However, the improvement in compression efficiency between MPEG-2 and MPEG-4 is not significant, and new techniques are needed to meet these requirements. The MASCOT project explores two such techniques, namely non-linear (morphological) and adaptive wavelet decompositions, and the utilisation of metadata.
In the future, audio-visual documents will as a rule be indexed and presented in a database together with metadata describing their content. Image and video sequence encoders may use this metadata information to improve their efficiency or to optimise their strategy. MASCOT seeks to validate this approach and to develop an efficient compression scheme exploiting such information, for example metadata representing the structure of a video program, governing shot boundary descriptions (cuts, dissolves, fades), and metadata for face recognition.
Compressed data representation with good scalability properties enhances efficient transmission of video over heterogeneous networks with limited channel capacity. Here scalable means that the bit-stream can be compressed such that only partial decoding is necessary, its degree depending on the conditions (bit-rate, errors, and resources). Of course, the quality level depends on the percentage of the bit-stream used by the decoder. Scalability can be spatial, temporal or with respect to quality (SNR).
A new tool for building a fully scalable video codec is the 3D wavelet (or 3D subband) decomposition, providing both spatial and temporal scalability. A single bit stream is encoded at very high bit rate, full frame rate and original display size. Scalability markers enable truncation of this bit stream, at the encoder or the decoder side, by jumping from one spatial/temporal resolution level to another. Bit budget management can be used at both sides to stop decoding at the targeted bit rate, thus enabling any desired combination of spatial resolution and frame rate. SNR scalability is achieved by embedded coding algorithms such as the 3D SPIHT algorithm. The 3D wavelet codec developed in MASCOT is based on an approach developed by MASCOT partner Philips Research France (see Figure).
3D wavelet analysis, consisting of consecutive temporal and spatial filtering, leads to a spatio-temporal multiresolution decomposition of the input group-of-frames (GOF), enabling smaller display sizes and/or lower frame rates. Temporal filtering is performed along the motion trajectory, requiring motion compensation for each pair of frames. Temporal Haar filters are used for GOFs consisting of 16 frames. (The example in the Figure starts with eight frames for reasons of convenience.) In the first step this gives rise to eight high-pass frames (shown in pink) and eight low-pass frames (shown in blue). In the second step, the group of low-pass frames is again spatially filtered leading to four low-pass frames (LL) and four high-pass frames (LH), etc. Eventually, one ends up with 15 high-pass and one low-pass frame. These frames are decomposed spatially using 2D wavelets. Roughly speaking, low-pass frames contain the low-frequency part of the temporal signal in the GOF corresponding to an average, and the high-pass frames contain the high-frequency part corresponding to a difference or detail signal.
The wavelet coefficients are encoded using an algorithm, called fully scalable zerotrees, which preserves the initial subband structure of the 3D wavelet transform. The hierarchy of temporal and spatial levels can be transposed to the motion vector coding. In the Figure the motion vectors are denoted by MVk, where k denotes the decomposition level.
In their original form, wavelet decompositions are linear. This may lead to various artefacts when coding image or video sequences containing sharp edges. One of the aims of this project is to propose new wavelets able to preserve significant structures inside scenes such as edges, textures, etc. A general and flexible framework for the wavelet construction is provided by the lifting scheme that enables one to modify existing wavelet decompositions, and to include nonlinearities and/or data-dependencies. Thus families of wavelets are developed based on mathematical morphology (morphological operations such as taking the maximum or the median are non-linear), as well as adaptive transforms based on the lifting scheme. The structure of such adaptive transforms may vary according to the nature of the input signal. Both linear and nonlinear filters may be used in the lifting steps and nonlinear criteria may be employed in order to select the best structure. This should lead to higher compression ratios for video sequences as well as to a superior subjective quality of their reconstruction. Recently we have succeeded in developing adaptive update lifting schemes that do not require any bookkeeping for perfect reconstruction. In these schemes, the choice of the update lifting filter is triggered by a binary threshold criterion based on a generalised gradient that can be chosen in such a way that it only smoothes homogeneous regions.
CWIs partners in the MASCOT project are: ENS des Telecommunications Paris, Heinrich Hertz Inst. Berlin, ENS des Mines Paris, Poznan University of Technology, UPC Barcelona, Vrije Universiteit Brussels and Philips Research France. The project comes under the Commissions FET (Future and Emerging Technologies) initiative that enables research of a bold nature involving high risks. MASCOT started last May and runs for two years. A public demonstration of the MASCOT codec is planned during a major international exhibition at the end of the project.