Characteristic Structures from Videos for Indexing

by Tamás Szirányi

Finding the main actor in a video, or detecting the exact outline of objects in a surveillance video, or registering cameras in arbitrary situations are important tasks in the analysis of videos and indexing of events. These tasks are to be completed without any human intervention in difficult outdoor and indoor situations. At SZTAKI, a project is devoted to developing such new features: segmenting the focused target, outlining foreground objects, shadows, mirroring surfaces and registering cameras.

An automatic focus map extraction method has been developed (Tamás Szirányi, Levente Kovács) using a modification of blind deconvolution for localized blurring function estimation. We use these local blurring functions (so-called point spread functions, or PSFs) for extraction of focus areas on ordinary images. In this inverse task our goal is not image reconstruction but the estimation of localized PSFs and the relative focus map. Thus, the method is less sensitive to noise and ill-posed deconvolution problems. The focus areas can be estimated without any knowledge of the shooting conditions or the optical system that was used. The technique is suitable for main object selection and extraction, tracking in video and in surveillance applications, and indexing of image databases.

Figure 1
Figure 1: Examples for focus extraction on images with various textures (top row: input, bottom row: respective focus maps).

A new model regarding foreground and shadow detection in video sequences is shown (Tamás Szirányi and Csaba Benedek, PhD student at the Pázmány Péter Catholic University, Budapest). The model works without detailed a priori object-shape information, and is also appropriate for low and unstable frame-rate video sources. We have introduced three novel features in comparison to previous approaches. First, we have a more accurate, adaptive shadow model, and show improvements in scenes with difficult lighting, colouring effects, and motley backgrounds. Second, we give a novel description for the foreground based on spatial statistics of the neighboring pixel values, which enhances the detection of background or shadowed object parts. Third, we integrate pixel intensities with different colour and texture features in a general probabilistic framework and compare the performance of different feature selections. Finally, a Markov Random Field model is used to enhance the accuracy of the separation. We validated our method on outdoor and indoor video sequences captured by the surveillance system at the university campus, and we also tested it through well-known benchmark video shots.

Figure 2
Figure 2: Different parts of the day in sequences at the entrance of Pázmány Péter Catholic University with segmentation results. Above left: in the morning (‘am’), right: at noon, below left: in the afternoon (‘pm’), right: wet weather.

Figure 3
Figure 3: Top: images of the “main hall” and “entrance” (Pázmány Péter Catholic University) cameras with control lines on the ground (marked with two long paper tapes) for verification.
Bottom: on the left a schematic map of the experiment shows the placement of the cameras and their field of views. Right: result of alignment of non-overlapping views with the highlighted control lines.

We have developed several methods for registering cameras from arbitrary motions or from biometrics (Tamás Szirányi, László Havasi and Zoltán Szlávik). We demonstrate here an application of our new robust walk detection algorithm based on our symmetry approach, which can be used to extract biometric characteristics from video image-sequences. To obtain a useful descriptor of a walking person, we temporally track the symmetries of a person’s legs. In a further processing stage, these patterns are filtered, then re-sampled and transformed to a subspace with a much smaller dimension of an ‘eigenwalk space’. Our method is suitable for use in indoor or outdoor surveillance scenes. Image registration methods are presented which are applicable to multi-camera systems viewing human subjects in motion. Determining the leading leg of the walking subject is important and the presented method can identify this from two successive walk-steps (one walk cycle). Using this approach, we can detect sufficient numbers of corresponding points for the estimation of correspondence between two camera views. This is the case both in overlapping and in a special case of non-overlapping camera configurations.

Our project is partly supported by the Hungarian National Research and Development Program. With this activity, we contribute to the Network of Excellence project run by ERCIM: MUSCLE (Multimedia Understanding through Semantics, Computation and Learning).

Please contact:
Tamás Szirányi, SZTAKI, Hungary
Tel: +36 1 279 6106