SPIEGLE: A Multimedia Search Engine Generator
by Arjen de Vries
When the efficiency of multimedia search system engineering is improved, multimedia search effectiveness will follow. The Semantic Multimedia Access project in the Netherlands has developed Spiegle, a parameterized search-engine generator. It generates a search system specialized for a particular set of circumstances, including user, form of data collection and background. The Netherlands Institute for Sound and Vision - the national audio-visual archive - and Van Dale publishers will test the system.
Multimedia searching is never a goal on its own, but in reality is embedded in user tasks. These vary in complexity, in the collections accessed, and in their contextual parameters. For example, a film maker may seek a suitable shot to include in a documentary, a teacher might look for an animation to illustrate a lecture, a group of friends may search for photos to accompany their stories told at a wedding party, while a DJ spinning his records in a popular night club might try to find that perfect blues lick to sample over his beats.
It is to be expected that a search strategy that works well in one scenario will not necessarily be the best choice for another. Ideally, each user task would be matched by a specialized retrieval strategy. In other words, a retrieval engine should be context-aware, or at least be adaptable to the context in which it will operate. For example, while the journalist could find relevant shots in a national archive, those looking for wedding photos would be better served on the Web, or in their friends collective folders of digital photos. In addition, while the teacher may be satisfied with a familiar example, the best answers to the DJs query would be unusual musical samples.
Researchers from CWI, the University of Amsterdam and the University of Twente are collaborating on MultimediaNs Semantic Multimedia Access project (also known as MN-N5) to create new technology to ease this adaptation of search system to user task. The project's main goal is to develop Spiegle, a parameterized search engine generator. Spiegle takes two inputs: firstly, a collection schema and secondly, a declarative specification of a retrieval strategy suitable for the user search task. It then generates a search system specialized for the particular context at hand, including the combination of this user, this collection, and the specific background knowledge available.
Spiegle combines the results of two existing research lines, both based upon probabilistic methods for information retrieval. The first building block is the TIJAH structured document retrieval system, developed in the Cirquid project. TIJAH retrieves document components from XML documents, matching on both content and structure. We are currently extending its text retrieval models with the probabilistic models developed for image and video retrieval, and will use the resulting system for multimedia search in both TRECVID and INEX 2005.
The second building block is the RAM database front-end for processing queries over arrays. RAM (Relational Array Mapping) was originally developed to express the retrieval models involving multimedia content. A recent article by Roelleke and others has demonstrated how to express many well-known retrieval models in a general matrix framework. The corresponding matrix expressions are easily expressed in RAM, so it seems the ideal starting point for specifying search strategies declaratively.
Consequently, the Spiegle parameterized search engine can be realized by integrating TIJAH and RAM. TIJAH provides the techniques to adapt the search system to different collections and background knowledge, and RAM provides the declarative language for specifying the retrieval model. Since both are implemented as front-ends on the same database back-end (the open source database system MonetDB), this should be feasible without too many complications.
Search systems generated with Spiegle will be put to the test with a variety of search tasks in scientific evaluations such as TREC, TRECVID, INEX, and CLEF. Perhaps more interestingly though, the project also involves various end-user organisations, including the Netherlands Institute for Sound and Vision, and Van Dale publishers, who offer great case studies for further validation of our research results.
Sound and Vision is not only the business archive of the national broadcasting corporations, but also a cultural history institute and a unique media experience for its visitors. The institute intends to open its archive to program makers and researchers, as well as for educational purposes. Using our technology, it should be easier to support each of these user groups with search systems specialized to their search tasks. We also hope to reduce the burden of annotation by integrating search functionality into the annotation process.
Our work with Van Dale, a prominent publisher of dictionaries, demonstrates that project results are not limited to the 'multimedia search engine'. Both detection and tracking are important forms of searching, and we have applied our technology to track the development of language in terms of word usage and the shifting of meaning over time. We think the insights resulting from this project are also applicable in searching collections that span decades of text data.
In summary, the main innovation of this project is its goal of working toward a system architecture that accommodates different types of searching, using different sets of a priori knowledge and exhibiting a varying degree of heterogeneity (or homogeneity). This will simplify considerably the comparison of different types of retrieval model instantiations on a series of search problems.
Arjen de Vries, CWI, The Netherlands
Tel: +31 20 592 4306