ERCIM News No.27 - October 1996
Caching and Mirroring Techniques in WWW and Digital Library Architectures
by László Kovács
The World Wide Web is a networked hypermedia architecture connecting
millions of documents via hypertext links. The documents are stored on server
machines and client softwares running on practically any networked computers
are used to retrieve documents through Internet. Mirroring in the Internet
jargon means the creation of a remote copy of some data or complete hypermedia
documents. This technique is used for information that is very popular or
served via low-speed connections. It can help in decreasing the network
traffic over the Internet backbone. Various techniques of mirroring work
well for other types of Internet services, such as FTP or USENET News, and
have an enormous significance in the area of the World Wide Web that generates
most traffic of all services over the Internet.
Although there exist a few public domain scripts for WWW mirroring, the
topic is in a somewhat premature state according to the evolving needs of
the Internet society. At SZTAKI different mirroring and caching techniques
are now being developed in the context of WWW services as well as in Digital
A mature algorithm for mirroring and a standardized portable hypermedia
(PHM) format can ease the distribution of hypermedia documents through the
World Wide Web. Recently, a two-phase mirroring algorithm has been developed.
The algorithm can create a remote copy of a complex HTML document stored
in another WWW server. The algorithm provides the mirrored document in a
PHM format defined in a paper presented at the 7th Joint European Networking
Conference. Hypermedia documents in PHM format can be transferred without
any need for further semantic transformation. A software environment based
on this algorithm for mirroring hypermedia documents was built. It provides
different high-level, intelligent automatic mirroring services via usual
WWW interfaces (set of forms). The proper use of this environment can decrease
the network load during peak periods and it can increase the accessibility
of the selected hypermedia documents.
The mirroring technique developed at SZTAKI can be the first step in the
direction of introducing a separate protocol and/or protocol extension for
mirroring purposes. The application of new mirroring techniques affects
the searching methods to be applied.
Uniform searching techniques
In heterogeneous distributed systems, searches are carried out through heterogeneous
search engines, with different schema, different transaction models, and
different search protocols. Hiding the details in such complex searches
is an open issue. Transparency in distributed systems (such as location
and access transparencies) is considered as a key point for the usability
of the system. Users would like to be completely unaware of the internal
details of system mechanisms, although this requirement is sometimes in
conflict with the performance issues of the system. SZTAKI plans to develop
a new uniform searching engine that makes homogeneous search possible even
in the presence of mirrored WWW documents.
Caching and replication of services
The performance and reliability of distributed systems can be improved by
using caching and replication techniques. Different digital library architectures
require sophisticated caching and/or replication architectures, eg hierarchical
distributed caching. Propagation of safe modification of contents in the
case of complex caching and/or mirroring architectures remains an open question.
Copyright and intellectual property issues can affect the techniques used
for improving the performance of the system or even prevent the use of some
of these techniques completely. A tradeoff between improving perfor-mance
and taking intellectual property issues into consideration implies the redefinition
of the very idea of copyright. As a recent activity in this field, new replicated
Dienst index services have been installed in three different European regions
(INRIA, SZTAKI and FORTH).
This new system installation has improved the European facilities by providing
faster and more reliable access to the distributed computer science technical
report digital library (NCSTRL).
László Kovács - SZTAKI
Tel: +36 1 269 8286
return to the contents page