ERCIM News No.25 - April 1996 - CLRC

Computing and Information Systems Department Web Demographics

by Victoria Marshall and Martin Prime

At RAL we are interested in the twin issues of Web demographics and Web user interfaces.

As part of our on-going research work, we took as an example one month's logs (October 1995) of CISD's web, analysed these files, and began to draw some conclusions about reader demographics and user interfaces. The analysis was by necessity highly localised and ideosyncratic as this was the first time we had done any serious analysis of the web usage. The analysis did however give some constructive pointers as to reader behaviour.

In October 1995 the CISD web had 352 UK readers (28% of all readers), of which 89% were from the academic domain, 11% from the commercial domain. This reflects the very large number of academic people who are interested in an academic site such as RAL. However, of the 9929 pages accessed (81% of the total), 98% were from the academics, but only 2% by the commercials. This would seem to indicate that commercial readers are coming into the CISD web but then reading very little, perhaps because the web has very little of interest to them, or alternatively because the sheer number of icons1 is (for a modem user) too great.

In the notionally American domain, the situation was reversed: we had 289 readers (18% of all readers), of which 38% were in the academic domain, and 42% in the commercial domain. Of the 1155 pages accessed (9% of the total), 40% were from the academics, 47% from the commercials.

Readership throughout the rest of the world was farily widespread. Germany and France are our biggest readers, followed by the Netherlands, Italy, Sweden (all of whom are our ERCIM partners) and Canada.

The People section of our web is the most popular within the CISD web. Most of the accesses seemed to be by people simply trying to contact someone, accesses to pages for people involved in some Web development, or by various types of search engine. Some deliberate actions can be discerned however. Quite a number of accesses were for specific groups of people; other accesses were to the pages of internationally recognised names, or those involved with some high-profile activity such as ERCIM. A small minority of people accessed pages for what would seem to be infantile reasons; one person (possibly a German student) accessed only the pages of female staff; others accessed only the pages of people whose names are perceived to have some quirk or "foreign" names spelled with an umlaut.

33% of (outside) referrals came via search engines. Because so many readers were coming into the Web in this way, it seemed sensible to analyse the queries they were using to get here.

Infoseek was the most popular web search engine (19% of queries came from their site), lycos and altavista came next (with 5% and 4% of queries respectively).

From the 3838 incoming queries, 1555 unique queries were asked. Not surprisingly perhaps the most common search was for "Rutherford" which occurred either on its own or in combination with "Appleton" and/or "Laboratory". Other common search terms were "heathrow", "gatwick" and "transputer".

A few slightly dubious terms were also used: "chippendales" (a male performance troupe) and "stevie+nicks" (a singer with Fleetwood Mac) for example. These were mis-indexed as variants on the names of various members of staff within the Department.

Some confusions were inevitable and evident. Of the 142 searches for "everest" (a project within the Department), 57 were clearly intended to apply to "Mount Everest" and included terms such as "everest+explorers", "sir+edmund +hillary+everest" and "skidoo+everest". It was also likely that some users had completely misunderstood the capabilities of the search engine: "indirect+flight+from+heathrow+to+tokyo" and "what+is+web+site+for+calton +university".

We also looked at reader's sessions once they were in the CISD web. 120 users made a total of 2424 hits, an average of 20 each, with a maximum of 149 hits. 38% of 1250 users hit just one page of the web, 18% made just 2 hits, 12% 3 hits, 8% 4 hits.

It is clear that much work could be done on the analysis of web demographics, and this is an area of research that we are actively pursuing. At the moment it is very much an exploratory exercise; however some suggestions and observations can already be made.

Firstly, not all browsers comply with the http standard! Further information about users (if only the machine name) would be invaluable. Secondly, as the web grows, the use of search engines is bound to increase which necessitates more efficient use of search terms, and the careful design of web page content. Thirdly, (in the academic domain at least) people are interested in people. We have already subtley re-architectured our Web to exploit this fact, and "lead" readers from people pages to our project pages.

Please contact:
Victoria Marshall or Martin Prime - CLRC
Tel: +44 1235 82 1900

