The ADS software keeps extensive logs about the use of the search and access software. In this section, usage statistics for the search software and for access patterns to the Article Service are reported. If not otherwise indicated, the statistics in this section are for the one-year period from 1 April 1998 through 31 Mar. 1999.
The ADS is accessed by users from many different countries. In the one-year period of this section the ADS was accessed by 127 000 different users, using 100 000 different hosts from 112 different countries. An individual user is defined as having a unique cookie (see Sect. 3). Users without cookies are distinguished by the hostnames from which the requests came. This may overestimate the number of users, since some users may have more than one cookie, for instance when accessing the ADS from home. The number of different hosts is a lower limit of the number of users. Many hosts are used by multiple users, so the real number is certainly considerably higher than that. The development of the number of users and queries over the life of the ADS is described in OVERVIEW. This section describes some more detailed investigations of the access statistics.
The total number of users at first comes as quite a surprise. The number of working astronomers in the world is probably between 10 000 and 20 000. The number of ADS users is much larger than that. This is probably due to several factors. First, there are certainly many accidental users. They somehow find our search page through some link, execute a query to see what they get back, and then never come back because it is of no interest to them.
Other possible users are media people. There are certainly many reporters occasionally looking up something in astronomy. I have spoken with several of them that use the ADS occasionally for that.
Another group of users are amateur astronomers. The ADS was described in Sky & Telescope by [Eichhorn (1996a)] a few years ago. this has certainly made amateur astronomers aware of this resource. The number of amateur astronomers world-wide is certainly in the millions, so they comprise a potentially large number of users.
Another large group of users visits the ADS through links from other web sites. One particularly popular one is NASA's Image of the Day, which frequently includes links to abstracts or articles in the ADS. Since this NASA page is visited by millions of people, a large number of them will access the ADS through these links.
The use of the ADS in different countries depends on several factors. One of these is certainly the population of the country. Figure 9 shows the number of queries per capita as a function of the population of the country ([CIA 1999]). There seems to be an upper limit of about 0.1 - 0.2 queries per person per year. The one exception is the Vatican with almost 3 queries per person per year. This is understandable since the Vatican has an active Astronomy program, which generates a large number of queries for a small population.
Figure 9: Number of queries per person per year in each country as a function of the population for the country |
Another factor for querying the ADS is the funding available for Astronomy in a country, and the available infrastructure to do astronomical research. Figure 10 shows the number of references retrieved per capita as a function of the Gross Domestic Product (GDP, [CIA 1999]) of the country. The symbols are the Internet codes for each country.
Figure 10: Number of references retrieved per person as a function of the Gross Domestic Product (GDP) per person for each country |
The highly industrialized countries cluster in the upper right part of the plot (area 1). A closeup of this region is shown in Fig. 11. Other clusters are the countries of the former Soviet Union (area 2), and Central and South American countries (area 3). The high number of references retrieved per capita combined with the lower GDP per capita of the former Soviet Union is probably due to a recent decline in GDP, but a still existing infrastructure for astronomical research.
Figure 11: Number of references retrieved per person as a function of the Gross Domestic Product (GDP) per person. This figure is a closeup of area 1 in Fig. 10 |
Figure 12: Number of queries per hour as a function of the time of day for the main SAO site and the mirror sites in France and Japan |
There are three features in this figure that deserve special notice. The first is that the shape of the accesses to the ADS mirror in France is the same as the shape of the non-US access to the SAO site. This indicates that the large majority of the non-US use on the SAO site is from European users.
This non-US usage is about 50% higher than the total usage of the ADS mirror site at the CDS in France. The reason for this is most probably the fact that the connectivity within Europe is not yet very good. We know for instance that our users in England and Sweden have better access to the main ADS site in the U.S.A. than to our mirror site in France. The same is true for other parts of Europe.
Another reason for the use of the U.S.A. site by European users is the fact that our European mirror sites do not yet have the complete set of scanned articles on-line. This forces some users to access the main ADS site in order to retrieve scanned articles.
Second, there is a slight peak in the distribution of queries to the NAO mirror in Japan around 21:00 UTC (Universal Time Coordinated, formerly Greenwich Mean Time). This is probably due to US west coast users using the Japanese mirror site instead of the US site. The access to Japan is frequently very fast and response times from Japan may be better than from SAO during peak traffic times.
Third, there is a distinct peak in the SAO-US usage at 9:00 UTC. This feature was so unusual that we tracked down the reason for it. It turns out that one of our users has set up web pages that include about 200 links to ADS abstracts. He had set up a link verifier that every night at 9:00 UTC checked all the links on his pages. This meant that the link verifier executed 200 queries every night at the same time, which showed up in this evaluation of our access statistics.
The following section shows statistics of how our users use the different capabilities of the ADS query system. Figure 13 shows a histogram of the relative usage of the different search fields (authors, objects, title, text). It shows clearly that the majority of queries are queries by author name (66%). Object names are used in fewer than 5% of the queries. The title field is used in about 21% of the queries, and the text field in 26%. Queries that use more than one field make up about 18% of the total. This usage pattern justifies for instance including tables of contents (ToCs) in the database that do not have abstracts for searching. Since a large part of the usage is through author and title queries, such ToC entries will still be found.
Figure 13: A histogram of the relative usage of the different search fields (authors, objects, title, text) and the use of multiple fields |
Figure 14 shows the number of queries as a function of the number of query items in each input field. The query frequency generally decreases exponentially with increasing number of search terms. For title and text queries, the frequency is approximately constant up to 3 query words, before the frequency starts to decrease. For abstract queries there is a significant increase in frequency of queries with more than 20 query words, for title queries there is a similar increase for queries with more than about 8 query words. This is due to queries generated through the query feedback mechanism which allows the user to use a given abstract and its title as a new query.
Figure 14: The number of queries in the period of 1 April 1998 to 31 March 1999 as a function of the number of query items in each input field |
Figures 15 and 16 show the usage of non-default query settings (see Sect. 4.2). The default settings were chosen to suffice for most queries. Figure 15 shows the percentage of non-default settings for the different settings and query fields available. It shows that 29% of author queries, 78% of title queries, and 85% of text queries use non-default settings. This was at first disappointing, because it suggested that the default settings might not be a reasonable selection of settings. The two main settings that were non-default were combining words with "AND'' (see Sect. 2.1.1.b.ii), and disabled weighted scoring (see Sect. 2.1.1.c). On closer examination of the statistics it turns out that the straight weighting settings come from mainly two systems, the NASA Techreports and the International Society for Optical Engineering (SPIE).
Figure 16: Percentage of non-default settings for the different available settings and query fields. This plot excludes the queries from NASA Techreports and SPIE |
Both of these systems use our Perl scripts (see Sect. 2.4) to access the ADS database. They do not set our normal default settings during these queries. Figure 16 shows the non-default settings for all queries that did not come from either of these two servers. There is still a small percentage of queries that use straight weighting, probably mostly due to other systems that use our Perl script interface routines.
The one remaining non-default setting that is used frequently is the combination of words with "AND''. We believe that the "OR'' combination as default is more useful since it returns more information. The beginning of the list of returned references is the same, regardless of whether "AND'' or "OR'' combination is selected, since references that match all words are sorted to the beginning of the list. When "OR'' combination is selected, partial matches will be returned after the ones with perfect matches. This is desirable since there may be relevant references that for some reason do not match all query words.
The other selecting mechanism that is available is the filtering of references according to what other information is available for a reference. The usage of the filtering is shown in Table 8. About 10% of the total queries use the filter option. Almost all of these filter by journal or select refereed journals only. The sum of the numbers for required data types adds up to more than the number for "Required data'', since more than one data type can be selected.
Filter Type | Required Data Type | Queries |
Total queries | 2754405 | |
Non-standard queries | 286341 | |
Selected journal | 158581 | |
Refereed journals | 96270 | |
Non-refereed journals | 1616 | |
Data available | 6381 | |
Required data | ||
Printable Articles | 2921 | |
Scanned Articles | 1951 | |
Electronic Articles | 1690 | |
Abstracts | 1382 | |
Planetary Data System | 834 | |
Planetary Nebulae | 667 | |
Citations | 615 | |
Table of Contents | 506 | |
References | 459 | |
Author Comments | 393 | |
On-line Data | 360 | |
SIMBAD Objects | 269 | |
NED Objects | 212 | |
Library Entries | 204 | |
Mail Order | 201 | |
Associated Articles | 83 |
Table 9 shows the number of links available and the usage pattern of the data links that the ADS provides. The highest usage is access to the abstracts, followed by the links to full text articles, links to citations, and links to on-line electronic articles. Reference links and links to SIMBAD objects are next.
Links | Nr. Links | Nr. Accesses |
Abstracts | 941,293 | 1,608,726 |
Scanned Articles | 138,785 | 526,872 |
Printable Articles | 40,928 | 254,881 |
(Postscript and PDF) | ||
Electronic Articles | 125,933 | 186,067 |
Citations | 195,192 | 77,316 |
References | 135,474 | 36,969 |
SIMBAD Objects | 110,308 | 23,505 |
On-line Data | 5,728 | 9,799 |
NED Objects | 31,801 | 6,144 |
Mail Order | 247,282 | 3,520 |
Library Entries | 18,746 | 1,645 |
Tables of Contents | 5,792 | 1,233 |
Author Comments | 203 | 313 |
Associated Articles | 2765 | 169 |
Planetary Nebulae Data | 281 | 143 |
The ADS Article Service provides access to full journal articles. The usage statistics should show how astronomy researchers read and use journal articles. In this section we describe a few of the statistics of the article server. More statistics on the usage of the scanned articles are described in OVERVIEW.
Figure 17 shows the number of pages of scanned articles retrieved over the life of the ADS, Fig. 18 shows the number of articles retrieved. The number of articles represents the sum of the selected links to on-line electronic articles, PDF and Postscript articles at the journals, and scanned articles at the ADS.
Figure 17: Number of pages of scanned articles retrieved through the life of the ADS Article Service |
Both the number of pages and the number of articles retrieved is steadily increasing. This is due to both the increased coverage in the ADS of scanned journals and the increase in the number of users that use the system.
Table 10 shows the number of retrievals in the various formats. Postscript is a printer control language developed by Adobe (see [Adobe Postscript 1990]). Postscript Level 1 is the first version of the Postscript language. It generates much larger files than Level 2 Postscript. Some older printers can process only Level 1 Postscript files. PDF (Portable Document Format) is a newer page description format, also developed by Adobe. PCL (Printer Control Language) is a printer control language developed by Hewlett Packard. It is used in low end PC printers. Low resolution is 200 dpi for Postscript and PDF, and 150 dpi for PCL. High resolution is 600 dpi for Postscript and PDF, 300 for PCL.
Article Type | Number of Retrievals | ||
March 99 | March 98 | March 97 | |
Postscript Level 1 | 476 | 557 | 644 |
(Low Resolution) | |||
Postscript Level 2 | 25 664 | 13 031 | 11 189 |
(Low Resolution) | |||
Postscript Level 2 | 10 472 | 8 291 | 6 435 |
(High Resolution) | |||
3 266 | 620 | n/a | |
(Low Resolution) | |||
7 049 | 1008 | n/a | |
(High Resolution) | |||
PCL | 14 | 73 | 72 |
(Low Resolution) | |||
PCL | 53 | 111 | 132 |
(High Resolution) | |||
GIF Thumbnails | 13 777 | 7 378 | n/a |
The majority of retrievals are of medium resolution Postscript files. This is the default setting in the ADS Article Service. The number of Postscript Level 1 articles (compatible with older printers, but much larger file sizes) retrieved is low compared with Level 2 Postscript articles, and slowly declining. The number of PCL articles retrieved is even smaller and also declining. The number of PDF articles retrieved was slowly increasing throughout 1998. It has increased much more rapidly in 1999. In early 1998 less than 15% of the high resolution articles were retrieved as PDF files. This fraction increased to 40% by March, 1999.
Copyright The European Southern Observatory (ESO)