next previous
Up: The NASA Astrophysics Data


Subsections

  
6 Access statistics

The ADS software keeps extensive logs about the use of the search and access software. In this section, usage statistics for the search software and for access patterns to the Article Service are reported. If not otherwise indicated, the statistics in this section are for the one-year period from 1 April 1998 through 31 Mar. 1999.

6.1 Abstract service

The ADS is accessed by users from many different countries. In the one-year period of this section the ADS was accessed by 127 000 different users, using 100 000 different hosts from 112 different countries. An individual user is defined as having a unique cookie (see Sect. 3). Users without cookies are distinguished by the hostnames from which the requests came. This may overestimate the number of users, since some users may have more than one cookie, for instance when accessing the ADS from home. The number of different hosts is a lower limit of the number of users. Many hosts are used by multiple users, so the real number is certainly considerably higher than that. The development of the number of users and queries over the life of the ADS is described in OVERVIEW. This section describes some more detailed investigations of the access statistics.

The total number of users at first comes as quite a surprise. The number of working astronomers in the world is probably between 10 000 and 20 000. The number of ADS users is much larger than that. This is probably due to several factors. First, there are certainly many accidental users. They somehow find our search page through some link, execute a query to see what they get back, and then never come back because it is of no interest to them.

Other possible users are media people. There are certainly many reporters occasionally looking up something in astronomy. I have spoken with several of them that use the ADS occasionally for that.

Another group of users are amateur astronomers. The ADS was described in Sky & Telescope by [Eichhorn (1996a)] a few years ago. this has certainly made amateur astronomers aware of this resource. The number of amateur astronomers world-wide is certainly in the millions, so they comprise a potentially large number of users.

Another large group of users visits the ADS through links from other web sites. One particularly popular one is NASA's Image of the Day, which frequently includes links to abstracts or articles in the ADS. Since this NASA page is visited by millions of people, a large number of them will access the ADS through these links.

The use of the ADS in different countries depends on several factors. One of these is certainly the population of the country. Figure 9 shows the number of queries per capita as a function of the population of the country ([CIA 1999]). There seems to be an upper limit of about 0.1 - 0.2 queries per person per year. The one exception is the Vatican with almost 3 queries per person per year. This is understandable since the Vatican has an active Astronomy program, which generates a large number of queries for a small population.


  \begin{figure}\resizebox{\hsize}{!}{\includegraphics{DS1781F10.eps}}\end{figure} Figure 9: Number of queries per person per year in each country as a function of the population for the country

Another factor for querying the ADS is the funding available for Astronomy in a country, and the available infrastructure to do astronomical research. Figure 10 shows the number of references retrieved per capita as a function of the Gross Domestic Product (GDP, [CIA 1999]) of the country. The symbols are the Internet codes for each country.


  \begin{figure}\resizebox{\hsize}{!}{\includegraphics{DS1781F10.eps}}\end{figure} Figure 10: Number of references retrieved per person as a function of the Gross Domestic Product (GDP) per person for each country

The highly industrialized countries cluster in the upper right part of the plot (area 1). A closeup of this region is shown in Fig. 11. Other clusters are the countries of the former Soviet Union (area 2), and Central and South American countries (area 3). The high number of references retrieved per capita combined with the lower GDP per capita of the former Soviet Union is probably due to a recent decline in GDP, but a still existing infrastructure for astronomical research.


  \begin{figure}\resizebox{\hsize}{!}{\includegraphics{DS1781F11.eps}}\end{figure} Figure 11: Number of references retrieved per person as a function of the Gross Domestic Product (GDP) per person. This figure is a closeup of area 1 in Fig. 10

The ADS is used 24 hours per day. The distribution of queries throughout the day is shown in Fig. 12. This figure shows the number of queries at the two largest mirror sites, as well as the queries at the main ADS site. The usage distribution data are for the time period from 1 November 1998 to 31 March 1999, not the full year, to avoid complications due to different periods where daylight savings time is in effect. The queries at the main site are separated into US users and non-US users on the basis of their Internet hostnames. All the individual curves show a distinct two-peaked basic shape, with additional smaller peaks in some cases. This distribution of queries over the day shows the usage throughout a workday, with a small minimum during lunch hour. The SAO-US distribution does not show a real minimum between the two peaks, presumably because of the distribution of US researchers over three time zones.


  \begin{figure}\resizebox{\hsize}{!}{\includegraphics{DS1781F10.eps}}\end{figure} Figure 12: Number of queries per hour as a function of the time of day for the main SAO site and the mirror sites in France and Japan

There are three features in this figure that deserve special notice. The first is that the shape of the accesses to the ADS mirror in France is the same as the shape of the non-US access to the SAO site. This indicates that the large majority of the non-US use on the SAO site is from European users.

This non-US usage is about 50% higher than the total usage of the ADS mirror site at the CDS in France. The reason for this is most probably the fact that the connectivity within Europe is not yet very good. We know for instance that our users in England and Sweden have better access to the main ADS site in the U.S.A. than to our mirror site in France. The same is true for other parts of Europe.

Another reason for the use of the U.S.A. site by European users is the fact that our European mirror sites do not yet have the complete set of scanned articles on-line. This forces some users to access the main ADS site in order to retrieve scanned articles.

Second, there is a slight peak in the distribution of queries to the NAO mirror in Japan around 21:00 UTC (Universal Time Coordinated, formerly Greenwich Mean Time). This is probably due to US west coast users using the Japanese mirror site instead of the US site. The access to Japan is frequently very fast and response times from Japan may be better than from SAO during peak traffic times.

Third, there is a distinct peak in the SAO-US usage at 9:00 UTC. This feature was so unusual that we tracked down the reason for it. It turns out that one of our users has set up web pages that include about 200 links to ADS abstracts. He had set up a link verifier that every night at 9:00 UTC checked all the links on his pages. This meant that the link verifier executed 200 queries every night at the same time, which showed up in this evaluation of our access statistics.

The following section shows statistics of how our users use the different capabilities of the ADS query system. Figure 13 shows a histogram of the relative usage of the different search fields (authors, objects, title, text). It shows clearly that the majority of queries are queries by author name (66%). Object names are used in fewer than 5% of the queries. The title field is used in about 21% of the queries, and the text field in 26%. Queries that use more than one field make up about 18% of the total. This usage pattern justifies for instance including tables of contents (ToCs) in the database that do not have abstracts for searching. Since a large part of the usage is through author and title queries, such ToC entries will still be found.


  \begin{figure}\resizebox{\hsize}{!}{\includegraphics{DS1781F10.eps}}\end{figure} Figure 13: A histogram of the relative usage of the different search fields (authors, objects, title, text) and the use of multiple fields

Figure 14 shows the number of queries as a function of the number of query items in each input field. The query frequency generally decreases exponentially with increasing number of search terms. For title and text queries, the frequency is approximately constant up to 3 query words, before the frequency starts to decrease. For abstract queries there is a significant increase in frequency of queries with more than 20 query words, for title queries there is a similar increase for queries with more than about 8 query words. This is due to queries generated through the query feedback mechanism which allows the user to use a given abstract and its title as a new query.


  \begin{figure}\resizebox{\hsize}{!}{\includegraphics{DS1781F10.eps}}\end{figure} Figure 14: The number of queries in the period of 1 April 1998 to 31 March 1999 as a function of the number of query items in each input field

Figures 15 and 16 show the usage of non-default query settings (see Sect. 4.2). The default settings were chosen to suffice for most queries. Figure 15 shows the percentage of non-default settings for the different settings and query fields available. It shows that 29% of author queries, 78% of title queries, and 85% of text queries use non-default settings. This was at first disappointing, because it suggested that the default settings might not be a reasonable selection of settings. The two main settings that were non-default were combining words with "AND'' (see Sect. 2.1.1.b.ii), and disabled weighted scoring (see Sect. 2.1.1.c). On closer examination of the statistics it turns out that the straight weighting settings come from mainly two systems, the NASA Techreports and the International Society for Optical Engineering (SPIE).


  \begin{figure}\resizebox{\hsize}{!}{\includegraphics{DS1781F10.eps}}\end{figure} Figure 15: Percentage of non-default settings for the different available settings and query fields


  \begin{figure}\resizebox{\hsize}{!}{\includegraphics{DS1781F10.eps}}\end{figure} Figure 16: Percentage of non-default settings for the different available settings and query fields. This plot excludes the queries from NASA Techreports and SPIE

Both of these systems use our Perl scripts (see Sect. 2.4) to access the ADS database. They do not set our normal default settings during these queries. Figure 16 shows the non-default settings for all queries that did not come from either of these two servers. There is still a small percentage of queries that use straight weighting, probably mostly due to other systems that use our Perl script interface routines.

The one remaining non-default setting that is used frequently is the combination of words with "AND''. We believe that the "OR'' combination as default is more useful since it returns more information. The beginning of the list of returned references is the same, regardless of whether "AND'' or "OR'' combination is selected, since references that match all words are sorted to the beginning of the list. When "OR'' combination is selected, partial matches will be returned after the ones with perfect matches. This is desirable since there may be relevant references that for some reason do not match all query words.

The other selecting mechanism that is available is the filtering of references according to what other information is available for a reference. The usage of the filtering is shown in Table 8. About 10% of the total queries use the filter option. Almost all of these filter by journal or select refereed journals only. The sum of the numbers for required data types adds up to more than the number for "Required data'', since more than one data type can be selected.


 

 
Table 8: Filter requests during the period of 1 April 1998 to 31 March 1999
Filter Type Required Data Type Queries
Total queries   2754405
Non-standard queries   286341
Selected journal   158581
Refereed journals   96270
Non-refereed journals   1616
Data available   6381
Required data    
  Printable Articles 2921
  Scanned Articles 1951
  Electronic Articles 1690
  Abstracts 1382
  Planetary Data System 834
  Planetary Nebulae 667
  Citations 615
  Table of Contents 506
  References 459
  Author Comments 393
  On-line Data 360
  SIMBAD Objects 269
  NED Objects 212
  Library Entries 204
  Mail Order 201
  Associated Articles 83


Table 9 shows the number of links available and the usage pattern of the data links that the ADS provides. The highest usage is access to the abstracts, followed by the links to full text articles, links to citations, and links to on-line electronic articles. Reference links and links to SIMBAD objects are next.


 

 
Table 9: Link types and their accesses during the period of 1 April 1998 to 31 March 1999. Numbers of links available are as of July 1999
Links Nr. Links Nr. Accesses
Abstracts 941,293 1,608,726
Scanned Articles 138,785 526,872
Printable Articles 40,928 254,881
(Postscript and PDF)    
Electronic Articles 125,933 186,067
Citations 195,192 77,316
References 135,474 36,969
SIMBAD Objects 110,308 23,505
On-line Data 5,728 9,799
NED Objects 31,801 6,144
Mail Order 247,282 3,520
Library Entries 18,746 1,645
Tables of Contents 5,792 1,233
Author Comments 203 313
Associated Articles 2765 169
Planetary Nebulae Data 281 143


6.2 Article access statistics

The ADS Article Service provides access to full journal articles. The usage statistics should show how astronomy researchers read and use journal articles. In this section we describe a few of the statistics of the article server. More statistics on the usage of the scanned articles are described in OVERVIEW.

Figure 17 shows the number of pages of scanned articles retrieved over the life of the ADS, Fig. 18 shows the number of articles retrieved. The number of articles represents the sum of the selected links to on-line electronic articles, PDF and Postscript articles at the journals, and scanned articles at the ADS.


  \begin{figure}\resizebox{\hsize}{!}{\includegraphics{DS1781F10.eps}}\end{figure} Figure 17: Number of pages of scanned articles retrieved through the life of the ADS Article Service


  \begin{figure}\resizebox{\hsize}{!}{\includegraphics{DS1781F10.eps}}\end{figure} Figure 18: Number of full text articles retrieved by ADS users. These numbers include the scanned articles at the ADS, as well as articles at the sites of the different journals that were requested through ADS links

Both the number of pages and the number of articles retrieved is steadily increasing. This is due to both the increased coverage in the ADS of scanned journals and the increase in the number of users that use the system.

Table 10 shows the number of retrievals in the various formats. Postscript is a printer control language developed by Adobe (see [Adobe Postscript 1990]). Postscript Level 1 is the first version of the Postscript language. It generates much larger files than Level 2 Postscript. Some older printers can process only Level 1 Postscript files. PDF (Portable Document Format) is a newer page description format, also developed by Adobe. PCL (Printer Control Language) is a printer control language developed by Hewlett Packard. It is used in low end PC printers. Low resolution is 200 dpi for Postscript and PDF, and 150 dpi for PCL. High resolution is 600 dpi for Postscript and PDF, 300 for PCL.


 

 
Table 10: Article retrieval by format type for March 1999, March 1998, and March 1997. PDF format and GIF Thumbnails were not yet available in March 1997
Article Type Number of Retrievals
  March 99 March 98 March 97
Postscript Level 1 476 557 644
(Low Resolution)      
Postscript Level 2 25 664 13 031 11 189
(Low Resolution)      
Postscript Level 2 10 472 8 291 6 435
(High Resolution)      
PDF 3 266 620 n/a
(Low Resolution)      
PDF 7 049 1008 n/a
(High Resolution)      
PCL 14 73 72
(Low Resolution)      
PCL 53 111 132
(High Resolution)      
GIF Thumbnails 13 777 7 378 n/a


The majority of retrievals are of medium resolution Postscript files. This is the default setting in the ADS Article Service. The number of Postscript Level 1 articles (compatible with older printers, but much larger file sizes) retrieved is low compared with Level 2 Postscript articles, and slowly declining. The number of PCL articles retrieved is even smaller and also declining. The number of PDF articles retrieved was slowly increasing throughout 1998. It has increased much more rapidly in 1999. In early 1998 less than 15% of the high resolution articles were retrieved as PDF files. This fraction increased to 40% by March, 1999.


next previous
Up: The NASA Astrophysics Data

Copyright The European Southern Observatory (ESO)