6 How the astronomical literature is used

Electronic libraries, because they provide access to the literature on an article basis, can provide direct measures of the use of individual articles. Direct bibliometric studies of article use are rare, and tend to be based on small samples (e.g. Tsay 1998); most bibliometric studies use indirect measures, particularly citation histories, (e.g. Garfield 1979; White & McCain 1989; Line 1993), as proxies for use.

Astronomy is perhaps unique, in that it already has an integrated electronic information resource (ADS/Urania) which includes electronic access to nearly all the modern journal literature, and which is used by a large fraction of practitioners in the field, worldwide. The combined Urania logs, including the electronic journals and the ADS, probably represent a fair sample of total readership in the field, perhaps even a majority of the readership as well.

In this section we will investigate the use of the astronomy literature as shown by the ADS logs; for articles more than a few months past the publication date they probably represent accurately the use of the astronomy literature. For articles immediately after publication the logs of the electronic journals are the definitive source; this usage pattern is substantially different from the pattern shown in the ADS logs, for example, the half-life for article reads for the electronic Astrophysical Journal is measured in days (E. Owens 1997, personal communication).

6.1 Readership as a function of age

The ADS logs provide a direct measure on the readership of individual articles. There are several different ADS logs, here we will use the "data'' log. Entries in the data log correspond to individual data items selected from a list which is returned following a query, such as shown in Fig. 2. Each entry is the result of a user, who can see the authors and title of a paper, choosing to get more information. 61% of these requests are for the abstract, 34% are for the whole text, 2% are for the citation histories, as well as several other options; SEARCH lists all the options and their use. In what follows we will refer to any request for data as a "read.'' By "age'' we refer to the time since publication of an article, NOT the time since birth of the astronomer reading the article!

In this subsection we restrict the study to the January 1999 log, and only requests for information about articles published in the largest (in terms of ADS use) eight journals (ApJ, ApJL, ApJS, A&A, A&AS, MNRAS, AJ, PASP; hereafter the Big8). The Big8 represent 62% of the 270 000 entries in the January data log.

Figure 10 shows the number of ADS reads (solid line, left abscissa) during January 1999 for articles published in the Big8 from 1976 to 1998, and the number of Big8 articles for which at least one data item was requested (dotted line, right abscissa), on a log-linear plot, binned yearly. The ADS database is 100% complete in titles, and in links to the full text of articles (either to the ADS scans, or directly to the electronic journals), and is 99% complete in article abstracts for the Big8 journal articles published during this 22 year period.

$\begin{figure} \resizebox{\hsize}{!}{\includegraphics{DS1780F10.eps}}\end{figure}$

Figure 10: The use of journal articles via the ADS as a function of age. The ordinate is the publication year. The solid line (left abscissa) shows the total number of reads, the dotted line (right abscissa) shows the total number of different articles for which data was requested

The number of papers published in the Big8 has been increasing at about 4% per year during this 22 year period (Schulman et al. 1997; Abt 1998; Fig. 11), Fig. 12 shows the information in figure 10 divided by the number of papers published. The top line shows the mean number of reads per paper, and the bottom line shows the fraction (maximum 1) of papers published for which information was requested.

$\begin{figure} \resizebox{\hsize}{!}{\includegraphics{DS1780F11.eps}}\end{figure}$

Figure 11: The number of Big8 journal articles published per year. The dotted line represents a 3.7% yearly increase

From 1976 to about 1994 the two lines are nearly parallel; this demonstrates that the change in readership with age is caused mainly by a change in the fraction of papers which are considered interesting enough to be read, not by a change in the number of times an interesting paper is read. Extrapolating the relation seen in the earliest 16 years of Fig. 12 we find that the fraction of articles interesting enough to be read is $I = I_0{\rm e}^{-0.075T}$ , where T is the age of the article in years, and I₀ is about 0.7. Similarly readership declines as $\sim {\rm e}^{-0.09T}$ , so the mean number of reads per relevant article is $M = M_0{\rm e}^{-.015T}$ , with M₀equal to 2.5 reads per month. For articles between 4 and 22 years old the readership pattern is well fit by R = IM.

For articles younger than 4 years old the extrapolation of the R = IM model substantially underestimates readership. While the fraction of read papers is only about 20% higher than the extrapolation (it could not be more than 30% after which all papers would be read), the mean reads per paper is 350% higher.

We postulate that there is another mode of readership, which dominates for articles between one month and four years old, we will call this "papers current enough to be read.'' If we subtract the R = IMmodel from the data we get the residual of papers current enough to be read. This can be well represented by $C = C_0{\rm e}^{-0.85T}$ , where C₀is equal to 5 reads per month. Now we have a two component model for readership (per article published), valid for papers between one month and 22 years old which is R = IM + C.

Figure 13 shows how well the model fits the actual readership data for January 1999. The solid line shows the difference between the log of the reads per paper published and the log of the model; the dotted lines show the $1 \sigma$ errors, estimated using $\sqrt {N}$ . Clearly the model fits the data well.

$\begin{figure} \resizebox{\hsize}{!}{\includegraphics{DS1780F12.eps}}\end{figure}$

Figure 12: The use of journal articles via ADS as a function of age. The ordinate is the publication year. The upper line shows the mean number of reads per paper, the lower line shows the fraction of different articles for which data was requested

While the R = IM + C model accounts for the vast majority of ADS use, there are at least two other modes of readership, which we will call "historical,'' and "new''. The historical mode describes the use of very old articles, and the new mode describes the readership of the current issue of a journal.

$\begin{figure} \resizebox{\hsize}{!}{\includegraphics{DS1780F13.eps}}\vspace*{-2mm} \end{figure}$

Figure 13: Accuracy of the R = IM + C model, versus publication date. The abscissa is the difference between the log of the number of reads per article published using ADS during January 1999 and the log of the readership model described in the text. The dotted lines show $1 \sigma$ errors using $\sqrt {N}$

The ADS in January 1999 had only one journal which is complete to an early enough time to measure the historical mode, the Astronomical Journal, which is complete from volume 1 in 1849. The data currently available (shown in Fig. 14) suggest a constant low level use, independent of time, H = H₀, where H₀ is 0.025 reads per month. With the database now being extended to include much of the literature of the past two centuries this parameterization should improve greatly in the next couple of years.

The new mode represents the readership of the latest issue of a journal. As soon as a journal is issued, either received in the mail, or posted electronically, a large number of astronomers scan the table of contents and read the articles of interest. Although ADS has a feature in the Table of Contents page which supports this type of readership, it does not represent a substantial fraction of ADS use. We believe most users do this either with the paper copy, or through the electronic journals directly. We can crudely estimate this mode in the ADS use by examining the daily usage logs following the release of new issues of the Astrophysical Journal, After subtracting the other modes already described we find $N = N_0{\rm e}^{-16T}$ , where N₀ is about 3.5 reads per month. For an accurate description of this mode one would need to analyze the logs of the electronic journals.

Finally we have a four component model for how the astronomical literature is read, as a function of the age of an article, R = N + C + IM + H, where the first three terms are exponentials with very different time constants, and the fourth is a low level constant. ADS use certainly underestimates the amplitude of the N term, and may underestimate the amplitude of the C term, as there are alternative electronic routes to some of these data.

6.2 Comparison of readership with citation history

Citation histories have long been used to study the long-term readership of scientific papers (e.g. Burton & Kebler 1960) with the basic result that the number of citations that a paper receives declines exponentially with the age of the article. While it is often assumed that the pattern of use is similar to the pattern of citation this has not been conclusively demonstrated. Recently has found that the mean use half-life for a set of medical journals was 3.4 years, while the mean citation half-life for the same journals was 6.3 years.

$\begin{figure} \resizebox{\hsize}{!}{\includegraphics{DS1780F22.eps}}\vspace*{-2mm} \end{figure}$

Figure 14: Readership of the Astronomical Journal. Total number of reads of AJ articles using ADS during January and February 1999, as a function of publication year

We will compare the use of some of the Big8 journals with their citation histories using two datasets: the ADS data logs for the period from 1 May 1998 to 31 July 1998, and the citation information provided to ADS by the Institute for Scientific Information covering references in articles published during the first nine months of 1998, and only covering references from 1981 to date. ISI does not provide us with the full citation histories, rather they provide us with pairs of citing and cited journal articles where both are in the ADS database, so the results will systematically underrepresent the citation histories of articles with substantial influence in areas outside astronomy, or where the primary references come from conference proceedings.

Figure 15 compares the citation histories of the Big8 journals with their readership; the abscissa refers to the citation information (dotted lines), the readership data (solid lines) have been arbitrarily shifted for comparison. The lower dotted line represents the fraction of Big8 journal articles which were cited during the first nine months of 1998; the upper dotted line represents the mean number of cites per article. The lower solid line shows the mean number of reads per article during the three month period May-July 1998, shifted by a factor of 19; the upper solid line shows the fraction of Big8 articles read, times 1.8.

$\begin{figure} \resizebox{\hsize}{!}{\includegraphics{DS1780F14.eps}}\vspace*{-3mm} \end{figure}$

Figure 15: The Big8 citation rates as a function of publication date compared with the readership rates. The dotted lines, and the abscissa refer to the citation information. The top dotted line represents the number of citations per article for citations in papers published during the first nine months of 1998. The bottom dotted line represents the fraction of articles which were cited during this period. The bottom solid line shows the number of reads per article for the three month period May-July 1998, and the top solid line shows the fraction of articles read. Both solid lines are arbitrarily shifted to show the similarity of the functional shapes

The number of cites has the same functional form as the fraction of reads, And the fraction of cites has the same form as the number of reads. This result is perhaps surprising.

Except for the most recent year (1997), where the number of cites declined from the year before the number of cites per article declines with age as $\sim {\rm e}^{-0.09T}$ or proportional to IM, the long term declining readership. The citation half-life for these articles, 7.7 years, is longer than the 4.9 years found by for the Physical Review, but is consistent with results of Abt (1981, 1996) of 20-30 year half-lives with no normalization, once one takes the increase in the number of astronomy papers/cites into account (Abt 1981, 1995).

The fraction of articles cited, on the other hand, appears to follow the same two component form as readership, R = IM + C. We postulate the following explanation for this behavior. The degree of citability we define as the degree to which a paper would be cited, were it possible. We postulate this is directly proportional to readership: D = D₀R. The large increase in the fraction of recent papers cited is thus due to the large increase in readership. We define the ability of a paper to be cited to be a steeply increasing function of age, simply because for one paper to cite another it must appear before the second paper is written, refereed, and published: $A = 1 - {\rm e}^{-1.5T}$ . Our model for the mean number of citations a paper receives, Z, as a function of age is: Z = Z₀AD or Z = Z₀AD₀R.

Figure 16 shows the number of citations per paper as a function of age (thick solid line), the Z = Z₀AD₀R model using the actual number of reads per paper for R (thin solid line), and the Z = Z₀AD₀R model using the R = IM + C model for R (dotted line). The product of the constants Z₀D₀ is the number of citations per read, currently this is about 0.08.

$\begin{figure}\resizebox{\hsize}{!}{\includegraphics{DS1780F15.eps}}\end{figure}$

Figure 16: The number of citations per article versus the $Z = 0.08R(1 - {\rm e}^{-1.5T})$ model. The thick solid line represents the number of citations per article from papers published in the first nine months of 1998, as a function of publication date. The thin solid line represents the model, where R is the actual readership data; the dotted line represents the model, where R is the R = IM + C model

The papers which are frequently cited tend also to be frequently read, although the correlation is not very strong. We rank the papers by number of cites/reads during the 1998 periods, and perform a Spearman rank correlation between the 26988 different Big8 papers cited and the 53755 papers read (57340 total), we obtain $r_{\rm Spearman} = 0.35$ . This underestimates the correlation because it excludes papers which were neither cited nor read.

Of the 66392 Big8 papers published between 1982 and 1997 81% were read in the 3 month period using ADS, while 41% were cited during the 9 month period. The probability that a paper was not read declined sharply with the number of times it was cited. Figure 17 shows this; one paper each of the (324, 224, 126) papers which were cited (7, 8, 9) times went unread during the period; none of the 430 papers which were cited 10 or more times went unread.

$\begin{figure}\resizebox{\hsize}{!}{\includegraphics{DS1780F16.eps}}\end{figure}$

Figure 17: Fraction of Big8 papers unread during a 3 month period in 1998, as a function of the number of times the papers were cited during a nine month period in 1998

The relations between the number of cites or reads of a paper and the rank that paper has when ranked by number of cites/reads are identical. If one takes papers published in a single year both cites and reads follow a power law $n \sim r^{-\alpha}$ (n is the number of reads or cites, and r is the rank of the paper with that many reads/cites), where $\alpha$ is ${1}\over{2}$ , this is the same result found for citation histories for the physics literature. If papers from all years are taken together and ranked the power law index flattens identically for both cites and reads to $\alpha = {{1}\over{3}}$ .

6.3 How the journals are used

6.3.1 The main journals

Figure 18 shows the fraction of articles published in the Big8 by each of the five main journals, leaving out the letters and supplements. We show the data only for articles published from 1983 to 1995. Before 1983 the data from ISI are less complete, and after 1995 the presence of the electronic journals, and the differing rules for the distribution of the ADS bitmaps, make the meaning of a "read'' differ from journal to journal. The reads and cites data for Figs. 18, 19, and 20 comes from the same 1998 reporting periods described above.

$\begin{figure}\resizebox{\hsize}{!}{\includegraphics{DS1780F17.eps}}\end{figure}$

Figure 18: Fraction of Big8 papers published by five selected journals. The top line (thick. solid) is ApJ, below that (dotted) is A&A, in the middle (dashed) is MNRAS, second from the bottom (thin, solid) is AJ, and the lowest line (thick, dotted) represents PASP

$\begin{figure}\resizebox{\hsize}{!}{\includegraphics{DS1780F18.eps}}\end{figure}$

Figure 19: Readership rates for five journals. Linetypes are as in Fig. 18. The lines represent the ratio of the fraction of reads of articles in a given journal to the fraction of articles that journal published. Note that the large spike for PASP in 1987 is due to a single very well read paper Stetson (1987) combined with fluctuations in the number of conference proceeding abstracts published in the journal

$\begin{figure}\resizebox{\hsize}{!}{\includegraphics{DS1780F19.eps}}\end{figure}$

Figure 20: Citation rates for five journals. Linetypes are as in Fig. 18. The lines represent the ratio of the fraction of cites of articles in a given journal to the fraction of articles read in journal. Note that the large spike for PASP in 1987 is again due to a single very well cited paper Stetson (1987)

Figure 19 shows the relative readership of papers as a function of journal and publication year. The abscissa is the ratio of the fraction of Big8 papers read and the fraction of Big8 papers published. Were all papers read equally frequently, independent on the journal in which they were published, Fig. 19 would show five straight lines at one; it does not. The papers from the AJ are read more on a per article basis than the other journals; the papers from A&A are read less. Recent PASP papers are read substantially more frequently than older ones, when compared with the readership patterns of the other journals.

Figure 20 shows the ratio of the fraction of citations an article received to the fraction of reads, as a function of journal and year. Were all articles cited in the same proportion to the number of times they were read (this is the constant Z₀D₀ in Sect. 6.2) then the figure would be five straight lines at one. The three bi- and tri-monthly journals do not show much deviation from straight lines at one, while the AJ appears to be systematically less cited than it is read. The PASP again shows an increase during the beginning of this decade.

Recall that the readership and citation information are from hundreds of thousands of individual decisions made by more than 10 000 astronomers during 1998. Taken together Figs. 18, 19, and 20 show the current opinion of astronomers as to the usefulness of articles as a function of journal and publication date. The growth of the AJ for example, from 6.5% of Big8 articles to 9.5% has not greatly affected the relative readership or citation rates for the journal.

The recent history of the PASP is perhaps the most interesting feature in Figs. 18, 19, and 20. From 1983 to 1995 the fraction of Big8 papers published by PASP declined from 6% to 3%. This decline is overstated, as PASP published some conference proceeding abstracts during the late 1980s, a practice which ended in 1991; the decline is nevertheless real: PASP published the same number of papers in 1995 as 19 years before, during which time the number of Big8 journal articles doubled.

Figure 19 shows two main features, fluctuations, and a slow rise. The large fluctuations during the late 80s and early 90s are due to two factors: fluctuations in the number of conference proceeding papers and abstracts; and the influence of, which was read at twice the rate of the next most read paper from 1997, and four times the next most read PASP paper from that year. The rise in the readership measure during the 1990s is not caused by any known systematic; we believe it represents a real increase in the perceived usefulness of the journal.

Figure 20 also shows the influence of, currently the third most cited article in the ADS database, although now without the addition of the fluctuations in article counts. It also shows the rise in the perceived usefulness per article (this time in the measure of cites per read). Noting that the number of cites per article is the product of Figs. 19 and 20 the rise in the number of cites per article, compared with the Big8 over the period 1989 to 1995 is a factor of three, so that now the journal is at full parity with the Big8. This demonstrates that the policy during this period was one of quality rather than quantity, a policy we dub "shaken, not stirred''.

6.3.2 Loss of relative currency

All Big8 astronomical journals lose currency, the current usefulness of an article, at a rate described by the readership and citation models of 6.1 and 6.2. Any changes in the loss of currency of one journal with respect to the rest of the Big8 should be seen in Fig. 19 in the form of a relative decrease in readership, as a function of age. Indeed the changes in the PASP which we have attributed to changes in editorial policy could simply be a substantial loss of relative currency.

One of the Big8 journals, the Astrophysical Journal Letters is intended to lose currency more rapidly than the other journals. Figure 21 shows the relative fraction of articles published (thin solid), articles read (thick solid), and articles cited (dotted) for the ApJL from 1981 to 1997. Except for the period from 1994 to 1997 the curves track each other reasonably well; older ApJL papers are not cited or read any more or less than the Big8 average. For the more recent papers the cites and reads increase above the fraction published, implying that the journal is in some sense more current than average.

$\begin{figure}\resizebox{\hsize}{!}{\includegraphics{DS1780F20.eps}}\end{figure}$

Figure 21: Use of the Astrophysical Journal Letters from 1981 to 1997. The thin solid line shows the fraction of Big8 papers published in ApJL, the thick solid line the fraction of reads, and the dotted line the fraction of cites

In terms of readership this effect is strongly affected by a systematic. During the 3 month period in 1998, most of the 1996 and all of the 1997 issues of MNRAS were not available electronically due to copyright constraints. This dramatically lowered the relative readership of that journal, pushing all the others up. Also all five journals which were fully electronic during 1997 show increases compared with AJ and PASP which were only available as bitmaps. Thus the increase in readership of the ApJL, the pioneer electronic journal (Boyce 1995), could be due to its superior delivery system, rather than its content.

6.3.3 Local differences in readership rates

Astronomers in different parts of the world read different journals at different rates than the average. Figure 22 shows three typical differences. The three curves show the ratio of readership fractions for a particular subset when compared with the rest of the world; a value of 1 means that there is no difference in relative readership. The thin solid line shows the MNRAS readership ratio for users who access the US site and have IP addresses ending in .uk; it shows that the British read Monthly Notices about 60% more than the world average.

$\begin{figure}\resizebox{\hsize}{!}{\includegraphics{DS1780F21.eps}}\end{figure}$

Figure 22: Local differences in readership rates for three journals. The thin solid line shows the increased use of MNRAS in the UK compared with the rest of the world; the dotted line shows this for A&A in Europe, and the thick solid line for AJ in the US

The dotted line shows the A&A readership ratio for users of the Strasbourg mirror, and the thick solid line shows the AJ readership ratio for US users with an IP address ending in .edu. They show that Europeans/Americans read A&A/AJ about 20% more than the rest of the world. The ApJ also shows about a 20% increase in the US; the PASJ shows a 300% increase in Japan.

6.3.4 Use of historical literature

The ADS is in the process of putting a large fraction of the astronomical literature of the past two centuries on-line via bitmapped scans. The first nineteenth century journal to be fully on-line is the Astronomical Journal, which was first fully on-line on 1 January 1999. Figure 14 shows the raw readership figures for the first two months of 1999 (US logs only), this shows the current readership of 150 years of the journal.

Clearly the back issues are being read; the only year where the journal was published, but no paper was read in the two months, was 1909, where only 12 papers were published. Also there is a break in the exponential falloff with age for articles published between 1950 and 1960, where approximately twice the expected readership occurred. During this period 94 different users read 283 articles; the biggest user made 13 reads. We have no explanation for this increased use. The only other period where the use is not predicted by the C + IM + H model of Sect. 6.1 is the first decade of the journal's existence, perhaps due to curiosity.