Version 1.0 of the XML DTD describing text files in the ADS Abstract Service.
Document Type Definition for the ADS bibliographic records Syntax policy ============= - The element names are in uppercase in order to help the reading. - The attribute names are preferably in lowercase - The attribute values are allowed to be of type CDATA to allow more flexibility for additional values; however, attributes typically may only assume one of a well- defined set of values - Cross-referencing among elements such as AU, AF, and EM is accomplished through the use of attributes of type IDREFS (for AU) and ID (for AF and EM) <!-- BIBRECORD is the root element of the XML document. Attributes are: origin mnemonic indicating individual(s) or institution(s) who submitted the record to ADS lang language in which the contents of this record are expressed the possible values are language tags as defined in RFC 1766. Examples: lang="fr", lang="en" --> <!ELEMENT BIBRECORD ( METADATA?, TITLE?, AUTHORS?, AFFILIATIONS?, EMAILS?, FOOTNOTES?, BIBCODE, MSTRING, MONOGRAPH?, SERIES?, PAGE?, LPAGE?, COPYRIGHT?, PUBDATE, CATEGORIES*, COMMENTS*, ANOTE?, BIBTYPE?, IDENTIFIERS?, ORIGINS, OBJECTS*, KEYWORDS*, ABSTRACT* ) > <!ATTLIST BIBRECORD origin CDATA #REQUIRED lang CDATA #IMPLIED > <!-- Generic metadata about the ADS record (rather than the publication) --> <!ELEMENT METADATA ( VERSION, CREATOR, CDATE, EDATE ) > <!-- Versioning is introduced to allow parsers to detect and reject any documents not complying with the supported DTD --> <!ELEMENT VERSION ( #PCDATA ) > <!-- CREATOR is purely informative --> <!ELEMENT CREATOR ( #PCDATA ) > <!-- Creation date for the record --> <!ELEMENT CDATE ( YYYY-MM-DD ) > <!-- Last modified date --> <!ELEMENT EDATE ( YYYY-MM-DD ) > <!-- Title of the publication --> <!ELEMENT TITLE ( #PCDATA ) > <!ATTLIST TITLE lang CDATA #IMPLIED > <!-- AUTHORS contains only AU subelements, each one of them corresponding to a single author name --> <!ELEMENT AUTHORS ( AU+ ) > <!-- AU contains at least the person's last name (LNAME), and possibly the first and middle name(s) (or just the initials) which would be stored in element FNAME. PREF and SUFF represent the salutation and suffix for the name. SUFF typically is one of: Jr., Sr., II, III, IV. PREF is rarely used but is here for completeness. Typically we would store salutations such as "Rev." (for "Reverend"), or "Prof." (for "Professor") in this element. --> <!ELEMENT AU ( PREF?, FNAME?, LNAME, SUFF? ) > <!-- The attributes AF and EM are used to cross- reference author affiliations and email addresses with the individual author records. This is the only exception of attributes in upper case. The typical use of this is: <AU AF="AF_1 AF_2" EM="EM_3">...</AU> --> <!ATTLIST AU AF IDREFS #IMPLIED EM IDREFS #IMPLIED FN IDREFS #IMPLIED > <!-- AU subelements --> <!ELEMENT PREF ( #PCDATA ) > <!ELEMENT FNAME ( #PCDATA ) > <!ELEMENT LNAME ( #PCDATA ) > <!ELEMENT SUFF ( #PCDATA ) > <!-- AFFILIATIONS is the wrapper element for the individual affiliation records, each represented as an AF element --> <!ELEMENT AFFILIATIONS ( AF+ ) > <!ELEMENT AF ( #PCDATA ) > <!-- the value of the ident attribute should match one of the values assumed by the AF attribute in an AU element --> <!ATTLIST AF ident ID #REQUIRED > <!ELEMENT EMAILS ( EM+ ) > <!ELEMENT EM ( #PCDATA ) > <!-- the value of the ident attribute should match one of the values assumed by the EM attribute in an AU element --> <!ATTLIST EM ident ID #REQUIRED > <!-- FOOTNOTES and FN subelements are here for future use --> <!ELEMENT FOOTNOTES ( FN+ ) > <!ELEMENT FN ( #PCDATA ) > <!ATTLIST FN ident ID #REQUIRED > <!-- BIBCODE; for a definition, see: http://adsdoc.harvard.edu/abs_doc/bib_help.html http://adsabs.harvard.edu/cgi-bin/ nph-bib_query?1995ioda.book..259S http://adsabs.harvard.edu/cgi-bin/ nph-bib_query?1995VA.....39R.272S This identifier logically belongs to the IDENTS element, but since it is the identifier used internally in the system, it is important to have it in a prominent and easy to reach place. --> <!ELEMENT BIBCODE ( #PCDATA ) > <!-- MSTRING is the unformatted string for the monograph (article, book, whatever). Example: <MSTRING>The Astrophysical Journal, Vol. 526, n. 2, pp. L89-L92</MSTRING> --> <!ELEMENT MSTRING ( #PCDATA ) > <!-- MONOGRAPH is a structured record containing the fielded information about the monograph where the bibliographic entry appeared. Typically this is created by parsing the text in the MSTRING element. Example: <MTITLE>The Astrophysical Journal</MTITLE> <VOLUME>526</VOLUME> <ISSUE>2</ISSUE> <PUBLISHER>University of Chicago Press </PUBLISHER> --> <!ELEMENT MONOGRAPH ( MTITLE, VOLUME?, ISSUE?, MNOTE?, EDITORS?, EDITION?, PUBLISHER?, LOCATION?, MID* ) > <!-- Monograph title (e.g. "Astrophysical Journal") --> <!ELEMENT MTITLE ( #PCDATA ) > <!ELEMENT VOLUME ( #PCDATA ) > <!ATTLIST VOLUME type NMTOKEN #IMPLIED > <!ELEMENT ISSUE ( #PCDATA ) > <!-- A note about the monograph as supplied by the publisher or editor --> <!ELEMENT MNOTE ( #PCDATA ) > <!-- List of editor names as extracted from MSTRING. Formatting is as for AUTHORS and AU elements --> <!ELEMENT EDITORS ( ED+ ) > <!ELEMENT ED ( PREF?, FNAME?, LNAME, SUFF? ) > <!-- Edition of publication --> <!ELEMENT EDITION ( #PCDATA ) > <!-- Name of publisher --> <!ELEMENT PUBLISHER ( #PCDATA ) > <!-- Place of publication --> <!ELEMENT LOCATION ( #PCDATA ) > <!-- MID represents the monograph identification as supplied by the publisher. This may be useful in correlating our record with the publisher's online offerings. The "system" attribute characterizes the system used to express the identifier --> <!ELEMENT MID ( #PCDATA ) > <!ATTLIST MID type NMTOKEN #IMPLIED > <!-- If the bibliographic entry appeared in a series, then the element SERIES contains information about the series itself. Typically this consists of data about a conference series (e.g. ASP Conference Series). Note that there may be several SERIES elements, since some publications belong to "subseries" within a series. --> <!ELEMENT SERIES ( SERTITLE, SERVOL?, SEREDITORS?, SERBIBCODE? ) > <!-- Title, volume, and editors of conference series --> <!ELEMENT SERTITLE ( #PCDATA ) > <!ELEMENT SERVOL ( #PCDATA ) > <!ELEMENT SEREDITORS ( ED+ ) > <!-- Serial bibcode for publication (may coincide with main bibcode) --> <!ELEMENT SERBIBCODE ( #PCDATA ) > <!-- PAGE may have the attribute type set to "s" for (sequential) the value associated to it does not represent a printed volume number --> <!ELEMENT PAGE ( #PCDATA ) > <!ATTLIST PAGE type NMTOKEN #IMPLIED > <!-- LPAGE gives the last page number (if known). Does not make sense if PAGE is type="s" --> <!ELEMENT LPAGE ( #PCDATA ) > <!-- COPYRIGHT is just an unformatted string containing copyright information from publisher --> <!ELEMENT COPYRIGHT ( #PCDATA ) > <!ELEMENT PUBDATE ( YEAR, MONTH? ) > <!ELEMENT MONTH ( #PCDATA ) > <!ELEMENT YEAR ( #PCDATA ) > <!-- CATEGORIES contain subelements indicating in which subject categories the publication was assigned. STI/RECON has always assigned a category for each entry in their system, but otherwise there is little else in our database. The attributes origin and system are used to keep track of the different classifications used. --> <!ELEMENT CATEGORIES ( CA+ ) > <!ATTLIST CATEGORIES origin NMTOKEN #IMPLIED system NMTOKEN #IMPLIED > <!ELEMENT CA ( #PCDATA ) > <!-- Typically private fields supplied by the data source. For instance, SIMBAD and LOC provide comments about a bibliographic entries --> <!ELEMENT COMMENTS ( CO+ ) > <!ATTLIST COMMENTS lang CDATA #IMPLIED origin NMTOKEN #IMPLIED > <!ELEMENT CO ( #PCDATA ) > <!-- Author note --> <!ELEMENT ANOTE ( #PCDATA ) > <!-- BIBTYPE describes what type of publication this entry corresponds to. This is currently limited to the following tokens (taken straight from the BibTeX classification): article book booklet inbook incollection inproceedings manual masterthesis misc phdthesis proceedings techreport unpublished --> <!ELEMENT BIBTYPE ( #PCDATA ) > <!-- List of all known identifiers for this publication --> <!ELEMENT IDENTIFIERS ( ID+ ) > <!-- Contents of an ID element is the identifier used by a particular publisher or institution. Examples: <ID origin="UCP" system="PUBID">38426</ID> <ID origin="STI" system="ACCNO">A90-12345</ID> --> <!ELEMENT ID ( #PCDATA ) > <!ATTLIST ID origin NMTOKEN #IMPLIED type NMTOKEN #REQUIRED > <!-- the collective list of institutions that have given us a record about this entry. --> <!ELEMENT ORIGINS ( OR+ ) > <!ELEMENT OR ( #PCDATA ) > <!-- The list of objects associated with the publication --> <!ELEMENT OBJECTS ( OB+ ) > <!ELEMENT OB ( #PCDATA ) > <!-- Keywords assigned to the publication --> <!ELEMENT KEYWORDS ( KW+ ) > <!ATTLIST KEYWORDS Lang CDATA #IMPLIED origin NMTOKEN #IMPLIED system NMTOKEN #REQUIRED > <!ELEMENT KW ( #PCDATA ) > <!-- An abstract of the publication. This is typically provided to us by the publisher, but may in some cases come from other sources (E.g. STI, which keyed abstracts in most cases). Therefore we allow several ABSTRACT elements within each record, each with a separate origin or language. The attribute type is used to keep track of how the abstract data was generated. For instance, abstract text generated by our OCR software will have: origin="ADS" type="OCR" lang="en" --> <!ELEMENT ABSTRACT ( P+ ) > <!ATTLIST ABSTRACT origin NMTOKEN #IMPLIED > type NMTOKEN #IMPLIED > lang CDATA #IMPLIED > <!-- Abstracts are composed of separate paragraphs which have mixed contents as listed below. All the subelements listed below have the familiar HTML meaning and are used to render the abstract text in a decent way --> <!ELEMENT P (#PCDATA |A| BR | PRE | SUP | SUB)* > <!-- Line breaks (BR) and preformatted text (PRE) make it possible to display tables and other preformatted text. --> <!ELEMENT BR EMPTY > <!ELEMENT PRE (#PCDATA | A | BR | SUP | SUB )* > <!-- A is the familiar anchor element. --> <!ELEMENT A ( #PCDATA | BR | SUP | SUB )* > <!ATTLIST A HREF CDATA #REQUIRED > <!-- SUP and SUB are superscripts and subscripts. In our content model, they are allowed to contain additional SUP and SUB elements, although we may decide to restrict them to PCDATA at some point --> <!ELEMENT SUP ( #PCDATA | A | BR | SUP | SUB )* > <!ELEMENT SUB ( #PCDATA | A | BR | SUP | SUB )* >
Copyright The European Southern Observatory (ESO)