American Journal of Primatology 29126143 (1993) TECHNICAL NOTE Basic Data Standards for Primate Colonies BENNETT DYKE Department of Genetics, Southwest Foundation for Biomedical Research, San Antonio, Texas A set of simple standards has been developed to ease the task of transferring primate colony data between institutions and computers, and to assure that colony vital statistics are appropriate for demographic and genetic analyses. The standards have been designed to be easily and inexpensively implemented from existing databases, and also may serve as a guide to the setup of new colony record-keeping systems. o 1993 Wiley-Liss, Inc. Key words: vital statistics, record-keeping, computers INTRODUCTION The recommendations for data standardization reported here arose initially from discussions of problems encountered in collecting vital statistics data for demographic studies of several different primate colonies. Most colonies keep similar information, but differences in format, coding, and accessibility can present serious obstacles to data sharing. These problems are not so apparent at the individual colony level, although the lack of standardization tends to be disconcerting to managers who are starting or computerizing their records for the first time. Data standardization does become a significant issue, however, when it is necessary to combine vital statistics and pedigree data across colonies. As ever greater proportions of the nation’s research animals are supplied by breeding colonies, studies which attempt to compare colonies a t the population level, or which require data aggregated across colonies for species-specific demographic and genetic analyses, are becoming increasingly important. Objectives in developing data standards were: 1. To specify the minimum amount of data required to construct pedigrees and perform detailed demographic analysis, 2. To make it as easy and cost-effective as possible for colony data managers to assemble information already on hand into a consistent format without programming, 3. To give software developers reasonable assurances that the data required for their analyses would always be present, and would be in a form that requires little or no preparation before use, and 4. To facilitate data transfer, and also to provide a set of guidelines for those who wish to set up new colony record-keeping systems that will conform in a general way to established databases. Received for publication January 12, 1992; revision accepted September 26, 1992. Address reprint requests to Bennett Dyke, Dept. of Genetics, Southwest Foundation for Biomedical Research, P.O.Box 28147, San Antonio, TX 78228. 0 1993 Wiley-Liss, Inc. 126 I Dyke The intent was not to persuade managers of successful database systems to abandon their own practices, but rather to organize a set of simple conventions that would provide common grounds for data exchange and analysis. In fact, the work reported here is the result of the active collaborative effort of a number of individuals (listed in the “Acknowledgments”)who have had years of experience with established colony databases, and of the willingness on the part of a number of colony managers to share their vital statistics data and to experiment with several versions of the standard as it has evolved. Standards work effectively only when they are agreed upon and shared by a substantial majority of users. Nonetheless, it should be understood that adoption of the data standards presented here is intended to be entirely voluntary. We hope, however, that they will be recognized as useful, simple, and convenient enough to warrant use by those who need to transfer data between databases or computers, and we hope that they can serve as a guide in the planning of new databases. Centralization of data (for example, a national registry of captive primates) requires adoption of a common format. However, the reverse is not true: A common format does not require centralization. As is the case with voluntary adoption of standards, the responsibility for ownership and protection of data should remain in the hands of the individual colonies, and any arrangements for sharing data must be worked out on an individual basis. In other words, adoption of data standards need not change the way that data are managed, or that collaborations are established, other than to increase the convenience of interactions. BACKGROUND Data management problems fall into two related classes, one of which is largely a computer science question, involving the choice of suitable hardware and database software. The other, which is (or should be) strictly a management issue, concerns the choice of data to be stored, and the form it should take. Computers and Database Software Paper records are still used by a few small colonies, but record-keeping for the most part has been computerized over the past 10 to 15 years. Computerized databases were first developed by the larger institutions that could afford the costly investment in early technology. Typically, data management software was developed in-house, and often was tied to the architecture and programming conventions characteristic of a particular brand of computer. File storage was usually limited and expensive, which led to the use of complex data coding schemes that emphasized “packing” of information into the smallest possible space. As a result, data management had to be left in the hands of highly specialized technical personnel. Data retrieval frequently required knowledge of programming techniques, decisions about additions and changes to databases and modifications of data codes were often major considerations, and a changeover to a new computer (even a new model of the same brand) was likely to be seriously disruptive. Today the situation has changed so that with proper choice of equipment and software, colony management personnel who do not have technical computer training may have easy access and control over their data, as well as considerable freedom to adjust the form and content of their databases quickly and easily. These improvements are ultimately the result of advances in performance of computer hardware. Desktop computers are now sufficiently powerful to manage large data sets using inexpensive database software that incorporates the same sophisticated data manipulation techniques found on large mainframe applications. Further, because these packages have been implemented for a broad, non-technical clien- Basic Data Standards for Primate Colonies I 127 tele, they have been made considerably easier to use than much of the older software. The dramatic decline in the cost of data storage has also eliminated the need for cryptic data coding, meaning that information can be stored in a form that is interpretable to the eye. The net result of these improvements is that with many of the technical barriers removed, data management may now be largely in the hands of those who know the data best, and data requirements per se can be the sole determinant of issues in colony record-keeping. These advances make it possible for even the smallest colonies to enjoy the advantages of computer management of their data. Animal Colony Data A considerable amount of valuable discussion and work on primate recordkeeping has been done in the past. The single most influential source for information on data required for effective primate colony management comes from the 1979 Committee on Laboratory Animal Records (CLAR) publication Laboratory Animal Records [National Research Council, 19791. The committee set out to define comprehensive specifications for a record-keeping system that would 1) identify basic information needed for local colony management, 2) provide basic data for evaluating breeding programs, 3) provide a uniform compilation of information that can accompany individual animals that are transferred between colonies, and 4) define a set of uniform record items so that information can be shared among institutions and used for national planning. Although the committee attempted to specify data entry forms, as well as some of the more technical aspects of computerized record-keeping, these details have not been as useful as the overall strategy of data organization proposed, or the extensive inventory of data items that should be considered for any well-managed colony. Animal colony data fall into two classes, each of which requires a somewhat different approach to record-keeping. Single-entry data (termed “unchanging data” by the CLAR) apply to events or measures that arise only once during the lifetime of the individual animal. These include IDSof the individual and its two parents, dates of birth and death, etc. Multiple-entry data (“continually changing data”), on the other hand, are those which derive from events or measures that may be repeated throughout the lifetime of the individual. These include information that comprises clinical, reproductive, developmental, experimental histories, etc. Although single-entry data would appear to be fundamental to colony management, and certainly are simpler to manipulate, much of the initial impetus for developing colony record-keeping systems has come from colony managers who have had a clear idea of their need for keeping track of an ever-increasing store of information about the clinical and reproductive histories of their animals, best stored as multiple-entry data. In many respects the relative importance of multiple-entry data persists today, primarily because its utility is immediately evident at the local colony level, whereas single-entry data tend to be most valuable when used in conjunction with comparable information aggregated over time or over multiple colonies. As a consequence of the past complexity of computer and software technology, most managers have quite understandably concentrated on solving their day-to-day database problems, rather than dealing with more farreaching questions that rely heavily on external organization and cooperation. In fact, it has not been until recently that the last three objectives of the CLAFt (evaluation of breeding programs, uniform individual data compilation, and sharing of information) could be implemented on the scale originally envisaged. As these objectives become more significant, issues of comparability and standardization arise that go beyond the scope of the CLAR report. 128 / Dyke DATA STANDARDIZATION Recent efforts to come to grips with issues of standardization have had some success. We report here an attempt initiated by an ad hoc committee formed at a workshop on data standardization held as part of the program of the 1987 meeting of the American Society of Primatologists (ASP) to define a set of standards that would permit inter-institutional sharing and aggregation of basic population registry (or census) data. Standards for a Single-EntryRegistry Record Structure The initial motivation of the ad hoc committee for developing a standard format for basic colony registry data came from the recognition that 1) few summaries of colony demography have been published, and 2) those published statistics that exist are seldom truly comparable. The principal reasons for this state of affairs are a lack of agreement on which demographic measurements and analyses are most appropriate for primate colonies, and the fact that programs that perform demographic analysis of small populations are surprisingly complex. The situation is further complicated by dissimilarities in record-keeping conventions that make it difficult to adapt any one analysis program to a variety of databases. Some of these dissimilarities are: 1. Differences in file and record structure. No two database programs seem to use the same scheme to organize their data. Some require that the initial records in a data file contain information about the format of data records that follow; others keep this information in a separate file. Sometimes records are fixed in length; sometimes their length is variable, depending on the amount of information they contain. 2. Differences in data format. Even when the same variables are kept, it is uncommon for them to have the same length, or to be located in the same position on the record from one database to the next. 3. Differences in coding. For example, some databases use a Julian calendar form, in which dates are expressed as number of days since a fixed date in the past (often January 1,1900), while others use one of the more common Gregorian forms. The latter form presents its own difficulties in that months may be expressed numerically or alphabetically (with full month names or their abbreviations), years may be represented by 2 , 3 , or 4 digits, and the order in which years, months, and days occur is variable. Likewise, some databases provide separate date fields for birth and entry into the colony, as well as death and exit dates, while others combine an entry date with an age a t time of entry and an exit date with a special code defining the type of exit (natural death, experimental sacrifice, sale, etc.). 4. Differences in category. Most codes used to designate gender are limited to some simple three-character combination (M, F, U or 1,2,3,etc. for males, females, and unknown gender, respectively). However, some colonies find it useful to add codes that distinguish hermaphrodites, experimentally induced intersexes, castrated individuals of either sex, etc. Codes designating variables such as class of acquisition or cause of death are seldom the same in different databases. 5 . Differences in variables. The same basic information may be kept in quite different ways. For example, some colonies maintain fertility histories as part of the information stored for individual animals, while others must reconstruct reproductive chronologies from offspring census records. No one of these discrepancies is particularly difficult to correct, but writing analysis software that can accommodate the seemingly infinite variety of data Basic Data Standards for Primate Colonies / 129 structures is not practicable. Consequently, the ad hoc committee decided that the simplest course would be to develop standards for an intermediary file with a simple record format that contains a set of variables likely to be common to all record-keeping systems. This structure could be generated without the need to write programs by using the ASCII export/import capabilities common to all modern database software. Variables to be included in the standard were limited to the minimum required for basic demographic analysis and for the construction of extended pedigrees for population genetic analysis. For the most part, these requirements are met by the CLAR report Basic Census File entry and disposition record definitions, and are thus widely available. Detailed fertility analysis may require multiple-entry information, but the committee felt that the initial attempt at standardization should be limited to a single-entry file. Practical decisions about specific format and coding were based on the conventions of the International Species Inventory System (ISIS) Animal Records Keeping System (ARKS) database [Scobie, 1987; Seal et al., 19771. Further minor adjustments were made after checking these specifications against the formats of a number of colony databases. Fourteen variables were selected and a set of format and coding rules were established. It was initially thought that most of these rules should be invariant, but we underestimated the difficulties that colony data managers would have in generating a standard registry file from existing data. To simplify this task, we modified our initial practice of requiring that the length of individual variables be fixed, by substituting a range of permissible widths for most fields. This has reduced the burden (and the error rate) a t the local level, at the relatively minor cost of requiring analysis software to accommodate some variability in input formats. The structure of this record is shown in Table A1 in the Appendix. Currently, the standard defines several rules for generating and using the registry file. These are listed in increasing order of flexibility: 1. The standard registry file must be accompanied by a full description of its record structure. That is, each variable must be defined, its length in characters and its data type (character, numeric integer, date, etc.) must be specified, and the exact meaning of coded information must be given. The importance of this requirement cannot be overstated. Software developers and other users of the data can accommodateto any amount of variability in format, but only if its extent is known precisely. 2. Data must be represented only by printable text (ASCII) characters. 3. Although record length is not fixed, all records in the file must be the same length. 4. No matter how many exit codes are used, it must be possible to group them into four general classes (alive at time of census, natural death, lost to follow-up, and unknown cause of exit). 5. Nine variables (IDSof Ego, Sire, and Dam; Sex; Birth, Entry, and Exit Dates; Acquisition and Exit Codes) are required for demographic and pedigree analysis, and must always appear in the standard registry file. 6. Four of the remaining five variables (Taxonomic code, Institution, Local subgroup, Current location, and End-of-recordcharacter) are highly desirable, but their absence usually does not preclude useful demographic and pedigree analysis. 7. Suggested upper and lower limits of field lengths, and fixed formats of gender, taxonomic and institution codes may be overridden, although experience shows that this is seldom necessary. 130 I Dyke 8. The order of variables in the record is not fixed (although the sequence shown in Table A1 is convenient). These rules should serve as guidelines for the construction of a standard registry file. It is encouraging to find in practice that it is relatively easy for colony data managers (using standard database software with an existing database) to generate a standard registry file that meets these requirements. Adherence to the specifications alone, however, will not guarantee that data managers will be able to perform demographic and pedigree analysis on their colonies. Analysis software also must be written so that the several levels of variability described above can be accommodated. We have found it is relatively easy to write software that accepts input data having any format and coding scheme that meets these specifications. Our programs [Dyke, 1989a,bl read an input data file, and also a separate text file that defines the record structure and provides a “dictionary” that specifies how codes (such as exit codes) will be transformed into a standardized set. The latter file can be created with an ordinary text or word processor, and incorporates information about record format that must accompany the data file (see Rule l above). Standards for Multiple-Entry Record Structures As explained, initial efforts a t standardization began with single-entry data partly because most colonies, following CLAR recommendations, already have these data in relatively accessible single-entry format, and partly because these are the data most crucial for basic demographic and pedigree analyses. Some important analyses require data that are better kept in multiple-entry files, however. Storage of multiple-entry data is not as straightforward as the single-entry case. In particular, the convenience of keeping all data for a single individual on a single record is complicated because 1)records will have different lengths if individuals in a population have differing numbers of entries. Many database systems in fact are based on variable length records for reasons of file storage efficiency. Unfortunately, there are a number of formats used for storage of variable length records, each of which typically requires its own software techniques for manipulation. On the other hand, 2) if records are fixed a t the same length, they will all have to be the same size as the record required for the individual with the most data. If the quantity of data varies between individuals, this practice can be extremely wasteful of file storage, since space on each record must be reserved whether or not there are entries to fill it. An alternative to keeping multiple entries on a single record for each individual is to keep a file of multiple records for each individual, with each record containing a fixed number of entries. A new record is created each time information is taken during the lifetime of the individual. The usual strategy is to keep only one kind of data on each record, and to create separate files for different kinds of data. We have chosen this strategy because 1) it makes it simple to keep all records in a data file the same fixed length, 2) for many purposes it does not waste file storage space, and 3) special software is not required for file management. A strategy for recording multiple events. The actual format of the records in such a file depends to a large extent on the nature of the data they contain. The simplest form is one consisting of only the identification of the individual, a sequence indicator (usually a calendar date), and a single measurement or coded observation. We call a multiple-entry file containing records of this sort an event file. Event files typically contain information such as birth dates of offspring, dates of routine clinical treatment and inoculations, weights and morphometric growth Basic Data Standards for Primate Colonies / 131 measures, etc. Coded behavioral observations may also be stored in this way, although the need to keep the individual's ID on each record may lead to excessive use of file storage if the number of observations is large. Records are usually grouped by ID and sorted by date or other sequence indicator. Details of this minimal structure can be found in Table A11 in the Appendix. A strategy for recording multiple changes of state. Event files are suitable for storage of repeated measurements and observations that occur only once. Other measurements or observations, however, describe an ongoing condition or state of the animal, implying that intervals, rather than instants of time must be accounted for on each record. Two dates, rather than one, must be associated with each measurement so that duration of state can be recorded. The record format for data of this sort has been more difficult to standardize than single-entry formats, because of differences both in database software and in data. Nonetheless, for purposes of exchanging data we have found a pair of alternative record structures that appear to be relatively easily generated by most database systems. Following our decision to confine standardization to the minimum data required for basic demographic and population genetic analysis, and noting that mortality analysis can be done with single-entry data, we have developed a multiple-entry file for the data required to analyze fertility. Calculation of fertility rates must take into account the fact that individuals are not exposed to the risk of reproducing a t all times during their lifetimes (in contrast to the risk of dying). Captive animals are particularly subject to interruptions during the normal reproductive span-single-sex caging, experimental procedures, hospitalization, etc., result in periods of varying length during which risk of reproducing goes to zero. To get an accurate assessment of individual reproductive performance, these interruptions must be accounted for. In most colony databases this information is kept as a breeding or location file, but because the notion of exposure to risk has applications beyond the study of fertility, we designate the standardized form in more general terms as an exposure file. The two standard formats are: 1. Single Date Format. The same Animal ID that appears in the registry file must be present in this file, along with a date marking the beginning of a change in reproductive status, and a code that specifies whether the status is one of exposure (Code E)or non-exposure (Code N) to risk of fertility. Optionally, Ego's sex, ID of a mate (or a list of potential mates), and cage location may be included. Records grouped by Animal ID and sorted by date of status change represent (in conjunction with Entry and Exit dates) the exposure history of each individual. The structure of this record and interpretation of its variables are shown in Table A111 in the Appendix. 2. Two Date Format. This format is similar to the single date format, except that it includes two dates representing the beginning and end, respectively, of a period of exposure to risk of fertility, which obviates the need for an exposure status code. Tabulation of risk is done only between two dates found on the same record. The structure of this record and interpretation of its variables are shown in Table AIV in the Appendix. T h e exposure file is a special case of data representing continual chronological change of status. Other multiple-entry data involve repeated measurements, or the recording of historical events that cannot be tabulated by linking records in the basic registry file. For example, conventional fertility analysis is based on live 132 I Dyke births, which can be tabulated by linking parental and offspring records in the registry file. Since census and registry files rarely contain records for stillbirths or abortuses, analyses of reproductive loss require special recording of such events, typically in a reproductive history file. To date, standardization of these files has not been tried, not because of differences in record structures (which tend to be relatively simple), but because of a diversity of the data per se. This variety results from differences in species-specific reproductive patterns, colony-specific differences in definitions of variables used to represent reproductive events, and lack of agreement on appropriate demographic measures to be used. The problem is exacerbated by well-known difficulties in assuring the quality of reproductive loss data. Nonetheless, it should be relatively easy to modify one of the two formats given here to accommodate more complex data structures. SUMMARY With the development of powerful and inexpensive computer technology, animal colony record-keeping has reached the stage where common data standards are practicable. Initially, these standards began as a rather strict set of specifications that left the responsibility for matching data format with program input requirements solely with colony data managers. Gradually they have evolved into a more satisfactory compromise in which the task is shared with the programs that use the data. The result is a system that combines great flexibility for data managers with reasonable assurance that their data can be analyzed without the need for customized software. ACKNOWLEDGMENTS The original ad hoc committee on data standards consisted of Arthur Davis of the Washington Regional Primate Center; Drs. Bennett Dyke, of the Southwest Foundation for Biomedical Research; Andrew J. Petto, of the New England Regional Primate Research Center; Samuel Sholl of the Wisconsin Regional Primate Research Center; Brent Swenson, of the Yerkes Regional Primate Research Center; and Lawrence Williams, of the University of South Alabama Primate Research Laboratory. Paul Dubois, of the Wisconsin Center, and Paul Mamelka, of the Southwest Foundation, have also made important contributions. Other individuals who have provided helpful suggestions and criticisms are Drs. Fred Bercovitch, John Capitanio, Carolyn Erhardt, Charles Howard, and Doris Zumpe. Funding for organizing some of the Committee's work, and for preparation of this manuscript was provided by NIH grant RR02229. REFERENCES Dyke, B. ACMP, ANIMAL COLONY MANAGEMENT PACKAGE USER GUIDE. PGL TECHNICAL REPORT NO. 3. San Antonio, TX, Population Genetics Laboratory, Southwest Foundation for Biomedical Research, 1989a. Dyke, B. PEDSYS, A PEDIGREE DATA MANAGEMENT SYSTEM USER'S MANUAL. PGL TECHNICAL REPORT NO. 2. San Antonio, TX, Population Genetics Laboratory, Southwest Foundation for Biomedical Research, 1989b. National Research Council. LABORATORY ANIMAL RECORDS. Committee on Laboratory Animal Records. Washington, DC Institute of Laboratory Animal Resources, National Academy of Sciences (DHEW publ. no. (NIH) 80-2064), 1979. Scobie, P. USER MANUAL FOR ARKS, AN ANIMAL RECORD KEEPING SYSTEM. Apple Valley, MN, ISIS, 1987. Seal, U.S.; Makey, D.G.; Bridgewater, D.; Simmons, L.; Murtfeldt, L.E. ISIS: A computerized record system for the manaaement of wild animals in captivity. INTgRNATIONAL ZOO YEARBOOK 17:68-70. 1977. Basic Data Standards for Primate Colonies / 133 APPENDIX I. The Standard Single-Entry Registry Record Structure (Table AI) Variables in this record were chosen to conform minimally to the ILAR Laboratory Animal Records Basic Census Record and the ISIS Universal Single Entry Data category. This information is sufficient for detailed demographic analysis, and for construction and analysis of extended pedigrees. All data must be in text (printable ASCII) format. Three classes of data variables are recognized: 1. REQUIRED ITEMS(1-9) that must be present for demographic and pedigree analysis. 2. OPTIONAL ITEMS(10-13) that are highly desirable as population or subgroup identifiers. Item 14 is an important aid in assuring that all records are the same length. The ‘5” character is often used for this purpose. 3. NON-STANDARDITEMS of information may be included a t the end of the record, just prior to the end-of-record field. These variables are usually ignored in most demographic and pedigree analysis programs, but may be important for other reasons. It is essential that an explicit description of the record structure accompany data files transferred between institutions. This information should include 1. A population title and calendar date of census (normally, the date of the most recent vital event recorded in the file). 2. Name, address, and telephone number of the person to be contacted for information about the data. An electronic mail address (Bitnet or Internet, for example) is highly desirable, as well. 3. Information about the type and format of disk or tape on which data are stored. 4. Field lengths, data types, and field descriptions for all items corresponding to the table above. 5. Codes and their definitions for standard items 4 , 7 , and 9 (SEX,AQCODE, and EXCODE) that match those listed in notes (F),(HI, and (I)below, respectively. TABLE AI. Standard Registry Record Structure Variable 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mnemonic (A) Length (B) Type (C) Description ID (or EGO) SIRE DAM SEX BIRTH ENTRY AQCODE EXIT EXCODE TAXON INSW SUBGRP LOC EOR 3 -10 3-10 3-10 1 6-15 6-15 2-6 6-15 2-16 3 6-9 3 -10 3-10 1 C C C C C C C C C C C C C C Animal ID Sire ID Dam ID Sex Birth date Entry date Acquisition code Exit date Exit code Taxonomic code Institution code Local subgroup code Current location code End-of-record character (See notes (A) through (M)below for explanations.) Note 134 I Dyke For additional flexibility that may alleviate potential problems in conforming to the standard, the code lists are left open, and new codes may be added to designate categories not defined by the standard. If this is done, alternate codes and descriptions should be defined explicitly. 6. Explicitly defined codes and descriptions for optional items 10, 11, 12, 13, and 14 (TAXON, INSTT, SUBGRP, LOC, and EOR). 7. A field length, data type, field description, and codes for each additional non-standard item of information added. 8. Except where noted, alphabetic information is assumed to be case-sensitive. See notes (El and (F). Notes: (A) Mnemonics are used in some software packages to select variables for analysis and as headings for tabular output. When supplied, mnemonics should be upper case and a maximum of six characters in length. (B) Recommended ranges of field lengths given here are based on those found in several colony databases. Other field lengths may be used, so long as the exact length is specified. (C) Normally, all data are interpreted as text characters and should be designated “C.” Provision is made for numeric types, however: C F I = = = character (left-justified) floating point (right-justified) integer (right-justified) Designators should be upper case only. (D) Animal ID must be present. Mnemonic used should be either ID or EGO, a term used to indicate the individual of reference in a pedigree. (E) Unknown parents may be designated by blanks, zeros, g’s, or the string UNK or UNKNOWN. Alphabetic coding in this instance is case-insensitive; that is, the string unknown may be substituted for UNKNOWN, etc. (F) Gender is usually coded as 1,2,3,or M, F, U for male, female, and unknown, respectively. Alphabetic coding for gender is case-insensitive. That is, m, f, or u may be substituted for M, F, or U, respectively. Codes for other gender categories are permitted, but for purposes of analysis it must be possible to subsume them under one of the three categories above. (G) Preferred order of date is Year-Month-Day (YYYYMMDD,where YYYY, MM, and DD are integer representations of year, month, and day, respectively.) However, nearly any systematic representation is acceptable: alphabetic representation of month (three-letter abbreviation or full name) is permitted, as are threeand two-digit representations of year. Order of year, month, and day is not fixed, and elements may be separated by slashes, periods, or dashes. Alphabetic month may be separated from integer day or year by a single blank; day may be separated from year by a comma and blank when occurring with alphabetic month. Examples are as follows: Basic Data Standards for Primate Colonies I 135 2910 11990 Jan291990 29JAN1990 900129 1990January29 Jan 29,1990 01-29-990 1990.01.29 Important: Different styles of date representation may not be mixed in any one file. An estimated date element is signified by entry of year or year and month only (YYYYOOOO or YYYYbbbb, where b = blank). Dates of events that have not occurred (i.e., death date of living animals) are designated by a full field of blanks or 0’s. Examples are as follows: OOJanl990 900100 291 I990 Jan 1990 199OJanuaryOO Jan ,1990 01-00-990 1990.01. Unrecorded dates of events known to have occurred are often designated by a full field of 9’s. However, with the approach of the year 1999, this practice is best avoided unless all four digits of the year are used; the use of asterisks in this case is suggested: 9999199199 is correct, but eventually 99 or 999 alone as designators of year will be ambiguous; **-**-1975 should not be substituted for 00-00-1975. Many demographic analyses assume that an event occurs on the 15th day of the month if only month and year are known, and on June 30 if only year is known. By convention, individuals are counted as alive on the day of their birth or entry, but not on the day of exit. Julian calendar representation should be avoided, since most people find it difficult to interpret by eye, and the fixed date in the past from which days are counted may vary from one database system to the next. Dates in Julian form need not be absolutely prohibited, however, since software is available that will convert between it and Gregorian calendars. (H)Acquisition Codes. It is often useful to compare characteristics of animals that have different origins. The following list represents classes common to several large US.colonies and represent categories generally thought to include enough animals for meaningful analysis. Other categories may be added as needed: 1 = Wild-born (known provenance) 2 = Wild-born (unknown provenance) 3 = Colony-born (local) 4 = Colony-born (other colony) 5 = Purchase or trade (unknown provenance) 6 = Caesarian section (local) 7 = Caesarian section (other colony) 99 = Unknown origin (I) Exit Codes. Many analyses require that animals be classified according to their exposure to risk of events such as reproduction, death, illness, etc. The reason for an animal leaving the population is an important determinant of this classification. The following list represents exit classes common to several large U.S.colonies and represent categories generally thought to include enough animals for meaningful analysis. Other categories may be added as needed: 136 I Dyke Alive at time of census Natural death Euthanasia for humane reasons (not research related) Trauma = Stillbirth = Abortion = Sale or trade = Escape or theft = Sacrifice = Research-related death (other than sacrifice) = Unknown cause of exit 0 = 1= 2 = 3 = 4 5 6 7 8 9 99 Most demographic analyses require that animals be classified according to only five exit categories. These categories, and their corresponding Exit Codes, are given below: Alive a t time of census Natural death Lost to follow-up (non-research related) Lost to follow-up (research related) Unknown cause of exit Corresponding to Code 0 above Corresponding to Codes 1-5 Corresponding to Codes 6, 7 Corresponding to Codes 8 , 9 Corresponding to Code 99 This breakdown should serve as a guide for translating or creating exit categories that do not match the Exit Codes as given above. (J) Since species (or subspecies) may differ with respect to vital statistics, taxonomic identification may be important census information. When combined with an institution code, this item also serves as an independent means of identifying individual records with the population to which they apply. It is usually not a good idea to mix widely divergent taxa in the same census file. A good rule of thumb is to make a separate file for each species, and to identify animals in the file by subspecies, if that is appropriate. Some (arbitrary) taxonomic codes that have been used so far are: MM MN PHA PHC SBB SO0 Etc. = Macaca mulatta = Macaca nemestrina = Papio hamadryas anubis = Papio hamadryas cynocephalus = Saimiri boliviensis boliviensis = Saguinus oedipus oedipus For the time being, additional codes are left to be defined by the individual colonies. (K) Institution codes. Institutional designation is sometimes a useful component of identification of records. Maximum field width is sufficient for the ISIS Institution name abbreviation. The following (arbitrary) list of institution codes has been used for some of the major U.S. primate colonies: CPRC DRPRC = = California Primate Research Center Delta Regional Primate Research Center Basic Data Standards for Primate Colonies / 137 NERPRC ORPRC ORAUMC SFBR UTMDA USAPRL WaRPRC WiRPRC YRPRC Etc. = = = = = = New England Regional Primate Research Center Oregon Regional Primate Research Center Oak Ridge Associated Universities Marmoset Colony Southwest Foundation for Biomedical Research University of Texas M.D. Anderson Cancer Center Univ. S. Alabama Primate Research Laboratory Washington Regional Primate Research Center Wisconsin Regional Primate Research Center Yerkes Regional Primate Research Center (L) “Local subgroup” and “Current location” categories are sometimes useful in defining population subunits that may be analyzed separately. These categories are, however, optional. (M) The default practice for most communications software is to discard blank characters a t the end of records. This can lead to problems such as differences in length of records if the last field of some, but not all, records contain trailing blanks. In these cases a single non-blank character can be appended to each record before transmission to assure that all records retain their full length. Analysis software will add as many blanks as necessary to the left of this character to assure that it appears in the same position at the end of each record in the file. This is an optional feature; choice of end-of-record character is also optional, although “>” is suggested. Example: The following example shows two records from a Standard Single-entry Registry File. The line below the second record serves as a ruler with stroke characters (1) marking the beginning of each field. Numbers below the strokes correspond to variables in Table AI. The first record indicates that Ego Rho53 has sire Rho12 and dam Rho22 is a female born on May 21, 1987; her birth date was also the date of entry into the colony, and she is thus classified as colony-born; no date or code of exit is given, meaning that she is still alive; she is a rhesus macaque belonging to the Southwest Foundation for Biomedical Research local subgroup “Gen,” and is housed in Cage A. The second record indicates that Ego Rh102 has unknown parents, is a male born sometime in 1985 and entered the colony on April 10,1991 from another colony where he was born; he died on March 13, 1992 of natural causes; he too was a Rhesus belonging to the SFBR “Gen” subgroup, and was housed in Cage B. Rh053Rh022Rh022F8705228~05220~ M M SFBR GenCageA> Rh202 M8500009204200492032302MM SFBR GenCageBB I. . I. . ). . I . . .1. . .(.(. . .1.1. 1. . 1. 1. . 1 2 2 3 45 b 7 8 9 10 2 2 2 2 13 14 11. Multiple-EntryData Files 1.Event files. Event files are used for storage of repeated measurements and observations that occur only once. In their simplest form they consist of the identification of the individual, a time or calendar date, a single measurement or coded observation, and an optional end-of-record field, as shown in Table AII. 138 I Dyke TABLE AII. MdtiDle-Entrv Data Record Structure (Event Ne) ~ ~ ~~ Variable Mnemonic Length Type Description 1 EGO DATE 3-10 6-15 1-<n> 1 C C <varies> C Animal ID Date of event Observation/measurement End-of record character 2 3 4 <EVENT> EOR Form and content of Variable 3 in the table will depend on the observation or measurement recorded. Its mnemonic <EVENT> is enclosed in angle brackets to indicate that although its presence is required, its content is optional. Length and Data Type of this field also depend on the nature of the event. Animal ID, Date, and EOR fields should conform to specifications described above. A record is created for each event. The event file need not contain the same number of records for each individual, nor is it necessary for all individuals represented in the registry file to be included in the event file. Records in the event file are usually grouped by Animal ID and sorted by date. Example: The following example shows three records from an Event File. The line below the third record serves as a ruler with stroke characters (1) marking the beginning of each field. Numbers below the strokes correspond to variables in Table AII. The records show the blood sampling history of Ego Rho53 with each record containing a separate sampling date and “event” (in this case the size and type of sample tube). Rh053880629 SmlRedTop> Rh05389062220mlRedTop> Rh05390060920mlRedTop> (....I._...I..._.... 1 -1 4 4 2. Fertility exposure files. When measurements or observations refer to an ongoing condition or state, intervals, rather than instants of time must be accounted for on each record. That is, two dates, rather than one, must be associated with each measurement. Two extended formats, corresponding to alternate conventions for defining dates of exposure to risk to fertility, are given here. 1. Single date format. In this configuration (Table AIII), it is always assumed that Ego begins life in a non-exposure status, and that the beginning of risk of fertility is defined by a date specified in the first record bearing status code E. The period of risk lasts until the date found on the first subsequent record bearing status code N,which in turn lasts until the date associated with the next record bearing status E, etc. Risk of exposure to fertility finally terminates at the Exit date, end of reproductive age, or end of the time period being analyzed. 2 Interpretation: The table below lists the three combinations of IDS, dates and exposure status codes that may be present in Exposure File records: Basic Data Standards for Primate Colonies / 139 EGO CHANGE STATUS SEX MATE + + + - - + + + + + + + - + NOTE (a) (b) (C) where + represents the presence, - the absence of an ID, date, or code. Other configurations are not permitted and will generate error messages. TABLE AIII. Multiple-Entry Data Record Structure (Single Date Exposure File) Variable Mnemonic Length T.me Description 1 2 EGO CHANGE STATUS SEX MATE C C C C C LOC 3-10 6-15 1-6 1 3-10 3-10 EOR 1 C Animal ID Status change date Exposure status code Ego’s sex Mate ID Cage location End-of record character 3 4 5 6 7 C Notes: (a) Reproduction of EGO is measured without regard to identity of mate. Because neither a Status Change Date nor Status Code is present on the record, it is assumed that EGO is continuously exposed to risk of fertility over a single interval. The beginning of this interval is defined by EGO’SBirth Date, Entry Date, or by the beginning of the Registry Period (whichever is later); the end is defined by EGOs Exit Date or the end of the Registry Period (whichever is earlier). Only one such record for each individual is permitted. (b) Reproduction of EGO is measured without regard to identity of mate. Here, we allow for exposure to risk of fertility that may be discontinuous, and whose range need not be defined by EGOs Birth and Exit dates. Multiple records for each individual define the duration of intervals, and the exposure status code associated with each record determines whether the interval is one of exposure (code E)or non-exposure (code N).In this configuration, it is always assumed that EGO begins life in a non-exposure status, and that the beginning of risk of fertility is defined by a date specified in the first Exposure File record bearing status code E. End of an interval may be determined in one of two ways: If record (i) is immediately followed by another (i + 1) that also specifies EGO, end of interval (i) is taken to be CHANGE(i + l), which is also the beginning of interval (i 11, etc. 0 + 0 If no subsequent record for EGO is found, the end of the interval is assumed to be EGOS Exit Date or end of the Registry Period (whichever is earlier). Records for each individual should be contiguous and ordered by Status Change Date such that Birth Date . . . ICHANGE(i) ICHANGE(i+ 1) 5 . . . Exit Date. Contiguous records with the same status code are treated as a single interval that is terminated as described above. Configuration (b) requires that an Exposure Change Date and an Exposure Status Code be present on each record of the indi- 140 I Dyke vidual whose exposure is being defined. The Exposure File may contain records having both configurations (a) and (b), although mixed configuration may not be applied to the same individual. (c) Only offspring produced jointly by EGO and MATE are tallied. This configuration is treated in a more restrictive fashion than configurations (a) and (b) above in that it is always assumed that the beginning of risk of fertility is defined by a date specified in the first Exposure File record having an Exposure Status Code E (that is, that the initial status of both individuals included in the record is one of non-risk). End of an interval may be determined in one of three ways: If record (i) is immediately followed by another (i + 1)that also specifies EGO l), which is also the and MATE, end of interval (i) is taken to be CHANGE (i beginning of interval (i + l), etc. 0 + If record (i) is immediately followed by another (i + 1)that specifies EGO and a new MATE, the fertility of EGO and the new MATE is assumed to have ended at CHANGE(i + l), which is also taken to be the first interval of a new series defining the joint fertility of EGO and the new MATE. 0 If no subsequent record for EGO is found, the end of the interval is assumed to be EGOS Exit Date, current mate’s Exit Date, or the end of the Registry Period (whichever is earliest). Configuration (c) requires that an Exposure Change Date, Exposure Status Code, and a MATE ID be present on each record representing a couple whose exposure is being defined. Records for EGO must be contiguous. EGO may have more than one mate (and in fact more than one mate a t one time) but records for each mate must be contiguous and ordered by Exposure Change Date as described above. An Exposure File containing records in configuration (c) may not contain records in configurations (a)or (b). Normally, configuration (c) applies to human populations, where marital fertility is a standard measure of interest. Only categories (a) and (b) would be used with non-human primates, where monogamy and identity of specific mates tends to be less important than simple co-residence with any member of the opposite sex. By convention, a day of exposure is counted on the beginning, but not the end date of the interval. Example: The following example shows four records from a Single Date Exposure File. The line below the fourth record serves as a ruler with stroke characters (I) marking the beginning of each field. Numbers below the strokes correspond to variables in Table AIII. The series indicates that female Ego is first exposed to fertility on April 10,1991when she is housed with mate Rh102 in location CageB (Record 1). On November 4 of the same year her exposure is interrupted when she is separated from this male as the result of her removal to CageA (Record 2), until February 28,1992, when she is re-housed with him in CageB (Record 3). The two are separated again two weeks later on March 13 when Rh102 is removed from CageB (Record 4). Basic Data Standards for Primate Colonies / 141 Rh0539llO~llOEFRhll02CageB> Rh0539ll%ll04NF CageA> Rh053920228EFRhll02CageB> Rh053920313NF CageB> I.. ll .. I . .. . 2 .A&. I.. L-+ 2. Two date formut. In this configuration (Table AIV), tabulation of risk is done only between a start and stop date found on the same record. TABLE AIV. Multiple-Entry Data Record Structure (Two Date Exposure File) Variable Mnemonic 1 2 3 4 5 6 7 EGO START STOP SEX MATE LOC EOR Length Type Description 3-10 6-15 6-15 1 3-10 3-10 1 C C C C C C C Animal ID Exposure start date Exposure stop date Ego’s sex Mate ID Cage location End-of record character Interpretation: The table below lists the three combinations of IDS and dates that may be present in Exposure File records: EGO START STOP SEX MATE + + + + + - + + + - + + + + + + + + - - + NOTE (d) (el (f) (€9 (h) where + represents the presence, - the absence of an ID, date, or code. Other configurations are not permitted and will generate error messages. Notes: (d) Reproduction of EGO is measured without regard to identity of mate. Because neither a START nor STOP date is present on the record, it is assumed that EGO is continuously exposed to risk of fertility over a single interval. The beginning of this interval is defined by EGO’S Birth Date, Entry Date, or by the beginning of the Registry Period (whichever is later); the end is defined by EGOS Exit Date or the end of the Registry Period (whichever is earlier). Only one such record for each individual is permitted. (e) Reproduction of EGO is measured without regard to identity of mate. In this configuration, it is always assumed that EGO begins life in a non-exposure status, and that the beginning of risk of fertility is defined by the START date specified in the Exposure File record. End of interval is specified by EGO’S Exit Date or the end of the Registry Period (whichever is earlier). (f) Reproduction of EGO is measured without regard to identity of mate. In this 142 I Dyke configuration, it is always assumed that EGO begins life in a non-exposure status, and that the beginning of risk of fertility is defined by EGO’S Birth Date, Entry Date, or by the beginning of the Registry Period (whichever is later). End of interval is specified by the STOP date specified in the Exposure File record. (g) Reproduction of EGO is measured without regard to identity of mate. In this configuration, it is always assumed that EGO begins life in a non-exposure status, and that the beginning of risk of fertility is defined by the START date specified in the first Exposure File record. End of interval is always specified by the STOP date. (h) Only offspring produced jointly by EGO and MATE are tallied. It is always assumed that the beginning of risk of fertility is defined by START (that is, that the initial status of both individuals included in the record is one of non-risk). With configurations (e) through (h) we allow for exposure to risk of fertility that may be discontinuous, and whose range need not be defined by EGOS Birth or Exit dates. Multiple Exposure records for each individual define the duration of intervals of exposure Records for each individual should be contiguous and ordered by date, such that Birth Date . . . 5 START(i) 5 STOP(Z) 5 START(I+ 1) I STOP(i+ 1) . . . 5 Exit Date. Contiguous records in which the STOP date of record (i) is the same as the START date of record (z + 1) are treated as single interval that is terminated by the STOP date of record (i t- 1). The Exposure File may contain records having configurations (d) through (g), and a mixture of configurations (e) through (g) may be applied to a single individual. Configuration (d) may not be mixed with configurations (el through (g) for same individual. Records for EGO must be contiguous. EGO may have more than one mate (and in fact more than one mate at one time) but records for each mate must be contiguous and ordered by date as described in (g) above. An Exposure File containing records in configuration (h) may not contain records in configurations (d) through (g). Normally, configuration (h) applies to human populations, where marital fertility is a standard measure of interest. Only categories (d) through (g) would be used with non-human primates, where monogamy and identity of specific mates tends to be less important than simple co-residence with any member of the opposite sex. By convention, a day of exposure is counted on the beginning, but not the end date of the interval. Example: The following example shows two records from a Two Date Exposure File that express the same timing as the Single Date Exposure example given above. The line below the second record serves as a ruler with stroke characters (I) marking the beginning of each field. Numbers below the strokes correspond to variables in Table AIV. The series indicates the same sequence as the example given above for the Single Date Exposure File: female Ego is first exposed to fertility for a period lasting from April 10,1991 to (but not including) November 4 of the same year, during which time she is housed with mate Rh102 (Record 1) in Basic Data Standards for Primate Colonies / 143 location CageB. After a period of non-exposure lasting from November 4, 1991 through February 27,1992,she is again housed with the same male from February 28,1992to March 13,1992in the same location, at which time her exposure ends (Record 2). Rh053920~20922204FRh~O2CageB> Rh0539202289203~3FRh202CageB> I .... I ..... I ..... 1 1 . . . . I 1 2 3 45 ....I b 7 Note that this format is somewhat more concise than Single Date format, but does not express explicit information about intervals of non-exposure. Important note: All records in an Exposure File must have the same format.