close

Вход

Забыли?

вход по аккаунту

?

Basic data standards for primate colonies.

код для вставкиСкачать
American Journal of Primatology 29126143 (1993)
TECHNICAL NOTE
Basic Data Standards for Primate Colonies
BENNETT DYKE
Department of Genetics, Southwest Foundation for Biomedical Research, San Antonio, Texas
A set of simple standards has been developed to ease the task of transferring
primate colony data between institutions and computers, and to assure that
colony vital statistics are appropriate for demographic and genetic analyses. The standards have been designed to be easily and inexpensively
implemented from existing databases, and also may serve as a guide to the
setup of new colony record-keeping systems. o 1993 Wiley-Liss, Inc.
Key words: vital statistics, record-keeping, computers
INTRODUCTION
The recommendations for data standardization reported here arose initially
from discussions of problems encountered in collecting vital statistics data for
demographic studies of several different primate colonies. Most colonies keep similar information, but differences in format, coding, and accessibility can present
serious obstacles to data sharing. These problems are not so apparent at the individual colony level, although the lack of standardization tends to be disconcerting
to managers who are starting or computerizing their records for the first time.
Data standardization does become a significant issue, however, when it is necessary to combine vital statistics and pedigree data across colonies. As ever greater
proportions of the nation’s research animals are supplied by breeding colonies,
studies which attempt to compare colonies a t the population level, or which require
data aggregated across colonies for species-specific demographic and genetic analyses, are becoming increasingly important.
Objectives in developing data standards were:
1. To specify the minimum amount of data required to construct pedigrees and
perform detailed demographic analysis,
2. To make it as easy and cost-effective as possible for colony data managers
to assemble information already on hand into a consistent format without programming,
3. To give software developers reasonable assurances that the data required
for their analyses would always be present, and would be in a form that requires
little or no preparation before use, and
4. To facilitate data transfer, and also to provide a set of guidelines for those
who wish to set up new colony record-keeping systems that will conform in a
general way to established databases.
Received for publication January 12, 1992; revision accepted September 26, 1992.
Address reprint requests to Bennett Dyke, Dept. of Genetics, Southwest Foundation for Biomedical
Research, P.O.Box 28147, San Antonio, TX 78228.
0 1993 Wiley-Liss, Inc.
126 I Dyke
The intent was not to persuade managers of successful database systems to
abandon their own practices, but rather to organize a set of simple conventions
that would provide common grounds for data exchange and analysis. In fact, the
work reported here is the result of the active collaborative effort of a number of
individuals (listed in the “Acknowledgments”)who have had years of experience
with established colony databases, and of the willingness on the part of a number
of colony managers to share their vital statistics data and to experiment with
several versions of the standard as it has evolved.
Standards work effectively only when they are agreed upon and shared by a
substantial majority of users. Nonetheless, it should be understood that adoption of
the data standards presented here is intended to be entirely voluntary. We hope,
however, that they will be recognized as useful, simple, and convenient enough to
warrant use by those who need to transfer data between databases or computers,
and we hope that they can serve as a guide in the planning of new databases.
Centralization of data (for example, a national registry of captive primates)
requires adoption of a common format. However, the reverse is not true: A common
format does not require centralization. As is the case with voluntary adoption of
standards, the responsibility for ownership and protection of data should remain in
the hands of the individual colonies, and any arrangements for sharing data must
be worked out on an individual basis. In other words, adoption of data standards
need not change the way that data are managed, or that collaborations are established, other than to increase the convenience of interactions.
BACKGROUND
Data management problems fall into two related classes, one of which is
largely a computer science question, involving the choice of suitable hardware and
database software. The other, which is (or should be) strictly a management issue,
concerns the choice of data to be stored, and the form it should take.
Computers and Database Software
Paper records are still used by a few small colonies, but record-keeping for the
most part has been computerized over the past 10 to 15 years. Computerized databases were first developed by the larger institutions that could afford the costly
investment in early technology. Typically, data management software was developed in-house, and often was tied to the architecture and programming conventions characteristic of a particular brand of computer. File storage was usually
limited and expensive, which led to the use of complex data coding schemes that
emphasized “packing” of information into the smallest possible space. As a result,
data management had to be left in the hands of highly specialized technical personnel. Data retrieval frequently required knowledge of programming techniques,
decisions about additions and changes to databases and modifications of data codes
were often major considerations, and a changeover to a new computer (even a new
model of the same brand) was likely to be seriously disruptive.
Today the situation has changed so that with proper choice of equipment and
software, colony management personnel who do not have technical computer training may have easy access and control over their data, as well as considerable
freedom to adjust the form and content of their databases quickly and easily. These
improvements are ultimately the result of advances in performance of computer
hardware. Desktop computers are now sufficiently powerful to manage large data
sets using inexpensive database software that incorporates the same sophisticated
data manipulation techniques found on large mainframe applications. Further,
because these packages have been implemented for a broad, non-technical clien-
Basic Data Standards for Primate Colonies I 127
tele, they have been made considerably easier to use than much of the older software. The dramatic decline in the cost of data storage has also eliminated the need
for cryptic data coding, meaning that information can be stored in a form that is
interpretable to the eye. The net result of these improvements is that with many
of the technical barriers removed, data management may now be largely in the
hands of those who know the data best, and data requirements per se can be the
sole determinant of issues in colony record-keeping. These advances make it possible for even the smallest colonies to enjoy the advantages of computer management of their data.
Animal Colony Data
A considerable amount of valuable discussion and work on primate recordkeeping has been done in the past. The single most influential source for information on data required for effective primate colony management comes from the
1979 Committee on Laboratory Animal Records (CLAR) publication Laboratory
Animal Records [National Research Council, 19791. The committee set out to define comprehensive specifications for a record-keeping system that would 1) identify basic information needed for local colony management, 2) provide basic data
for evaluating breeding programs, 3) provide a uniform compilation of information
that can accompany individual animals that are transferred between colonies, and
4) define a set of uniform record items so that information can be shared among
institutions and used for national planning. Although the committee attempted to
specify data entry forms, as well as some of the more technical aspects of computerized record-keeping, these details have not been as useful as the overall strategy
of data organization proposed, or the extensive inventory of data items that should
be considered for any well-managed colony.
Animal colony data fall into two classes, each of which requires a somewhat
different approach to record-keeping. Single-entry data (termed “unchanging data”
by the CLAR) apply to events or measures that arise only once during the lifetime
of the individual animal. These include IDSof the individual and its two parents,
dates of birth and death, etc. Multiple-entry data (“continually changing data”), on
the other hand, are those which derive from events or measures that may be
repeated throughout the lifetime of the individual. These include information that
comprises clinical, reproductive, developmental, experimental histories, etc.
Although single-entry data would appear to be fundamental to colony management, and certainly are simpler to manipulate, much of the initial impetus for
developing colony record-keeping systems has come from colony managers who
have had a clear idea of their need for keeping track of an ever-increasing store of
information about the clinical and reproductive histories of their animals, best
stored as multiple-entry data. In many respects the relative importance of multiple-entry data persists today, primarily because its utility is immediately evident
at the local colony level, whereas single-entry data tend to be most valuable when
used in conjunction with comparable information aggregated over time or over
multiple colonies. As a consequence of the past complexity of computer and software technology, most managers have quite understandably concentrated on solving their day-to-day database problems, rather than dealing with more farreaching questions that rely heavily on external organization and cooperation. In
fact, it has not been until recently that the last three objectives of the CLAFt
(evaluation of breeding programs, uniform individual data compilation, and sharing of information) could be implemented on the scale originally envisaged. As
these objectives become more significant, issues of comparability and standardization arise that go beyond the scope of the CLAR report.
128 / Dyke
DATA STANDARDIZATION
Recent efforts to come to grips with issues of standardization have had some
success. We report here an attempt initiated by an ad hoc committee formed at a
workshop on data standardization held as part of the program of the 1987 meeting
of the American Society of Primatologists (ASP) to define a set of standards that
would permit inter-institutional sharing and aggregation of basic population registry (or census) data.
Standards for a Single-EntryRegistry Record Structure
The initial motivation of the ad hoc committee for developing a standard format for basic colony registry data came from the recognition that 1) few summaries
of colony demography have been published, and 2) those published statistics that
exist are seldom truly comparable. The principal reasons for this state of affairs are
a lack of agreement on which demographic measurements and analyses are most
appropriate for primate colonies, and the fact that programs that perform demographic analysis of small populations are surprisingly complex. The situation is
further complicated by dissimilarities in record-keeping conventions that make it
difficult to adapt any one analysis program to a variety of databases. Some of these
dissimilarities are:
1. Differences in file and record structure. No two database programs seem to
use the same scheme to organize their data. Some require that the initial records
in a data file contain information about the format of data records that follow;
others keep this information in a separate file. Sometimes records are fixed in
length; sometimes their length is variable, depending on the amount of information they contain.
2. Differences in data format. Even when the same variables are kept, it is
uncommon for them to have the same length, or to be located in the same position
on the record from one database to the next.
3. Differences in coding. For example, some databases use a Julian calendar
form, in which dates are expressed as number of days since a fixed date in the past
(often January 1,1900), while others use one of the more common Gregorian forms.
The latter form presents its own difficulties in that months may be expressed
numerically or alphabetically (with full month names or their abbreviations),
years may be represented by 2 , 3 , or 4 digits, and the order in which years, months,
and days occur is variable. Likewise, some databases provide separate date fields
for birth and entry into the colony, as well as death and exit dates, while others
combine an entry date with an age a t time of entry and an exit date with a special
code defining the type of exit (natural death, experimental sacrifice, sale, etc.).
4. Differences in category. Most codes used to designate gender are limited to
some simple three-character combination (M, F, U or 1,2,3,etc. for males, females,
and unknown gender, respectively). However, some colonies find it useful to add
codes that distinguish hermaphrodites, experimentally induced intersexes, castrated individuals of either sex, etc. Codes designating variables such as class of
acquisition or cause of death are seldom the same in different databases.
5 . Differences in variables. The same basic information may be kept in quite
different ways. For example, some colonies maintain fertility histories as part of
the information stored for individual animals, while others must reconstruct reproductive chronologies from offspring census records.
No one of these discrepancies is particularly difficult to correct, but writing
analysis software that can accommodate the seemingly infinite variety of data
Basic Data Standards for Primate Colonies / 129
structures is not practicable. Consequently, the ad hoc committee decided that the
simplest course would be to develop standards for an intermediary file with a
simple record format that contains a set of variables likely to be common to all
record-keeping systems. This structure could be generated without the need to
write programs by using the ASCII export/import capabilities common to all modern database software.
Variables to be included in the standard were limited to the minimum
required for basic demographic analysis and for the construction of extended
pedigrees for population genetic analysis. For the most part, these requirements
are met by the CLAR report Basic Census File entry and disposition record
definitions, and are thus widely available. Detailed fertility analysis may require
multiple-entry information, but the committee felt that the initial attempt at
standardization should be limited to a single-entry file. Practical decisions about
specific format and coding were based on the conventions of the International
Species Inventory System (ISIS) Animal Records Keeping System (ARKS)
database [Scobie, 1987; Seal et al., 19771. Further minor adjustments were made
after checking these specifications against the formats of a number of colony
databases.
Fourteen variables were selected and a set of format and coding rules were
established. It was initially thought that most of these rules should be invariant,
but we underestimated the difficulties that colony data managers would have in
generating a standard registry file from existing data. To simplify this task, we
modified our initial practice of requiring that the length of individual variables be
fixed, by substituting a range of permissible widths for most fields. This has reduced the burden (and the error rate) a t the local level, at the relatively minor cost
of requiring analysis software to accommodate some variability in input formats.
The structure of this record is shown in Table A1 in the Appendix.
Currently, the standard defines several rules for generating and using the
registry file. These are listed in increasing order of flexibility:
1. The standard registry file must be accompanied by a full description of its
record structure. That is, each variable must be defined, its length in characters
and its data type (character, numeric integer, date, etc.) must be specified, and the
exact meaning of coded information must be given. The importance of this requirement cannot be overstated. Software developers and other users of the data can
accommodateto any amount of variability in format, but only if its extent is known
precisely.
2. Data must be represented only by printable text (ASCII) characters.
3. Although record length is not fixed, all records in the file must be the same
length.
4. No matter how many exit codes are used, it must be possible to group them
into four general classes (alive at time of census, natural death, lost to follow-up,
and unknown cause of exit).
5. Nine variables (IDSof Ego, Sire, and Dam; Sex; Birth, Entry, and Exit
Dates; Acquisition and Exit Codes) are required for demographic and pedigree
analysis, and must always appear in the standard registry file.
6. Four of the remaining five variables (Taxonomic code, Institution, Local
subgroup, Current location, and End-of-recordcharacter) are highly desirable, but
their absence usually does not preclude useful demographic and pedigree analysis.
7. Suggested upper and lower limits of field lengths, and fixed formats of
gender, taxonomic and institution codes may be overridden, although experience
shows that this is seldom necessary.
130 I Dyke
8. The order of variables in the record is not fixed (although the sequence
shown in Table A1 is convenient).
These rules should serve as guidelines for the construction of a standard registry file. It is encouraging to find in practice that it is relatively easy for colony
data managers (using standard database software with an existing database) to
generate a standard registry file that meets these requirements. Adherence to the
specifications alone, however, will not guarantee that data managers will be able
to perform demographic and pedigree analysis on their colonies. Analysis software
also must be written so that the several levels of variability described above can be
accommodated. We have found it is relatively easy to write software that accepts
input data having any format and coding scheme that meets these specifications.
Our programs [Dyke, 1989a,bl read an input data file, and also a separate text file
that defines the record structure and provides a “dictionary” that specifies how
codes (such as exit codes) will be transformed into a standardized set. The latter file
can be created with an ordinary text or word processor, and incorporates information about record format that must accompany the data file (see Rule l above).
Standards for Multiple-Entry Record Structures
As explained, initial efforts a t standardization began with single-entry data
partly because most colonies, following CLAR recommendations, already have
these data in relatively accessible single-entry format, and partly because these
are the data most crucial for basic demographic and pedigree analyses. Some
important analyses require data that are better kept in multiple-entry files, however.
Storage of multiple-entry data is not as straightforward as the single-entry
case. In particular, the convenience of keeping all data for a single individual on a
single record is complicated because 1)records will have different lengths if individuals in a population have differing numbers of entries. Many database systems
in fact are based on variable length records for reasons of file storage efficiency.
Unfortunately, there are a number of formats used for storage of variable length
records, each of which typically requires its own software techniques for manipulation. On the other hand, 2) if records are fixed a t the same length, they will all
have to be the same size as the record required for the individual with the most
data. If the quantity of data varies between individuals, this practice can be extremely wasteful of file storage, since space on each record must be reserved
whether or not there are entries to fill it.
An alternative to keeping multiple entries on a single record for each individual is to keep a file of multiple records for each individual, with each record
containing a fixed number of entries. A new record is created each time information is taken during the lifetime of the individual. The usual strategy is to keep
only one kind of data on each record, and to create separate files for different kinds
of data. We have chosen this strategy because 1) it makes it simple to keep all
records in a data file the same fixed length, 2) for many purposes it does not waste
file storage space, and 3) special software is not required for file management.
A strategy for recording multiple events. The actual format of the records
in such a file depends to a large extent on the nature of the data they contain. The
simplest form is one consisting of only the identification of the individual, a sequence indicator (usually a calendar date), and a single measurement or coded
observation. We call a multiple-entry file containing records of this sort an event
file. Event files typically contain information such as birth dates of offspring, dates
of routine clinical treatment and inoculations, weights and morphometric growth
Basic Data Standards for Primate Colonies / 131
measures, etc. Coded behavioral observations may also be stored in this way,
although the need to keep the individual's ID on each record may lead to excessive
use of file storage if the number of observations is large. Records are usually
grouped by ID and sorted by date or other sequence indicator. Details of this
minimal structure can be found in Table A11 in the Appendix.
A strategy for recording multiple changes of state. Event files are suitable for storage of repeated measurements and observations that occur only once.
Other measurements or observations, however, describe an ongoing condition or
state of the animal, implying that intervals, rather than instants of time must be
accounted for on each record. Two dates, rather than one, must be associated with
each measurement so that duration of state can be recorded.
The record format for data of this sort has been more difficult to standardize
than single-entry formats, because of differences both in database software and in
data. Nonetheless, for purposes of exchanging data we have found a pair of alternative record structures that appear to be relatively easily generated by most
database systems.
Following our decision to confine standardization to the minimum data required for basic demographic and population genetic analysis, and noting that
mortality analysis can be done with single-entry data, we have developed a multiple-entry file for the data required to analyze fertility. Calculation of fertility
rates must take into account the fact that individuals are not exposed to the risk
of reproducing a t all times during their lifetimes (in contrast to the risk of dying).
Captive animals are particularly subject to interruptions during the normal reproductive span-single-sex caging, experimental procedures, hospitalization, etc.,
result in periods of varying length during which risk of reproducing goes to zero.
To get an accurate assessment of individual reproductive performance, these interruptions must be accounted for. In most colony databases this information is
kept as a breeding or location file, but because the notion of exposure to risk has
applications beyond the study of fertility, we designate the standardized form in
more general terms as an exposure file.
The two standard formats are:
1. Single Date Format. The same Animal ID that appears in the registry file
must be present in this file, along with a date marking the beginning of a change
in reproductive status, and a code that specifies whether the status is one of exposure (Code E)or non-exposure (Code N) to risk of fertility. Optionally, Ego's sex,
ID of a mate (or a list of potential mates), and cage location may be included.
Records grouped by Animal ID and sorted by date of status change represent (in
conjunction with Entry and Exit dates) the exposure history of each individual.
The structure of this record and interpretation of its variables are shown in Table
A111 in the Appendix.
2. Two Date Format. This format is similar to the single date format, except
that it includes two dates representing the beginning and end, respectively, of a
period of exposure to risk of fertility, which obviates the need for an exposure
status code. Tabulation of risk is done only between two dates found on the same
record. The structure of this record and interpretation of its variables are shown in
Table AIV in the Appendix.
T h e exposure file is a special case of data representing continual chronological
change of status. Other multiple-entry data involve repeated measurements, or the
recording of historical events that cannot be tabulated by linking records in the
basic registry file. For example, conventional fertility analysis is based on live
132 I Dyke
births, which can be tabulated by linking parental and offspring records in the
registry file. Since census and registry files rarely contain records for stillbirths or
abortuses, analyses of reproductive loss require special recording of such events,
typically in a reproductive history file. To date, standardization of these files has
not been tried, not because of differences in record structures (which tend to be
relatively simple), but because of a diversity of the data per se. This variety results
from differences in species-specific reproductive patterns, colony-specific differences in definitions of variables used to represent reproductive events, and lack of
agreement on appropriate demographic measures to be used. The problem is exacerbated by well-known difficulties in assuring the quality of reproductive loss
data. Nonetheless, it should be relatively easy to modify one of the two formats
given here to accommodate more complex data structures.
SUMMARY
With the development of powerful and inexpensive computer technology, animal colony record-keeping has reached the stage where common data standards
are practicable. Initially, these standards began as a rather strict set of specifications that left the responsibility for matching data format with program input
requirements solely with colony data managers. Gradually they have evolved into
a more satisfactory compromise in which the task is shared with the programs that
use the data. The result is a system that combines great flexibility for data managers with reasonable assurance that their data can be analyzed without the need
for customized software.
ACKNOWLEDGMENTS
The original ad hoc committee on data standards consisted of Arthur Davis of
the Washington Regional Primate Center; Drs. Bennett Dyke, of the Southwest
Foundation for Biomedical Research; Andrew J. Petto, of the New England Regional Primate Research Center; Samuel Sholl of the Wisconsin Regional Primate
Research Center; Brent Swenson, of the Yerkes Regional Primate Research Center; and Lawrence Williams, of the University of South Alabama Primate Research
Laboratory. Paul Dubois, of the Wisconsin Center, and Paul Mamelka, of the
Southwest Foundation, have also made important contributions.
Other individuals who have provided helpful suggestions and criticisms are
Drs. Fred Bercovitch, John Capitanio, Carolyn Erhardt, Charles Howard, and
Doris Zumpe.
Funding for organizing some of the Committee's work, and for preparation of
this manuscript was provided by NIH grant RR02229.
REFERENCES
Dyke, B. ACMP, ANIMAL COLONY MANAGEMENT PACKAGE USER GUIDE.
PGL TECHNICAL REPORT NO. 3. San
Antonio, TX, Population Genetics Laboratory, Southwest Foundation for Biomedical
Research, 1989a.
Dyke, B. PEDSYS, A PEDIGREE DATA
MANAGEMENT SYSTEM USER'S MANUAL. PGL TECHNICAL REPORT NO. 2.
San Antonio, TX, Population Genetics Laboratory, Southwest Foundation for Biomedical Research, 1989b.
National Research Council. LABORATORY
ANIMAL RECORDS. Committee on Laboratory Animal Records. Washington, DC
Institute of Laboratory Animal Resources,
National Academy of Sciences (DHEW
publ. no. (NIH) 80-2064), 1979.
Scobie, P. USER MANUAL FOR ARKS, AN
ANIMAL RECORD KEEPING SYSTEM.
Apple Valley, MN, ISIS, 1987.
Seal, U.S.; Makey, D.G.; Bridgewater, D.;
Simmons, L.; Murtfeldt, L.E. ISIS: A computerized record system for the manaaement of wild animals in captivity. INTgRNATIONAL ZOO YEARBOOK 17:68-70.
1977.
Basic Data Standards for Primate Colonies / 133
APPENDIX
I. The Standard Single-Entry Registry Record Structure (Table AI)
Variables in this record were chosen to conform minimally to the ILAR Laboratory Animal Records Basic Census Record and the ISIS Universal Single Entry
Data category. This information is sufficient for detailed demographic analysis,
and for construction and analysis of extended pedigrees. All data must be in text
(printable ASCII) format.
Three classes of data variables are recognized:
1. REQUIRED ITEMS(1-9) that must be present for demographic and pedigree analysis.
2. OPTIONAL ITEMS(10-13) that are highly desirable as population or subgroup identifiers. Item 14 is an important aid in assuring that all records are the
same length. The ‘5”
character is often used for this purpose.
3. NON-STANDARDITEMS of information may be included a t the end of the
record, just prior to the end-of-record field. These variables are usually ignored in
most demographic and pedigree analysis programs, but may be important for other
reasons.
It is essential that an explicit description of the record structure accompany
data files transferred between institutions. This information should include
1. A population title and calendar date of census (normally, the date of the
most recent vital event recorded in the file).
2. Name, address, and telephone number of the person to be contacted for
information about the data. An electronic mail address (Bitnet or Internet, for
example) is highly desirable, as well.
3. Information about the type and format of disk or tape on which data are
stored.
4. Field lengths, data types, and field descriptions for all items corresponding
to the table above.
5. Codes and their definitions for standard items 4 , 7 , and 9 (SEX,AQCODE,
and EXCODE) that match those listed in notes (F),(HI, and (I)below, respectively.
TABLE AI. Standard Registry Record Structure
Variable
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Mnemonic
(A)
Length
(B)
Type
(C)
Description
ID (or EGO)
SIRE
DAM
SEX
BIRTH
ENTRY
AQCODE
EXIT
EXCODE
TAXON
INSW
SUBGRP
LOC
EOR
3 -10
3-10
3-10
1
6-15
6-15
2-6
6-15
2-16
3
6-9
3 -10
3-10
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
Animal ID
Sire ID
Dam ID
Sex
Birth date
Entry date
Acquisition code
Exit date
Exit code
Taxonomic code
Institution code
Local subgroup code
Current location code
End-of-record character
(See notes (A) through (M)below for explanations.)
Note
134 I Dyke
For additional flexibility that may alleviate potential problems in conforming to
the standard, the code lists are left open, and new codes may be added to designate
categories not defined by the standard. If this is done, alternate codes and descriptions should be defined explicitly.
6. Explicitly defined codes and descriptions for optional items 10, 11, 12, 13,
and 14 (TAXON, INSTT, SUBGRP, LOC, and EOR).
7. A field length, data type, field description, and codes for each additional
non-standard item of information added.
8. Except where noted, alphabetic information is assumed to be case-sensitive.
See notes (El and (F).
Notes:
(A) Mnemonics are used in some software packages to select variables for analysis
and as headings for tabular output. When supplied, mnemonics should be upper
case and a maximum of six characters in length.
(B) Recommended ranges of field lengths given here are based on those found in
several colony databases. Other field lengths may be used, so long as the exact
length is specified.
(C) Normally, all data are interpreted as text characters and should be designated
“C.” Provision is made for numeric types, however:
C
F
I
=
=
=
character (left-justified)
floating point (right-justified)
integer (right-justified)
Designators should be upper case only.
(D) Animal ID must be present. Mnemonic used should be either ID or EGO, a
term used to indicate the individual of reference in a pedigree.
(E) Unknown parents may be designated by blanks, zeros, g’s, or the string UNK
or UNKNOWN. Alphabetic coding in this instance is case-insensitive; that is, the
string unknown may be substituted for UNKNOWN, etc.
(F) Gender is usually coded as 1,2,3,or M, F, U for male, female, and unknown,
respectively. Alphabetic coding for gender is case-insensitive. That is, m, f, or u
may be substituted for M, F, or U, respectively. Codes for other gender categories
are permitted, but for purposes of analysis it must be possible to subsume them
under one of the three categories above.
(G) Preferred order of date is Year-Month-Day (YYYYMMDD,where YYYY, MM,
and DD are integer representations of year, month, and day, respectively.) However, nearly any systematic representation is acceptable: alphabetic representation of month (three-letter abbreviation or full name) is permitted, as are threeand two-digit representations of year. Order of year, month, and day is not fixed,
and elements may be separated by slashes, periods, or dashes. Alphabetic month
may be separated from integer day or year by a single blank; day may be separated
from year by a comma and blank when occurring with alphabetic month. Examples
are as follows:
Basic Data Standards for Primate Colonies I 135
2910 11990 Jan291990
29JAN1990
900129
1990January29 Jan 29,1990 01-29-990 1990.01.29
Important: Different styles of date representation may not be mixed in any one file.
An estimated date element is signified by entry of year or year and month only
(YYYYOOOO or YYYYbbbb, where b = blank).
Dates of events that have not occurred (i.e., death date of living animals) are designated by a full field of blanks or 0’s. Examples are as follows:
OOJanl990
900100
291 I990 Jan 1990
199OJanuaryOO Jan ,1990 01-00-990 1990.01.
Unrecorded dates of events known to have occurred are often designated by a full
field of 9’s. However, with the approach of the year 1999, this practice is best
avoided unless all four digits of the year are used; the use of asterisks in this case
is suggested: 9999199199 is correct, but eventually 99 or 999 alone as designators of
year will be ambiguous; **-**-1975 should not be substituted for 00-00-1975.
Many demographic analyses assume that an event occurs on the 15th day of the
month if only month and year are known, and on June 30 if only year is known.
By convention, individuals are counted as alive on the day of their birth or entry,
but not on the day of exit.
Julian calendar representation should be avoided, since most people find it difficult
to interpret by eye, and the fixed date in the past from which days are counted may
vary from one database system to the next. Dates in Julian form need not be
absolutely prohibited, however, since software is available that will convert between it and Gregorian calendars.
(H)Acquisition Codes. It is often useful to compare characteristics of animals that
have different origins. The following list represents classes common to several
large US.colonies and represent categories generally thought to include enough
animals for meaningful analysis. Other categories may be added as needed:
1 = Wild-born (known provenance)
2 = Wild-born (unknown provenance)
3 = Colony-born (local)
4 = Colony-born (other colony)
5 = Purchase or trade (unknown provenance)
6 = Caesarian section (local)
7 = Caesarian section (other colony)
99 = Unknown origin
(I) Exit Codes. Many analyses require that animals be classified according to their
exposure to risk of events such as reproduction, death, illness, etc. The reason for
an animal leaving the population is an important determinant of this classification. The following list represents exit classes common to several large U.S.colonies and represent categories generally thought to include enough animals for
meaningful analysis. Other categories may be added as needed:
136 I Dyke
Alive at time of census
Natural death
Euthanasia for humane reasons (not research related)
Trauma
= Stillbirth
= Abortion
= Sale or trade
= Escape or theft
= Sacrifice
= Research-related death (other than sacrifice)
= Unknown cause of exit
0 =
1=
2 =
3 =
4
5
6
7
8
9
99
Most demographic analyses require that animals be classified according to only
five exit categories. These categories, and their corresponding Exit Codes, are
given below:
Alive a t time of census
Natural death
Lost to follow-up (non-research related)
Lost to follow-up (research related)
Unknown cause of exit
Corresponding to Code 0 above
Corresponding to Codes 1-5
Corresponding to Codes 6, 7
Corresponding to Codes 8 , 9
Corresponding to Code 99
This breakdown should serve as a guide for translating or creating exit categories
that do not match the Exit Codes as given above.
(J) Since species (or subspecies) may differ with respect to vital statistics, taxonomic identification may be important census information. When combined with
an institution code, this item also serves as an independent means of identifying
individual records with the population to which they apply. It is usually not a good
idea to mix widely divergent taxa in the same census file. A good rule of thumb is
to make a separate file for each species, and to identify animals in the file by
subspecies, if that is appropriate. Some (arbitrary) taxonomic codes that have been
used so far are:
MM
MN
PHA
PHC
SBB
SO0
Etc.
= Macaca mulatta
= Macaca nemestrina
= Papio hamadryas anubis
= Papio hamadryas cynocephalus
= Saimiri boliviensis boliviensis
= Saguinus oedipus oedipus
For the time being, additional codes are left to be defined by the individual colonies.
(K) Institution codes. Institutional designation is sometimes a useful component
of identification of records. Maximum field width is sufficient for the ISIS Institution name abbreviation. The following (arbitrary) list of institution codes has been
used for some of the major U.S. primate colonies:
CPRC
DRPRC
=
=
California Primate Research Center
Delta Regional Primate Research Center
Basic Data Standards for Primate Colonies / 137
NERPRC
ORPRC
ORAUMC
SFBR
UTMDA
USAPRL
WaRPRC
WiRPRC
YRPRC
Etc.
=
=
=
=
=
=
New England Regional Primate Research Center
Oregon Regional Primate Research Center
Oak Ridge Associated Universities Marmoset Colony
Southwest Foundation for Biomedical Research
University of Texas M.D. Anderson Cancer Center
Univ. S. Alabama Primate Research Laboratory
Washington Regional Primate Research Center
Wisconsin Regional Primate Research Center
Yerkes Regional Primate Research Center
(L) “Local subgroup” and “Current location” categories are sometimes useful in
defining population subunits that may be analyzed separately. These categories
are, however, optional.
(M) The default practice for most communications software is to discard blank
characters a t the end of records. This can lead to problems such as differences in
length of records if the last field of some, but not all, records contain trailing
blanks. In these cases a single non-blank character can be appended to each record
before transmission to assure that all records retain their full length. Analysis
software will add as many blanks as necessary to the left of this character to assure
that it appears in the same position at the end of each record in the file. This is an
optional feature; choice of end-of-record character is also optional, although “>” is
suggested.
Example:
The following example shows two records from a Standard Single-entry Registry File. The line below the second record serves as a ruler with stroke characters
(1) marking the beginning of each field. Numbers below the strokes correspond to
variables in Table AI. The first record indicates that Ego Rho53 has sire Rho12 and
dam Rho22 is a female born on May 21, 1987; her birth date was also the date of
entry into the colony, and she is thus classified as colony-born; no date or code of
exit is given, meaning that she is still alive; she is a rhesus macaque belonging to
the Southwest Foundation for Biomedical Research local subgroup “Gen,” and is
housed in Cage A. The second record indicates that Ego Rh102 has unknown
parents, is a male born sometime in 1985 and entered the colony on April 10,1991
from another colony where he was born; he died on March 13, 1992 of natural
causes; he too was a Rhesus belonging to the SFBR “Gen” subgroup, and was
housed in Cage B.
Rh053Rh022Rh022F8705228~05220~
M M SFBR GenCageA>
Rh202
M8500009204200492032302MM SFBR GenCageBB
I. . I. . ). . I . . .1. . .(.(. . .1.1. 1. . 1. 1. . 1
2
2
3
45
b
7 8
9 10 2 2
2 2 13
14
11. Multiple-EntryData Files
1.Event files. Event files are used for storage of repeated measurements and
observations that occur only once. In their simplest form they consist of the identification of the individual, a time or calendar date, a single measurement or coded
observation, and an optional end-of-record field, as shown in Table AII.
138 I Dyke
TABLE AII. MdtiDle-Entrv Data Record Structure (Event Ne)
~
~
~~
Variable
Mnemonic
Length
Type
Description
1
EGO
DATE
3-10
6-15
1-<n>
1
C
C
<varies>
C
Animal ID
Date of event
Observation/measurement
End-of record character
2
3
4
<EVENT>
EOR
Form and content of Variable 3 in the table will depend on the observation or
measurement recorded. Its mnemonic <EVENT> is enclosed in angle brackets to
indicate that although its presence is required, its content is optional. Length and
Data Type of this field also depend on the nature of the event. Animal ID, Date,
and EOR fields should conform to specifications described above. A record is created for each event. The event file need not contain the same number of records for
each individual, nor is it necessary for all individuals represented in the registry
file to be included in the event file. Records in the event file are usually grouped
by Animal ID and sorted by date.
Example:
The following example shows three records from an Event File. The line below
the third record serves as a ruler with stroke characters (1) marking the beginning
of each field. Numbers below the strokes correspond to variables in Table AII. The
records show the blood sampling history of Ego Rho53 with each record containing
a separate sampling date and “event” (in this case the size and type of sample
tube).
Rh053880629 SmlRedTop>
Rh05389062220mlRedTop>
Rh05390060920mlRedTop>
(....I._...I..._....
1
-1
4
4
2. Fertility exposure files. When measurements or observations refer to an
ongoing condition or state, intervals, rather than instants of time must be accounted for on each record. That is, two dates, rather than one, must be associated
with each measurement. Two extended formats, corresponding to alternate conventions for defining dates of exposure to risk to fertility, are given here.
1. Single date format. In this configuration (Table AIII), it is always assumed that Ego begins life in a non-exposure status, and that the beginning of risk
of fertility is defined by a date specified in the first record bearing status code E.
The period of risk lasts until the date found on the first subsequent record bearing
status code N,which in turn lasts until the date associated with the next record
bearing status E, etc. Risk of exposure to fertility finally terminates at the Exit
date, end of reproductive age, or end of the time period being analyzed.
2
Interpretation:
The table below lists the three combinations of IDS,
dates and exposure status
codes that may be present in Exposure File records:
Basic Data Standards for Primate Colonies / 139
EGO CHANGE STATUS SEX MATE
+
+
+
-
-
+
+
+
+
+
+
+
-
+
NOTE
(a)
(b)
(C)
where + represents the presence, - the absence of an ID, date, or code. Other
configurations are not permitted and will generate error messages.
TABLE AIII. Multiple-Entry Data Record Structure (Single Date Exposure File)
Variable
Mnemonic
Length
T.me
Description
1
2
EGO
CHANGE
STATUS
SEX
MATE
C
C
C
C
C
LOC
3-10
6-15
1-6
1
3-10
3-10
EOR
1
C
Animal ID
Status change date
Exposure status code
Ego’s sex
Mate ID
Cage location
End-of record character
3
4
5
6
7
C
Notes:
(a) Reproduction of EGO is measured without regard to identity of mate. Because
neither a Status Change Date nor Status Code is present on the record, it is
assumed that EGO is continuously exposed to risk of fertility over a single interval. The beginning of this interval is defined by EGO’SBirth Date, Entry Date, or
by the beginning of the Registry Period (whichever is later); the end is defined by
EGOs Exit Date or the end of the Registry Period (whichever is earlier). Only one
such record for each individual is permitted.
(b) Reproduction of EGO is measured without regard to identity of mate. Here, we
allow for exposure to risk of fertility that may be discontinuous, and whose range
need not be defined by EGOs Birth and Exit dates. Multiple records for each
individual define the duration of intervals, and the exposure status code associated
with each record determines whether the interval is one of exposure (code E)or
non-exposure (code N).In this configuration, it is always assumed that EGO begins
life in a non-exposure status, and that the beginning of risk of fertility is defined
by a date specified in the first Exposure File record bearing status code E. End of
an interval may be determined in one of two ways:
If record (i) is immediately followed by another (i + 1) that also specifies EGO,
end of interval (i) is taken to be CHANGE(i + l), which is also the beginning of
interval (i 11, etc.
0
+
0 If no subsequent record for EGO is found, the end of the interval is assumed to
be EGOS Exit Date or end of the Registry Period (whichever is earlier).
Records for each individual should be contiguous and ordered by Status Change
Date such that
Birth Date . . . ICHANGE(i) ICHANGE(i+ 1) 5 . . . Exit Date.
Contiguous records with the same status code are treated as a single interval that
is terminated as described above. Configuration (b) requires that an Exposure
Change Date and an Exposure Status Code be present on each record of the indi-
140 I Dyke
vidual whose exposure is being defined. The Exposure File may contain records
having both configurations (a) and (b), although mixed configuration may not be
applied to the same individual.
(c) Only offspring produced jointly by EGO and MATE are tallied. This configuration is treated in a more restrictive fashion than configurations (a) and (b) above
in that it is always assumed that the beginning of risk of fertility is defined by a
date specified in the first Exposure File record having an Exposure Status Code E
(that is, that the initial status of both individuals included in the record is one of
non-risk). End of an interval may be determined in one of three ways:
If record (i) is immediately followed by another (i + 1)that also specifies EGO
l), which is also the
and MATE, end of interval (i) is taken to be CHANGE (i
beginning of interval (i + l), etc.
0
+
If record (i) is immediately followed by another (i + 1)that specifies EGO and
a new MATE, the fertility of EGO and the new MATE is assumed to have ended at
CHANGE(i + l), which is also taken to be the first interval of a new series
defining the joint fertility of EGO and the new MATE.
0
If no subsequent record for EGO is found, the end of the interval is assumed to
be EGOS Exit Date, current mate’s Exit Date, or the end of the Registry Period
(whichever is earliest).
Configuration (c) requires that an Exposure Change Date, Exposure Status Code,
and a MATE ID be present on each record representing a couple whose exposure is
being defined. Records for EGO must be contiguous. EGO may have more than one
mate (and in fact more than one mate a t one time) but records for each mate must
be contiguous and ordered by Exposure Change Date as described above. An Exposure File containing records in configuration (c) may not contain records in
configurations (a)or (b). Normally, configuration (c) applies to human populations,
where marital fertility is a standard measure of interest. Only categories (a) and
(b) would be used with non-human primates, where monogamy and identity of
specific mates tends to be less important than simple co-residence with any member of the opposite sex.
By convention, a day of exposure is counted on the beginning, but not the end date
of the interval.
Example:
The following example shows four records from a Single Date Exposure File.
The line below the fourth record serves as a ruler with stroke characters (I)
marking the beginning of each field. Numbers below the strokes correspond to
variables in Table AIII. The series indicates that female Ego is first exposed to
fertility on April 10,1991when she is housed with mate Rh102 in location CageB
(Record 1). On November 4 of the same year her exposure is interrupted when she
is separated from this male as the result of her removal to CageA (Record 2), until
February 28,1992, when she is re-housed with him in CageB (Record 3). The two
are separated again two weeks later on March 13 when Rh102 is removed from
CageB (Record 4).
Basic Data Standards for Primate Colonies / 141
Rh0539llO~llOEFRhll02CageB>
Rh0539ll%ll04NF
CageA>
Rh053920228EFRhll02CageB>
Rh053920313NF
CageB>
I..
ll
.. I . .. .
2
.A&.
I..
L-+
2. Two date formut. In this configuration (Table AIV), tabulation of risk is
done only between a start and stop date found on the same record.
TABLE AIV. Multiple-Entry Data Record Structure (Two Date Exposure File)
Variable
Mnemonic
1
2
3
4
5
6
7
EGO
START
STOP
SEX
MATE
LOC
EOR
Length
Type
Description
3-10
6-15
6-15
1
3-10
3-10
1
C
C
C
C
C
C
C
Animal ID
Exposure start date
Exposure stop date
Ego’s sex
Mate ID
Cage location
End-of record character
Interpretation:
The table below lists the three combinations of IDS and dates that may be
present in Exposure File records:
EGO START STOP SEX MATE
+
+
+
+
+
-
+
+
+
-
+
+
+
+
+
+
+
+
-
-
+
NOTE
(d)
(el
(f)
(€9
(h)
where + represents the presence, - the absence of an ID, date, or code. Other
configurations are not permitted and will generate error messages.
Notes:
(d) Reproduction of EGO is measured without regard to identity of mate. Because
neither a START nor STOP date is present on the record, it is assumed that EGO
is continuously exposed to risk of fertility over a single interval. The beginning of
this interval is defined by EGO’S Birth Date, Entry Date, or by the beginning of the
Registry Period (whichever is later); the end is defined by EGOS Exit Date or the
end of the Registry Period (whichever is earlier). Only one such record for each
individual is permitted.
(e) Reproduction of EGO is measured without regard to identity of mate. In this
configuration, it is always assumed that EGO begins life in a non-exposure status,
and that the beginning of risk of fertility is defined by the START date specified in
the Exposure File record. End of interval is specified by EGO’S Exit Date or the end
of the Registry Period (whichever is earlier).
(f) Reproduction of EGO is measured without regard to identity of mate. In this
142 I Dyke
configuration, it is always assumed that EGO begins life in a non-exposure status,
and that the beginning of risk of fertility is defined by EGO’S Birth Date, Entry
Date, or by the beginning of the Registry Period (whichever is later). End of
interval is specified by the STOP date specified in the Exposure File record.
(g) Reproduction of EGO is measured without regard to identity of mate. In this
configuration, it is always assumed that EGO begins life in a non-exposure status,
and that the beginning of risk of fertility is defined by the START date specified in
the first Exposure File record. End of interval is always specified by the STOP
date.
(h) Only offspring produced jointly by EGO and MATE are tallied. It is always
assumed that the beginning of risk of fertility is defined by START (that is, that
the initial status of both individuals included in the record is one of non-risk).
With configurations (e) through (h) we allow for exposure to risk of fertility that
may be discontinuous, and whose range need not be defined by EGOS Birth or Exit
dates. Multiple Exposure records for each individual define the duration of intervals of exposure Records for each individual should be contiguous and ordered by
date, such that
Birth Date . . . 5 START(i) 5 STOP(Z) 5 START(I+ 1) I STOP(i+ 1) . . . 5 Exit Date.
Contiguous records in which the STOP date of record (i) is the same as the START
date of record (z + 1) are treated as single interval that is terminated by the STOP
date of record (i t- 1). The Exposure File may contain records having configurations (d) through (g), and a mixture of configurations (e) through (g) may be applied
to a single individual. Configuration (d) may not be mixed with configurations (el
through (g) for same individual.
Records for EGO must be contiguous. EGO may have more than one mate (and in
fact more than one mate at one time) but records for each mate must be contiguous
and ordered by date as described in (g) above. An Exposure File containing records
in configuration (h) may not contain records in configurations (d) through (g).
Normally, configuration (h) applies to human populations, where marital fertility
is a standard measure of interest. Only categories (d) through (g) would be used
with non-human primates, where monogamy and identity of specific mates tends
to be less important than simple co-residence with any member of the opposite sex.
By convention, a day of exposure is counted on the beginning, but not the end date
of the interval.
Example:
The following example shows two records from a Two Date Exposure File that
express the same timing as the Single Date Exposure example given above. The
line below the second record serves as a ruler with stroke characters (I)
marking the beginning of each field. Numbers below the strokes correspond to
variables in Table AIV. The series indicates the same sequence as the example
given above for the Single Date Exposure File: female Ego is first exposed to
fertility for a period lasting from April 10,1991 to (but not including) November 4
of the same year, during which time she is housed with mate Rh102 (Record 1) in
Basic Data Standards for Primate Colonies / 143
location CageB. After a period of non-exposure lasting from November 4, 1991
through February 27,1992,she is again housed with the same male from February
28,1992to March 13,1992in the same location, at which time her exposure ends
(Record 2).
Rh053920~20922204FRh~O2CageB>
Rh0539202289203~3FRh202CageB>
I .... I ..... I ..... 1 1 . . . . I
1
2
3
45
....I
b
7
Note that this format is somewhat more concise than Single Date format, but
does not express explicit information about intervals of non-exposure.
Important note: All records in an Exposure File must have the same
format.
Документ
Категория
Без категории
Просмотров
4
Размер файла
1 255 Кб
Теги
data, colonies, primate, basic, standards
1/--страниц
Пожаловаться на содержимое документа