| CHAPTER 4
DATA STANDARDS AND CATEGORIES
4.1 Being aware of standards
As mentioned in Chapter 1, if you wish to exchange data at some point then you need to be aware of standards -- in other words accepted systems of naming and ordering things. Even if you have no intention of exchanging data it may be useful to know that certain standards exist or are being worked on. Where international standards are absent, categories may have been devised to cover certain topics. As a guide for choosing fields, a selection of these categories is included as well; perhaps some of these will evolve eventually into standards. For some topics, however, it is difficult to gain universal usage as regards categories; this is understandable because the objectives, type of study area and geographical scale of investigation can vary considerably from one project to another and so make common standards difficult to achieve (e.g. for habitat categories).
For botanists the most important standards are concerned with the taxonomic framework -- the names and relationships of plants. This of course has a long history (going back to Linnaeus in the 1730s) but the advent of computerised databases has meant that very exact standards are now required. Because of this and the need to reach internationally accepted standards a group of botanists formed the Taxonomic Databases Working Group (TDWG) in 1985. The group is now called the International Working Group on Taxonomic Databases for Plant Sciences although its short name is still TDWG. Regular meetings are held, and a number of publications on standards have been produced, some of which will be referred to below.
How do these standards work in practice? A prime example is the International Transfer Format for Botanic Garden Plant Records -- or ITF for short. Several versions of the ITF have been published, the first in 1987, and the most recent in 1997 (Wyse Jackson, 1997). It has taken this period of time to iron out difficulties and to reach international agreement. As it stands at present the ITF allows for 73 standardised fields, although it is not necessary to fill in all of these, and a short list of 23 fields is suggested for setting up a simple database (see below). Each field is described, and rules are given as to how data items should be written and entered into the computer. If all the rules are followed it should be possible to exchange data freely with other botanic gardens which can then share a common pool of information.
Although the ITF will be directly relevant to relatively few people, it provides an example of how standards can be defined for computerised data exchange. In addition, some of the standardised fields, particularly for names and places, can be used for a variety of other purposes.
The following sections are intended as a guide to existing botanical standards and categories, starting with the most important -- plant names.
4.2 Plant names (scientific)
Once plants have been identified and named it is important to atomise the data and to be aware of the basic fields -- for family, genus, species and subspecies names -- as illustrated in Figure 4.1.
(Figure 4.1 about here)
For many purposes these basic fields, or perhaps just genus and species names, will be sufficient. However, if you are involved in detailed taxonomic work then a more comprehensive standard may be needed. A TDWG-approved standard is available (Bisby, 1994) that does cater to some extent for the needs of different users, being divided into four levels of complexity (ranging from a Limited Standard to a Full Standard). This TDWG standard on plant names has been used in the ITF mentioned above.
For the sake of illustration, the short list of 23 fields recommended by the ITF is shown in Table 4.1, although it must be remembered that this is a special list of fields (including more than just plant names) designed for the needs of botanic gardens. If you are not interested in accession records for botanic gardens then the list can either be noted in passing or used very selectively.
(Table 4.1 about here)
As mentioned in Chapter 1, a world list of families and genera for vascular plants is available as a computer file, the information being the same as in Brummitt (1992). These can be regarded as standard names.
4.3 Plant names (vernacular)
Recording of local or vernacular names is especially important in ethnobotanical databases. A single field may be sufficient to record the local name of a plant -- as in Table 4.1 -- but for linguistic analysis (see Martin, 1995) several fields might be needed. For instance, Cunningham et al.(1995) noted different (but similar) local names for the same variety of banana in Uganda and Kenya and recommended that these names be stored in two separate fields, one for the root word and another for the prefix (see Table 4.2). In other parts of the world local names may be characterised by a root word followed by a suffix (again, see Martin, 1995).
(Table 4.2 about here)
It may also be necessary to record trade names of a plant and an extra field (or fields) may be needed for this purpose. An example would be the name "meranti" which refers to a range of tropical hardwood species.
4.4 Life form
It may be useful to refer to a standard way of recording the physical form of plants, be they, for example, large trees, shrubs, herbaceous species, epiphytes or climbers. No TDWG standard on life form exists at this point, although a publication is planned on this topic. However, Cunningham (1988) compiled a list of life form categories based mainly on those of Raunkiaer (1934), but with extra categories obtained from Mueller-Dombois and Ellenberg (1974) and from Box (1981). These life form categories are listed and described in Table 4.3 and have been used in Chapters 6 and 7.
(Table 4.3 about here)
4.5 Geographical location of a plant
A group of fields can be considered under this heading, ranging from a map reference for a specific site to a broad geographical region. Both altitude and depth (for aquatic plants) can also be included here. It should be noted that the following fields are specified in detail in the ITF. The ITF also specifies several extra fields not listed below.
(a) Latitude and longitude
There are six main fields that can be listed here, namely:
Latitude, Degrees
Latitude, Minutes
Latitude, Seconds
Longitude, Degrees
Longitude, Minutes
Longitude, Seconds
(b) Altitude and depth
Both these fields should be recorded in metres, although feet may be preferable in some situations (e.g. when these are the units used on maps or are the country standard -- as in the USA for instance).
(c) Locality
This can refer to any place within the country or Basic Recording Units (see below).
(d) The TDWG standard for countries and regions
This standard, prepared by Hollis and Brummitt (1992), outlines a world scheme for recording plant distributions. In essence, the scheme provides a five-letter alphabetical code to denote a Basic Recording Unit -- or BRU. The BRUs are either equivalent to countries or, in the case of large countries, to regions, states or provinces within the country. The complete list of BRUs is given by Hollis and Brummitt and is also available as a database file (see Appendix).
The BRUs are defined so that they may be grouped together in two different ways: either into politically based units recognised by the International Standards Organisation or into a continental and regional scheme (as denoted by a two-digit code).
4.6 Plant distribution: commonness and rarity
In databases concerned with the use and conservation of plants it may be important to indicate the nature of a species' distribution within a region. For instance, some species will be restricted and rare while others will be widespread and common, and this will affect any management that is carried out. Two scales can be mentioned here, although neither are international standards. Both depend on knowledge about three factors that determine how common or rare a species is, namely:
(1) Geographical range: whether a species is found over a wide area or whether it is narrowly restricted or endemic to a small area.
(2) Habitat range: whether the species is found in a wide range of habitats or restricted to very few.
(3) Local population size: whether a species is found in large populations in some places or whether it is only found in small populations.
Based on these factors, Rabinowitz, Cairns and Dillon (1986) have proposed a 7-point scale (Table 4.4) while Moll (1981) employed a 5-point scale (Table 4.5). Moll's scale has been used in Chapters 6 and 7.
(Tables 4.4 and 4.5 about here)
4.7 Habitat type
No internationally accepted standard on habitat type appears to be available, although the TDWG is seeking a simple system that can be used to categorise the habitat, soil types and landscape in which a plant occurs. However, as mentioned above, it may not be possible to arrive at a universal standard.
It can be noted in passing that the Tree Conservation Information Service of the World Conservation Monitoring Centre (see Appendix for contact address) has compiled a data collection form that includes a section on habitat type, as shown in Table 4.6. These categories allow a broad description of many habitat type to be made, but as they were designed with trees in mind not all habitat types are included.
(Table 4.6 about here)
4.8 Conservation status
The chief set of categories to be aware of under this heading is that published by the International Union for the Conservation of Nature (IUCN) in 1994 (entitled "IUCN Red List Categories"). If any species is considered to be of conservation concern (i.e. threatened) then it may be assigned to a category based on a set of criteria dealing with population size, trends and distribution. The details of the criteria are given in the publication, which should be referred to if you wish to use these IUCN categories. The categories themselves are described briefly in Box 4.1.
------------------
Box 4.1 IUCN Red List Categories
EXTINCT (EX): Species that have disappeared completely.
EXTINCT IN THE WILD (EW): Species (plants) that may exist in cultivation or botanic gardens but have disappeared from their natural range.
CRITICALLY ENDANGERED (CR): Species that face a very high risk of imminent extinction in the wild.
ENDANGERED (EN): Species that are not in the CR category but still face a very high risk of extinction.
VULNERABLE (VU): Species that face a high risk of extinction in the wild in the medium-term future.
LOWER RISK (LR): This category is divided into three:
Conservation Dependent (cd): Species that depend on conservation programmes for their survival in the wild.
Near Threatened (nt): Species that are not in the cd category but which are close to being placed in the VU category.
Least Concern (lc): Species that do not fall into the cd or nt categories.
Two extra categories are recognised, namely:
DATA DEFICIENT (DD): Species for which insufficient information is available for making a Red List assessment.
NOT EVALUATED (NE): Species that have not been assessed using the IUCN criteria.
---------------------(end of Box 4.1)
If a species is threatened then it may be useful to specify the nature of the threat. Although no internationally-accepted standard exists, a list of threats is included in the data form of the Tree Conservation Information Service of the WCMC mentioned above. These threats are given in Box 4.2.
---------------------------
Box 4.2 List of threats used by the Tree Conservation Information Service
* Felling * Grazing * Exploitation of plant parts * Fire * Natural disaster * Pollution * Pests & Diseases * Invasive species * Lack of dispersal or pollinating agents * Seed Predation * Poor regeneration for unknown reasons * Mining
Tourism * Industrial development * Agriculture * Forestry * Expansion of human population * Decline in soil water content
Other major threat *
In a situation where several threats are present the most serious can be indicated with the number 1.
------------------ (end of Box 4.2)
4.9 Economic use
A publication called the "Economic Botany Data Collection Standard" has been prepared for the TDWG by Cook (1995). This provides a means of describing the uses of plants (or plant parts) by employing standardised descriptors and terms. There are three levels to the standard, ranging from Level 1 (the broadest) to Level 3 (the most specific). The standard offers a most comprehensive coverage and permits extremely specific plant uses to be described. Table 4.7 shows the broad headings provided at Level 1 in order to indicate the range of topics included in the standard.
(Table 4.7 about here)
This TDWG standard gives a full list of plant parts, as shown in Table 4.8. Some of these terms are used in Chapter 6.
(Table 4.8 about here)
4.10 Names and addresses
Names and addresses may be required, for example, for those who have given information or for those who have collected and identified plant material. There does not appear to be a suitable international standard, but a checklist of relevant fields to chose from may be useful (see Table 4.9). In a relational database, names and addresses would normally be stored in a separate, or associated, table (see Chapter 5).
(Table 4.9 about here)
|