| CHAPTER 6
USING A DATABASE: AN ETHNOBOTANICAL EXAMPLE
6.1 The example
In order to illustrate how to set up and use a database it is helpful to take an example. The example chosen is fictitious, but is based on a survey of the trade in medicinal plants in KwaZulu Natal, South Africa (Cunningham, 1988). This chapter is intended as a prelude to the next chapter (Using the Demonstration Diskette) in which the same example is used.
It will be assumed that a pilot project has been carried out in order to clarify the aims and methods of data collection, the main aim in this example being to identify those species of wild medicinal plants which are at risk through over-exploitation. This can be achieved by assessing both supply (plant distribution and approximate abundance) and demand (sales of plant parts at herbalist shops) and also by taking into account how susceptible individual species are to exploitation (as indicated by their life form and the plant parts taken by gatherers).
It will also be assumed that a decision has been made to store the relevant information in a database. Accordingly a conceptual model of the database can be drawn up, as shown in Figure 6.1. As can be seen, this is a simple flatfile design containing eight fields; however a relational design will be introduced later in order to illustrate how two databases can be joined together.
(Figure 6.1 about here)
The fields in Figure 6.1 have been chosen to serve the main aim of the project. Hence the field "Distribution Category" gives an index of supply; the field "Bags Sold Per Year" gives an index of demand; and the fields "Life Form Category" and "Part Used" can be used to assess susceptibility to over-exploitation (for example, a tree species used for its bark will probably be more susceptible than, say, a herb species used for its leaves). The field called "Threat Category" is used to record the conservation status of individual species -- as judged from the four previous fields.
It will now be assumed that suitable hardware and software have been obtained, and that the next stage can be carried out: setting up the database on the computer.
6.2 Setting up the database
The main steps in setting up the database can be shown in a flow diagram (Figure 6.2). The first step is to select the option "New Database" -- or the equivalent depending on the software being use -- and to give a name to the database. There may be restrictions on the number of letters that can be used for a database name (e.g. in the Demonstration Diskette no more than eight letters can be used) and for this example the name "CONSERV" (short for Conservation) will be used. As will become apparent when using the Demonstration Diskette, it is important to distinguish clearly between the names of databases and fields. For this reason it is sensible to make sure that the name of a database is never the same as one of the field names and that the names themselves are written differently. In the examples in this book database names are given in upper case letters and field names are given in both upper and lower case letters (e.g. "Life Form").
(Figure 6.2 about here)
The next step is to name and define each field in turn. In this example the field names have been shortened somewhat for the sake of convenience and are called "Genus", "Species", "Local Name", "Life Form", "Distribution", "Part", "Bags" and "Threat". Each field has to be defined in terms of its length and whether it is alphabetic, numeric, or a combination of the two (this procedure is explained in the next chapter).
At this point the form (i.e. the arrangement of fields and data entry boxes) on the computer screen can be designed. This will depend on the software you obtain: some software will allow you to make a sophisticated design (with colour and other options) while other software will only permit a very basic design. Paper data entry forms can also be designed at this point, and it may be convenient if these match the computer screen design. Of course if you enter the data directly into the computer such forms will not be needed, although as discussed in Chapter 5 they are useful to have as a final backup. An example of a data entry form is shown in Figure 6.3. It should be noted that this example is purely for demonstration purposes and is not based on any particular software design.
(Figure 6.3 about here)
Two of the fields shown in Figure 6.3 are used for recording codes -- that is for the fields "Life Form" and "Distribution". The meaning of these codes is given in Tables 4.5 and 4.7 of Chapter 4. Codes and abbreviations are convenient when the number of options is limited (e.g. for life form categories) or when the meaning is clear (e.g. for distribution categories) but are best avoided for lists of names that are too long to remember. This is particularly important with regard to plant names; these need to be entered in full, as in Figure 6.3, in order to avoid any confusion (Synge and Heywood, 1991).
Of the other fields shown in Figure 6.3, the field "Bags" is for recording the total number of bags of plant parts for particular species sold in herbalist shops in a year; while the field "Threat" is for recording degree of endangerment according to six conservation categories (i.e. Extinct in the Wild, Endangered, Vulnerable and Declining, Declining, Rare and Vulnerable, and Indeterminate) based on Hall et al.(1980).
6.3 Data entry and checking
Figure 6.2 shows the steps to be followed once the framework of the database has been established: the process of entering the data record by record.
Whether typing in the data directly or from completed data entry forms, it is important to be consistent and to follow, for instance, any conventions on the use of upper and lower case letters. Nonetheless, typing errors are bound to occur and so it is essential to get someone else to check print-outs of the database records. This should be done in manageable batches. Any errors can then be corrected using the editing operations of the database package.
There are, however, ways of avoiding typing and consistency errors. For example, in the Demonstration Diskette, separate databases are used to enter data into most of the fields (note that in some cases a separate database can be formed from an imported file, such as an ASCII file; see next chapter). The separate (or associated) databases contain all the information required in the correct form and it is simply a matter of selecting which item to enter. Hence there are separate databases for plant names, life form, distribution, threat and part used which can be used to make data entry convenient, consistent and error-free. In some database packages so-called "List Boxes" perform a similar function. It is also possible to set limits on numerical entries so that figures outside a prescribed range cannot be entered, thus preventing gross errors from being included in the database.
Once a set of records has been entered it is then possible to manipulate the database in various ways. The following chapter will explain in detail how many of these operations are performed (as contained in the Demonstration Diskette), but a general introduction to data manipulation can be given here -- using the database CONSERV as an example.
6.4 Manipulating the database
Some of the more important operations in manipulating a database are described below. When considering these operations it is helpful to remember that a database is essentially a table -- with horizontal rows representing the records, and vertical columns representing the fields. In order to describe the operations, clauses are used to show how commands are sent to the database. These command clauses do not represent any specific database manipulation language but do indicate the general principles involved. The terminology is partly based on that given by Date (1981).
(a) Sorting the records
Any field in the database can be used to sort the database into a specified order (e.g. alphabetically or according to particular categories). But what happens if two fields (rather than one field) need to be sorted, as for genus and species names? This is possible by sorting (alphabetically in this case) first according to genus name and, second, by species name within the sorted genus names. In database language this operation may be described as follows:
SORT <CONSERV> ON Genus
ON Species WITHIN Genus
In this example the sorting would be alphabetical, but in other situations it might be numerical (for instance if species numbers were used).
(b) Selecting particular records and removing unwanted fields
In the example above the sorting is done record by record until they are arranged in the right order. But in other situations the database works by handling whole sets of records that have something in common. Two such operations are SELECT and PROJECT (a third called JOIN will be described later). These operations are frequently used and their meaning needs to made clear.
Suppose, to begin with, that you want to select all the records in the database where bark is the plant part used. The appropriate command clause would then be:
SELECT <CONSERV> WHERE Part=bark
The SELECT operation makes a new table by taking a horizontal subset of the existing table; in this case taking only the records that contain the word "bark" in the field named
"Part". This can be shown in a diagram (Figure 6.4).
(Figure 6.4 about here)
More complicated selections can be made, involving the use of several criteria. In the example above WHERE was used, but a number of other commands can be given: these can be represented by words and abbreviations such as AND, OR, LE (Less than) and GE (Greater than or equal to). For example, a more complex selection could be written as follows:
SELECT <CONSERV> WHERE Part=bark AND Life Form=P2 AND Bags GE 20
Further examples of selections are given in the next two chapters.
When new tables are made in this way it is often useful to remove unwanted or redundant fields. This is where the PROJECT operation comes in (the word PROJECT is in verb form here by the way). In contrast to the SELECT operation, the PROJECT operation makes a vertical subset of an existing table and only keeps the fields that are specified. For example if the result of the last selection example is named TREE, the following command could be given:
PROJECT <TREE> OVER Genus, Species, Local Name, Threat
This would remove all fields except "Genus", "Species", "Local Name" and "Threat" from the table called TREE and simply give the names of the plants that have been selected, together with their threat category. The new table formed can be called VUL_TREE (short for Vulnerable Trees). A diagram of the PROJECT operation is given in Figure 6.5.
(Figure 6.5 about here)
Much of the manipulation of a flatfile database can be achieved by the SELECT and PROJECT commands (or their equivalent operations).
However, when more there is more than one table present, as in a relational database, the JOIN operation is also needed.
(c) Joining tables together
Suppose that the names and addresses of helpers also need to be added to the database CONSERV. The helpers in this instance are the people who have provided the local name of the plants. It might then be convenient to store the names and addresses in a separate table (called ADDRESS), linked to the main table by a number. This design is illustrated in Figure 6.6. By storing the information in this way the names and addresses need only be recorded once; this results in a more compact and efficient database compared to the one that would have been formed if the names and addresses had simply been added to the main table.
(Figure 6.6 about here)
Given this database design it would then be possible, for example, to add the names and addresses of helpers to the table called VUL_TREE described above. In order to do this it would of course be necessary to include the field "Helper Number" in VUL_TREE. The following command would then join the two tables together:
JOIN <VUL_TREE> <ADDRESS> OVER Helper Number
Figure 6.7 gives a diagram of this joining operation, showing the tables joined by a common column, in this example the field "Helper Number".
(Figure 6.7 about here)
By using the operations SELECT, PROJECT and JOIN, numerous permutations are possible. When new tables are formed in this way they are stored as temporary files in the computer; if they need to be kept they should be copied and named otherwise they will be lost once another selection takes place. This arrangement helps prevent huge numbers of unwanted files from piling up in the computer. The details of these operations are described in the next chapter.
6.5 Outputs from the database
Some of the files produced as a result of data manipulation may be needed as outputs. The main forms of output are:
(a) to a printer;
(b) directly to another computer or package (if compatible);
(c) indirectly to another computer or package via another type of file (e.g. an ASCII file);
(d) to another computer via the telephone line (e.g. by means of the Internet).
These outputs have already been mentioned in Chapter 1, while operations (a) and (c) are described in more detail in the next chapter.
|