Step 1: The manual validation consisted in review each selected page and verifies whether there is an association between the names reported in the columns “kws” and “fam” from the automatic validation file, or if there are other associations in the page not identified by the searching algorithm. To do so, we generate a second file (manual validation) with the item and page code, the butterfly’s and plant name and the plant family.
Step 2: Each pair of butterfly-host plant record was associated with three (3) additional columns with information about the origin of the data (“primary” when the data comes from direct field observation, and “secondary” when comes from revised literature or indirect sources), record source (“literature” when record comes from published literature, and “field observation” when comes from an direct observation in field), and the evidence reported for the association (“larval feed on”, “eggs lays on”, etc.).
Column name |
Description |
Example of content |
item.id |
URL code which identify the original monographs or journals |
|
page.id |
URL code which identify the page |
|
butterfly |
Butterfly species scientific name |
Ascia monuste eubotea |
plant |
Host plant species scientific name |
Cleome spinosa |
plant. family |
Name of the host plant family |
Capparaceae |
record type |
Origin of the data |
Primary :: secondary |
source |
Origin of the data |
Field observation :: literature |
evidence |
Description of the observation that support the association |
Ovipositing on the leaves :: Larval feeding |
Each pair of butterfly and host plant names was copy under the corresponding column (“butterfly”, “plant”), in a separate row with the corresponding item and page code. All the common and scientific names at any taxonomic level (species, genus, family or order) were recorded. Common and generic plant names were copy as shown in the original text, with out translations and without substitute it by their corresponding scientific name. In contrast, the transcriber corrected any typo error in scientific names made for the algorithm.
On going work: We did our first search for 12 butterfly genus from all the five families. We found 469 pages from 256 items. The majorities of these pages were classified as “text” (276), which potentially contains information about associations and were posteriorly validated manually. The remained pages were no analyzed because contained information about announcements (84), indexes (85), and bibliographic references (24).
The scientific name search algorithm identified 11,021 records associations, but after manual validation only 254 associations were recorded, corresponding to 165 butterfly species, and 168 host plant names (that includes scientific and common names)