BHL data validation

Step 1: The manual validation consisted in review each selected page and verify whether there was an association between the names reported in the columns “kws” and  “fam” from the automatic validation file, or if there were other associations in the page not identified by the searching algorithm. To do so,  a second file (manual validation) was generated with the item and page code, the butterfly’s and plant name and the plant family.


Step 2: Each pair of butterfly-host plant record was associated with three (3) additional columns with information about the origin of the data (“primary” when the data comes from direct field observation, and “secondary” when comes from revised literature or indirect sources), record source (“literature” when record comes from published literature, and “field observation” when comes from an direct observation in field), and the evidence reported for the association (“larval feed on”, “eggs lays on”, etc.).


Column name


Example of content

URL code which identify the original monographs or journals

URL code which identify the page


Butterfly species scientific name

Ascia monuste eubotea


Host plant species scientific name

Cleome spinosa

plant. family

Name of the host plant family


record type

Origin of the data

Primary :: secondary


Origin of the data

Field observation :: literature


Description of the observation that support the association

Ovipositing on the leaves :: Larval feeding


Each pair of butterfly and host plant names was copy under the corresponding column (“butterfly”, “plant”), in a separate row with the corresponding item and page code. All the common and scientific names at any taxonomic level (species, genus, family or order) were recorded. Common and generic plant names were copy as shown in the original text, with out translations and without substitute it by their corresponding scientific name. In contrast, the transcriber corrected any typo error in scientific names made for the algorithm.


On going work: We did our first search for 12 butterfly genus from all the five families. We found 469 pages from 256 items. The majorities of these pages were classified as “text” (276), which potentially contains information about associations and were posteriorly validated manually. The remained pages were no analyzed because contained information about announcements (84), indexes (85), and bibliographic references (24).

The scientific name search algorithm identified 11,021 records associations, but after manual validation only 254 associations were recorded, corresponding to 165 butterfly species, and 168 host plant names (that includes scientific and common names).

Scratchpads developed and conceived by (alphabetical): Ed Baker, Katherine Bouton Alice Heaton Dimitris Koureas, Laurence Livermore, Dave Roberts, Simon Rycroft, Ben Scott, Vince Smith