Revision of BHL data validation from Thu, 2013-08-15 15:39

Step 1: The manual validation consisted in review each selected page and verifies whether there is an association between the names reported in the columns “kws” and  “fam” from the automatic validation file, or if there are other associations in the page not identified by the searching algorithm. To do so, we generate a second file (manual validation) with the item and page code, the butterfly’s and plant name and the plant family.

 

Step 2: Each pair of butterfly-host plant record was associated with three (3) additional columns with information about the origin of the data (“primary” when the data comes from direct field observation, and “secondary” when comes from revised literature or indirect sources), record source (“literature” when record comes from published literature, and “field observation” when comes from an direct observation in field), and the evidence reported for the association (“larval feed on”, “eggs lays on”, etc.).

 

Column name

Description

Example of content

item.id

URL code which identify the original monographs or journals

http://www.biodiversitylibrary.org/item/699

page.id

URL code which identify the page

http://www.biodiversitylibrary.org/page/13297

butterfly

Butterfly species scientific name

Ascia monuste eubotea

plant

Host plant species scientific name

Cleome spinosa

plant. family

Name of the host plant family

Capparaceae

record type

Origin of the data

Primary :: secondary

source

Origin of the data

Field observation :: literature

evidence

Description of the observation that support the association

Ovipositing on the leaves :: Larval feeding

 

Each pair of butterfly and host plant names was copy under the corresponding column (“butterfly”, “plant”), in a separate row with the corresponding item and page code. All the common and scientific names at any taxonomic level (species, genus, family or order) were recorded. Common and generic plant names were copy as shown in the original text, with out translations and without substitute it by their corresponding scientific name. In contrast, the transcriber corrected any typo error in scientific names made for the algorithm.

 

On going work: We did our first search for 12 butterfly genus from all the five families. We found 469 pages from 256 items. The majorities of these pages were classified as “text” (276), which potentially contains information about associations and were posteriorly validated manually. The remained pages were no analyzed because contained information about announcements (84), indexes (85), and bibliographic references (24).

 

The scientific name search algorithm identified 11,021 records associations, but after manual validation only 254 associations were recorded, corresponding to 165 butterfly species, and 168 host plant names (that includes scientific and common names)

Scratchpads developed and conceived by (alphabetical): Ed Baker, Katherine Bouton Alice Heaton Dimitris Koureas, Laurence Livermore, Dave Roberts, Simon Rycroft, Ben Scott, Vince Smith