Revision of BHL data search from Sun, 2013-06-30 15:08

The Biodiversity Heritage Library (BHL) is a consortium of natural history and botanical libraries that cooperate to digitize and make accessible the legacy literature of biodiversity held in their collections and to make that literature available for open access and responsible use as a part of a global “biodiversity commons.”  Read more about BHL here.

They provide metadata at four different levels: titles (volume level records), items (records for original monographs or journals), parts (records for articles, chapters, treatments, etc) and for the individual pages. We focus our search in the items and pages. 

Step 1: Automatic search

For each butterfly species in the checklist we did a preliminary automatic search in BHL for pages where the species was mentioned. We made use of the BHL API function NameGetDetail. This function matches the given name string to a Name Bank ID and returns basic title, item and page metadata for each page on which the specified name appears. Please note that users are required to obtain an API Key in order to use the BHL API.

We summarized the results of the API call for each species, and built a list of page IDs. We then used the BHL API function GetPageOcrText to download the text file for each one of these pages.

Step 2: Key word matching

Then, we used our keyword list to select the pages with information about biotic associations.

Step 3: Automatic download

Next, we implemented a routine to automatic download in PDF format the pages selected, and built a file with the following information:

Column name

Description

Example of content

arch

Archive name

O reilly1813CotSoNH.pdf

code

Bibliography code in our data base

Kaye1913

nps

Page number

230

kws

Keywords matched

host :: host plant :: feed on :: larvae

val

Scientific name from the checklist

Philotiella Leona


Scratchpads developed and conceived by (alphabetical): Ed Baker, Katherine Bouton Alice Heaton Dimitris Koureas, Laurence Livermore, Dave Roberts, Simon Rycroft, Ben Scott, Vince Smith