Our butterfly and moth data takes flight! | Digital Collections Programme

Our previous blog post looked at preparing the Lepidoptera for digitisation. In this post, we will look at the second part of the digitisation process; the imaging and transcription that allows data to be set free and accessed by the global science audience on the Museum’s Data Portal.

Photo showing a DSLR camera on a mount, with a tray containing a pinned butterfly speciment beneath the lens. The butterfly and accompanying scale bar and labels is visible on a computer screen to the right.
The imaging equipment set up to digitise the Lepidoptera collection

Let’s find out what’s involved and why it’s leading to new ways of accessing and using the information in our collections.


To image the Lepidoptera, specimens are placed under a DSLR camera and inside a light box that has been specifically designed for this purpose. The camera is connected to a computer monitor. We adjust height and focus of the camera to produce the clearest, most accurate image on the computer screen before taking the photograph.

When the specimen has been imaged, the reverse of the preparation process needs to happen. The labels from the raised side of the unit tray are put back on the pin, observing the chronological order. These labels now include the new unique specimen number label with data matrix at the bottom.

Photo showing a pinned moth close up with its QR code digital label in view underneath. The pin is being held in pincers.
A specimen from the Lepidoptera collection of the bordered straw moth (Heliothis peltigera) with labels

As we are digitising the collection we also take the opportunity to re-house the Lepidoptera into new plastazote-lined drawers. This is a neutral, archival material that is preferable for specimens to the old cork-lined drawers. When the new drawer is filled, they are returned to the collection.

Photo showing a near full drawer from above of several columns of specimens of the same species of butterfly with text and digital labels throughout
New drawer of Heliothis peltigera specimens following digitisation.

For species that only have a few specimens, we can put more than one species into a drawer. For species with many specimens we can fill more than one drawer. With species that have a lot of specimens, we leave the remainder of the drawer empty to allow for further specimens coming into the museum in the future.

Drawer photographed from above, segmented by differing size boxes, each containing one or a few representatives of the specific Lepidoptera species of therein, with labels
When a species has few representatives in the collection, more than one species is added to the drawer

Transcription – digitising the specimens!

We have prepared the specimens, added unique specimen numbers, imaged them, re-housed them and returned them to the collection. However, the Lepidoptera are not yet digitised. We have an image that contains the majority of the data for the specimen. We also know the taxonomy and the location of the specimen within the collection from the drawer it was originally placed within.

Using the drawer number and the taxonomy, we organise these images into folders. These images are ingested into a bespoke database for the project which allows categorisation and transcription to occur. During ingestion, the computer script re-labels all images with the unique specimen number by reading the data matrix from the image.

Screen shots of windows on a PC showing how the Lepidoptera images are arranged within the folder hierarchy.
The hierarchy of our Lepidoptera folders on the PC following imaging

After re-labelling the script sends images to the transcription interface part of the database. These records are then allocated back to digitisers to enable transcription, the final stage of digitisation.

Screenshot of the transcription interface window with various text entry fields and drop down boxes, and the digital image of the specimen to the right.
The interface for transcription showing the fields of data that can be completed for each image

The transcription interface contains fields into which we type additional data from labels. Due to the folders created in the previous step, location, taxonomy and specimen number are already data based.

The data that needs to be manually added are: the collection site; collector; date of collection event or if the specimen was bred, date of this; when the specimen came into the museum; number for the specimen and, where applicable, the type status and whether any slides or preparations have been made from them.

It is not always guaranteed that each specimen will have all of this of data, but we fill in what we have. Unlocking this data and including it on the Museum’s Data portal gives scientists access to large datasets with unrivalled historical and geographical data. This enables scientists to conduct phonological research on natural history data in a way they have never been able to perform before.

The future

Promoting access to specimens digitally helps to preserve and protect the specimens for the future. Providing access to this unrivalled historical and geographical data also provides new ways to conduct science.

For example, Museum scientists, Steve Brooks et al. have been comparing Lepidoptera data to historical temperature records. They have found that 92% of the 51 species emerged earlier in years with higher spring temperatures.

‘The warming climate is already causing butterflies to emerge earlier – and unless their food plants adapt at the same rate, the insects could emerge too early to survive.’ (S.Brooks et al., 2016)

The Museum’s collection has unique records spanning more than a century, so by releasing this data we can equip more scientists to conduct new research in new ways.

What isn’t transcribed?

When digitising the Lepidoptera, we sometimes come across information on a specimen that doesn’t correspond to transcription fields. On those occasions we have a ‘more info on label’ tick box that can be checked for the database.

However, these stories provide excellent narrative and insight into the specimens’ past and their collectors. We are sharing some of our favourite on twitter using #MothMonday.

We would love to hear your Lepidoptera stories or what you want to hear about, so please tag @NHM_Digitise with your photos or comments. You can also watch the Museum’s film below to see a summary of the whole process and how Steve Brooks made use of the data for the study quoted above:

4 Replies to “Our butterfly and moth data takes flight! | Digital Collections Programme”

  1. Great job Jennifer, and very good explanation. I wish you good luck in your endeavour.
    You do not mention colour space management for the images. I take it your images are referenced to an RGB colour space such as ‘sRGB’ or ‘Adobe RGB 98’ (https://en.wikipedia.org/wiki/RGB_color_space)?
    Mike Hardman, Cyprus

    1. Hi Mike,

      Thanks for your comment, I’m glad you enjoyed the post. We an RGB colour space in Adobe and we use the pantone of the grey plastazote as a reference point. Hope that helps!


      1. Hi Jennifer // Thanks. That puts my mind at ease. That sort of thing needs to be in the system design at the start, as fixing it later can be a pain. //Mike

Comments are closed.

%d bloggers like this: