Our previous blog post looked at preparing the Lepidoptera for digitisation. In this post, we will look at the second part of the digitisation process; the imaging and transcription that allows data to be set free and accessed by the global science audience on the Museum’s Data Portal.

Let’s find out what’s involved and why it’s leading to new ways of accessing and using the information in our collections.
Imaging
To image the Lepidoptera, specimens are placed under a DSLR camera and inside a light box that has been specifically designed for this purpose. The camera is connected to a computer monitor. We adjust height and focus of the camera to produce the clearest, most accurate image on the computer screen before taking the photograph.
When the specimen has been imaged, the reverse of the preparation process needs to happen. The labels from the raised side of the unit tray are put back on the pin, observing the chronological order. These labels now include the new unique specimen number label with data matrix at the bottom.

As we are digitising the collection we also take the opportunity to re-house the Lepidoptera into new plastazote-lined drawers. This is a neutral, archival material that is preferable for specimens to the old cork-lined drawers. When the new drawer is filled, they are returned to the collection.

For species that only have a few specimens, we can put more than one species into a drawer. For species with many specimens we can fill more than one drawer. With species that have a lot of specimens, we leave the remainder of the drawer empty to allow for further specimens coming into the museum in the future.

Transcription – digitising the specimens!
We have prepared the specimens, added unique specimen numbers, imaged them, re-housed them and returned them to the collection. However, the Lepidoptera are not yet digitised. We have an image that contains the majority of the data for the specimen. We also know the taxonomy and the location of the specimen within the collection from the drawer it was originally placed within.
Using the drawer number and the taxonomy, we organise these images into folders. These images are ingested into a bespoke database for the project which allows categorisation and transcription to occur. During ingestion, the computer script re-labels all images with the unique specimen number by reading the data matrix from the image.

After re-labelling the script sends images to the transcription interface part of the database. These records are then allocated back to digitisers to enable transcription, the final stage of digitisation.

The transcription interface contains fields into which we type additional data from labels. Due to the folders created in the previous step, location, taxonomy and specimen number are already data based.
The data that needs to be manually added are: the collection site; collector; date of collection event or if the specimen was bred, date of this; when the specimen came into the museum; number for the specimen and, where applicable, the type status and whether any slides or preparations have been made from them.
It is not always guaranteed that each specimen will have all of this of data, but we fill in what we have. Unlocking this data and including it on the Museum’s Data portal gives scientists access to large datasets with unrivalled historical and geographical data. This enables scientists to conduct phonological research on natural history data in a way they have never been able to perform before.
The future
Promoting access to specimens digitally helps to preserve and protect the specimens for the future. Providing access to this unrivalled historical and geographical data also provides new ways to conduct science.
For example, Museum scientists, Steve Brooks et al. have been comparing Lepidoptera data to historical temperature records. They have found that 92% of the 51 species emerged earlier in years with higher spring temperatures.
‘The warming climate is already causing butterflies to emerge earlier – and unless their food plants adapt at the same rate, the insects could emerge too early to survive.’ (S.Brooks et al., 2016)
The Museum’s collection has unique records spanning more than a century, so by releasing this data we can equip more scientists to conduct new research in new ways.
What isn’t transcribed?
When digitising the Lepidoptera, we sometimes come across information on a specimen that doesn’t correspond to transcription fields. On those occasions we have a ‘more info on label’ tick box that can be checked for the database.
However, these stories provide excellent narrative and insight into the specimens’ past and their collectors. We are sharing some of our favourite on twitter using #MothMonday.
We would love to hear your Lepidoptera stories or what you want to hear about, so please tag @NHM_Digitise with your photos or comments. You can also watch the Museum’s film below to see a summary of the whole process and how Steve Brooks made use of the data for the study quoted above:
Great job Jennifer, and very good explanation. I wish you good luck in your endeavour.
You do not mention colour space management for the images. I take it your images are referenced to an RGB colour space such as ‘sRGB’ or ‘Adobe RGB 98’ (https://en.wikipedia.org/wiki/RGB_color_space)?
Mike Hardman, Cyprus
Hi Mike,
Thanks for your comment, I’m glad you enjoyed the post. We an RGB colour space in Adobe and we use the pantone of the grey plastazote as a reference point. Hope that helps!
Jennifer
Hi Jennifer // Thanks. That puts my mind at ease. That sort of thing needs to be in the system design at the start, as fixing it later can be a pain. //Mike
Great article, I think it can be very useful for an interested person.