The Digital Collections Programme in a year like no other | Digital Collections Programme

It’s been a year since we had to first close the doors of the Museum due to the pandemic, and like the rest of our colleagues, the Digital Collections Programme (DCP) team have adjusted to the world of video calls, furlough and working from home. Despite these challenges, in 2020 the team imaged 72,000 specimens, transcribed data from 85,000 specimens and georeferenced 17,000 specimens, giving us plenty of progress to reflect on from this challenging year. Over 25 billion data records  have now been downloaded from the Data Portal and GBIF in over 360,000 download events, and remote working has only further highlighted the pertinence of digitising collections and making them accessible to the world. 

Digitising from home – lice, beetles, flies and dragonflies

With access to the Museum limited over the last year, we have prioritised the work that can be done remotely – namely transcription (capturing data from specimen labels using images of the specimens) and georeferencing (determining geographical coordinates for the locality where a specimen was collected from).  

First up was the bee type collection. The Museum has around 4,500 bee (Apoidea) type specimens (a reference specimen or group of specimens from which the species was  described and named) that date back as far as the early 1800s, making this an incredibly useful resource for researchers needing to identify what species they have observed. A significant amount of imaging of this collection had been completed prior to the Museum closing, and so we were able to use these images to transcribe and georeference the data from the specimen labels. We then compared the information captured from the specimen labels to the information stated in the original descriptions of these species to check for any discrepancies that needed to be resolved. You can read more about the process in this blog from digitiser Pete Wing. 

From bees, we moved on to the Museum’s Odonata type collection, which was imaged in March 2018 and consists of around 1,600 type specimens of dragonflies and damselflies. Using these images, the team transcribed, georeferenced and verified the data associated with these specimens. 

The transcription of the parasitic louse (Phthiraptera) collection was up next. This collection had been imaged in 2017 as the pioneer project to trial our automated digitisation workflow. It consists of more than 70,000 microscope slides, and we completed the transcription in record time. 

Finally, we were able to transcribe and georeference 10,600 Agromyzidae (also known as leaf-miner flies) in the British and Irish collection. The data from these specimens can now be fed into the National Recording Scheme and contribute to our understanding of species distribution and trends over time.  

Restarting projects on-site

At times during the year, when restrictions were eased, we were able to get back into the Museum and restart work on projects that required physical access to the collections. 

One of these was the digitisation of Echinodermata fossil specimens – marine invertebrates such as starfish. This project focused on blastoids (an extinct type of stemmed echinoderm, often referred to as sea buds), cystoids (an extinct echinoderm that lived attached to the sea floor by stalks) and Asterozoa (a superclass which includes starfish and brittle stars). Our work consisted of an audit workflow, which involved matching specimens in the Echinodermata collection to records in the Collections Management System (CMS). This was done in real-time, so that any issues could be flagged and dealt with while we were in the collection. Although no mass imaging of specimens was carried out, we took plenty of photos for social media and these particularly beautiful specimens (no offence to the many lice we have digitised) have been going down a treat! 

We restarted the imaging of herbarium sheets of Malvales (an order of flowering plants that includes okra, cotton and cacao), and completed the digitisation of the Bennett collection – a collection of 2,000 chalcid parasitic wasp specimens and 2,500 bee specimens  donated by Dr Frederick Bennett in 2018. Before incorporating donated material into the main collection, we need to register the specimens and create records for them in the CMS – known as entry point digitisation. By using our high-throughput workflow and ALICE imaging system, we can accomplish this through automated processes with the added benefit of capturing specimen and label images for future use.  

In addition to restarting paused projects on-site we were also able to start new projects, including the imaging and transcription of a tribe of ground beetles known as Lebiini. The data captured from this project will feed into the PhD research project of Beulah Garner (Senior Curator, Coleoptera), which aims to investigate the historical biogeography of the Lebiini tribe. As this is a historic collection, we can learn a lot about how species distributions have changed over time. 

Digitisation on demand

2020 saw many things go virtual – including access to natural history collections. Travel restrictions and closure of the Museum’s collections for both researchers and many Museum staff meant that our curators received many requests for digital access to specimens. While on-site, we were able to assist with these requests by carrying out the specialised imaging. 

The Museum is also involved in two projects funded through the SYNTHESYS+ Virtual Access programme. SYNTHESYS+ provides researchers with access to the vast natural history collections held in Europe. This has traditionally been physical access, however, in the spring of 2020 the programme put out a call for researchers to propose ‘digitisation on demand’ projects. SYNTHESYS+ Virtual Access allows researchers to make a case for a collection to be digitised by the holding institutions to meet a specific need of the research and collections communities.  

One of these projects is the EPT Project. This project focuses on the insect Orders Ephemeroptera (mayflies), Plecoptera (stoneflies) and Trichoptera (caddisflies), known as EPT, and their presence in freshwater can be used to assess water quality and habitat health. Although the distribution of EPT is relatively well known in Europe, there are gaps in our global knowledge, and we – along with two other institutions –  have been supporting the project by imaging the 75,000 pinned and slide mounted specimens in our collection and transcribing their label data. The contribution of our data to this project will enable many more species to receive assessments of conservation importance through the IUCN red list. Digitiser Robyn Crowther wrote a blog about the background to the project and its progress. 

Contributing to knowledge on coronaviruses

The second Virtual Access project will see us digitise 6,000 bat specimens to contribute to a ‘COVID-19 Chiroptean knowledge base’.  The most similar virus to the one causing the COVID-19 pandemic was found in a common horseshoe bat species, and the data held in museum collections can help scientific understanding of the origins of the pandemic. We – along with eight other institutions – will be digitising dry and spirit-preserved bat specimens focusing on Horseshoe bats (Rhinolophidae) and their closely related families – Old World leaf-nosed bats (Hipposideridae) and trident bats (Rhinonycteridae). The work entails label transcription and georeferencing of specimens, from which the data will contribute to a knowledge base that will aid our understanding of the distribution and ecological demands of these key species. The data will be a useful tool in understanding and predicting spillover events (when a virus moves from one species to another). As this will be our first project in the Zoology mammal collection, we’ll be establishing new workflows to refine the most efficient ways of working with these types of specimens. 

Looking ahead

While still mainly working from home the team are continuing to focus on transcription, as well as writing up workflows and research for publication. We’re continuing to publish blogs, post about our projects on Twitter and Instagram, and have recently taken part in our first Reddit Ask Me Anything (AMA), which was a great way to reach and engage with new audiences. We’ve also started having ‘Tricky Transcription’ meetings, where we meet over Teams to work on particularly hard-to-decipher labels and have a bit of a chat! 

As the situation hopefully starts to improve, the team are looking forward to getting back into the Museum more frequently and being surrounded by the collections once again. The past year has been a challenging one, but each day we move a little closer to our goal of freeing the data in the collections for use around the world. 

Be sure to follow us on Twitter and Instagram to stay up to date on the work of the DCP, and check out some of our blog posts from the last year: 

Discover more from Blogs from the Natural History Museum

Subscribe now to keep reading and get access to the full archive.

Continue reading