A guest blog by Pete Wing
January 2023 marks the tenth anniversary since I first started digitising the Museum’s collections. A lot has changed in that time but the main principle of digitisation has remained the same: to transform the access and use of the Museum’s collections through unlocking natural history data and sharing this with the world.
Over 5.3 million specimens have been released on the Museum’s Data Portal. These have seen over 34 billion records downloaded in over 620,000 download events and over 2350 scientific papers cite the digital collection- but how did we get to this point?
The original, and longest running, digitisation project I worked on was called iCollections. This aimed to image, transcribe and georeference every single Lepidoptera (moth and butterfly) specimen from Britain and Ireland. I think it’s fair to say that few, if any, of us involved with that project foresaw just how digitisation would change the way scientists work with natural history collections. Digitisation has also changed through the years, from bespoke projects solely to inform specific research to an ongoing, business-as-usual part of museums around the world. It has also become a lasting career that has evolved and expanded and now has various opportunities for career progression.
Beginning with Butterflies
Taking it back to 2013, I had wall to wall butterflies to digitise. Even though there were only about 60 species to digitise, the final number of specimens turned out to be 181,545. This took about 18 months for a team of six to complete. When we digitise the collection one by one, we are often the first people to investigate the collection as a whole for decades. As a result we often come across surprising or unexpected finds. Some specific aspects that jumped out to me were; a Monarch butterfly (Danaus plexippus) that was pinned with a sewing needle, a Bath White (Pontia daplidice) that was collected about 500m from the house I grew up in, the extensive collection of Robert ‘Porker’ Watson (and the money saving way he had of ‘correcting’ his wife’s initials from his labels and replacing them as necessary for his current spouse) and, of course, the three months of my life that was spent digitising the Chalk Hill Blue (Lysandra coridon). This was due to there being in excess of 21,000 of these, such was the wanderlust of early 20th Century collectors searching for any specimen that looked ever so slightly different and may be considered an ‘aberration’. For this species, ‘A Monograph of the British Aberrations of the Chalk-hill Blue Butterfly’ was published in 1938 by Bright & Leeds and detailed around 400 of these (not all of which had been observed in specimens at that point).
Beyond the butterflies lay the moths, which accounted for the vast majority of the remainder of the iCollections project. These collections are so vast that, to date, we are yet to finish the entirety of the British and Irish micro moths, though the current digitisation team continues this work alongside other projects. By January 2023, a total of 519,421 British and Irish Lepidopteran specimens have been digitised.
Alongside iCollections, in 2014 the Digital Collections Programme was established to coordinate the mass digitisation of the Museum’s collections and, following an interview to work on the Atlas of Living Malaysia project, this has been the team I’ve been a part of since December 2017.
While it was a change for me to move away from almost solely working on Lepidoptera, the biggest change was the way in which we thought about digitising. When associating metadata with images, this was previously done through the use of folder structures. As we digitise a specimen, we give it a new barcode or unique identification number. The Digital Collections Programme took this further and built on our use of unique identifiers to embed more data (such as location and taxonomy information) into barcodes. We take the same photo of a specimen but have additional barcodes in the same image. These barcodes are read by software which enables the computer to assign the additional taxonomic and location data simultaneously, saving time and removing human error from the process. The best example of this is the microscope slide workflow. In this digitisation workflow, we can use five data matrices (barcodes) to encode the unique identifier of the specimen, its taxonomy, location within the collection, country of collection and whether or not it is a type specimen. To be able to do this processing does require a fair chunk of pre-digitisation preparation from curators and digitisers alike but, once set up, it ends up being quite a bit faster than any alternative we’re yet to encounter.
We’re now well established in the Museum’s Herbaria as well as continuing the slide work mentioned above and expanding workflows into Zoology and Palaeontology. That said, I do have a soft spot for insects and beyond getting to digitise Alfred Russel Wallace’s personal collection and then later the Museum’s Birdwing Butterfly collection. I have also been able to establish more in-depth workflows for diagnostic imaging to fulfil digital surrogate loans for external scientists. This has been quite the learning curve, not only in getting good images but also becoming accustomed to using a myriad of available equipment to do so.
It is worth remembering that digitisation isn’t just taking pictures. During the various COVID-19 induced lockdowns, we needed to find alternative digitisation work to do at home. This primarily gave us the opportunity to transcribe and georeference a plethora of outstanding projects but an addition to this was the work on the Bee, Dragonfly and Damselflies Type specimens. These are the example specimens for which a species has been named, so are a sought-after resource for scientists around the world. During lockdown, after transcribing and georeferencing the locality data for these specimens, we were also able to research the original descriptions of these types and compare those to any labels present. This initial step of type checking would allow us to flag any potential issues or inconsistencies that we found to the responsible curators, meaning that any further investigating could be a concerted and targeted effort, to free up valuable curatorial time, and so that our data can be more immediately usable by any potential researchers.
The last couple of years have also seen me take on more responsibility within the digitisation team as a Senior Digitiser. This has largely meant I now take a bit of a back seat when it comes to digitising specimens (though I do like to keep my hand in when I can) but I’m much more involved in organising the projects we have coming up, testing new workflows and giving support and training to our new digitisers.
I’ve also been overseeing some of the Museum’s involvement in some external projects such as the European funded SYNTHESYS+ Virtual Access program (https://www.synthesys.info/access/virtual-access.html) to offer digitisation on demand. In some ways this returns digitisation within the Museum to its roots of fulfilling specific research projects, but with improved diagnostic imaging as well as transcribed and georeferenced specimen data. This is an increasingly important development in the way in which anyone can access natural history collections. Digitising in this way can take more time, e.g. capturing more images or more of the label data than standard workflows, but we learn from these differences, e.g. expanding the variety of specimens we can work with, both in terms of the kinds of preparation and taxonomically.
Over the years I’ve handled in excess of 390,000 specimens, learned a lot about the collections, the stories they hold, the passion the people who work with them have, as well as a huge amount about digitisation, handling data and a variety of imaging techniques. If you’d asked me ten years ago, I never would have thought I’d end up being filmed and recorded for TV and radio or been asked to supply the music for a video some colleagues were putting together for a Lego insect manipulator. Things have changed, stayed the same and gradually evolved as we encounter new challenges, perspectives and learn and, I’m happy to say for the most part, it’s been really rather fun.
After a decade of digitisation, we would love to hear from you – what types of stories do you love to hear from us? What would you like to hear more about? Let us know over on Instagram and Twitter and check out what we have digitised already over on the Museum’s Data Portal.