Why georeferencing is the most important thing for the Museum since sliced bread | Digital Collections Programme

The ‘spatial wealth’ of the Museum’s collections is often ignored or at best under-appreciated. Most specimens if not all have a spatial locality associated with them, either written on to a label, written in a notebook, or on the specimen.

Close crop of a photo of a drawer of pinned clouded yellow butterflies with QR code and hand written labels visible
Digitising the Museum’s collections will let us unlock and share a treasure trove of information about our 80 million specimens

These localities can vary between very precise (e.g. a GPS-based latitude/longitude), very imprecise (e.g. ‘South America’) or, most likely, somewhere in-between. Most specimens within the Museum do not have a latitude and longitude, but do have detailed locality information on the accompanying label, which can be used to define co-ordinates for that specimen. So what is georeferencing, why do we need it, and how do we use it?

Georeferencing is the process used to give the locality of a specimen geographical coordinates, thus enabling it to be plotted on a map. So far so simple, because georeferencing is not a complicated process to understand.

Why we do this is also equally simple, because it allows for mapping and modelling to research anything from species distributions and relationships to environmental changes or targeting conservation practices. Digitising and georeferencing museum collections gives specimens a ‘second life’, allowing distributions and collection information previously hidden away behind-the-scenes to be utilised in new research projects. However, how we do this at the Museum becomes slightly more complicated.

To do this we use a custom georeferencing interface (based on the Google Maps API) developed to clean up the data and remove duplicate site localities. To determine between site variants and the site data we create we have all the individual site variants listed on the left of the interface and the Master sites (cleaned site localities we create) located on the right (see below). This way we can link multiple site variants to a Master record that has the correct latitude, longitude, and extent.

Screen shot showing the user interface for iCollections, with two columns for the two different lists in view
The iCollections georeferencing interface, showing the Site Variants and Site Masters list divisions

A core principal of the georeferencing interface is to reduce and rationalise site variants. This process of reducing duplication and error to produce a Site Master reduces the final tally of sites georeferenced by approx. 45%.

The site variants found for a specific location, for example Reading, reading, or Reeding, are checked to see if they are the same location (from the label information) then linked and merged into a single Site Master, in this case Reading, Berkshire, England, UK, effectively reducing the amount of georeferencing needed. Finally and importantly, the interface uses Google maps, Geonames and many other web resources thus ensuring repeatability and use of worldwide standards.

Georeferencing the Museum’s UK Lepidoptera collection

The UK Lepidoptera iCollections initiative involved georeferencing 183,000 UK butterfly specimens. Following the previously mentioned protocols we could locate and map these insect’s distributions across the UK, and began to identify key collecting localities that enabled the rationalisation of large numbers of individual records into single Master Sites. For example the New Forest Master Site contained 5,500 individual specimens!

This large number of specimens equated to 12,000 Master Sites. Depicted as the red dots and specimen densities shown in the figures below, you can see there is a strong southerly locality bias.

Screen shot of an image depicting the UK and Ireland land mass with red dots indicating locations of collections of butterfly specimens
Georeferenced Master Sites for Museum UK butterfly specimens, equating to 183,000 individuals
Graphic depicting the UK and Ireland land mass with a heat map showing the location density of collected specimens
Museum UK butterfly specimen density map, indicating the main hotspots are located in the south of England

Many of the butterflies are migratory, and of course climate conditions and habitat suitability help towards explaining this distribution. However, talking with curators about prolific butterfly collectors also led us to believe a slight bias based on holiday destinations and local areas around the collector’s home settlements was also a factor.

As you can see from our final figure we can also use this information to map the collecting sites of individual collectors. This data has a wealth of possible uses, and this is not just limited to identifying historic collecting sites.

Graphic showing the UK and Ireland land mass with the hotspots of collections by Cockayne depicted through density of colouration
Hotspot map for collecting localities of an individual butterfly collector, Cockayne

Main Georeferencing lessons learned (barriers and constraints)

  1. Quality of locality data: The locality data collected from the label varied significantly across different groups in terms of quality. Certain groups have been georeferenced rapidly as there is a good standard of information (such as the butterflies), other groups, notably the Beetles (Coleopetra) have been very difficult, due to historically sparse locality information on the label and imprecise site information data. Standards for site locality information vary significantly across the collections.
  2. Curatorial and Research group interaction: The group’s georeferencing thus far have highlighted the importance of this interaction. Usually the label contains all the necessary site information, but sometimes this needs to be supplemented by staff with specific expert knowledge of the collection. This has proved invaluable when dealing with very specific and sparse label locality information.
  3. GeoRef Standards guidelines: The Museum georeferencing guidelines have proved invaluable in enabling clear instructions to be passed onto the team and for geographical standards and consistency are maintained. The guidelines are being updated as we tackle other groups across the Museum so that they provide a clear set of instructions of how we georeference specific localities and their extents, therefore, providing a geographical standard, based on our Museum data and best practise from other similar organisations.

The 80 million specimens in the Museum’s collections equates to approximately 3.1 million individual sites – a treasure trove of spatial wealth that can be utilised to answer multitudes of questions about species distributions and all that can lead on from this. Visit the Museum’s data portal to see what’s already available online.

Caitlin McLaughlin
Digital Collections Programme

2 Replies to “Why georeferencing is the most important thing for the Museum since sliced bread | Digital Collections Programme”

  1. I found the article very interesting and, as a lay person, easy to understand. When you visit museums you do not realise how little of the collections is on display with the rest hidden away. Digitisation and geo referencing must be the way forward to ensure these valuable collections are not forgotten or their value wasted in being stored and unaccessible.

  2. An interesting article and if funding continues to pursue this over a long period we should be able to assess scientifically the impact of climate change on the many species in this country under threat of extinction.

Comments are closed.

Discover more from Blogs from the Natural History Museum

Subscribe now to keep reading and get access to the full archive.

Continue reading