The ‘spatial wealth’ of the Museum’s collections is often ignored or at best under-appreciated. Most specimens if not all have a spatial locality associated with them, either written on to a label, written in a notebook, or on the specimen.
These localities can vary between very precise (e.g. a GPS-based latitude/longitude), very imprecise (e.g. ‘South America’) or, most likely, somewhere in-between. Most specimens within the Museum do not have a latitude and longitude, but do have detailed locality information on the accompanying label, which can be used to define co-ordinates for that specimen. So what is georeferencing, why do we need it, and how do we use it?
Georeferencing is the process used to give the locality of a specimen geographical coordinates, thus enabling it to be plotted on a map. So far so simple, because georeferencing is not a complicated process to understand.
Why we do this is also equally simple, because it allows for mapping and modelling to research anything from species distributions and relationships to environmental changes or targeting conservation practices. Digitising and georeferencing museum collections gives specimens a ‘second life’, allowing distributions and collection information previously hidden away behind-the-scenes to be utilised in new research projects. However, how we do this at the Museum becomes slightly more complicated.
To do this we use a custom georeferencing interface (based on the Google Maps API) developed to clean up the data and remove duplicate site localities. To determine between site variants and the site data we create we have all the individual site variants listed on the left of the interface and the Master sites (cleaned site localities we create) located on the right (see below). This way we can link multiple site variants to a Master record that has the correct latitude, longitude, and extent.
A core principal of the georeferencing interface is to reduce and rationalise site variants. This process of reducing duplication and error to produce a Site Master reduces the final tally of sites georeferenced by approx. 45%.
The site variants found for a specific location, for example Reading, reading, or Reeding, are checked to see if they are the same location (from the label information) then linked and merged into a single Site Master, in this case Reading, Berkshire, England, UK, effectively reducing the amount of georeferencing needed. Finally and importantly, the interface uses Google maps, Geonames and many other web resources thus ensuring repeatability and use of worldwide standards.
Georeferencing the Museum’s UK Lepidoptera collection
The UK Lepidoptera iCollections initiative involved georeferencing 183,000 UK butterfly specimens. Following the previously mentioned protocols we could locate and map these insect’s distributions across the UK, and began to identify key collecting localities that enabled the rationalisation of large numbers of individual records into single Master Sites. For example the New Forest Master Site contained 5,500 individual specimens!
This large number of specimens equated to 12,000 Master Sites. Depicted as the red dots and specimen densities shown in the figures below, you can see there is a strong southerly locality bias.
Many of the butterflies are migratory, and of course climate conditions and habitat suitability help towards explaining this distribution. However, talking with curators about prolific butterfly collectors also led us to believe a slight bias based on holiday destinations and local areas around the collector’s home settlements was also a factor.
As you can see from our final figure we can also use this information to map the collecting sites of individual collectors. This data has a wealth of possible uses, and this is not just limited to identifying historic collecting sites.
Main Georeferencing lessons learned (barriers and constraints)
- Quality of locality data: The locality data collected from the label varied significantly across different groups in terms of quality. Certain groups have been georeferenced rapidly as there is a good standard of information (such as the butterflies), other groups, notably the Beetles (Coleopetra) have been very difficult, due to historically sparse locality information on the label and imprecise site information data. Standards for site locality information vary significantly across the collections.
- Curatorial and Research group interaction: The group’s georeferencing thus far have highlighted the importance of this interaction. Usually the label contains all the necessary site information, but sometimes this needs to be supplemented by staff with specific expert knowledge of the collection. This has proved invaluable when dealing with very specific and sparse label locality information.
- GeoRef Standards guidelines: The Museum georeferencing guidelines have proved invaluable in enabling clear instructions to be passed onto the team and for geographical standards and consistency are maintained. The guidelines are being updated as we tackle other groups across the Museum so that they provide a clear set of instructions of how we georeference specific localities and their extents, therefore, providing a geographical standard, based on our Museum data and best practise from other similar organisations.
The 80 million specimens in the Museum’s collections equates to approximately 3.1 million individual sites – a treasure trove of spatial wealth that can be utilised to answer multitudes of questions about species distributions and all that can lead on from this. Visit the Museum’s data portal to see what’s already available online.
Digital Collections Programme