Who uses collection data? | Digital Collections Programme

blog 1
The Global Biodiversity Information Facility provides open access to the world’s biodiversity data

There is an incredible amount of specimen data providing critical information on the  natural world, but with less than 10% of the estimated 1.5 billion specimens digitally accessible, who is using this information?

This blog provides a brief overview of how the Museum’s digital collection sits within the wider landscape of efforts to digitise our collections and some insight into how these data are making an impact on the wider scientific community.

The Museum’s Data Portal​ currently holds over 4.3 million specimen records from the Museum collection. One of the major destinations of this data is the Global Biodiversity Information Facility (GBIF), an international network and research infrastructure that provides free and open access to the world’s biodiversity data. Currently, the GBIF network of nearly 100 formal members and more than 1,400 data-sharing institutions provides access to 1.3 billion species occurrence records—evidence on where and when organisms have been observed or collected.

Historical Data provides a baseline

Natural History Collections only makes up a fraction (around 10%) of the 1.3 billion occurrences on GBIF, but what they do provide is a unique historical perspective on the distribution of bio- and geo-diversity. Much of the data  comes from observations of an organism at a particular time and location. This information provides key information on recent changes in global species distribution patterns. The vast majority of these observations are comparatively recent, providing critical insight into where biodiversity is presently distributed.

To look at biodiversity over longer timescales, however, we need historical records. Museum collections contain specimens dating back over 200 years, including details of where and when they were collected. This is a critical time period, during which humans have had a major impact on the distribution of biodiversity, radically affecting landscapes through industrialisation, increased consumption of natural resources, pollution and climate change. This impact is so stark that some scientists recommend that this time period is assigned its own name the Anthropocene to mark out the extraordinary impact humans have had on the history of our planet. 

Data from this period provide invaluable baseline measurements of biodiversity and climate before the Anthropocene. They are an essential contribution to ongoing, cutting-edge research: Dr Natalie Cooper is using data on some of the oldest vertebrates in the Museum collection to define the baseline of vertebrate diversity before the beginning of the Anthropocene. These baselines can then be used to investigate how humans have affected vertebrate diversity since this time, and to provide recommendations for restoring environments to pre-Anthropocene conditions.

spirit collection
A selection of vertebrates from the Museum’s spirit collection

“It’s really important to work out what the true baseline diversity was before humans started rapidly changing the environment. Otherwise we tend to compare the environment today to what it was like just 30 years or so ago. If lots of extinction and human impacts occurred earlier than this then we might not be restoring environments correctly, and might we miss major issues that mean our conservation efforts won’t be as effective.” Dr Natalie Cooper

Organising our data

Historical collections are invaluable but they have a distinct a set of challenges in terms of data management. Often the information we have about a specimen is incomplete, out-of-date or hard to interpret, making it difficult to organise, access, visualise and analyse this information. At the Museum, the Digital Collections Programme works closely with the Informatics group to develop standards and policies for specimen data that enables the Museum’s work to feed into the wider Natural Science global community. 

The Natural History Museum’s collection of specimens paints one part of a global picture on the distribution of and diversity of the natural world across space and time. Through organisations like GBIF we can bring this data together to paint a far more complete picture than is possible via any one institution. GBIF helps participants do this in a  standardised form providing a critical resource for researchers mapping and modelling the natural world. This combined effort bring vast quantities of data together, enabling the natural sciences community to build a future where both people and the planet can thrive. 

Since 2015, 16 billion of the Museum’s specimen and research records have been downloaded and used by researchers, curators and natural science enthusiasts worldwide as part of over 200k downloaded datasets. A significant proportion (~60%) of this data is accessed through the GBIF portal, which harvests life science collections data from related collections and observation databases across the world and standardises these to make them fit for researchers to easily and accurately search across multiple collections.

GBIF’s processes dramatically magnify the impact of the Museums collections, helping us to track the 267 scientific publications have used Museum data over the last four years. This provides fascinating insight into the diverse uses and impact of our collections.

data 2
Overview of research papers citing Museum data separated by topic.

How GBIF data is being used by the UK Scientific Community 

For the past decade, the UK scientific community has been among the most active users of data from by the GBIF network. Since GBIF began tracking scientific literature in 2008, researchers based at institutions in the UK (and their co-authors abroad) have made substantive use of GBIF-mediated data in 445 peer-reviewed journal articles. This bibliography is freely available and searchable across many facets and dimensions, along with the complete suite of 923 references of all documents (e.g. books and book chapters, academic theses, conference proceedings) and citation types (discussions, acknowledgements et al.) during the same period.

These studies cover a broad range of topics including ecology and evolution (34%), conservation (19%), invasive species (15%), impacts of climate change (13%), human health and agriculture (5% each), with specialised disciplines like phylogenetics, taxonomy and other biodiversity and biogeographic topics accounting for the rest. Funders of this research in recent years has been equally diverse, with authors acknowledging the support of 35 different British agencies, universities, societies, trusts and foundations since 2014. The Natural Environment Research Council leads the pack, with 23 acknowledgements, followed by the Scottish Government’s Rural and Environment Science and Analytical Services Division (6), DEFRA (5) and the University of Oxford (5).

Shifting the paradigm of science

UK based researchers Warren et al based at the Tyndall Centre at East Anglia used more than 385 million records from GBIF—among them 73,789 records from Natural History Museum, London—to perform a global-scale analysis of the effects of climate change. The open data used in The projected effect on insects, vertebrates, and plants of limiting global warming to 1.5°C rather than 2°C covers more than 115,000 terrestrial species, including more than 34,000 insects and other invertebrates.


Interactive map showing the countries that contributed data to Warren et al study


Proportion of records contributed to the Warren et al study by each of the 5.5k institutions

The Warren et al paper forms the primary basis of latest biodiversity findings from the Intergovernmental panel on Climate change (IPCC) and can be seen in chapter 3 of their latest report on Global Warming of 1.5°C. Without collections sharing their data openly with commonly accepted standards this research would not be possible. Members of the natural science sector may be spread all over the world, but we face the same challenges when it comes to unlocking and sharing the information in our collections. It makes sense, to pool our technologies, resources and tools as well as our data. 

Museum Scientist, Professor Andy Purvis, is the Principal investigator of PREDICTS (Projecting Responses of Ecological Diversity In Changing Terrestrial Systems) and the second phase of the project PREDICTS2. This is a collaborative project combining many data sources to investigate how local biodiversity responds to human pressures to improve our ability to predict future biodiversity changes. Because of his experience of global analysis and modelling experience, Andy has been selected to join the International science-policy Platform on Biodiversity and Ecosystem services (IPBES).  This intergovernmental body brings together 145 world experts to assess the state of biodiversity estimating that 1 million species are at risk of extinction. Understanding how human pressures influence global biodiversity can help experts make recommendations of changes we can make to prevent further decline of biodiversity and reverse the downward trajectory that is expected if we carry on using the Earth’s resources at the current rate. 

Providing free and open access to global biodiversity data is shifting the way in which scientists are studying the natural world. Analytical tools enable us to understand how our data is used so that we can demonstrate the impact of our work. By ensuring that our data abides by international data standards and contributes to larger data aggregators such as GBIF; the Museum contributes a valuable resource to Science and part of wider solutions to the challenges facing humans today. 

Explore our data online at data.nhm.ac.uk and if you would like to stay in touch with the Digital Collections Programme you can follow us on twitter or instagram or find out more about the programme on the website.

One Reply to “Who uses collection data? | Digital Collections Programme”

Comments are closed.