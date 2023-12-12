Jason Hale, Product Manager Data Ecosystem, takes a look at how the Museum is using a novel algorithm to overcome the challenges of cataloguing environmental DNA data.

The UK is one of the most nature-depleted countries in the world. This means that when it comes to the amount of biodiversity remaining, the UK finds itself in the bottom 10% of countries worldwide primarily due to the fact that about 70% of our land is agriculture.

As well as being a world-leading visitor attraction, the Natural History Museum also has over 300 scientists tackling some of the biggest challenges facing the world today including climate change, biodiversity loss and food security.

The Museum’s Urban Nature Project brings together a team of people looking at the challenges the UK faces with increasing urbanisation.

The first step for our science team has been to develop and test a range of practical ways we can help create an accurate picture of what species are living where, so we can more quickly arrive at science-based solutions that protect UK nature and support recovery in urban areas.

One of these practical solutions has been to create a taxonomic equivalence engine.

What is taxonomy and why is this important?

Taxonomy is the naming, describing and then classifying the living world around us. But it’s not just about naming them, it’s about ordering and grouping them together based on shared characteristics. There are many important ways taxonomy helps us, from disease prevention, conservation and ecology to food security and pest management.

To be able to protect any environment, we need to know what is there to begin with.

A big challenge in wildlife conservation is getting an accurate representation of what species live where. Traditional survey methods only focus on a comparatively small number of species we can visibly see, but with environmental DNA (eDNA), we’re able to zoom into the microscopic world and detect thousands more species than previously possible. eDNA helps to bridge that gap between the seen and the unseen, enabling researchers to learn even more about the impacts we’re having on biodiversity (and what to do about it).

However, eDNA technology brings its own challenges, like discrepancies between taxonomies (species names).

For example, as part of the Urban Nature Project, soil samples were collected and processed in our in-house labs. The DNA detected from this process used species names from a global database, but since the Urban Nature Project is studying UK biodiversity, the results needed to be mapped back to UK versions. We call this taxonomic equivalence (or species equivalence).

Introducing the taxonomic equivalence engine: a faster and more accurate way to match species names

The process of manually mapping species can take our scientists a number of days depending on the volume of data, so we set about looking for a way to speed up the process and give our scientists more time to spend on analysing the results.

Our solution is the taxonomic equivalence engine: an algorithm which does this name matching between global and UK taxonomies automatically. This seemingly simple idea uses innovative technology to solve a complex problem.

It utilises graph database technology, which stores data as relationships between nodes rather than as tables in conventional databases (like Excel), allowing us to perform the name matching much faster than before. It’s an innovative solution which sits within the wider Data Ecosystem – our new biodiversity monitoring platform for researchers, which aggregates various types of biodiversity and environmental data together.

Built in partnership with Amazon Web Services (AWS), both the Data Ecosystem and taxonomic equivalence engine are helping accelerate the Museum’s UK biodiversity research.

For example, for twenty samples, it used to take researchers around a day to perform taxonomic equivalence manually, and they would naturally make human errors too. Now it takes less than five minutes, has no risk of human error, and provides a much more consistent and accurate approach to taxonomic equivalence.

For now, the taxonomic equivalence engine is only being used on eDNA data from the Urban Nature Project, but there is huge potential for wider impact in the future. Through the Data Ecosystem, we will centralise UK biodiversity data across the Museum, and by combining large volumes of eDNA data with audio and traditional biological records, we hope to develop brand new methodologies for tracking and monitoring biodiversity. This will ultimately help us to create and share practical, science-based conservation solutions to support UK nature recovery.

For more about all the Museum’s Urban Nature Project plans , visit: https://www.nhm.ac.uk/about-us/urban-nature-project.html