The Data Portal grows up | Digital Collections Programme

How latest upgrades mean that the Portal finally comes out of beta

We have just completed a major upgrade that not only gives the Museum’s Data Portal a new look but promises faster response times, optimised search capabilities, new methods to track digital collection use and improve our capabilities to meet the needs of users in the future.

portal1

“It has been almost four years since we launched the Data Portal back in 2015 and in that time we have transformed access to the collections, creating an audience that is at least ten times greater than the number of scientists able to visit our physical collections.” Vince Smith, Head of Informatics.

Usually, software is kept in beta for a short period of time—as a transitional phase between “alpha” (when in-house testers or focus groups try out the software) and the official release. Neither of these qualities accurately describes the Museum’s Data Portal, which is why the tag has now been removed. The Portal has delivered more than 18 billion records downloaded in over 250k download events since 2015. It contains over 4.48 million records from the specimen collection and a further 6 million records from other research datasets including 3D scans, images, video and audio recordings as well as other structured data in tables. More than 393 scientific publications have cited data from the Data Portal, either directly or through aggregators such as the Global Biodiversity Information Facility (GBIF), though there are many more citations that it is currently not possible to track.

Why use collections data?

Museum collections contain specimens collected over the last 200 years, including details of where and when they were collected. This is a critical time period, during which humans have had a major impact on the distribution of biodiversity, radically affecting landscapes through increased consumption of natural resources, pollution and climate change. The scientific research that our data has been used for reflects this significance, covering topics including agriculture, biodiversity, evolution, ecology, species distributions and human health.

Museum Researcher Dr Natalie Cooper led a recent study using collections data to look into sex biases across natural history collections. Over 250k Museum specimens from the Data Portal formed part of a larger dataset of 2 million birds and mammals across five major natural history collections. 

“The Museum collections we used date from 1850 onwards. These early records allowed us to investigate changes in sex biases through time that would not be possible without the detailed metadata available on the Portal.”  Dr Natalie Cooper

dataportal2
Bird specimens from the Museum’s collection.

On the surface, a bias towards males in museum collections might not seem like much of an issue. However, many aspects of a species’ biology and behaviour that are affected by an animal’s biological sex.

“One of the most striking biases is seen when looking at type specimens. These are the individual specimens on which a species description is made, and so seen as a typical example of that particular species. Of these, only 25% of bird and 39% of mammal type specimens were female.” Dr Natalie Cooper

There are significant differences between males and females in their morphology, and any bias towards males could result in it being difficult to identify females down to species level because we don’t have a good record of their morphology. These biases can also affect other, sometimes surprising, things. For example, males can be more susceptible to parasites as testosterone inhibits the immune system. Where collections are sex-biased, research looking into infection and immunity within a species could be skewed.

“Many of the projects I work on use large amounts of data across a broad range of taxa. Being able to quickly and easily query the portal and get these large datasets has been indispensable for my work. The Museum’s collection is amazing but it has gaps like every other collection, so being able to combine our data easily with data from other institutions allows us to get a much better picture of what is going on for all kinds of questions and projects.”  Dr Natalie Cooper

New features transform the study of natural history

We’re looking to make it easier for users to use the research and data produced at the Museum. We have converted the site to use a more responsive framework which improves the user interface so it looks great on all platforms from phones to desktops. This upgrade gives us the opportunity to more rapidly develop and release new features as well as giving users a better experience.

We have also added the expanded search capability. This allows users to search across Specimens, Collections and Everything. This allows users to search across all collections and research data at once. Previously, only single data resources could be queried at a time. This improvement allows searches to be carried out across all the Museum’s data. Not only does this support simpler navigation and discovery of data, but it also increases the connectivity between datasets. The new search functionality can be found by using the main search bar on the home page.

dataportal3
The main search bar on the Data Portal home page

In addition to being able to search across all resources simultaneously we have also improved the search functionality available on the site. A new user interface has been created to facilitate the cross-resource search which is more intuitive to use and provides new search possibilities including more complex searches using and and or clauses, numerical searches and the ability to filter using ranges of values.

dataportal4
New user interface facilitates more complex searches

To better support our more tech minded users and provide additional open access, we have created a Python library to allow easier programmatic interfacing with the Data Portal. Documentation of the application programming interface (API) is available on the Portal and has been updated with examples. This library is still in active development, but a pre-release version is available on PyPI.

In April this year, we released an update that allows fully persistent, dynamic and repeatable data retrieval for search results using Digital Object Identifiers (DOIs). In the seven months since it’s release, we have minted over 1100 DOIs using the system which reference over 200 million records. This is important because it allows scientists to check and re-use others’ data in a more repeatable way. DOIs also make it easier to track both the usage and impact of research and collections data.

dataportal5
A word cloud of the query and filter terms used in the 1100+ DOIs created since April ‘19

Looking forward, we have a public roadmap full of features which we will be working on in the coming months. Highlights include:

  • Better tracking of citations and new integrations with data on the Data Portal so that users can easily navigate to literature related to the specimens they are viewing
  • Integration with the ORCID system for researcher identification
  • Improved API and usage documentation with more programmatic integration libraries (an R library!)
  • Support for the IIIF (International Image Interoperability Framework) standards and associated image viewers which would allow enhanced access to the images of our specimens.
  • Increased use of links from specimens to other resources such as genetic analysis data and 3D resources so that users can, again, find related datasets to the specimens they are viewing

The changes that the Data Portal team have made pave the way for future evolution of the Data Portal and unlock further improvements. Find out more about the Museum’s data on the website or explore the Data Portal at data.nhm.ac.uk. If you would like to get in touch, email data@nhm.ac.uk.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.