Data management is a broad term. Here, Samantha Luciano, a second-year student of the MSc Biobanks & Complex Data Management of the Côte d’Azur University in France, talks about what it means in the context of a biodiversity collection. What is data? What is it used for? In what form(s) is it found? What do we do with it?
What are we talking about when we say data management?
The term data management refers to the whole process of ingesting, storing, organising, cleaning, tracking, tracing and maintaining the data that is created or collected. The concept encompasses a broad combination of functions aimed at making data accurate, available, discoverable and accessible.
Good data management also eliminates duplication of data and standardises its format. Data may come from different sources, and therefore be of different types, and may not be collected in the same way by each system or user.
In a collection of biodiversity samples such as the Wildlife Veterinary Investigation Centre (WVIC) collection, managed by the Museum’s Molecular Collection Facility, the value of a physical sample is considerably increased when accompanied by data. Indeed, what value would a sample have if we knew nothing about its origin, nature, age or quality? It would be totally unusable or would require time and various experiments to learn more.
But would scientists want to risk wasting time and money on manipulations if the sample is not worth it? Wouldn’t they prefer to get a good quality, trusted sample and associated data directly for their research?
At the heart of the Wildlife Veterinary Investigation Centre (WVIC) collection
The WVIC collection is the result of more than 20 years of work and a lot of data was gathered in that time! The naturalist behind the collection, Vic Simpson, was not only very rigorous with his practical experiments, but also collected masses of data in large Excel spreadsheets. This data included taxonomy, geographical origin, sex and age of the animals, as well as measurements (weight of organs, length of bodies or parts, etc.), observations and health status.
Now that the collection has been bequeathed to the NHM, this data considerably increases the value of the samples. However, to adapt them to the museum databases, meticulous work has been required to reformat, sort and organise this information.
The various samples (called ‘preparations’) are associated with an individual (called ‘parent/voucher specimen’) and its data. All samples are then counted and inventoried as several preparations can come from the same parent (e.g. one piece of liver and two of kidney from the same red squirrel M5346, i.e. three preparations from one parent). All manual data was checked (geographic information, measurements, comments etc.), cross-referenced using different functions such as VLOOKUP, formatted, and reorganised (sometimes using macros) to best suit the organisation of the Museum’s current collection management system, EMu. A total of 717 preparations, 367 parents and 37 taxonomy records will then be created to accurately represent the tubes already stored in the MCF. The rest of the samples to process (tissues in bags and other tubes formats) will in due course follow suit.
The Museum currently uses a web-based sample inventory management system called FreezerPro® which helps with the tracking and tracing of samples in and out of freezers. These two systems are rather complementary, with EMu storing vast quantities of metadata for samples (e.g. taxonomy and pre-requisite information for legal compliancy) and FreezerPro® storing the corresponding locations.
Long-term issues with data management
The associated data represent an indispensable source of information for current and future research. It is essential to manage data in a sustainable way to ensure its future. As technology advances, servers, databases and other software are becoming more powerful and capable of storing ever-increasing amounts of complex data. That being said, the human presence is still required for the collection and curation (i.e. organising and reformatting) of this data. Training in data management is therefore crucial.
In addition to having an impact on research and sample quality, data also influences the finances of an organisation. Indeed, keeping non-discoverable, inaccessible ‘orphan’ samples, i.e. without associated data, takes up space in energy-intensive facilities (refrigerators, freezers and/or liquid nitrogen tanks) and risks the financial sustainability of institutes’ biobanks when they are kept for many years if ‘nobody wants them’.
Biodiversity biobank managers (although this principle also applies to their human counterparts) need to manage the collections and decide what samples are financially justified to keep, anticipating and prioritising higher value samples which are more likely to be used and shared in the future. As funding in the field is limited, all resources are strictly controlled, and less useful expenditure, or wastage without any benefits from sample usage, must be avoided.