Instant Ideas and Collaboration
Thank you to Jez Cope for this month’s feature article. Join us for the chat on Tuesday the 7th October on Research Data Management, 18:30 -20:30 BST.
Research Data Management Agenda is now available to add questions to the chat.
Research Data Management
Research data management (RDM) has become an increasingly hot topic in higher education recently, especially with the approach of the April 2015 deadline for compliance with EPSRC’s expectations for RDM in HEIs. But what is RDM, and how does it impact on libraries?
Why manage research data?
First off, it saves time and reduces stress. Well-managed data is difficult to lose: it’s stored on resilient systems, so it can’t be destroyed by a simple hardware failure or other accident. It’s well organised, and curated so that collections don’t fill up with garbage. It’s well described so that it can be found and understood when it’s needed: research papers are often written up long after the initial experiments took place. All of this takes some time up front, but can massively reduce stress.
Data that is managed supports research integrity, by enabling researchers to defend themselves against accusations of misconduct. “Climate-gate” is one example of where better data management practices could have defused a situation early which instead hit the media and spiralled out of control.
Well-managed data is reusable. This means that time and money that might have been spent repeating experiments in the lab to arrive at the same data can instead be spent obtaining new insights from that data or generating new data; funders are very keen on this!
Reusable data is in turn shareable, which brings its own benefits. You can even combine multiple datasets in new ways to make insights that the original data creators never even considered possible.
Data management vs data sharing
Just a quick aside: I’m a big fan of Open Science, but it’s important to distinguish between data management and data sharing. There are always going to be times when data just cannot be shared: it may be sensitive patient data, or relate to the confidential intellectual property of a commercial partner.
Managing research data well is essential to support sharing, but it’s still important even when the data won’t be shared, and can provide benefits specific to non-shared data. Better guarantees about confidentiality reduce risk and reassure participants. Managing data well can also aid communication and collaboration, leading to much better outcomes for partners who are keen to get their money’s worth.
Even in cases where data could (and maybe should) be shared, the researchers involved may not be comfortable with the idea, so I generally prefer to keep the concepts distinct and convince people to take care of their data first, then introduce sharing as and when appropriate.
In addition to all the above benefits, all of the Research Councils UK (RCUK – the British government funding agencies) and several charities have policies mandating research data management. These vary widely in focus and level of detail, but are all broadly aligned around the RCUK Common Principles on Data Policy. The Digital Curation Centre has a good summary of funder research data policies if you want to learn more. Increasing numbers of publishers (such as PLoS) are also requiring that data underlying publications be shared in some form.
What does RDM look like?
I’ve given some hints already, but let’s take a quick look at some of what’s involved in research data management.
Information security is often thought of as dealing only with privacy, perhaps because this is the focus of legal obligations such as the Data Protection Act and non-disclosure agreements. In fact, there are three interrelated aspects:
Protection from unauthorised access
Protection from corruption or alteration, whether malicious or accidental
Protection from loss of access, whether temporary (e.g. a power failure) or permanent
Most research institutions already have a variety of services provided by their central IT services which meet the information security needs of researchers, but these are not always well communicated and can be difficult to bring together in the right combination for a particular project.
Organisation, description and metadata
Keeping data safe is all very well, but it’s still no use if you can’t find it, or can’t understand it once you’ve found it. Many researchers have complex and carefully thought out structures that simply need documenting, while some just have an impenetrable mess of confusingly named files.
Data must also be described with appropriate metadata. This includes basic information to aid discovery, but also more detailed information that is needed to interpret the data properly, including links to methodologies or experimental protocols.
Once a project is complete it’s time to archive the important data for future reference and possible sharing. The project data must be appraised and selected for archival, which could be with an external service, such as the UK Data Archive or figshare, or with an institutional archive. A metadata record describing the data can then be made publicly available, indicating how the data itself can be obtained.
In some cases, archival will go together with publishing the data. Many researchers are uncomfortable about giving up control over who has access (and as I’ve mentioned, public access may not be suitable for everything) but if the dataset turns out to be popular it may be more convenient in the long run to allow reusers to obtain it directly from the archive.
Data management planning
Most research funders now require a data management plan (DMP) to be submitted with every grant application. The expected content of a DMP varies, but it will generally set out the researchers approach to the areas described above.
Who is responsible for RDM?
This is still up for debate: clearly there are aspects that sit naturally with IT services, research support offices and the library, while the researchers themselves have some responsibility for their own data. But what do you think?
About the author:
Jez Cope (@) is the Research Data Support Manager at Imperial College London Library (@). Jez is responsible for delivering the Library’s pilot research data management service. He works with research staff and students to help them look after and do more while fulfilling the expectations of their funders and publishers and the College.