Researchers generate ever-increasing amounts of data when performing their research and they need to find new ways of managing this data properly. This process is accelerated by the research councils and other funders in the UK, who are increasingly requiring bids to indicate how they will manage the data that is part of the research that is to be funded. I attended the second day of the ‘roadshow’ organised by the Digital Curation Centre (DCC) in London last week, where Sarah Jones presented about the funders’ data policies. The DCC have an excellent concise overview of funders’ requirements, which shows that nearly all now expect policy stipulations regarding data management planning and sharing. Sarah showed that aspects that are important to nearly all funders are timely release of data, open data sharing wherever possible and provisions for longer-term preservation of the data.
The burden for much of this will be placed on the institutions, who need to have a clear data management policy in place, as well as the tools to support all the aspects of data management. The DCC is helping institutions with this, via the excellent resources on their websites, tools such as DMP Online to help them create data management plans, and by organising events such as the roadshow where experiences can be shared and best practices disseminated.
On the more technical level, there are a number of open source tools available that will allow departments and institutions to manage research data. HEFCE and JISC have funded a number of projects, that release their software as open source. A few examples are:
- DataStage is a secure personalized ‘local’ file management environment for use at the research group level
- DataBank is a scalable data repository designed for institutional deployment.
- VIDaaS (Virtual Infrastructure with Database as a Service) is a project of two halves. The ‘DaaS’ part will develop software that enables people to build, edit, search, and share databases online; the ‘VI’ part involves the development of an infrastructure enabling the DaaS to function within a cloud computing environment.
- BRISSkit will design a national shared service brokered by JANET to host, implement and deploy biomedical research database applications that support the management and integration of tissue samples with clinical data and electronic patient records.
There is an excellent opportunity for institutions and research department to start trialling these tools without the need to make large investments. And if the tool fits your use case, it is easy to get involved with the community and benefit from the opportunities that the open development approach offers. OSS Watch is here to help!