Publishing research data
Philippa Frame and Stephanie Jacobs
Getting started
Why publish research data?
Publishing data and citing its location in published works allows others to replicate, validate and ensure accuracy of results. Sharing data improves the scientific record and increases scientific integrity. The Australian Code for the Responsible Conduct of Research, 2018 advises that researchers should share their data wherever possible and appropriate.
To support best practice in sharing data, QUT has adopted the F.A.I.R data principles to make research data findable, accessible, interoperable and reusable. The benefits of sharing data publicly include:
- increase scientific integrity (replicate, validate and correct results)
- satisfy funding bodies/publisher requirements
- preservation for future use (reduce duplication efforts)
- increase impact and engagement (increased citation rates of up to 69%)
- defend validity (also new research methodologies)
- support applications for promotion, tenure and grants
- facilitate collaboration and networking with industry partners.
Care should be exercised when sharing data that is sensitive, confidential, or subject to privacy legislation. It may be possible to share such data through mediated access arrangements, however, relevant legislation and the conditions of any commercial agreements and ethics approvals must be adhered to.
Exercise: Watch a video on publishing research data.
‘Rethinking Research Data’ | Kristin Briney | TEDxUWMilwaukee
‘Have you published your research data?’ (MediaHub video, 8min) (QUT login required).
Registries vs repositories
When it comes to data publishing, researchers may use repositories or registries to publish their data (as well as through other methods). A data repository stores data along with metadata, whereas a data registry acts as a catalogue or index of research data and only stores metadata.
An example of a research data registry is the Australian Research Data Commons’ Research Data Australia, a national registry of research datasets, that harvests metadata (only) from institutional data repositories, including QUT.
institutional and discipline-specific repositories
QUT’s institutional data repository
QUT’s Research Data Finder is a digital repository for research data created or collected by QUT researchers. Researchers can publish data, code, software, spatial images and more.
Features include:
- DOIs and data citations generated for datasets
- Links to related research projects and people
- Creative Commons licence selection
- Machine and human readable records
- Meets funder and publisher requirements for publishing open access datasets.
Researchers can specify the level of access to their datasets – open (publicly available for access, use, re-use and redistribution) or mediated (access by others requires approval by the data owner).
Metadata and data stored in Research Data Finder is exposed to a wider research audience through selected external harvesters, including the Australian Research Data Commons (ARDC) (through their national portal, Research Data Australia) as well as being highly indexed by the major search engines.
Discipline-specific and multidisciplinary repositories
Some repositories are specifically designed for the publishing of datasets to accompany journals that have data policies (such as the PLOS data policy). These repositories make the data underlying scholarly publications F.A.I.R, and include:
- figshare – A repository where users can make all of their research outputs available in a citeable, shareable and discoverable manner.
- Dryad – An international disciplinary repository of open data underlying scientific and medical publications.
There are other types of data repositories suited to different purposes, including:
Discipline-specific:
- DataONE
- GenBank
- PANGAEA
- Scientific Data (Nature’s data journal)
- Australian Data Archive (social sciences)
- TERN (terrestrial ecosystem data)
- IEEE DataPort (electrical and electronic engineering)
Source code:
To find a discipline-specific research data repository, search the Directory of Open Access Repositories (OpenDOAR) and the Registry of Research Data Repositories.
Exercise: Find a repository or registry. What are some of the features that make it an appropriate tool for publishing research data?
Learn more
documentation and metadata
Data documentation should provide contextual information for the data so that it can be understood in the future. Documentation requirements will vary depending on the discipline and type of research being conducted.
Documentation should include:
- project aims and objectives (to provide context)
- catalogue of data collected
- description of lifecycle of key data elements (procedures for collection/creation, validation, transformation, processing, analysis, publication, archiving/destruction)
- description of instruments, calibrations etc.
- description of how data is structured (data model, coding schemes, controlled vocabularies etc.)
- details of any quality control processes
- confidentiality agreements and consent forms
- manuals, code books, procedure documents.
Metadata (data about data) is standardised information about a resource, presented in a structured format that is machine-readable and human-readable.
Metadata can describe individual items or groups of items (individual files, images or datasets etc.). The items described by the metadata may be physical or digital. For example, a library catalogue includes metadata about books held in the library plus the electronic journals to which the library subscribes. The metadata helps the library to manage its resources and assists users in the discovery and use of those resources. Likewise, metadata helps researchers to manage and re-use data after its creation.
Ideally, as much metadata as possible should be gathered at the beginning of a research project, with ways devised to collect metadata (automatically if possible) throughout the life of the project.
A ‘metadata schema’ defines a set of terms that will be used to describe a resource and a set of rules that define the syntax or application and language (e.g. XML). Wherever possible, metadata should be created using an existing schema to assist in interoperability and the ability to share data.
In general, the types of metadata collected will consist of:
In many disciplines, there are commonly-used standards for describing and sharing data within the discipline. Expand the boxes below to read some examples.
Challenge me
Exercise: Use these instructions to explore how to deposit data into Research Data Finder: Research Data Finder Quick Guide
Attribution
Content in this chapter has been developed by QUT Library.
All information correct at time of publication, 11 October 2021.
image credits
Royalty-free images used on this page were sourced from unsplash.com and pixabay.com.
Icons created by priyanka, Dinosoft Lab and Wichai Wi from Noun Project.