Cranfield Libraries: Research data management: Guidance on planning your data storage

Guidance on planning your data storage

When planning a research project, it is important that you consider how you will store and curate your data. In some cases, this will be dictated by the software and equipment you are using or the conventions of your discipline. In other cases, you may have to make a choice between several options.

These are likely to be some of the key factors in your decision-making:

What software and formats you or your colleagues have used in past projects,
Any discipline-specific norms (and any peer support that comes with them),
What software is compatible with hardware you already have,
Whether you have funding for new software,
How you plan to analyse, sort, or store your data,
Volume and size of data (consider timescale, number of data points or observations and sampling frequency),

You can use a multi-disciplinary metadata standards directory to help you: Metadata Standards Catalog (bath.ac.uk)

But you should also consider:

What formats will enable sharing and reuse with colleagues and other researchers for future projects,
What formats are at risk of obsolescence, because of new versions or their dependence on particular software,
What formats will be the easiest to annotate with metadata so that you and others can interpret them days, months, or years in the future,
In some cases, it might be best to use one format for data collection and analysis and convert or export your data to another format for archiving once your project is complete.

Best formats for preservation

If you are not aware of any disciplinary standards these are some good file formats for the preservation of the most common data types:

Textual data: XML, TXT, HTML, PDF/A (Archival PDF), ODT, RTF,
Tabular data (including spreadsheets): CSV,
Databases: XML, CSV,
Images: TIFF, PNG, JPEG (note: JPEGS are a 'lossy' format which lose information when re-saved, so only use them if you are not concerned about image quality),
Audio: FLAC, WAV, MP3,
Video: mp4, ogg, mj2,
Transcripts of interviews – deposit anonymised transcripts rather than video or audio files to comply with GDPR and research ethics protect your participants.

The following sites provide further information on recommended formats for data sharing, reuse, and preservation:

UK Data Service recommended file formats

Library of Congress recommended formats statement (digital and non-digital formats)

Best practice is to deposit both the original file and an open format.

How much data are you going to create?

This may seem like an impossible task to know at the beginning of the research cycle but below is a ready reckoner:

NB: 1GB is assumed to be equal to 1024 MB, and so these examples are estimates.

1GB = 1 relational database (assuming average database file size of 800MB)
1GB = 56 SPSS files (assuming average SPSS file size of 18MB)
1GB = 60 Word files + 50 PDF files + 80 Excel spreadsheets (assuming average Word file size of 350KB, PDF file size of 4MB and spreadsheet size of 10MB)
1GB = 340 digital images (assuming average image file size of 3MB)
1GB = 252 MP3 audio files + 252 plain text files + 1 Excel spreadsheet (assuming average MP3 file size of 4MB, plain text file size of 40KB and spreadsheet size of 2MB)
1GB = 3,000 XML-encoded digital texts (assuming average XML file size of 330KB)
Video files: The size of these files is dependent on a number of factors including resolution and frame rate.
For instance, 1 minute of a 1080p 8-bit RGB uncompressed video is approx. 8GB.

Also think about the volume and types of data created by similar projects factoring in changes to timescales, number and type of data points or observations and changes to equipment or software.

We would advise against using data compression as this may impact the integrity of your data.

Creating your data files and folders

When you save your data files make sure that your filenames are consistent and meaningful to futureproof your research and prevent potential data loss. Future users of your data also need to be able to easily identify with the help of a top-level README file explaining contents. Please see our guidance on the contents and structure of a Readme file.

We suggest that file and folder names follow the following convention:
ISO 8601 formatted date–2 additional pieces of metadata–specific sample/document ID–version

For example, for our documents we would use:
20240509 – CU-RST-FFG-v1 (Date, Cranfield University, Research Support Team, File Format Guidance, version 1)

Things to note:

Your metadata should be abbreviated to 2 or 3 letters if they are consistently used. Abbreviations need to be explained in your README file.
You need to decide which label is more important to order the files and folders by (date, author, sample id)
File and folder names can be separated by long hyphens, underscores or capitalising each part (20240509CuRstFfgv1)

Use our file naming convention worksheet to make you think about naming conventions.

Preparing your data for deposit

Before you deposit your research data into CORD in CERES, you need to select, prepare, organise, and document your data, check any legal and/or ethical issues, and determine which level of access you will give to your data.

1. Select data to preserve

Not all the data that you have created and/or collected will need to be preserved. The principal investigator (PI) should perform a data appraisal to determine the data that needs to be archived. During this appraisal, think about:

Data that underpin a journal article. This should be preserved to ensure the integrity and transparency of the research, enabling others to verify or challenge research results,
Data that are unique or contain scientific value and will likely be used for future investigations,
Whether there are legal requirements to preserve the data,
If the raw data or processed data should be preserved,
If the codes/software that you used to process the data are publicly available.

2. Organise your data

Gather all the data selected for preservation. Determine if all the data will be deposited together in a single folder, or if some of the data need to be deposited separately as they require separate persistent identifiers.

3. Prepare your data

Ensure that the data are in an open format which enables re-use and helps to future-proof them as they will not depend on proprietary software formats which may no longer be supported in the future.

The UK Data Service also provide a data management checklist to work through.

4. Document and describe your data

Ensure that the data are structured and labelled consistently. Use meaningful file naming conventions and include documentation describing all the files and how to use them. The documentation should enable anyone to use and understand the data.

5. Legal, ethical, and commercial considerations

Check that you have legal permission to share your data. If your data contains any third-party copyrighted material, you must obtain permission to use and share these materials before depositing into CORD. Additionally, if your data contains any personal or sensitive information, ensure that you have explicit consent to share these data.

6. Access considerations

What level of access will your data require? Will the data be publicly available or will access only be available upon request? How will others be able to use and share your data? Determine how you would like to license your data by selecting the most appropriate data licence.

File sizes: Datasets smaller than 1GB

Datasets up to 1GB in size should be deposited into CORD in CERES via the submission form below.

If the data are associated with a publication (or publications), you should cite your data DOI in the data availability statement in the associated publication(s) and provide the full citation data in the reference list.

Only submit those files that are the final versions for publication as you will not be able to delete files. If you have mistakenly uploaded the wrong file(s) then contact us to let us know. You are responsible for consulting the guidance on file formats before submitting your data to the data repository.

File sizes: Datasets bigger than 1GB

If you want to deposit a single file greater than 1GB please contact the Library Services Research Support Team to discuss your requirement.

When to deposit your data into CORD?

Datasets associated with journal articles should be deposited into CORD once the corresponding article has been accepted for publication. Upon approval into CORD, the dataset will be allocated a DOI, which should be included in a data availability statement that should be added to the article prior to publication.

Applying an embargo

We understand that not all data can or should be made open access immediately. There may be times when you want to delay the point at which people can access the data you upload. It is possible to apply a temporary embargo to your dataset.

Benefits of embargoes

There are several reasons you may want to apply an embargo. These could include:

You want a period of exclusive access to your dataset but also want to be able to cite the dataset in publications, on project websites or at conferences. By uploading with an embargo you will still be issued with a DOI which you can cite in your publication.
You have a data sharing agreement in place with a data provider, collaborators or another institution which requires a delay in the sharing of data.

Useful guides on data collaboration issues

Research data management — UK Data Service

Sharing your data with collaborators | Research Data Management (cam.ac.uk)

Next steps

When you are ready to deposit data, click the button.

Deposit your data

Questions?

Please email the Research Support Team in Library Services.