Skip to main content

Research Data Management

File Formats for Preservation and Reuse

File formats can affect long-term preservation and reuse. While researchers may use proprietary file formats for analysis, converting data to open and/or standard formats will help ensure the data can be rendered and accessed in the future. Researchers can also chose to make data available in both preservation-friendly formats and original file formats.

Best practice suggests selecting formats that are open/documented standards, non-proprietary, unencrypted, uncompressed, and commonly used by your research community. For example, when you have spreadsheet-based (aka tabular) data save the file as Comma-separated values (.csv) instead of Excel (.xls, .xlsx) and for text files use Plain text (.txt) or PDF/A (.pdf) instead of Microsoft Word (.doc, .docx).

Repositories may provide a list of preferred files formats (see Dryad’s File Types Guidance). The Library of Congress also provides information on recommended file formats.

Common Data Formats

The definition of what "data" are varies by discipline.  In some fields, a published article or report could be considered data, but in others, the "bones" of that article - the data behind figures, tables, graphics, and other conclusions - are what could be considered data.  If a research funding agency requires a formal Data Management Plan, they will often provide some guidance as to what they would consider data.

Data Type Original Data Format Preservation Friendly Formats
(Open Standard, Uncompressed)
Text Hand-written, docx, wpd, odt, rtf, txt, html, xml, pdf xml, PDF/A, txt
Tabular Simple
(minimal metadata)
csv, tsv, pipe-delimited, xls(x), ods, dif, xps csv

Tabular Extensive
(variable and value labels, missing data defined)

sav (SPSS), sas7bdat or xpt (SAS), dta (STATA)  csv, txt with setup file  or associated script (r or m)
Database mdb, dbf, sql, sqlite, db, db3, xml xml, sqlite
Visual static: pdf, jpeg, tiff, png, gif, bmp,
moving: mpeg, mov, avi, mxf
PDF/A, tiff, JPEG2000
‚Äč
MPEG-4
Audio wav(e), mp3, mp2, aiff, wma, aac, dct, flac, ogg,  wave, aiff 

For more, see the UK Data Service Recommended Formats or the Recommended Formats Statement of the Library of Congress