Contact us at datamanagement @ duke (dot) edu or attend one of our data management workshops.
File formats can affect long-term preservation and reuse. While researchers may use proprietary file formats for analysis, converting data to open and/or standard formats will help ensure the data can be rendered and accessed in the future. Researchers can also chose to make data available in both preservation-friendly formats and original file formats.
Best practice suggests selecting formats that are open/documented standards, non-proprietary, unencrypted, uncompressed, and commonly used by your research community. For example, when you have spreadsheet-based (aka tabular) data save the file as Comma-separated values (.csv) instead of Excel (.xls, .xlsx) and for text files use Plain text (.txt) or PDF/A (.pdf) instead of Microsoft Word (.doc, .docx).
Repositories may provide a list of preferred files formats (see Dryad’s File Types Guidance). The Library of Congress also provides information on recommended file formats.
The definition of what "data" are varies by discipline. In some fields, a published article or report could be considered data, but in others, the "bones" of that article - the data behind figures, tables, graphics, and other conclusions - are what could be considered data. If a research funding agency requires a formal Data Management Plan, they will often provide some guidance as to what they would consider data.
Data Type | Original Data Format | Preservation Friendly Formats (Open Standard, Uncompressed) |
Text | Hand-written, docx, wpd, odt, rtf, txt, html, xml, pdf | xml, PDF/A, txt |
Tabular Simple (minimal metadata) |
csv, tsv, pipe-delimited, xls(x), ods, dif, xps | csv |
Tabular Extensive |
sav (SPSS), sas7bdat or xpt (SAS), dta (STATA) | csv, txt with setup file or associated script (r or m) |
Database | mdb, dbf, sql, sqlite, db, db3, xml | xml, sqlite |
Visual | static: pdf, jpeg, tiff, png, gif, bmp, moving: mpeg, mov, avi, mxf |
PDF/A, tiff, JPEG2000 MPEG-4 |
Audio | wav(e), mp3, mp2, aiff, wma, aac, dct, flac, ogg, | wave, aiff |
For more, see the UK Data Service Recommended Formats or the Recommended Formats Statement of the Library of Congress