File formats can affect long-term preservation and reuse. While researchers may use proprietary file formats for analysis, converting data to open and/or standard formats will help ensure the data can be rendered and accessed in the future. Researchers can also chose to make data available in both preservation-friendly formats and original file formats.
Best practice suggests selecting formats that are open/documented standards, non-proprietary, unencrypted, uncompressed, and commonly used by your research community. For example, when you have spreadsheet-based (aka tabular) data save the file as Comma-separated values (.csv) instead of Excel (.xls, .xlsx) and for text files use Plain text (.txt) or PDF/A (.pdf) instead of Microsoft Word (.doc, .docx).
The definition of what "data" are varies by discipline. In some fields, a published article or report could be considered data, but in others, the "bones" of that article - the data behind figures, tables, graphics, and other conclusions - are what could be considered data. If a research funding agency requires a formal Data Management Plan, they will often provide some guidance as to what they would consider data.
|Data Type||Original Data Format||Preservation Friendly Formats
(Open Standard, Uncompressed)
|Text||Hand-written, docx, wpd, odt, rtf, txt, html, xml, pdf||xml, PDF/A, txt|
|csv, tsv, pipe-delimited, xls(x), ods, dif, xps||csv|
|sav (SPSS), sas7bdat or xpt (SAS), dta (STATA)||csv, txt with setup file or associated script (r or m)|
|Database||mdb, dbf, sql, sqlite, db, db3, xml||xml, sqlite|
|Visual||static: pdf, jpeg, tiff, png, gif, bmp,
moving: mpeg, mov, avi, mxf
|PDF/A, tiff, JPEG2000
|Audio||wav(e), mp3, mp2, aiff, wma, aac, dct, flac, ogg,||wave, aiff|