Skip to main content

Research Data Management

Best Practices for File Versioning

Complex research projects will inevitably produce multiple versions of files. Implementing a file versioning strategy at the beginning of your research project will help to avoid confusion amongst collaborators and avoid lost time and effort trying to recover the "right" version of a file. It is also best practice to document all changes made between versions.

The Simple Way

File versioning can be as simple as integrating version numbers into your file naming convention (i.e., v1, v1_2, v2) or by using dates. Avoid using ambiguous terms such as ‘final’ or ‘revision’. In the example below, the type of analysis, publisher, and deadline of the submission is listed, followed by a version number:

MultivariateAnalysis_PLOSSubmission_20171130_v3.csv

Using Tools

There are specific software tools that can be used to strictly maintain version control. The most popular is perhaps Git, which is the foundation of both GitHub, GitLab and Bitbucket. See this Wikipedia List of version control software for more.

Certain storage and file sharing platforms, such as the Open Science FrameworkGoogle Drive, DropBox and Box, also have built-in version tracking with the ability to get back to earlier file versions.

Keep Original Files Read-Only

Always keep a "read only" version of your raw, unprocessed dataset to protect against unintentional changes. For important files that are the original stem of a project, keeping a read-only file or password protected zipped package helps prevent accidents.