A guide to using Excel for data cleaning, analysis, and visualization

Excel is a program for creating, processing, analyzing, and visualizing tables of data. Spreadsheet programs like Excel make it easy to enter your data and see all of it at once, which can be more difficult in other data analysis programs. On the other hand, other data analysis programs may offer more powerful options and may be more reliable for advanced statistical analysis.

Here are some basic concepts that are important for getting started with Excel.

You may see these terms in various training documents.

Formulas are calculations that Excel can perform on your data. This includes everything from basic arithmetic to more complicated, built-in functions that operate over a large number of cells. After you click inside a cell to make it the active cell, you can type a formula into the formula bar. Formulas begin with an equals sign ("=").

After you enter a formula into the formula bar, the result of the calculation (or the "value") will appear in the active cell. The formula, however, will still appear in the formula bar. This is great for keeping track of how the value was calculated, but it can be a bit risky, too. Sometimes, you'll do something like delete or move a column, or re-sort the data, and all of a sudden the formula will break and you will lose the value.

To keep just the values, you can copy the cells that contain the formulas, and then "Paste Values," either over top of the formulas or into another set of cells.

There are plenty of ways to copy formulas and values in Excel. One especially useful way is the "fill handle." This is a black "plus sign" that appears when you hover your mouse over the bottom-right corner of the active cell.

A formula can either contain just numbers and arithmetic operators (+, -, etc.), or it can include built-in Excel functions and references to other cells. References can be a bit tricky, so you might want to look through this article on references. This information is especially helpful if you're going to be copying formulas from one column to another.

Sometimes you'll be open files that are already in Excel format (.xls or .xlsx). Sometimes, though, you may be opening a data file that is in an alternative format, like .txt or .csv. These formats can easily be read by Excel, but sometimes Excel doesn't load everything correctly using the defaults.

In a nutshell, Excel will try to guess what kind of variable you have in each column - text or number. If something looks like a number, Excel will apply rules that it applies to all numbers. One of these rules is that numbers can't start with "0." If you have data that look like numbers but actually have zeroes at the beginning - for example, zip codes - you may accidentally lose those zeroes when you open the data file.

To specify the data type of some of your columns when you open the data file, you want to use the "Get External Data from Text" option. This Text Import Wizard will allow you to specify other things, like where the data should be imported and whether you want to skip some of the columns.

Excel is extremely useful, but it is very easy to access and change the data in individual cells, which means it is very easy to accidentally erase data or introduce errors. Here are a few tips for data management in Excel:

- Never work from the original copy of your data.

*(Always make a copy first, then start working.)* - Save under a new file name after a major cleaning or processing step is done.

*(Use descriptive file names to retrace your steps.)* - Create a “README” file to keep track of everything you do to your data.

*(This includes the sources of your data, field properties, data cleaning steps, Excel functions, etc.)*

- Last Updated: Jul 19, 2017 4:16 PM
- URL: http://guides.library.duke.edu/excel
- Print Page

411 Chapel Drive

Durham, NC 27708

Durham, NC 27708

(919) 660-5870

Perkins Library Service Desk

Perkins Library Service Desk