Skip to main content

Excel

A guide to using Excel for data cleaning, analysis, and visualization

PivotTables

PivotTables make it easier to rearrange and summarize the data in your Excel worksheets.  PivotTables work best when you start with a table of data that is very clean and specific - all columns have been labeled, the data are at the lowest level of detail, and there are no extra analysis or description rows in the data block.  See the data cleaning section if you have questions about what a clean data table looks like.

To create a PivotTable, select the entire data table and go to Insert --> PivotTable.  You can keep all of the defaults and click OK.  This will create a PivotTable on a new worksheet.

A PivotTable has two components: a Field List (the PivotTable Builder) and the table itself.  At the top of the Field List, there is a list of all of the column names from the original data set.  (This is why it is important to have good names for your columns!)  At the bottom of the Field List, there are four boxes: Filters, Columns, Rows, Values.  Each of these are areas where you can drag columns (or "fields") from the list at the top.

Filters

The Filters box allows you to apply a global filter to the PivotTable -- that is, you can use a particular field to remove rows from the data so that they won't be represented in the PivotTable, without having to add that particular field to the PivotTable itself.  

If you drag a field into the Filters box, it will create a filter very similar to what you see when you create a filter elsewhere in Excel.  Every unique value in that field will be added to a list, and there will be a checkbox in front of each value.  To remove rows with a particular value from the PivotTable, uncheck the box next to the value.  You can also search for values to make it easier to turn them on and off.

Columns/Rows

The Columns and Rows boxes are for categorical (text) variables - something like the names of states or the names of month.  When you add a field to either the Columns or Rows box, Excel will find every unique value in that field and create a separate row or column for each value.  You can drag multiple fields to both Columns and Rows; the second field will be listed as subdivisions of the first field, and so on.

Values

The Values box is where you drag numerical variables. Every category you've added to the PivotTable already will "contain" one or more of the rows of the original data table.  When you drag a numerical variable to "Values," Excel looks at that variable for all of the rows in that category and will summarize the values in those rows into a single number.  

Excel can summarize a numerical field in a variety of ways: sum, count, average, max, min, product, etc.  Depending on your version of Excel, the default summary may be "count" or "sum."  You can change this by clicking the "i" or "?" button and changing the Value Field Settings.

You can drag multiple fields to Values; the second field will show up as an additional column in the table.