Skip to Main Content

ECON 490: Economics of Health Care Markets: GSS & CPS Surveys

Prof. Cuddy

General Social Survey (GSS) & Current Population Survey (CPS)

These are broad, recurrent surveys that ask many questions, including some on health topics and on subjects that may correlate with health.

Current Population Survey

Current Population Survey (CPS)
 A monthly survey of about 50,000 households conducted by the Census Bureau for the Bureau of Labor Statistics. The survey has been conducted for more than 50 years and is the primary source of information on the labor force characteristics of the U.S. population. [Edited excerpt from "Overview" on the Current Population Survey web site,]

  • CPS at Census / CPS at BLS
  • IPUMS CPS  Long time series of Census and CPS data. IPUMS is an interface to microdata, to provide custom extractions; NHGIS is an interface to aggregated data and GIS layers. Extract raw ASCII data and codebook, with loading files for Stata, SAS, and SPSS.
  • CPS Data at the NBER  The National Bureau of Economic Research archives microdata from several of the CPS surveys in several popular data formats (e.g., Stata, SPSS, SAS) along with some technical documentation (sample weights, etc.).
  • ceprDATA.org Public microdata from several key surveys (e.g., ACS, CPS, SIPP) in Stata format.

GSS Introduction

The General Social Survey (GSS) contains both a standard core of demographic, behavioral, and attitudinal questions, and a second group of questions on special topics that are not asked every year.  This survey has been periodically conducted in the United States since 1972, and the cumulative dataset will provide respondents from all survey years.

A subset of the GSS is available here as a ZIP archive.  This subset mostly includes only questions that were included every year or almost every year.  The ZIP archive contains the dataset in both text file and R Workspace formats, a codebook written for this subset in markdown, and an HTML version of the codebook.  All missing values are denoted by NA in the dataset,

You may also choose and download alternative variables by using the Survey Documentation and Analysis interface, instructions to which are provided at right.

GSS Survey Abstract

[Excerpted from the GSS website]

The General Social Survey (GSS) began in 1972, designed to be "a high-quality, unbiased, and easily accessible public opinion survey that cataloged America’s thoughts, feelings, and opinions over time. That instrument, the General Social Survey, is now among NORC’s signature projects. Over five decades, the GSS has compiled data from thousands of Americans about their evolving characteristics and attitudes."

"Previously administered every year and now more recently conducted every two years, NORC interviews a representative sample of Americans about a range of topics. The questions address belief in God, confidence in government institutions, race relations, abortion, spending patterns, gun rights, social isolation—even pet ownership."

Since 1972, the General Social Survey (GSS) has been monitoring societal change and studying the growing co

GSS Cumulative File Subset (1972-2012) [be sure to download the entire zip archive]

The GSS through the "Survey Documentation and Analysis" website

If you wish to download your own variables, follow these steps below.


  • In the next window, click the "Sequential Variable List" to the left side of the window.  The main portion of the window will fill with categories of variables, which you can see in the next picture to the right.  Note that when you see "19xx Module," you are looking at a series of questions only asked in a particular year.  This will sharply reduce the number of individuals for whom you will have data and may present problems. 


    Clicking any variable category will display all of the questions and variable names present for that category.  Clicking on any of the linked variable names will provide you with the full question, the possible responses to the question, and the number of observations answering the question by possible response.

  • When you have identified a variable of interest, write down the variable name, which is highlighted in pink when looking at the full record. 

    Remember, you will need a continuous variable.  There are a few of these, but most of those that can potentially be treated as a continuous variable are, in fact, ordinal.  An ordinal variable is a categorical variable with a distinct, logical order to the response.  However, unlike a continuous variable, the categories are not necessarily evenly spaced.  An example of this type of variable is respondents income in 2006 dollars.  Be certain to check before treating these as continuous variables.

    Finally, as you look for variables, note the sample sizes in the details for each variable.  It's safest to have thousands of observations that answer the question (answers that are not "IAP," "DK," "NA," and similar non-responses).  This is to ensure that you have enough observations that answer ALL of the questions in which you have an interest.  Last, be certain to write down those values you wish to keep, which exclude those non-responses.  For example, for the variable EDUC, I would write down 0-20, which are the valid responses for years of education.
  • Once you've identified the variables of interest, hit your browser's back button until you've returned to the initial screen before the codebook.  Now, mouse over the "Download" button at the top and click "Customized Subset."  On the next screen you must do two things.  First at the top, change the "Select FILE(S) to construct option to "CSV file."  Second, type in the variable names to download, each separated from the next by a space.  My example can be seen to the right, and here

    Also, there is a field called "Select CASES to include."  This is the point where you specify the answers you want in the dataset and which values you will exclude.  For each variable, type in the name, a left paretheses (no space), the range of values you wish to keep, and a right parentheses.  Separate variable ranges by a single comma.

    For example, I can select specific education and income responses in order to exclude all non-responses...
    educ(0-20),rincom06(1-25)

    When done, click the continue button at the very bottom.  On th next page, click the "Create the Files" button.
     
  • When done, a new page will appear with two links to two files, the data itself, and a codebook for the variables you've selected.  You want BOTH.  Right click on the "Data file," click "Save Target As," and provide a name while saving the file as a text file.  Do the same for the codebook.
 
  • The data are ready for import into R or Stata.  You will notice that a new variable, CASEID, is in your data.  Keep this, it is a unique number for each observation, like a name.

    Keep the codebook, because it will tell you what each numbered response means for each of your questions.