Skip to Main Content

Statistical Science

This guide highlights key information and resources for Statistical Science research.

Science Librarian

Profile Photo
Brittany Wofford

Ask a Librarian

chat loading...

Top data sources

Data sources by discipline

Figshare

  • Figshare is a repository where users can make all of their research outputs available in a citable, shareable and discoverable manner.

Google Dataset Search

  • Dataset Search is a search engine for datasets. Using a simple keyword search, users can discover datasets hosted in thousands of repositories across the Web.

Harvard Dataverse

  • The Harvard Dataverse Repository is a free data repository open to all researchers from any discipline, both inside and outside of the Harvard community, where you can share, archive, cite, access, and explore research data. Each individual Dataverse collection is a customizable collection of datasets (or a virtual repository) for organizing, managing, and showcasing datasets.

Kaggle

  • Discover open data sets on a variety of topics. Take advantage of Kaggle's online community by share your analysis methods with other users.

Mendeley Data

  • Mendeley Data is a free and secure cloud-based communal repository where you can store your data, ensuring it is easy to share, access and cite, wherever you are.

Registry of Research Data Repositories (re3data)

  • Re3data is a global registry of research data repositories that covers research data repositories from different academic disciplines. It includes repositories that enable permanent storage of and access to data sets to researchers, funding bodies, publishers, and scholarly institutions. re3data promotes a culture of sharing, increased access and better visibility of research data.

OpenDOAR

  • OpenDOAR is the quality-assured, global Directory of Open Access Repositories. You can search and browse through thousands of registered repositories based on a range of features, such as location, software or type of material held.

Our World in Data

  • Our World in Data is a scientific online publication that focuses on large global problems such as poverty, disease, hunger, climate change, war, existential risks, and inequality.

For more information, check out the Data Archives and Repositories page of the Research Data Management Guide.

American Community Survey (ACS)

  • A yearly survey from the U.S. Census Bureau that provides data on occupation, educational attainment, home ownership status, and more.

Data.gov

  • Data.gov is the United States government's open data website. It provides access to datasets published by agencies across the federal government. Data.gov is intended to provide access to government open data to the public, achieve agency missions, drive innovation, fuel economic activity, and uphold the ideals of an open and transparent government.

Digest of Education Statistics

  • The primary purpose of the Digest of Education Statistics is to provide a compilation of statistical information covering the broad field of American education from prekindergarten through graduate school. The Digest includes a selection of data from many sources, both government and private, and draws especially on the results of surveys and activities carried out by the National Center for Education Statistics (NCES).

Figshare

  • Figshare is a repository where users can make all of their research outputs available in a citable, shareable and discoverable manner.

Healthdata.gov

  • The data is collected and supplied from agencies from the U.S. Department of Health and Human Services as well as state partners. This includes the Centers for Medicare and Medicaid Services, Centers for Disease Control and Prevention, Food and Drug Administration, and the Agency for Health Care Research and Quality, among others.

International Social Survey Programme (ISSP)

  • The ISSP is a cross-national collaboration programme conducting annual surveys on diverse topics relevant to social sciences.

Inter-university Consortium for Political and Social Research (ICPSR)

  • ICPSR, the Inter-university Consortium for Political and Social Research, is a membership-based organization which collects data from individual researchers, polling agencies, and governmental and international agencies. Its purpose is to maximize the availability and utilization of data resources by obtaining data and the technical documentation needed to access them for redistribution to its membership. Duke University's membership in ICPSR allows all Duke researchers to use ICPSR resources free of charge.

IPUMS

  • IPUMS provides census and survey data from around the world integrated across time and space. IPUMS integration and documentation makes it easy to study change, conduct comparative research, merge information across data types, and analyze individuals within family and community contexts.

OECD Statistics

  • OECD.Stat includes data and metadata for OECD countries and selected non-member economies.

Open Data Network

  • Socrata's initiative designed to foster data-centered collaboration between governments and the private sector.

Pew Research Center

  • Pew Research Center makes the case-level microdata for much of its research available to the public for secondary analysis after a period of time. See this post for more information on how to use the datasets and contact info@pewresearch.org with any questions.

Qualitative Data Repository (QDR)

  • The Qualitative Data Repository (QDR) is a dedicated archive for storing and sharing digital data (and accompanying documentation) generated or collected through qualitative and multi-method research in the social sciences and related disciplines.

Uniform Crime Reporting (UCR) Program

  • The Uniform Crime Reporting (UCR) Program generates reliable statistics for use in law enforcement. It also provides information for students of criminal justice, researchers, the media, and the public. The program has been providing crime statistics since 1930.

U.S. Bureau of Labor Statistics

  • The Bureau of Labor Statistics of the U.S. Department of Labor is the principal Federal agency responsible for measuring labor market activity, working conditions, and price changes in the economy.

U.S. Census Data

  • Explore the U.S Census Bureau's new content platform to access data collected from their decennial collection instrument. 

U.S.D.A. Food and Nutrition Service

  • The Program Data site provides selected statistical information on activity in all major Food and Nutrition Service (FNS) programs . These include the Supplemental Nutrition Assistance Program (SNAP); the Special Supplemental Nutrition Program for Women, Infants and Children (WIC); child nutrition programs (National School Lunch, School Breakfast, Child and Adult Care, Summer Food Service and Special Milk); and food distribution programs (Schools, Emergency Food Assistance, Indian Reservations, Commodity Supplemental, Nutrition for the Elderly, and Charitable Institutions).

UN Comtrade

  • The United Nations Comtrade database aggregates detailed global annual and monthly trade statistics by product and trading partner for use by governments, academia, research institutes, and enterprises. Note: Duke only has access to freely available resources from this provider.

For more information, check out the Data Archives and Repositories page of the Research Data Management Guide.

EnviroFacts

  • This website provides access to several EPA databases to provide you with information about environmental activities that may affect air, water, and land anywhere in the United States.

International Energy Agency (IEA)

  • The IEA collects, assesses and disseminates energy statistics on supply and demand, compiled into energy balances in addition to a number of other key energy-related indicators, including energy prices, public RD&D and measures of energy efficiency, with other measures in development.

National Center for Science and Engineering Statistics (NSCES)

  • Intuitive tools to analyze NCSES data on R&D and the education and employment of the STEM workforce.

U.S. Energy & Information Administration (EIA)

  • The U.S. Energy Information Administration (EIA) collects, analyzes, and disseminates independent and impartial energy information to promote sound policymaking, efficient markets, and public understanding of energy and its interaction with the economy and the environment.

World Environment Situation Room (WESR) from United Nations Environment Programme (UNEP)

  • World Environment Situation Room (WESR) is the UNEP online data, information and knowledge platform. It enables users to visualize, interrogate, access, link and download data, information and knowledge products regarding the World environment.

For more information, check out the Data Archives and Repositories page of the Research Data Management Guide.

National Library of Medicine (NLM) Dataset Catalog

  • This catalog contains over 62,000 biomedical datasets from various repositories for searching, discovering, and retrieving.

Healthdata.gov

  • The data is collected and supplied from agencies from the U.S. Department of Health and Human Services as well as state partners. This includes the Centers for Medicare and Medicaid Services, Centers for Disease Control and Prevention, Food and Drug Administration, and the Agency for Health Care Research and Quality, among others. 

National Center for Health Statistics (CDC)

  • Explore data from various collection instruments such as the National Survey of Family Growth, National Health and Nutrition Examination Survey, National Hospital Ambulatory Medical Care Survey, and more.

Substance Abuse & Mental Heath Data Archive (SAMHDA)

  • SAMHDA is a one-stop shop for SAMHSA public use data with online analysis tools. Learn more about different types of data files, what data are available as free downloadable PUFs, and what online analysis systems are available on the site.

For more information, check out the Data Archives and Repositories page of the Research Data Management Guide.

Finding papers with data sets

Are you looking for an article with a data set or a data set associated with an article your reading?

There is no one source that lets you search for just this type of information, but librarians have some tricks to help you.

  1.  Find an article and look at the full text to see if it hints at any raw data or a statistical analysis being available. Look to see if there is a link to Supplemental or Supporting Information

screenshot showing example of available supporting info

 

  1.  Browse the journal Nature Scientific Data , which contains only open-access and peer-reviewed articles. Look for the link to data citations or data record.

screenshot of a the link for the data citation for an article

 

  1. Use the Advanced Search feature in ProQuest Science Database. Enter your topic in the search box. Next, in the box titled "Document Feature", select tables, graphs, etc.

screenshot of document feature drop down menu

 

  1. Search the Data Citation Index by topic, browse titles, and link to the data repository.  

screenshot of Data citation index main search screen

 

  1. Use the advanced search in Web of Science to search by topic and limit to Data Paper.

screenshot of Advanced search screen in Web of Science

 

  1. In PubMed, search "Datasets as Topic"[Mesh] and your keywords. MeSH is the abbreviation for Medical Subject Headings, i.e. the thesaurus of PubMed. You can also search "Dataset" [Publication Type] with your keywords.

screenshot of dataset publication type in PubMed

 

  1. Search data repositories for interesting data sets, then find the associated article.