Skip to Main Content

Statistical Science

This guide highlights key information and resources for Statistical Science research.

Science Librarian

Profile Photo
Brittany Wofford

Ask a Librarian

chat loading...

Duke Sports Analytics Club

Duke Sports Analytics Club logo

Duke Sports Analytics Club will primarily be a platform in which people can do projects diving into a sport or problem of their choice. Afterward, they could publish that research project in an article which would be posted on our public facing website. We are partnering with many Duke Athletics Division I teams, allowing for dedicated members who are interested in working with the athletic department to help our school. Furthermore, we provide opportunities for members to participate in sports analytics competitions, attend conferences, and conduct research in the field. Our mission is to foster a community in which anyone can get involved and apply their interest in sports analytics in any capacity possible. Click here for access to the club via DukeGroups.

Resources on the Web

data.world

  • data.world is widely recognized as the leading enterprise data catalog and governance platform powering the strategic data initiatives for some of the world’s most recognizable brands. Currently, it contains 9 sports analytics datasets.

Equity in Athletics

  • Provides custom data searches. Select the EADA statistical data you are interested in for one or more years and download data for a customized group of schools.

Kaggle

  • Kaggle is a tremendous resource for collected datasets and competitions in a variety of sports competitions. A search for "sports analytics" returns 371 datasets.

NCAA Statistics

  • Provides data on collegiate athletic championships.

Sports Reference

Sports Statistics

  • Provides an array of sports data sets (football, tennis, soccer, basketball, racing, baseball, and hockey) for data modeling, data visualization, predictions, and machine learning.

Equity in Athletics (U.S. Department of Education)

  • The Equity in Athletics Data Analysis Cutting Tool is brought to you by the Office of Postsecondary Education of the U.S. Department of Education. This analysis cutting tool was designed to provide rapid customized reports for public inquiries relating to equity in athletics data. The data are drawn from the OPE Equity in Athletics Disclosure Website database. This database consists of athletics data that are submitted annually as required by the Equity in Athletics Disclosure Act (EADA), via a Web-based data collection, by all co-educational postsecondary institutions that receive Title IV funding (i.e., those that participate in federal student aid programs) and that have an intercollegiate athletics program.

baseballR

  • Github for R package, baseballR, that assists in analyzing MLB analytics

cfbscrapR

  • Github for R package, cfbscrapR, which helps analysis of college football data

KenpomR

  • R package to scrape Kenpom (team specific data requires subscription)

nbastatR

  • Website for R package, nbastatR, which assists in scraping data and generating analysis for NBA

nflfastR

  • Website for R package, nflfastR, which has many useful functions and datasets for analyzing the NFL

PySport OpenSource

  • Very comprehensive compilation of R packages and scrapers/APIs related to sports analytics.

10 Steps to Get Started in Sports Analytics in 2023

  • Great resource by Bill Kapatsoulias that provides an overview on getting started with sports analytics coding, data sources, and analytical techniques.

A Guide to Getting Started with R (Duke Sports Analytics Club)

  • Compilation of resources for getting started with sports analytics and R, contact Sean Li (sean.li571@duke.edu) with questions.

CRAN Task View: Sports Analytics

  • This CRAN Task View contains a list of packages useful for sports analytics. Most of the packages are sport-specific and are grouped as such. However, we also include a General section for packages that provide ancillary functionality relevant to sports analytics (e.g., team-themed color palettes), and a Modeling section for packages useful for statistical modeling. Throughout the task view, and collected in the Related links section at the end, we have included a list of selected books and articles that use some of these packages in substantive ways. Our goal in compiling this list is to help researchers find the tools they need to complete their work in R.

How to Use R for Sports Stats

  • A 2015 blog post by Brice Russ that provides a brief overview of using R for sports statistical analyses.

Syracuse Analytics Blitz

  • Github of project finished by Jack Lichtenstein, Ben Thorpe, and Bryce Grove for Syracuse Analytics Blitz in February 2021, with the goal of estimating optimal run/pass ratios by field position in the NFL. Excellent example of stunning dataviz, as well as what it looks like to work a project through to completion.

Resources @ Duke

General


Baseball


Basketball


Football

Conferences

Undergraduate thesis highlights

Screenshot of Walker's 2018 thesis Incentives to Quit in Men’s Professional Tennis: An Empirical Test of Tournament Theory
Screenshot of Model's 2020 thesis Hitting around the shift: Evaluating batted-ball trends across Major League Baseball
Screenshot of Froelich's 2013 thesis Is the Blind Side Tackle Worth It?: An Analysis of the Salary Allocation of the NFL Offensive Line
Screenshot of Yao's 2019 thesis For Love of the Game: A Study of Tournament Theory and Intrinsic Motivation in Dota 2
Screenshot of Silverman & Seidel's 2011 thesis Incentives in Professional Tennis: Tournament Theory and Intangible Factors
Screenshot of Goldstein's 2017 thesis Long-Term Contracts and Predicting Performance in MLB
Screenshot of Shorin's 2017 thesis Team Payroll Versus Performance in Professional Sports: Is Increased Spending Associated with Greater Success?
Screenshot of Elliott's 2009 thesis The Effect of Exchange Rates on the Performance of Professional Sports Franchises in International Competition
Screenshot of Battle-McDonald's 2019 thesis The Impact of Collegiate Athletic Success and Scandals on Admissions Applications
Screenshot of Pollack's 2017 thesis What Gets Paid? Analyzing the Major League Baseball Contract Market