Vignette Introduction

Martin Monkman



These vignettes serve a dual purpose:


Vignettes completed to-date:

  1. Relationship Between Strikeouts and Home Runs – This vignette looks at the relationship between rate of strikeouts and home runs from the year 1950+. This question was inspired by Marchi and Albert (2014), Analyzing Baseball Data in R.

  2. Run Scoring Trends – Major League Baseball average per-game run scoring for each season since 1901.

  3. Team Payroll and the World Series – This vignette examines whether there is a relationship between total team salaries (payroll) and World Series success.

Further reading

A number of books and on-line resources use the Lahman package as material for the examples. These include:


Michael Friendly and David Meyer (2016) Discrete Data Analysis with R: Visualization and Modeling Techniques for Categorical and Count Data (CRC Press). DDAR Web Site

Max Marchi and Jim Albert (2014) Analyzing Baseball Data with R (CRC Press)

David Robinson (2017) Introduction to Empirical Bayes (published at [])

Hadley Wickham and Garrett Grolemund (2017) R for Data Science: Import, Tidy, Transform, Visualize, and Model Data (O’Reilly)

Articles, blog entries, and course materials

Steven Buechler (2014-2015) Analysis of career performance in top home run hitters

Kris Eberwein (2015-09-30) “Hacking The New Lahman Package 4.0-1 with R-Studio” (via [])

Michael Lopez (2016) Lab materials for Skidmore College MA 276, “Sports and Statistics”

Bill Petti (2015-09-21) A Short(-ish) Introduction to Using R Packages for Baseball Research

Exploring Baseball Data with R blog

Jim Albert (2018-12-24) The Vanishing 300 Batting Average

Jim Albert (2015-01-05) A Graph of a Batting Average

Brian Mills (2014-09-30) Using ggmap and Lahman to Find the Hometown College Rosters