MLB Steal Probability Engine
Skip to:
Stolen Bases Model:
A generalized linear regression model I built is used to predict stolen base success probability. This model utilizes the following variables to account for the specific players in the interraction:
Baserunner
Smoothed stolen base percentages since 2024 are used to account for the baserunner (smoothed percentages are scaled closer to league average the less attempts a player has). This methodology takes past success unto account while not overreacting to small sample sizes.
Catcher
For the opposing catcher poptime to second base is utilized (amount of time that it takes a catcher to throw the ball to second base). This data was more predictive for catchers than smoothed steal percentages.
Pitcher
Smoothed past steal percentages were also used for pitchers (success of baserunners against the pitcher).
Model Summary
The model utilizes these three variables to predict the success probability of stolen base attempts. Some key attrbiutes of the model are:
- The baserunner and catcher have more of an impact than the pitcher.
- The model weighted runner data approximately 1.5 times more heavily than pitcher data.
Live Deployment:
The live engine was built using python. My script leverages the MLB API to collect the result and names of the players involved in steal attempt as they happen live during games. Player names are then used to look up smoothed percentages and poptime data which is fed to the model resulting in a prediction.
Tweet Output:
George Springer Stole 2B ✅
— Steal Decisions Live (@StealSignal) May 22, 2025
🟢 Good Matchup
84% Successful Steal Chance
Player Steal Grades:
🏃 George Springer: 61st/302
⚾ Randy Vásquez: 106th/401
🧤 Elias Díaz: 12th/61
Live tweets produced by this engine contain the following data:
- Result (succesful steal or caught stealing)
- Model grade and predicted steal probability
- Player steal grades for all three players
Model grades are simply broken into good, okay, or bad matchups based on predicted probabilities. Player steal grades are where each specific player ranks within their position based on smoothed percentages or poptime.
Want to see the model in action live during MLB games? Follow @StealSignal on X.