Segregation-adjusted WAR leaderboard
Posted: Tue May 10, 2022 1:14 pm
Got a little project to share that isn't directly related to Strat but this community might find it interesting.
We now know enough about the Negro Leagues to where people smarter than me are able to project how those players would have performed if allowed to compete in organized ball. Going a step further, it's possible to add those career projections (called Major League Equivalencies or MLEs) into the AL and NL, subtract the white players they would have replaced, and adjust everyone's replacement level accordingly, thereby modeling a historical integrated MLB. And that's what I've done.
Here is a PDF of the top 500 WAR totals adjusted for segregation, era, and length of schedule. Catchers also get a lower replacement level so that they get fair representation amongst the top 278 HOF-eligible players* (278 is the number of players currently in the HOF; I will call this set of players the T278 going forward). The scores listed are the players' adjusted WAR total divided by the 278th-best HOF-eligible player's total. So Jake Peavy's score is 82, meaning his adjusted WAR total is 82% of the player at the HOF borderline, Buzz Clarkson (whose score is 100).
Into the weeds...
The schedule adjustment is simply every player season gets the average of his actual WAR and his WAR per 162 games. The only players who don't receive it are pre-1893 pitchers.
The era adjustment scheme is based on slicing baseball history into 6 eras:
1 1871-1892 Early
2 1893-1919 Deadball
3 1920-1946 Liveball
4 1947-1968 Integration
5 1969-1992 Expansion
6 1993-2022 Modern
The overall thing I'm trying to do is get each post-deadball era equal representation within the T278 by adjusting their replacement levels. Here are the figures for each era in terms of % of total HOF-eligible PA taken by T278 players.
As you can see, the liveball era is the hinge that I use to determine what % to target in earlier eras. When adding in Negro League MLEs to the liveball era and targeting the same overall PA% as later eras, white T278 hitters end up accounting for 7.7% of total PA. Therefore in earlier eras I attempt to limit white T278 hitters to ~7.7% of PA as a way to adjust for the unearned advantage they received from playing against only other white players. If anything, this adjustment is generous to white players, judging by the dominance of dark-skinned players in the integration era.
Pitching is trickier because you want to reward the heavier individual workloads of earlier eras, therefore equalizing T278 IP% across eras doesn't work like equalizing PA% does. Here are the results of the method I landed on:
So D is the figure I attempt to equalize, essentially a workload-adjusted IP%. And once again the liveball era is the hinge: I find that after adding in Negro League MLEs, white T278 pitchers get a figure of 11.3% in column D, so that is the figure I target for whites in earlier eras.
Overall, T278 hitters account for 12.7% of total HOF-eligible PA in organized baseball, while T278 pitchers account for 9.0% of total HOF-eligible IP.
If this all sounds complicated, it's actually dead simple. Apart from the schedule adjustment, literally the only adjustments being made are fiddling with replacement levels until the eras (and catchers) are equally represented.
Obviously none of this is meant to be extraordinarily precise. In particular, I think the MLEs for Negro Leaguers are very likely conservative. Rather the purpose is to give an idea of the shape of what the history might have looked like, especially with regards to the effects of a higher replacement level on pre-integration players. The next step will be to add some kind of bonus for peak performance to get a better idea of HOF-worthiness. But as a simple baseline, I still find the current results highly interesting, hope you do as well. Here again is the PDF of the Top 500.
*edit 6/10/22: I revised my catcher adjustment after finding an error in my original calculations. The new method is to simply increase catchers' positional adjustment until T278 catchers have more appearances than one of the infield or outfield positions. So, second base is the next most scarce position in the T278 in terms of appearances, and catchers get adjusted up until games caught exceeds 2B appearances.
We now know enough about the Negro Leagues to where people smarter than me are able to project how those players would have performed if allowed to compete in organized ball. Going a step further, it's possible to add those career projections (called Major League Equivalencies or MLEs) into the AL and NL, subtract the white players they would have replaced, and adjust everyone's replacement level accordingly, thereby modeling a historical integrated MLB. And that's what I've done.
Here is a PDF of the top 500 WAR totals adjusted for segregation, era, and length of schedule. Catchers also get a lower replacement level so that they get fair representation amongst the top 278 HOF-eligible players* (278 is the number of players currently in the HOF; I will call this set of players the T278 going forward). The scores listed are the players' adjusted WAR total divided by the 278th-best HOF-eligible player's total. So Jake Peavy's score is 82, meaning his adjusted WAR total is 82% of the player at the HOF borderline, Buzz Clarkson (whose score is 100).
Into the weeds...
The schedule adjustment is simply every player season gets the average of his actual WAR and his WAR per 162 games. The only players who don't receive it are pre-1893 pitchers.
The era adjustment scheme is based on slicing baseball history into 6 eras:
1 1871-1892 Early
2 1893-1919 Deadball
3 1920-1946 Liveball
4 1947-1968 Integration
5 1969-1992 Expansion
6 1993-2022 Modern
The overall thing I'm trying to do is get each post-deadball era equal representation within the T278 by adjusting their replacement levels. Here are the figures for each era in terms of % of total HOF-eligible PA taken by T278 players.
- Code: Select all
era PA%
1993-2022 0.136
1969-1992 0.138
1947-1968 0.141
1920-1946 0.140
whites 0.077
1893-1919 0.089
whites 0.073
1871-1892 0.079
whites 0.079
As you can see, the liveball era is the hinge that I use to determine what % to target in earlier eras. When adding in Negro League MLEs to the liveball era and targeting the same overall PA% as later eras, white T278 hitters end up accounting for 7.7% of total PA. Therefore in earlier eras I attempt to limit white T278 hitters to ~7.7% of PA as a way to adjust for the unearned advantage they received from playing against only other white players. If anything, this adjustment is generous to white players, judging by the dominance of dark-skinned players in the integration era.
Pitching is trickier because you want to reward the heavier individual workloads of earlier eras, therefore equalizing T278 IP% across eras doesn't work like equalizing PA% does. Here are the results of the method I landed on:
- Code: Select all
era A B C D
1993-2022 0.076 175 0.42 0.181
1969-1992 0.094 205 0.50 0.189
1947-1968 0.095 211 0.51 0.186
1920-1946 0.099 210 0.51 0.196
whites 0.057 0.113
1893-1919 0.083 296 0.72 0.116
whites 0.071 0.099
1871-1892 0.099 413 1.00 0.099
whites 0.099 0.099
A = % of HOF-eligible innings thrown by T278 pitchers
B = average IP/season thrown by T278 pitchers
C = B/413(% of average T278 yearly workload compared to pre-1893 T278 pitchers)
D = A/C
So D is the figure I attempt to equalize, essentially a workload-adjusted IP%. And once again the liveball era is the hinge: I find that after adding in Negro League MLEs, white T278 pitchers get a figure of 11.3% in column D, so that is the figure I target for whites in earlier eras.
Overall, T278 hitters account for 12.7% of total HOF-eligible PA in organized baseball, while T278 pitchers account for 9.0% of total HOF-eligible IP.
If this all sounds complicated, it's actually dead simple. Apart from the schedule adjustment, literally the only adjustments being made are fiddling with replacement levels until the eras (and catchers) are equally represented.
Obviously none of this is meant to be extraordinarily precise. In particular, I think the MLEs for Negro Leaguers are very likely conservative. Rather the purpose is to give an idea of the shape of what the history might have looked like, especially with regards to the effects of a higher replacement level on pre-integration players. The next step will be to add some kind of bonus for peak performance to get a better idea of HOF-worthiness. But as a simple baseline, I still find the current results highly interesting, hope you do as well. Here again is the PDF of the Top 500.
*edit 6/10/22: I revised my catcher adjustment after finding an error in my original calculations. The new method is to simply increase catchers' positional adjustment until T278 catchers have more appearances than one of the infield or outfield positions. So, second base is the next most scarce position in the T278 in terms of appearances, and catchers get adjusted up until games caught exceeds 2B appearances.