Page 1 of 2
More on SOM statistical innacuracy

Posted:
Mon Jun 02, 2008 2:08 pm
by JOHNEIGENAUER
Since my last post in which I asserted that ballpark home runs created inaccurate statistics, I did a little research into lefty-righty splits since I said that I suspected that they were also a source of error.
Purely at random, I selected Mike Fiore.
In 1969, Mike Fiore hit .276 vs. RHP and .264 vs. LHP. Obviously, these numbers are close enough to suggest that Fiore’s BA on his vs. LHP and his vs. RHP cards would be similar. Diamond Dope (or simple calculation) reveals that Fiore’s BA on his card is .338 vs. RHP and .166 vs. LHP.
The league average of LH batters vs. LHP was .224.
The league average of LH batters vs. RHP was .261.
These two league averages require (based upon the way that SOM calculates card averages) that Fiore’s card vs. LHP actually be higher than vs. RHP.
Moreover, Fiore has 17 GBA points (17/216 = 7.8%). Ignoring GBAs on the fielding chart, this means that Fiore has AT LEASE a 7.8% chance of grounding into a double play vs. LHP with a runner being forced. Mike Fiore grounded into ZERO double plays vs. LHP in real life.
Picking another similar player, Ken Boswell hit .290 vs. RHP and .221 vs. LHP. Boswell’s average vs. RHP on his card is .339; it is .167 vs. LHP.
Boswell grounded into one DP vs. LHP in 62 ABs; he has 31 GBA points on his vs. LHP card (14.4%).
Boswell struck out 10 times in 62 ABs vs. LHP; he struck out 37 times in 300 AB vs. RHP: 16.1% vs. 12.3%. In SOM, Boswell has 41 strikeout points vs. LHP and 5 vs. RHP. In real life, Boswell struck out approximately 33% more against lefties than against righties. In SOM, he strikes out more than 800% more.
The DP points are actually easy to verify: I picked a dozen or so players and SOM’s card info was wrong in every case.
I explain the incorrect strikeout and DP values by assuming that SOM did not have these data available when making the cards; in other words, they guessed (or made estimates, however, you want to say it).
However, I cannot explain the errors in BA—that data has been available for decades and decades.
Here is another example: Tommie Agee in real life hit .270 vs. LHP and .273 vs. RHP. His card average vs. LHP is .257 and vs. RHP is .320.
Any guesses?
John E.
Re: More on SOM statistical innacuracy

Posted:
Mon Jun 02, 2008 2:19 pm
by LMBombers
[quote:fbff1d1178="jeigenauer"]Here is another example: Tommie Agee in real life hit .270 vs. LHP and .273 vs. RHP. His card average vs. LHP is .257 and vs. RHP is .320. [/quote:fbff1d1178]
I see some of your points however I don't see anything wrong with Agee's above BA. Remember that only half of a player's stats come from his own card. Agee bat's RH. His projected BA on his card is slightly lower than his actual BA because generally a LH pitcher's card will be slightly worse vs RH batters. The opposite is true vs RHP. Generally RHP will be harder on RH batters which will reduce his BA back down to what he actually hit vs RHP.
A hitter's card should not exactly reflect what he did in real life unless you believe that all the pitchers he might face will also give him the same results as his card does.

Posted:
Mon Jun 02, 2008 4:33 pm
by ugrant
Taking LMBombers reply a bit further, I don't think SOM has ever disclosed their formulation for recreating a player's stats. A lot of inferences can be made based upon mathematically calculating a player's card, but that's about it.
It's possible SOM attempts to recreate a player's stats by basing their formulation upon a player facing the exact pitchers in the same stadiums. More likely SOM uses an "average" of pitchers from a players division or perhaps league. Without knowing what basis for the opposing pitching SOM is using to "balance" a player's statistics against his own card, no real conclusion can be drawn by using only 1/2 the outcome criteria.
Along those lines I took a few minutes to check Jeigenaur's BA calculations on Fiore's card. I found his results "ballpark", meaning the BA results will vary quite a bit depending upon what ball park Fiore is assumed to be playing in.
One other item pertinent to the discussion here is that the "cards" being used in the 69 set here on TSN do not match the cards published by SOM for the 69 season either in the head to head die (originally one sided) or CD (two sided) versions. I think this variance is unique to the TSN universe of SOM (emphasis on "think" - anyone else know?).
Point by point

Posted:
Mon Jun 02, 2008 5:51 pm
by JOHNEIGENAUER
First, a reply to the statement, “I don't think SOM has ever disclosed their formulation for recreating a player's stats.” That’s true. But I broke their code more than 30 years ago when I was in high school. I was even able to predict what player cards would look like with precision.
Next, regarding the speculation about what SOM takes into account: they do not base player statistics “upon a player facing the exact pitchers in the same stadiums.” Nor do they use an “an average of pitchers from a players division or… league”. This is true right up to inter-league play. I do not know how their formulation has changed with interleague play.
Next, it is true that “Without knowing what basis for the opposing pitching SOM is using to ‘balance’ a player’s statistics against his own card, no real conclusion can be drawn by using only 1/2 the outcome criteria.” This is why I did not base my conclusions on only half of the outcome criteria. You will notice, for example, that I mentioned league batting averages.
Next, regarding checking my finding and concluding that they were only “ballpark”: that is entirely untrue. I used the “ignore ballpark effects” selection in diamond dope. This was not an estimate: it was a conscious decision to minimize variables to get an initial understanding.
Regarding the statement that the cards do not match the TSN version cards: that is correct. I have the 1969 cards here in my hand and the lefty/right cards do not match the TSN versions: in fact, they do not have ballpark homers. I cannot see what this has to do with my study.
Regarding the reply to Agee’s BA: it should be beyond obvious that I understand that only half of a player’s offensive opportunities come from his card. Second, your statements about “generally” etc. have no significance: what we are discussing is a formulaic expression of probabilities. Evidently, you skipped over the sections on Boswell and Fiore, which explained that league batting averages must be taken into account.
I will get back to Agee when I have more time.
DP

Posted:
Mon Jun 02, 2008 5:52 pm
by JOHNEIGENAUER
In your criticism, why did you skip the fact that Fiore ground into ZERO DPs and yet his card is loaded with DPs? Why did you skip commenting on the Boswell strikeout observation?

Posted:
Mon Jun 02, 2008 8:00 pm
by ugrant
My reply wasn't intended as a criticism, it was merely an extraction from the stuff you posted. Your methodology wasn't apparent so I posted what I did and how what I did could be changed by ball park effects. That was what I meant by "ball park" although in retrospect I guess it could be read as an off hand criticism of your work (which was not intended - my apologies).
If you've figured out how SOM makes their cards, please share. A simpleton like myself has no idea how to predict a future SOM card since I have no idea how to figure out pitcher statistical effects. Since you "broke the code" you evidently do.
Lastly, take it easy, Jeigenaur. No offense or criticism was intended.
Re: More on SOM statistical innacuracy

Posted:
Mon Jun 02, 2008 8:33 pm
by LMBombers
[quote:d20c89618e="jeigenauer"]Any guesses?[/quote:d20c89618e]
Take it easy jeigenauer. You posed this question and we are giving you our best guesses.

Posted:
Wed Jun 04, 2008 5:59 pm
by pwootten
I used to create my own card - I was Brooks Robinson's replacement at 3B. Man, I kicked butt. Forget those 40/40 guys. I was a 60/60 guy. I'd cost $19M in SOM today. :D
Paul

Posted:
Mon Jun 09, 2008 6:28 pm
by Rob55
cruising through here I seen this thread .... which is intersting .... since i am coming in the middle of the discussion ....and don't know what I might have missed..... I almost hate to jump ..... I'm not shy though ...so I will. If I mess up call me an idiot or something.,.. I won't be offended. And for the guy getting upset so easily .... please take my posts with a grain of salt and don't think i am woofin on your work.....
The 69 set as it was made is differnet from the cards here....as you have noted. The old set isn't super advanced and so don't have all the good stuff on the newer cards.....so SOM changes them for TSN to use here. They are STILL not super advanced cards per se.... but modified cards which try to take into account ball paark HR's and singles and all the "extras" on the pitchers card etc..... they are NOT accurate. But they are "slightly" improved from the old set (well maybe). However some of the things pointed out above are inexcusable and I agree with some of what was said above....
Now we hit my "sore" spot....Daimond Dope..... I HATE seeing this used as gospel. The BIG thing I seen was that one guy said .... I USED THE IGNORE BALL PARK EFFECTS....... sigh sigh sigh,...... ok I assume that since you know how to read the cards.... you can count a card on your own....so count a couple of cards and compare them to diamond dope.... you will find they are wrong. When you check ignore ballpark effects it will STILL count ballpark singles and HR's ....... check for yourself ....don't take my word for it ..... thats why i hate to see DD pointed out as a reference sometimes...... it CAN be misleading..... adrian does do great work though and what he has there is accurate.... "IF" you know what he did.....and the ignore ballpark effects does not ignore the > or # signs.
Now ....the original 69 set is set up so you can replay the season and get somewhat the same results.... when the extra stuff was added for the TSN version the accuracy kinda went out the window...and YOUR results will not be as good..... becasue of the way they are used here at TSN ....
Anyhow .... you ARE correct ...they are NOT statistically accurate here. It would take a study and the whole season would have to be reworked to get the left/right splits right. (and other things of course) One thing to keep in mind also.... if a guy only has say 150 AB's ..... and 1 double play in those AB's ....there may have to be several on his card to isnure he actually hits into a DP in 155 AB's ........this is how some of those unbalanced awesome cards happen .... a guy hits 15 dbls in 100 PA's ....hs icard is gonna have a MESS of doubles on it....
Now I KNOW fiore doesn't fall into this category ...and once again let me say I agree ...its NOT right.
Hope some of that made sense..... :D

Posted:
Mon Jun 09, 2008 6:44 pm
by Rob55
A PRIME example of... old ..... modified ...and correct would be to look at Hank Aarons original 1957 SOM card.....
then look at the card SOM made for TSN to use here in ATG1 and 2
http://fantasygames.sportingnews.com/baseball/stratomatic/atg2/league/player.html?player_id=30000
then look at the deluxe season SOM card that we use in ATG3
http://fantasygames.sportingnews.com/stratomatic/league/player.html?player_id=30000&year=1957
they don't look anything alike ..... instead of E he is 3L ....etc etc