This Has My Attention: New Questions

Thu Feb 09, 2023 4:43 pm

Very interesting stuff J-Pav even though it does seem to be describing the opposite of what you and I are experiencing. In the link the positive run diff team is consistently OVER-performing pythag. Still it's a good example of how pythag may not work as well for sims as real life, and maybe it will even give someone smarter than me an idea why the results in Strat (for you and me at least) would be flipped from what the author found.

Thu Feb 09, 2023 4:56 pm

From the link you attached:

Nope. No matter which exponent we choose, we find the same result: the Pythagorean formula underpredicts the number of wins by the higher-scoring team; for a fixed difference of 1 RPG, the formula does better when both teams score relatively few runs.

Which is the same thing that was posted in the orignal analysis of MLB expected runs which EGRICH said is not true.

But if one uses comprehension, the higher runs is the winning teams, that is the teams that win more games than lose on average, the higher scoring per game over the lower scoring per game. Our game is going to throw a wrinkle in the formula, in that the highest scoring team can often be awful, which as noted the run distributions didn't match the needed distribution for the formula to be accurate, but still gave a result "in the ballpark".

Thu Feb 09, 2023 5:17 pm

Using this team as an example:

10th in the league in scoring:
https://365.strat-o-matic.com/league/stats/teams/463291

7 games below their pythag projection:
https://365.strat-o-matic.com/league/expanded/463291

So this is an example of a LOW SCORING Team failing to meet their pythag projection.

Thu Feb 09, 2023 6:20 pm

In the discussion in the paper if you read it, the higher scoring team is not the highest scoring team, it is the higher by means of the run differential. In your team you scored 4.51 runs per game, your opponents scored 4.03 runs per game. Your team is the higher scoring team on average versus your opponents by .48 runs per game. The higher this number gets the more likely to be under the expected runs you will get. As runs scored goes up the run differential frequently will go up but nowhere is the formula based on just winning teams runs scored.

Thu Feb 09, 2023 6:37 pm

gkhd11a wrote:In the discussion in the paper if you read it, the higher scoring team is not the highest scoring team, it is the higher by means of the run differential. In your team you scored 4.51 runs per game, your opponents scored 4.03 runs per game. Your team is the higher scoring team on average versus your opponents by .48 runs per game. The higher this number gets the more likely to be under the expected runs you will get. As runs scored goes up the run differential frequently will go up but nowhere is the formula based on just winning teams runs scored.

AHHHHHHHH ... Gotcha

Thu Feb 09, 2023 7:49 pm

bkeat23 wrote:
FrankieT wrote:I couldn't resist this. Anyone ever an Isaac Asimov fan? Well, this reminded me of Hari S

“Scientific method, hell! No wonder the Galaxy was going to pot.” - Seldon.
...as it pertains to the hard core math part of this thread.

nice!

Thu Feb 09, 2023 8:07 pm

By the way--this phenomena regarding the applicability of the formula for different environments/factors, is quite expected.
Why?

Think of it this way. If I am calculating the time it will take a baseball to get from my hand to yours when I throw it to you at 25 feet away, it is a simple equation involving the distance and the speed of the ball in the direction of motion. It will almost exactly be time = distance/speed.

But what if we threw that ball 2000 miles? Well, the "exact" (time = distance/speed) model doesn't work. It is not valid in that larger domain. We have to account for gravity, curvature of the earth, 3-d velocity, friction, the ball shape's imperfections and aerodynamics, the atmospheric conditions, angle and heigt of insertion...on and on. The model doesn't apply in that domain. And we came up with distance/speed by ignoring most of the terms in the real correct equation, because we can't solve it.

Because models are approximations. You take a closed end solution that is meant to apply to infinite cases, and you can't solve the infinite case problem. So you put boundaries and parameterize it. As Charlie said--you make it "ballpark" for your application. Whether it is the momentum equations or bridge building or the pythag theorem. We cannot solve real-world problems with our mathematics for all domains. We must choose the domain and ignore the terms we say are small (but not zero). There is NO exception to this for real world problems.

So, we specify a domain where it works well enough, coefficients, etc and leave out the difficult but small effects. But when we do that we also limit where that model provides ballpark answers.

Thu Feb 09, 2023 8:22 pm

Another thing. In SOM, why would we expect pythag to be as predictive as it is in MLB when we have highly disparate team strengths?

If you are in a division with at least one really crappy team, why would someone think that the +120 runs they gained from beating the snot out of some guy with a 10% lower salary cap or terrible construction, should equate to a crapload of wins against the rest of the league? And there is almost ALWAYS a crappy team.

Pythag is ballpark in MLB because the teams are generally very close competitively. The average margin of talent is much less than a typical SOM league almost every time.

Domain problem.

Thu Feb 09, 2023 8:25 pm

I would challenge someone here to explain in plain language, what is the underlying theory (and it is a theory) for why it is claimed to be a predictive model? (and is it? or is it a descriptive one?--there is a difference)

From first principles. Then pull out the terms that we are ignoring. Identify the domain and the reasons for its selections.

Go.

And if we can't, then why would we pull it out as some kind of rain stick?

Thu Feb 09, 2023 9:04 pm

Max:

I think I can explain it, but it is so counterintuitive because it seems to be saying run diff doesn’t even matter, when everything about team building is maximizing run diff.

If I understand this correctly, what the author is saying is that the higher run scoring team will always have the mathematical edge in a head to head matchup.

So if we look at this league:

https://365.strat-o-matic.com/league/462980

When I played Confluence, the worst run SCORING team, my predicted edge is to win 68% of the time. But because they are the best runs allowed team, I won only 58% of the time in head to head. Against the second best pitching team, I was predicted to win 62% of the time based on runs scored, but I won only 22% of our nine games. So there’s five of the ten under pythag wins right there.

What I think is happening is that all these games we’re mathematically predicted to win based on RUN SCORING, we actually underperform the predictions because run scoring alone overestimates your wins to begin with.

I think.

This Has My Attention: New Questions

Re: This Has My Attention: Resolved??

Re: This Has My Attention: Resolved??

Re: This Has My Attention: Resolved??

Re: This Has My Attention: Resolved??

Re: This Has My Attention: Resolved??

Re: This Has My Attention

Re: This Has My Attention: Resolved??

Re: This Has My Attention: Resolved??

Re: This Has My Attention: Resolved??

Re: This Has My Attention: Resolved??

Who is online