The importance of ballpark: more to it than BPs

Dooo-Doo question and formula stuff (like a baby eating)

Postby cummings2 » Mon Oct 10, 2005 12:21 am

where does the 1.83 come from in the formula?, I'll be trying it with some of my teams now and find the P-Record...fun stuff! thanks.

As far as the formula for calculating OAVG, yep...it has proven to be quite a challenge and there are always numbers that throw the accuracy off, especially number that come from starts in visiting ballparks, however through the course of an entire season these numbers are starting to show some consistency where the differences can be attributed to home BP ratings.

Now I am quite the newbie here, I have only played 10 teams, only 4 have finished their seasons, I started toying with this formula for my 3rd team and since then my pitching has been fairly O.K...most importantly I am gathering data to help me decifer this formula. I understand that even with the most accurate and complex formula there will always be an aleatory factor that will have to be considered (such is the nature of the game) but my reasoning is that if I can get to a point where I can generate some numbers vs. L/R hitters and have an idea of how much will the BP rating will affect then it will be easier for me to find the best bargains, the best fits given Home stadium and divisional ballparks and lineups, know when a pitcher is struggling and why, wheather it is due to visiting BPR, opp. lineups, roll of the dice...obviously once I gather the OAVG then I still have to refer to the card to see the nature of the hits...

So far I am finding numbers within a small and acceptable margin of error, some odd number jumps out of nowhere and throws everything off every now and then, but the differences can be easily tracked to BP ratings HOWEVER I am and have been playing with fairly similar good solid defenses to get some statstical consistency, and here is where I need help.: I am not so familiar in Strat as some of you guys are so I want to run these numbers by someone whom I am sure will just glance at them and point out something I am missing.

Anyway..maybe I should just continue running numbers and once I feel more comfy and find even more statsical consistency write it up in a post.

Thanks for the Pythagorean explanation Luckyman. As always, all best of luck.
cummings2
 
Posts: 55
Joined: Tue Jul 03, 2012 2:34 pm

Double Checking Pythagoras

Postby cummings2 » Tue Oct 11, 2005 3:06 am

Just to make sure I understand the pythagorean record:

Team after 108 games has record of 56-52 Scored 526 runs and allowed 493 runs.

therefore:

526^1.83 / (526^1,83 + 493^1.83)

or

95368.2716 / (95368.2716 + 84705.1590) = 0.5296076

So essentially this mean that I should have a .529 winning percentage which equals to a record of 57-51

Right?
cummings2
 
Posts: 55
Joined: Tue Jul 03, 2012 2:34 pm

Postby childsmwc » Tue Oct 11, 2005 3:54 pm

Luckyman,

Interesting points as always and while I agree in principle with the argument the real question is the difference between two parks significant enough to change the overall linear values of each event.

The typical example to highlight what you discussed would be to apply sabermetrics to a slow pitch softball game. In a softball game if you say score from first 75% of the time, then the value of a walk and or single is not significantly different than a HR.

In baseball, scoring from first is much more difficult and thus you get a range of incremental RC values for each event from the BB to the HR.

In your example Coors is the proverbial Softball environment while Petco would be the hard to score baseball environment.

I agree that the analogy works, but I would question any "arbitrary" adjustment you might make for ballparks to counter this in an RC model. It would suggest that the all power no OBP guys that are "bargains" in Coors might not be quite as valuable as people believe because their low OBP costs them more Runs Created in a Hitters park (and conversely explain why Bonds becomes even more valuable in a Pitchers park).

I am just not certain how significant a difference the ballpark makes on the overall expected value of outcomes is to modify my runs created formula. But now I may have to go see if I can mathmatically figure it out :P .

Bbrool
childsmwc
 
Posts: 55
Joined: Tue Jul 03, 2012 2:34 pm

Postby MARCPELLETIER » Wed Oct 12, 2005 12:04 am

Bbrool,

all legitimate points, and to be honest, I don't know how the walk/hit ratio mathematically works out with regards to stadium, but consider this:

when I set that walks are worth 75% the value of hits, limiting myself to neutral stadiums, Wilkerson appears as a superbargain while Crawford appears as a neutral value. When I lower down the value of walks to 55% of hits, Crawford's absolute value doesn't change (I am only modifying the weight for walks...so Crawford, who has no walk vs rhp and so few vs lhp doesn't change his absolute value, but relative to the crowd, Crawford obviously becomes much better) while Wilkerson's absolute value becomes a bust (his value becomes lower than his price tag). At 65%, both are relatively similar in terms of value. This is to pinpoint that the walk/hit ratio doesn't need to change that much in order to have solid effects on whom you should be interesting in when it comes to drafting.

Also, don't forget that the line-up position also have a huge impact of the walk/hit weight ratio. It will always be true that you need on-base at the top of the line-up regardless of the stadium, and hence I would have no hesitation to recommend a strong "walker" leading-off a Petco team (in fact, I think that the most effective teams (at least with dh) should have a lead-off that draws a lot of walks...having a "hitter" like Suzuki at the top of the line-up costs too much for the relative value of hits with regards to walks for the lead-off spot. That is why I have Jimenez leading off on my team, a player who draws a lot of walks). I would say that the most critical spots to have hitter are from 2-6.

Thus, to come back to your concern about how this information could apply to our RC formulas, I will answer you with a philosophical deep view :P :

The biggest shortcoming, and by far, from sabermetricians has been to jump from overall, average analyses to specific conclusions without analyzing the particular situation to which these conclusions apply. When analysts condemned bunting, they based their arguments on overall analysis, and applied their recommendations to all situations. But in real life baseball, every situation is different from each other. It might be a bad idea to bunt Jeter against T.Lilly in the first inning, but it is perhaps a good idea to bunt rey sanchez against M.Rivera trailing the yankees by one point at home. If this would have been done, I believe we would have found that there is not so much difference between traditional baseball and sabermetrician analysis (at least, the differences would appear, I believe, smaller). Thus, the next step for sabermetricians is to develop analyses that will tell people the best strategies in particular situations, to make the step between overall analysis to real-life situations. In the best world, to pitch-by-pitch siutations.

We have to do the same with our formulas. We (or at least, I) have used the same RC formula for all players regardless of the line-up position and the stadium (in my case, with walks being equal to 73% the value of singles), but this is clearly wrong. We KNOW that, for lead-offs, the percentage is higher than this, while for the #3 spot, the percentage is lower, because the number of times they face empty bases differ radically. By how much? I still can't tell, but if one thinks that lead-offs face empty bases in 62% of their at-bats, while hitters at the #3 spot face empty bases in 50% of their at-bats, it shouldn't be too hard to figure this out. If we find out that, for lead-offs, the walk-single weight ratio is closer to 80%, and, for the #3 spot, closer to 60%, then I believe this will provide new insights about who are the bargains for the lead-off spot and for the #3 spot.
MARCPELLETIER
 
Posts: 55
Joined: Tue Jul 03, 2012 2:34 pm

Postby childsmwc » Wed Oct 12, 2005 11:43 am

Luckyman,

I started thinking that we could probably in a few nights do the work required to build strat sabermetrics for a standard $80 mil DH league. I think if each of us documented every PA for say two seasons, (using an excel template that I have sitting in my mind) that would be a large enough data base to extrapolate RC values for various events and slots in the order. It could also be sorted ballpark specific to see how much events change between a Coors and say a Petco.

For that matter we could each do just one league if we could get another 4 or 5 owners to help us log data.

If we did this over a couple of 200X years, we would be able to tell if these numbers remain constant over the years (and thus we don't have to do the work each season) or learn that the number move significantly that it would require reviewing the data each season.

For those that are unfamiliar with the process I am describing about documenting PA's let me explain.

We are trying to determine the number of expected runs that occur as a result of each probability in baseball. In baseball there are actually a very limited number of states that can occur. A batter can come up with only 0, 1, or 2 outs. In each of those three events the batter can only come up with the bases empty or base runners at 1B, 2B, 3B, 1B & 2B, 1B & 3B, 2B & 3B, or the bases loaded.

What we would do is examine the condition that existed at the time the batter came to the plate (ie outs and baserunners) log what he did that PA (single, out, dp, hr, etc), log changes in the baserunners, log runs that scored directly from his PA, and lastly log the total runs that scored that inning after the batters PA. You would also log place in the order for each batter as well as ballpark the game was being played, and the arms of the pitcher and batter.

From all of this data you can determine the expected runs each event in baseball contributes. You would also be able to determine ballpark effects for each event, (ie. if runs are easier to score in Coors, then just getting on base is worth more in that ball park, but how much more). Also you would be able to see based on what batting order slot you were trying to fill how often each slot has runners on etc.

I would definitely be willing to share the data with anyone that participated. Whether or not I would go so far as to publicly share all of the results is left to be seen, but in all likelihood Luckyman would incorporate them into his ratings and effectively be posting the results anyways.

Any takers? Tell you what I will put the template together tonight and log one days worth of games (ie. 18 games played per day) and see how long that takes to get an idea how long of a commitment we are talking here.

Bbrool
childsmwc
 
Posts: 55
Joined: Tue Jul 03, 2012 2:34 pm

Postby childsmwc » Wed Oct 12, 2005 12:24 pm

As typical with most of the statistical ventures I undertake, I significantly understated the amount of time this data entry would take. However, I might still try this in my spare time and if anyone else is interested in logging a league I can send a template on and explain some shortcuts I have already figured out for data entry.

Bbrool
childsmwc
 
Posts: 55
Joined: Tue Jul 03, 2012 2:34 pm

Postby childsmwc » Wed Oct 12, 2005 12:28 pm

Actually I just realized that most of the data I am trying to recover is identified in the play by play so a quick copy and parse gives me most of my data. Stay tuned if I figure out an easy way to do this I will post for help.

Bbrool
childsmwc
 
Posts: 55
Joined: Tue Jul 03, 2012 2:34 pm

Postby cummings2 » Wed Oct 12, 2005 12:57 pm

I am currently focusing my stat/strat analysis to understanding pitching but this the kind of stuff that's fun to take on. How much time are we talking about per night to log in the info?

I don't know if it will make any statistical difference or if you have already considered this but if what Play By The Rules wrote in the Homefield Advantage post is right you'd also have to consider that the home team's avg. is adjusted by .10 not a great number nor depenedant on situation but theoretically the exact same situation can come up and the result will have a slight difference, for example two teams playing in Shea are in the same division, same lineups all season long it is likely that the same situation will come up several times: Same stadium, same pitchers, top of the order no one on...yet the result will change depending on which team was playing as HOME and which as AWAY. Now, over the course of 162 games the results will be quite insignificant but over the course of a 200+year simulation the results will magnify and it'll be clearer that there is a difference, here's PBTR here's the quote:


Yes there is a home field advantage...

--------------------------------------------------------------------------------

...programmed into the CD-Rom. It supposedly adjusts batting average ca. 10 points for the home team. I'm not sure how they do it.

Also, wouldn't you have to consider the defense playing? I mean same situation, same batter, same pitcher, Luis Castillo playing 2B will be different than Trent Durrington playing 2B but there is a chance that the "hit" (say a 2Bx) that is recorded as such in the play-by-play will be the same hit on the pitcher's card, however, once it will be recorded as a clean hit, once as an error, once as a gb out. the point that I am trying to make is that the info taken in relation to each hitter's situational hitting will be greatly dependant on the defense that he is playing up against (at least up the middle) and there will be times that we won't be able to know if it came from the "hit" challenging the defense or from the hitter's/pitcher's/ballparks' own "merits".

Say you run a simulation of 1000 PA.
Pitcher Schmidt,
Hitter Pierre,
1 Out,
Runner on 2B :Matheny Sp. 1-9 (the speed on the bases will aslo have to be factored in to asses sac fly's and moving runners over as part of the ABs)

In this particular simulation there are only two things that can happen: if it falls on Schmidt's card the result is a 2B(x) and it fall on Pierre's card it is a FB(rf)B?

We have Ichiro playing RF and Castillo playing 2B.

Now run the same situation another 1000 PAs but this time around it's Ryan Freel standing on 2B, Ruben Sierra is on RF and Durrington is on 2B

My feeling is that results will lead to very opposite and mistaken evaluations of Juan Pierre. Just food for thought.



Either way, I guess I will continue focusing my efforts into understanding pitching and defense, I am quite along in what I want to take care of, once I am done with this stuff and I gather enough info to validate it, I focus on hitting, if you need help then, I'm all for it. Either way it sounds like fun, good luck with it Bbrool
cummings2
 
Posts: 55
Joined: Tue Jul 03, 2012 2:34 pm

Postby JayJelinek » Wed Oct 12, 2005 5:11 pm

Bbrool,

Read your posting above on collecting data on offensive contexts (and the values of individual offensive events within that context). I agree with you that the play by play log has this context information, so it's not a matter of recording it, but collecting the recorded info into a large central spreadsheet/database.

To do this requires (in my mind) two tools:

1) A tool to take an individual play by play log (for a specific game) and strip out the context information and store it into the database. This is what you are thinking of doing, as I understand.

2) A tool to automatically crawl through all of the play by play logs for a league and do 1) automatically for all games in the league.

To illustrate 2), here is a link to a game log for a league I'm in.

http://fantasygames.sportingnews.com/baseball/stratomatic/1969/league/boxscore.html?group_id=361&g_id=273

Please note the link above specifies the league # (group_id=361) and game #(g_id=273). I believe that all games in this league can be accessed by just iterating through the game # - if you copy/paste this link into your browser and manually edit the g_id to a value < 273, you'll get another log.

The tool in 2) would be a program into which you would enter a league ID and it would automatically pull up all of the game logs through a browser interface automatically....and each of these game logs would be parsed using the tool in 1) above.

And...the tool 2) is similar to what I do for a living. So, I know how to do it, and know that it's possible but will take enough of my spare time to be careful in volunteering.

However, we are both in an NL69 league that is now drafting. My proposal would be use this league as the starting point to record this information. If you would be willing to share your work on 1) above, I would go ahead and try to automate the part of retrieving each individual game log. Once it's done, you would in theory enter a league ID and then sit back and watch it crank all the game logs for that league into your DB...And, it would give me the entire season to work on it, (don't want to overpromise my spare time!)

Thoughts?

Jay

PS - This whole thread on the value of offensive events in different contexts interests me greatly. I played Strat a lot in the 70s (through college), then life got in the way, and just picking it up again over the last few months. It has been hard to rekindle full interest, mainly because it seems like in the 200X era, the predominant way to establish a competitive offense is via high OBP and high HR% (walk, walk, bomb, repeat). When I played in the 70's, this certainly was one way, but other strategies such as high 2B, high 3B, and good baserunning advancement skill were also competitive if done well. (IMO, artificial turf had something to do with it; think of the KC and StL teams of that era which built offenses around BA, 2b/3b power, and speed). This is the main reason why I'm reentering here using the 69 cards, because I think the low ERA of those leagues will present opportunities to craft good offenses in more different ways in that context.
JayJelinek
 
Posts: 55
Joined: Tue Jul 03, 2012 2:34 pm

Postby childsmwc » Wed Oct 12, 2005 6:54 pm

Jay,

I was thinking about asking Penngray how I might parse out data like you suggested, because I think has a similar computer background. I think that the data collected will be unique to the game played (ie. 1969, ATGII, and 200X results will not be compatible).

The 200X product probably has the most relevance (this in only conjecture however) since I would anticipate that results from year to year in the 200X games would be similar.

When you have an environment with dominant pitchers like 1969 and ATGII your dynamics can significantly change.

If you can come up with a way to parse out the data, I would be more than willing to share the end results on the expected value side of the equation. I should caveat that I am by no means a professional statistician and therefore I may still have some trouble in the end interpreting all the data.

For that matter I would be glad to share with you some of the formula's I currently use to compute player values along with proof that my formula's when applied to team results consistently produce expected runs that are within an acceptable range of real run totals.

Right now I have not noticed any consistent variances in my projected results that I can tie to specific parks, which is why I have some skepticism that the the RC value is measurably different for each event different in Coors compared to Petco.

Bbrool
childsmwc
 
Posts: 55
Joined: Tue Jul 03, 2012 2:34 pm

PreviousNext

Return to Strategy

Who is online

Users browsing this forum: No registered users and 49 guests

cron