Sunday, September 30, 2007

Clear as mud

So glad the Cubs are already in. This gives me a headache.

NL Playoff Scenarios

Scenario 5 (with variations)

Padres lose, Rockies win, Mets win, Phillies win
All four teams finish 89-73, meaning the NLDS cannot begin until Thursday. Mets visit Phillies on Monday to determine East champion.

The Rockies, Padres and Monday's loser remain tied. The Rockies have the best head-to-head record among the three (regardless of whether Mets or Phillies lose). The Rockies either can choose to play a home game Tuesday and, if they win, a home game Wednesday, or they can opt to let the Padres and the East runner-up play Tuesday, with the winner hosting the Rockies on Wednesday.

If the Rockies choose to play Wednesday on the road, the Padres would host the Mets or visit the Phillies on Tuesday, with the winner hosting the Rockies on Wednesday for the wild card.

If the Rockies choose to stay at home, the Phillies (if they lost to the Mets on Monday) or Padres (if the Phillies beat the Mets) would choose to play Tuesday at the Rockies, then (if they win) Wednesday at home against the Padres or Mets, OR simply play Wednesday at the winner of Padres-Rockies or Mets-Rockies.

Got all that? Go read the other four.


Friday, September 28, 2007

Champions

It wasn't until a couple months ago when that word entered my mind. Now I have some big plans for next Saturday night.


Thursday, September 27, 2007

Peskiest of the Fish Part 2

Part 1

I'm re-presenting each chart from last night since I forgot to match-up the scale on the axes in a couple instances.
Otherwise, it would look mis-leading, even though the data is correct, the scale would be confusing. Let's get those out of the way, followed by the 3rd (2nd PITCHf/x) AB.





Ted went back to the two-seamer down to start the next at bat. He followed that with three .... umm, I dunno.


There's not a whole lot of spin on these last three pitches, so they're not going to move much (and they didn't, as shown just above).


He appears to be dropping down a little bit, so the angle/trajectory would make it "move", relative to his fastballs, away from a lefty and down.



Based on some stuff from an article by Bahill et al. in American Scientist, I'm guessing these last three pitches were two seam sliders.

BTW, the colors go in the same order (blue,purple,gold,light blue, 1,2,3,4) for each of these four-pitch AB's.


Wednesday, September 26, 2007

Peskiest of the Fish

Jeremy Hermida has five hits and four RBI in two days. Three of the hits are doubles. He's not exactly killing the Cubs, but they have enough Fish issues as it is, without this guy causing problems.

Here's a look at his eight at-bats, before the Cubs try to pick up a win on get-away day, and get the magic number down to two. Don't forget to pull for Padres during the night-cap.

9/25
v Lilly
Double to left center
automatic Double to center
infield Single to third

v Wuertz
strike Out

9/26

v Marquis
line Out to center
Single to left
ground Out to first

v Marmol
Double to left


For Game 1, PITCHf/x is missing all but one pitch from the first at bat, so we're left with the big hit to center and the infield hit against Lilly, plus the K against Wuertz.

The pitches happen to be in order, running clockwise from the bottom right, in the release point chart. Keep the colors in mind when viewing the other charts.

Starts with a couple two-seamers, busting down-and-in for this lefty-on-lefty match-up, followed by four-seamer down. Finally, a curve-ball. A hanging curve-ball. An automatic (or ground-rule) double to deep center is a bomb in Dolphin Stadium.






The first pitch is a called strike, although it appears outside Hermida's average PITCHf/x strike zone. He swung and missed the second, fouled off the third, and we already know what happened to the curve ball.

I'll move on to the next at bats later.


Tuesday, September 25, 2007

Up Next: Dan Barone

OK, so the Cubs couldn't handle the D-Train, but tomorrow is another day. Florida is sending out Dan Barone, who's worked out of the pen a lot this year, but basically throws 3 pitches.

Here's a slightly different way of looking at pitches; "Bubble" plots, using spin rate (rpm) for bubble size.

Two games (not starts) from late August at home.

Release points - notice the slow spin rate pitches towards the top and right of the bunch


Those slow-spinners appear to hit the outside (to a RHH) of the plate


In this plot, you can see three pitches


The 2nd set of fastballs spin a few hundred RPM's slower (20%, roughly), along with less back-spin and more side-spin (190-210 v 230-250 degrees). Likely four-seam and two-seam, respectively.

Here's spin rate v. start speed


Saturday, September 22, 2007

Local Kid, probably a Sox fan

Tom Gorzelanny of Evergreen Park is Lefty #3 for the weekend.
He's got plenty of PITCHf/x data, here's the most recent:

8/7 @ari
8/12 @sf
8/22 @col
9/2 @mil
9/12 MIL
9/18 @sd

That home game is the first, and only, PITCHf/x game in Pittsburgh.

Let's see how the six starts compare (y=50)

Looks good enough for today's purposes.

According to ESPN Inside Edge, Gorz throws four pitches:
Fastball, Curve, Slider, Change

Much like Rich Hill, the uses the slider against lefties, and the change against righties, but not as extreme.

The pfx chart looks interesting, using k-means (k=4)


More later


Hello Soto

Geovany Soto is not the Cubs' catcher of the future. He's their catcher now (since Sept. 15).

Lou's running him out again against the Pirates. How could he do otherwise? Dusty, perhaps, would be sending Blanco to work.

I've got 116 pitches for Geo in the database. He has a pretty typical strike zone (avg sz_top=3.44 ft., sz_bot=1.47), with what looks like good zone judgment.

Pitches Seen: 116
In Strike Zone: 61 (52.6%)
Out of Zone: 55 (47.4%)

Swings: 44 (37.9%)
ISZ: 34 (55.7%)
OOZ: 10 (18.18%)

Whiffs: 6 (13.6%)
ISZ: 3 (8.8%)
OSZ: 3 (30.0%)


That's a pretty slick contact rate of 86.4%.

The sample is too small to be meaningful, but if anything, he may tend to go after pitches Up and OOZ or In and OOZ. Doesn't like the ball away at all - 14 pitches outside, 1 swing, and it was also Up and OOZ. He fouled it off.

Update: See how close to the zone most of the OOZ swings are, and the bunch of low ISZ takes.


Friday, September 21, 2007

PITCHf/x in Italian

Cool stuff - enough English for a non-speaker to enjoy.

http://profpepper.playitusa.com/Mat_Fis/mat_index.html


Data Cleaning and Data Dreaming

I think there's some really great thought and work going on around cleaning up the noise in PITCHf/x. Via Mike Fast, here's a good post and discussion on the topic at ike2100.

While I understand the need to correct measurement error, as a recovering psychophysiologist, I'm more interested in correcting the measurement, not the error.

Obviously, Sportvision will work that end, while we throw away the junk, try and understand (not eliminate) weather and park effects, and manage them to help discriminate pitches.

I do like that the physicists want to go after the data in very interesting ways, down to the algorithm that is used to create the data. It furthers understanding in multiple dimensions, and I do enjoy thinking about and following the challenging problems.

The problems I want to tackle are more around developing a bag-of-tricks - metrics and visuals - that are intuitive, informative, and fun. Right now, I'm an open-to-the-world brain dump of ideas (some even my own), meandering about a fresh domain.

OK, anyway, GO BRAVES

---------

Cubs 81-73 *8*
Brewers 78-74 -2.0


Thursday, September 20, 2007

Remember the old days?

Like last year
http://baseballanalysts.com/archives/2006/01/a_quantitative_1.php

PITCHf/x changes things quite a bit.


Wednesday, September 19, 2007

Shearn Speed - Griffey Hurt - Soriano's Bomb



I wouldn't expect the difference to pop-out amongst the two back-to-back home starts. But there's an explanation. Wind.

8/26 6 In from LF
road game
9/5 3 In from LF
9/9 10 Out to CF
road game


Does it impact spin rates?



Looks like it. Miller Park looks different, eh?

There's been some good stuff posted on weather/altitude effects on spin, I'll link 'em up later.....

-------

Doesn't look good for Junior Griffey, that is not what you want to see. Good game, otherwise.

-------
Check out Soriano's homer - red meat.


Image from MLB.com Game Day


Tonight's Opponent: Tom Shearn

Mr. Shearn throws four pitches. K-means clustering struggles with the smaller data set from Milwaukee, but the PFX graph still makes it clear. The full sample is shown, and I think exhibits the need to normalize data.



Using the "All" as reference, here are the average specs on the pitches:













































ClusterStart Speedpfx_xpfx_zSpin DirectionSpin Rate
168.73.6-9.9200.2964.8
285.6-6.310.1211.81386.8
380.53.33.3139.1553.8
487.6-2.013.0188.51558.3



I'll update later with a look at pitch effectiveness, and maybe a little in-progress stuff.


Minor League Reference

Baseball-Reference has done it again.
16 years of stats from MiLB.

http://minors.baseball-reference.com/

Meanwhile, I'll look forward day we have PITCHf/x in every professional park - and beyond.


Monday, September 17, 2007

Arroyo Clusters

Looking at all Bronson's PITCHf/x home games, I tried a variety of clusters before finding "the elbow" at k=5.



Here's how the clusters break-out (I'm using median values below, btw):

And here's the PFX and release points - the latter are not included in the cluster analysis, but do provide a good test.



Yes, these images have incorrect labels on the axes....


Updated script, plus some extra SQL

First, a script for crawling, umpires, and database updates in one shot - one day at a time.

You can easily re-modify it to do full crawls, take the parameters from the command line, whatever.

crawl-and-load.pl

You can use it at your own risk. To make it work, be sure to look carefully - I've even noted the lines of interest. Your data model may vary slightly, so don't run it unless you have a back-up and know what you're doing.

EDIT LINES:
month and day: lines 5 & 8
connection string: 291

REVIEW AND EDIT SQL:
351,360,381,389,403,412,426,435,446,479,489,542,554,562,587,605,648,660,668

Here's how I've gone about creating views (MYSQL 5+ required) for hitter strike zones and pitch spin.

create-hitter-zones.sql

create-spin-view.sql

Enjoy, share your feedback and improvements.


Sunday, September 16, 2007

Ted Lilly Clusters

Using five variables and four clusters, here's what k-means found (click for big picture).



This includes all y=50, nothing normalized etc. This seems to work. I suspect normalizing the data will help sort things out. Goal is to automate clusters of 2,3,4 and 5 to find best fit, apply it, and go from there.


Clusters

Early results of k-means cluster analysis aren't too bad. Like Dan Fox found, it isn't the best way to do the job, since you have to input the # of clusters you're looking for. It does come pretty close. Using four clusters and five variables (Spin Direction, Spin Rate, Start Speed, pfx_x, pfz_z) worked best of the various combinations I've played with so far - with the toy Rich Hill sample as a test.



Pretty close. Much more work to do.

Here's the same chart, but grouped by velocity, from an earlier post.



----------------
NLC Standings
Cubs -- (13)
Brewers 1.0
Cardinals 7.0


Data from 9/15, or lack thereof

Nothing from game 2 in terms of PITCHf/x, so no side-by-side Marshall v. Lilly as planned.

Learning about cluster analysis instead.

NLC Standings
Cubs -- (14)
Brewers 1.0
Cardinals 6.0


Saturday, September 15, 2007

Marshall's Return

The lefty goes back for what I suspect is one last start. A couple weeks ago, in this post, I took a quick look at one start - compared to Rich Hill. I'm guessing he's more similar to Ted Lilly than Rich Hill, and I'll follow up on that after game 2 tonight with a look at both starters' outings from the double-header. Hopefully the Magic Number will be 12 by then.

Team GB (Magic#)
Cubs -- (14)
Brewers 2
Cardinals 7
Reds 8.5


Swing Low, Sweet Chariot Geoff Jenkins

No, there is not a Brewers theme today, other than the Voodoo Bernie the Brewer that I'm filling with pins.

Most Likely to Swing Low (>250 total pitches)
Geoff Jenkins .569
Alex Cintron .541
Tona Pena .529
I-Rod .524
Alfonso Soriano .521 (Who just homered in St. Louis, as I type this, not included in this data)

Least Likely
Trot Nixon .047
Gabe Gross .063
Jack Cust .064
Moises Alou .067
Chris Snyder .074

Whiffs'a'lot
Chris Snyder 1.000 (good thing he never swings at them)
Jack Cust .867 (ditto)
Jack Hannahan .867 (Money Ball)
Jose Cruz .857
Ramon Vazquez .857
Jason Michaels .857

Rarely a Whiff
Jeff Conine .000
David Eckstein .034
Jason Kendall .077
Eric Bruntlett .083
Kevin Frandsen .105

Homers off low pitches
Alfonso Soriano 3
Richies Sexson 2
28 others with 1

Soriano's whiff rate on low pitches is .402, which is more than a 1/2 SD better than average. So, yes, he'll swing at it, but you don't want to go there.

Extreme Swings and Whiffs (SwingRate/WhiffRate)

Suckers:
Tony Clark .462/.722
Miguel Olivo .410/.800
Jason Smith .484/.667
Geoff Jenkins .569/.565
Wily Mo Pena .432/.737

Eagle Eyes:
Jeff Conine .111/.000
Eric Bruntlett .169/.083
Kevin Mench .172/.200
Jeff Keppinger .159/.231
Brian Giles .159/.269


Friday, September 14, 2007

Brew f/x

Some diarists at Brew Crew Ball have also consumed the kool-aid and are looking at the Brewers via PITCHf/x


PITCHf/x and Plate Discipline

As Mike mentioned in some comments last week, there's a good study of zone judgment and plate discipline by Dan Fox @ BP. Here's another post about Dan's study (link to it there), and a link to another from Pizza Cutter, at On Baseball and the Reds.


Tuesday, September 11, 2007

Aiming High

Two questions: Who swings at the high stuff (without regard to px, just pz and hitter's average strike zone) and misses it? Who takes it deep?

Forty hitters fit the bill for the first question. Of those, four have hit homers Up and OOZ. I was able to find 50 hitters who have homered on pitch Up and OOZ. One is Matt Murton.

Here are the 40 you want to go up out of the zone on, followed by the hitters who have more than one homer Up and OOZ, who you might want to avoid. Ya know, for any Major League pitchers checkin' out my blog.

Pitches Up (including Away/In and Up)
Swing%> 30; Whiff%> 35 (about 1/2 a SD above mean) (sortable table)


















































































































































































































  Swing Whiff Homers
Bay, Jason 30.5% 50.0%  
Betancourt, Yuniesky 43.3% 35.6%  
Biggio, Craig 38.9% 57.1%  
Braun, Ryan 38.3% 39.1%  
Cabrera, Asdrubal 34.1% 40.0%  
Church, Ryan 32.0% 50.0%  
Clayton, Royce 37.0% 45.0%  
Crawford, Carl 32.9% 50.0%  
Crosby, Bobby 32.7% 38.9%  
Delgado, Carlos 31.4% 63.6%  
Dobbs, Greg 33.3% 44.4%  
Francoeur, Jeff 32.4% 41.7% 1
Giles, Marcus 33.8% 50.0%  
Gomes, Jonny 32.1% 41.2%  
Gonzalez, Alex 43.2% 36.8%  
Hall, Bill 38.2% 50.0%  
Hunter, Torii 30.8% 41.7% 1
Lind, Adam 40.3% 37.9%  
Mackowiak, Rob 30.4% 64.3%  
Milledge, Lastings 45.8% 45.5%  
Monroe, Craig 35.5% 36.4%  
Morneau, Justin 53.3% 37.5%  
Napoli, Mike 31.1% 42.1% 1
Pagan, Angel 40.0% 37.5%  
Peralta, Jhonny 34.6% 48.1%  
Quinlan, Robb 31.3% 50.0%  
Rodriguez, Ivan 41.5% 44.4%  
Ross, David 35.7% 40.0%  
Soriano, Alfonso 31.3% 44.4%  
Terrero, Luis 35.0% 57.1%  
Thames, Marcus 33.3% 46.2%  
Thorman, Scott 37.9% 40.0% 1
Uribe, Juan 38.3% 49.2%  




Home Runs when pz > hitter's average pz sz_top

































Kemp, Matt 3
Rios, Alex 2
Hamilton, Josh 2
Navarro, Dioner 2
Izturis, Maicer 2
Wells, Vernon 2
Beltre, Adrian 2


Monday, September 10, 2007

A matter of style and judgment

I calculated average strike zones for all players (using +/- 1 ft. for lateral, PITCHf/x data for vertical), sliced everyone's pitches into 9 zones etc etc. for all hitters with over 250 pitches

First thing I noticed, is you can infer something about style of hitter (aggressive, conservative) and/or judgment (good, bad) from swing rates, in and out of the zone.

I've included lines indicating mean and rings for one and two SD's out. A little rough (hacking in Excel), but close enough for starters.



Let's look at the leaders and the Cubs.

The most aggressive hitters: Hitter (ISZ/OSZ)
Delmon Young (.817/.444)
Alfonso Soriano (.776/.437)
Johnny Estrada (.786/.428)
AJ Pierzynski (.751/.389)
Vladimir Guerrero (.743/.385)

The most conservative hitters:
Andy Gonzalez (.454/.164)
Reggie Willits (.466/.184)
Ramon Martinez (.502/.187)
Luis Castillo (.462/.200)
Jayson Werth (.497/.219)

Worst Judgment:
Travis Metcalf (.562/.370)
Ryan Zimmerman (.522/.366)
Ronnie Belliard (.534/.354)
Michael Cuddyer (.562/.337)
Jamie Burke (.552/.324)

Best Judgment:
So Taguchi (.719/.194)
Ryan Raburn (.718/.212)
Jeff Kent (.694/.181)
Jim Edmonds (.694/.209)
David Ortiz (.682/.207)

Leaders

Biggest difference between ISZ and OOZ:
Morgan Ensberg (.546)
So Taguchi (.525)
Jeff Kent (.514)
Ryan Raburn (.506)
Josh Hamilton (.489)
14th place Mark DeRosa (.470)

Smallest difference between ISZ and OOZ:
Ryan Zimmerman (.155)
Ronnie Belliard (.179)
Travis Metcalf (.192)
Miguel Olivo (.195)
Reed Johnson (.198)
63rd place Jason Kendall (.302)

Most likely to swing at a ball: (OOZ)
Ivan Rodriguez (.487)
Miguel Olivo (.459)
Tony Pena (.446)
Corey Patterson (.445)
Delmon Young (.444)
6th place Alfonso Soriano (.437)

Least likely to swing at a ball: (OOZ)
Morgan Ensberg (.124)
Jack Cust (.131)
Jeff Conine (.135)
Rickie Weeks (.142) [ed. Wow!]
Brian Giles (.144)
47th place Mark DeRosa (.196)

Most likely to swing at a strike: (ISZ)
Delmon Young (.817)
Geoff Jenkins (.810)
Johnny Estrada (.786)
Rick Ankiel (.779)
Jeff Francoeur (.777)
6th place Alfonso Soriano (.776)

Least likely to swing at a strike: (ISZ)
Andy Gonzalez (.454)
Luis Castillo (.462)
Reggie Willits (.466)
J.J. Hardy (.487)
Jayson Werth (.497)
20th place Jason Kendall (.534)

Cub Styles:
Some of these fits are weak - particularly Murton's bad judgment - he's really average, a little on the bad side.

Aggressive:
Alfonso Soriano (.776/.437)
Jacque Jones (.712/.335)
Aramis Ramirez (.695/.326)

Conservative:
Derrek Lee (.565/.219)
Ryan Theriot (.557/.237)
Jason Kendall (.534/.232)

Bad Judgment:
Matt Murton (.634/.277)
Mike Fontenot (.580/.269)

Good Judgment:
Cliff Floyd (.668/.253)
Mark DeRosa (.666/.196)


Sunday, September 9, 2007

Finding the Zone

Between my interest in Fools, OOZ, some feedback and the release point normalization stuff, I wanted to see how strike zones vary within PITCHf/x, in a variety of ways.

First, sz_top and sz_bot, which are set by the PITCHf/x operators.






































































































































































sz_top sz_bot
Batter Pitches Avg Max Min Stddv Avg Max Min Stddv
Lee, Derrek 1063 3.807 4.940 3.216 0.262 1.735 2.400 1.354 0.138
DeRosa, Mark 872 3.598 4.464 3.107 0.188 1.575 2.038 1.321 0.139
Theriot, Ryan 834 3.306 4.422 2.855 0.190 1.439 2.290 1.060 0.119
Jones, Jacque 732 3.606 4.384 3.250 0.194 1.688 2.200 1.330 0.081
Soriano, Alfonso 716 3.260 4.714 2.639 0.213 1.478 2.674 1.043 0.140
Ramirez, Aramis 651 3.609 4.534 3.291 0.204 1.643 2.276 1.285 0.092
Fontenot, Mike 594 3.181 4.112 2.320 0.181 1.444 1.820 1.000 0.126
Kendall, Jason 541 3.503 4.660 3.201 0.194 1.637 2.218 1.419 0.134
Floyd, Cliff 418 3.794 4.458 3.168 0.216 1.775 2.125 1.179 0.124
Murton, Matt 418 3.140 3.618 2.825 0.159 1.464 1.820 1.279 0.089
Pagan, Angel 293 3.404 4.044 3.120 0.187 1.517 2.040 1.227 0.132
Pie, Felix 271 3.428 4.170 3.081 0.255 1.579 1.870 1.059 0.172



OK, that's not a good sign - there are some out of whack values (this is across any park, any pitch where there's full data).

Breaking out Derrek Lee's pitches:



And Mark DeRosa's



More later....


Wednesday, September 5, 2007

Out of Zone - "strike charts"

Check this out - Josh Kalk is taking a look at what, in effect, are pictures of OOZ tendencies - check out this post, and his site (there's a Geoff Jenkins chart that will surprise no one).

He's also doing some very important work on normalizing PITCHf/x data. Starts here, but I'd just hit his home page to see the latest.


Tuesday, September 4, 2007

Dodgers v Cubs

We're in the midst of game 2 of 4, but here's a peek at two graphs on the four starters th Cubs are facing. Sample is home games except for Loaiza, whose games are from two starts prior to joining LA. Not much to say, working on other things, but I think the breaks picture is interesting. Click for larger...



Monday, September 3, 2007

Stults - trial balloon

Messing around with psuedo-3D plots of pitches. Was working on some charts of the four Dodger starters the Cubs face this week.

Check out this graph of Stults (click for larger).....interesting? Comments on this and its usefulness (e.g. strike zone analysis for Fools or At Bat analysis for key match-ups or A Big plot of everything a guys throws or Average pitch by type) would be appreciated.


Sunday, September 2, 2007

Marmol

Carlos Marmol is a two-pitch pitcher, piece of cake to pick out his pitches, but not so much to hit them.



Two fluke pitches, both show-up on both release point and break as outliers. This is after cleaning intentional balls and pitch-outs.




Pretty obvious, fastball/curveball. Lots of strikes for both, which is scary.



Check out the curveball - killer pitch. If you swing, you whiff 40% the time. If you take, it gets called a strike 45% of the time.



Data from all home games from July 13 - August 30 with PITCHf/x data and y0=50ft. 288 pitches were found, excluding two intentional balls and one pitch out.


Hill ≠ Marshall

This is something I hear over and over:

The Cubs shouldn't pitch Hill and Marshall back-to-back because they are so similar.

But they ain't. Pretty much a moot point now (more on Steve Trachsel later), but, without additional comment, I present Hill v Marshall, @ Wrigley, against Seattle on consecutive nights with very comparable weather, y0 = 40.





Any questions?


Other Side of Hill

Finally, just in time for today's start, here's some more on Rich Hill.

The first question I pondered was about the pitch location at home plate being clear as the break/velocity information was - in terms of the 4-pitch grouping.

Well, it is. Click the image to see a bigger version - I've included an approximate strike-zone for reference sake.

From the catcher's perspective, you can see the fastballs (staying up and away a lot of the time), cutters down and in, curveballs and sliders falling along the line you'd expect such pitches to follow across the plate, with more curves lower and out of the zone.



More on Rich later when I get back to the Out of the Zone stuff.


Data Loader

I've combined the umpire stuff into the parser that Mike Fast created based on Baseball Hacks hack #28.
You can get it here
http://harrypav.googlepages.com/xml2mysql-load.pl


Saturday, September 1, 2007

Patton v Wandy

BTW, we'll see a different kind of lefty tomorrow today - more lateral movement, less vertical, and softer. Here's Patton's start against the Pirates v. Wandy's start from the 31st at Wrigley. A couple more details on Patton are in the previous post.

Big differences here:



Here you can see Wandy's not only throwing harder, but also is a little more tightly grouped than Patton.



Release points (y0=50)