Angel Guzman showed good stuff in two outings this week in Houston. Questions have come up about his pitch classifications shown in Gameday. I don't agree on all the pitches (never have, never will) and I actually asked the good people at MLBAM how they are classifying Guzman's pitches. Let's see how they tag them and how I tag them.
First is a spin movement graph based on Gameday's pitch IDs for Guzman. The "FA" all come from 2008, the "SL" sliders are a mix of 2008 and 2009, the "FF" and "SI" are all 2009, the "CU" a mix again.
I have data from 2007 when Guzman was up, but Gameday didn't do its own IDs until last season. I'll still include those 2007 games in the charts based on my classifications.
Now, my IDs. The four-seamer is yellow, the two-seamer dark red, the cutter is light blue and the curveball blue. I'm breaking it out by park, with Wrigley in the top-left in both charts for reference.
You can see some parks, like Petco, have the ball sinking more (or not sinking less, as it were), making those cutters look more like sliders. I trust the data from the other parks.
Ross Paul, who built the neural network that is used to generate real time pitch IDs was able to answer some of my questions on how they are classifying pitches for Guzman. For some background information on how they do it to begin with, you should check out my recent look at Gameday 2009 for The Hardball Times.
From the article, the list of pitch types currently used for Gameday:
FF Four-seam fastball
FT Two-seam fastball
FA Fastball (generic, usually when the pitcher is not well known)
SI Sinker, same as two-seam but the label is used for some pitches (Derek Lowe had his two-seam fastball consistently marked SI, for example)
SL Slider (real time pitch classifications often mix curves and sliders)
Without going into the gory details, pitch classifications are weighted for each pitcher based on what they know, or don't know, about that pitcher. In Guzman's cases, they were actually excluding the cut fastball designation (FC) from Guzman explicitly. Essentially, those pitches were either going to be sliders (which isn't a bad "miss") or, now, sinkers. Woah.
The sinker designation is a piggy-back. If a pitcher is known to throw a pitch he calls a sinker, they'll automatically label anything that's an FT (two-seam fastball) SI. It's two steps - the system knows the pitcher throws a sinker, so they make sure it will include the two-seamer class for a given pitcher so the net can decide if a pitch fits the bill (step one) and (step 2) relabel anything the net spits out as FT to SI.
Various attributes are used to classify each pitch. Speed is one of them. Guzman throws his cutter so hard that it confuses Gameday. Since they're not looking for a cutter, it will sometimes say "hey, that's a slider based on movement and the fact that it is slower than a fastball, and we 'know' he throws one, so it isn't typical, but we'll take it". Other times, the system will say "that's gotta be a fastball, with that spin etc. It moves down compared to his other fastballs, so maybe that's a two-seam sinker." But the "confidence" of the system is dirt low on the sinker labels, 0.2 where 1.2 is a solid ID.
That confidence rating is important. It directly reflects the output of the neural network and should be used to filter pitches from analysis based solely on Gameday IDs.
So, Guzman has good stuff, throws hard and confuses a couple cameras and a computer. If he can keep the hitters just as off balance, we'll be very happy in Chicago.