<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>nerdbaseball.com &#187; R is good</title>
	<atom:link href="http://www.nerdbaseball.com/tag/r-is-good/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.nerdbaseball.com</link>
	<description></description>
	<lastBuildDate>Thu, 08 Sep 2011 20:35:04 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Walking the walk: Part I, Offense</title>
		<link>http://www.nerdbaseball.com/2009/02/walking-the-walk-part-i-offense/</link>
		<comments>http://www.nerdbaseball.com/2009/02/walking-the-walk-part-i-offense/#comments</comments>
		<pubDate>Wed, 04 Feb 2009 18:32:17 +0000</pubDate>
		<dc:creator>Prof. Nerdtron 3000</dc:creator>
				<category><![CDATA[Nerdtron's computer]]></category>
		<category><![CDATA[amazingly I'm not actually goofing off]]></category>
		<category><![CDATA[fantasy baseball]]></category>
		<category><![CDATA[modern nerdiness]]></category>
		<category><![CDATA[R is good]]></category>
		<category><![CDATA[SPSS is bad]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[willy taveras]]></category>

		<guid isPermaLink="false">http://www.nerdbaseball.com/?p=128</guid>
		<description><![CDATA[No one has ever mistaken fantasy baseball for real-life baseball.  Except for me.  When I’m trouncing the nincompoops in my league, I want to feel like the victory has some greater meaning in life.  My fellow nincompoops feel much the same way, which is why we set up a Sportsline league that allows us to [...]<div class="addthis_toolbox addthis_default_style" addthis:url='http://www.nerdbaseball.com/2009/02/walking-the-walk-part-i-offense/' addthis:title='Walking the walk: Part I, Offense' ><a class="addthis_button_facebook"></a><a class="addthis_button_twitter"></a><a class="addthis_button_email"></a><a class="addthis_button_google_plusone"></a><a class="addthis_button_compact"></a></div>]]></description>
			<content:encoded><![CDATA[<p>No one has ever mistaken fantasy baseball for real-life baseball.  Except for me.  When I’m trouncing the nincompoops in my league, I want to feel like the victory has some greater meaning in life.  My fellow nincompoops feel much the same way, which is why we set up a Sportsline league that allows us to set the stat values to whatever we so desire.  This way, we don’t have to deal with the absurdness of a Yahoo! League where a SB is the same as a HR.</p>
<p>It also turns out that I’ve been tasked with learning R, an open source statistical package, for work.  Namely, I need to figure out how to do a multiple regression.  This is a statistical technique that allows you to model an outcome based on multiple inputs.  So, to learn how to use the software, I decided to model offense in baseball, and finally figure out how good our league’s scoring system really is (for offense, at least).</p>
<p>Here’s a warning.  It gets really nerdy from here on out, so if you’re just interested in the answer, here it is.  Our method is pretty good.  We overvalue walks.  If you’re for some reason interested in R vs. SPSS, R is an adequate replacement.</p>
<p><span id="more-128"></span></p>
<p>So what is this scoring system?  It’s point based for various baseball events.  We’re only talking about the offense, so here we go:</p>
<table border="0">
<tbody>
<tr>
<th>Event</th>
<th>Points</th>
</tr>
<tr>
<td>1B</td>
<td>1</td>
</tr>
<tr>
<td>2B</td>
<td>2</td>
</tr>
<tr>
<td>3B</td>
<td>3</td>
</tr>
<tr>
<td>HR (dong)</td>
<td>4</td>
</tr>
<tr>
<td>BB</td>
<td>1</td>
</tr>
<tr>
<td>R</td>
<td>1</td>
</tr>
<tr>
<td>RBI</td>
<td>1</td>
</tr>
<tr>
<td>SB</td>
<td>1</td>
</tr>
<tr>
<td>CS</td>
<td>-1</td>
</tr>
<tr>
<td>GDP</td>
<td>-1</td>
</tr>
</tbody>
</table>
<p>To figure out how our weights stack up, I looked at the 2008 run scoring for each of the teams as a measure of offense (i.e., the outcome variable).  The planned inputs were everything shown above, plus strikeouts.  However, some of the measures ended up being dropped.  Strikeouts didn’t make the final analysis because it’s bimodal (what’s going on there?), and regression demands normal distribution.    Triples are positively skewed, but I kept them in anyway without a transform.  RBI and GDP I dropped because they’re context dependent.  Obviously, RBI and R are closely correlated.  There is nothing Earth shattering by saying that the trick to scoring runs is driving them in. GDP is equally misleading.  A team with a high GDP has a lot of runners on base, which is actually a good thing.  SB and CS didn’t end up being a significant predictor of runs in 2008, so they’re not in the final analysis.  A league that treats steals as anything more than a footnote is interested in Fantasy baseball, not fantasy Baseball.  If your real-life MLB team is built around steals, take a moment and go weep.</p>
<p>I’m sorry I’ve offended the member of the Willy Tavares fan club.  The output of the modeling is in the table below.</p>
<p><img class="alignnone size-full wp-image-129" title="offensemodel" src="http://www.nerdbaseball.com/wp-content/uploads/2009/02/offensemodel.png" alt="offensemodel" width="432" height="296" /></p>
<p>What does this mean?  This is a linear model, of the form y=mx + b.  Y, in this case, is predicted runs.  B is the intercept, and listed under each model.  Don’t read into the negative intercepts, it’s mostly a statement that real baseball teams don’t have zero offense.  The rest of those numbers are the coefficients for each event.  In model number five, where all five events are used, we predict that a homerun will produce 1.5 runs, and a double will produce 0.94 runs.  Triples are also productive, but we have to take that 1.46 with a grain of salt because the original data was skewed.  Finally, a walk is not as good as a single, but getting on base at all is still very valuable. (Don’t worry about something being more significant than something else.  Significance is a yes/no proposition.  If something is reported as more significant, that speaks to the confidence in the coefficient, not the magnitude of the coefficient itself).</p>
<p>None of this is actually new, and all it really does is confirm that we’re on the right track with our league weights.  We could stand to knock BB down a bit, I guess.  It’s also pretty amazing that these 5 stats describe 90% of the variance in run production.  Finally, the Twins scored 60 more runs than the model predicts.  They were the only over-productive outlier.   I would expect fewer runs out of them this year.   San Diego and St. Louis under-produced by about 40 runs.  The SD result shows that not including park factors could be a real problem if this were anything more than just fooling around.  STL will probably just score more runs this year.</p>
<p>Now how often can I get nerdly baseball pursuits to count as work?</p>
<div class="addthis_toolbox addthis_default_style" addthis:url='http://www.nerdbaseball.com/2009/02/walking-the-walk-part-i-offense/' addthis:title='Walking the walk: Part I, Offense' ><a class="addthis_button_facebook"></a><a class="addthis_button_twitter"></a><a class="addthis_button_email"></a><a class="addthis_button_google_plusone"></a><a class="addthis_button_compact"></a></div>]]></content:encoded>
			<wfw:commentRss>http://www.nerdbaseball.com/2009/02/walking-the-walk-part-i-offense/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
	</channel>
</rss>

