Tải bản đầy đủ - 0 (trang)
Hack 56. Predict the Outcome of a Baseball Game

Hack 56. Predict the Outcome of a Baseball Game

Tải bản đầy đủ - 0trang

Duringthefirstcouplehoursofabaseballgame,turnonthe

radiobroadcastofthatgame.Listenjustlongenoughto

identifytheteamthatisatbat.Thatteamhasagreaterthan

50percentchanceofwinningthatgame.



WhyItWorks

Baseballisagamewherethelongeryouareonoffense,the

morepointsyoucanscore.Asmorebatterscometobatina

singleinning,thechancesofmovingrunnersalongthebase

pathsandacrosshomeplateincreases.Anotherwaytolookat

itistoimaginetheendofaninningthatwashugeforone

team.Ifateamscoredalotofruns,theyhadtohaveused

considerablymorethantheminimumofthreebattersinthat

inningand,consequently,beenatbataproportionatelylonger

lengthoftimethantheotherteam.Overthecourseofagame,

theteamthatisatbatlongestismorelikelytoscoremore(or

havemoreproductiveinnings).

Samplingtheory[Hack#19]suggeststhatasampleismost

likelytocapturethemostcommonelementsofapopulation.

Ourpopulationhereisallthemomentsduringagamethatwe

couldlistento.Themostcommoncharacteristicinthe

population(intermsofwhoisatbat)belongstotheteamthat

isatbatthemost.

Figure5-4suggestsapossibledistributionofat-battimefora

regulationnine-inninggame.Inthisexample,thewinningteam

wasonoffensefor58percentofthetime.Inretrospect,a

randomtuningintothebroadcasthada58percentchanceof

findingthewinningteamatbat.



Figure5-4.Timeatbatforwinningandlosing

teams



Theaccuracyofpredictionshouldbeabove50percentoverthe

longrunofbaseballbroadcasts,butitwon'tbereally,really

accurate.Thisisbecausetherelationshipbetweentimeatbat

andscoringavictoryisnotaperfectcorrelation[Hack#11].

Playerscanscorequicklyhitahomerunontheirfirstpitch,for

exampleortheycantaketheirtimegettingmanyhitsbutstrand

manyrunnersandneverscore.

Overall,thecorrelationbetweenthetwovariablesshouldbe

positive,however.Eventheperhapsunimpressive58percent

accuracyinmyimagineddatainFigure5-4meansthatyouwill

beright16percentmoreoftenthanablindguess.Withsuchan

advantageattheblackjacktables,youwouldbeamillionairein

aweek.



ProvingItWorks

Totesttheaccuracyofmyclaim,youcanusethedatathat

appearsinyourdailynewspaper.Whilemostboxscoresdonot

includeinformationabouttotaltime-at-batforeachteam,there

isavariablethatprovidesalmostthesameinformation.There

willalmostcertainlybea"totalat-bats"reported.Whilethis

statisticisnotthesameastimespentatbat,itshouldcorrelate

prettyhighly.Eachday,thisinformationisprovidedformore

thanadozengames,andjustafewdays'worthofdatashould

beenoughtotestmytheory.Gatherthetotalat-batsforeach

team,includingwhichteamwonthegame.



Real-liferesearchersoftendon'thaveaccesstothevariabletheywould

reallyliketoknowabout,andususingnumberofat-batsinsteadof

timeatbatisgoodexampleofthis.Instead,wemustsettleforthe

nextbestthingavailable.Scientistscallthesesubstitutesproxy

variablesorsurrogatevariables.



Myhypothesisisthattheteamwiththemostat-batsshouldwin

thegamemorethan50percentofthetime.Outofcuriosity,I

testedthishypothesismyself.IusedtheChicagoCubsasan

example,becausetheirstatswerereadilyavailableontheWeb.

Iarbitrarilychose2003andtheCubs'first25games.An

analysisofthesegamesfoundthattheteamwiththemostatbatswon56percentofthetime.IfIhadeliminatedthethree

situationswherethereweretiesinat-bats,Icouldhave

predictedwith63percentaccuracy.

Whiletheteamwiththefewestat-batssometimesdidwinthe

ChicagoCubsgames,thelargerthediscrepancybetweenatbats,themorelikelytheteamwiththemostat-batswastowin

thegame.Whenthemost-at-batsteamswon,theyaveraged

4.14moreat-batsthantheloser.Whentheleast-at-batsteams

won,theyaveragedonly2.88at-batslessthantheloser.



OtherPlacesItWorks

Somepeoplehavesuggestedthatinthecaseofmyteam,the

KansasCityRoyals,ifIwanttoberightmorethanhalfthe

time,Ishouldalwayspredictaloss.Yes,yes,veryfunny.



WhereItDoesn'tWork

Theaccuracyofthismethodshouldbelowifyouturnonthe



radiointheninthinning,whichiswhyIsuggestyoutryit

duringthefirstcouplehoursofthegame.Undertherulesof

baseball,ifthehometeamisleadingafterthetopoftheninth

inning,theynevercometobat.Theywin.Gameover.Ashome

teamswinmoreoftenthanvisitingteams,thismeansthat

oftenthewinningteamnevercomestobatatallintheninth

inning.

Thispresentsaninterestingvariationofthispredictionmethod

thatappliesonlytotheninthinning.Turnonthegameinthe

ninthinning;ifyourteamisbatting,thingsdon'tlooksogood.

ThedatapresentedfortheChicagoCubsthatfoundthewinning

teamoccasionallyhavingfewerat-batsthantheiropponentcan

bepartlyexplainedbythefactthatthewinningteam

sometimesbatsinonlyeightinnings.

Thismethoddoesn'tworkforallsports.Inbasketball,for

example,timeofpossessionwouldn'tbeexpectedtopositively

correlatewithpointsscoredand,inthecaseofhigh-energy,

fast-scoringteams,mightevennegativelycorrelate.Infootball,

ontheotherhand,timeofpositionisconsideredakeyindicator

ofqualityperformanceandusuallycorrelateswithawin.







Hack57.PlotHistogramsinExcel



UseMicrosoftExceltoplotdatadistributionssothatyou

canhaveabetterunderstandingofstatistics.

Thereissometruthtotheclich\x8e"apictureiswortha

thousandwords."Apictureisoftenthebestwaytounderstand

1,000numbers.Peoplearevisuallyoriented.We'regoodat

lookingatapictureandobservingdifferentcharacteristics;

we'rebadatlookingatalistof1,000numbers.

Oneofthemostpowerfultoolsavailableforunderstandingdata

isthehistogram,apictureofthedistributionofvalues.Hereis

theideaofahistogram.Supposeyouhavealotofdatasay,the

battingaveragesforall6,032baseballplayersbetween1955

and2004whoaveraged3.1ormoreplateappearancesper

game.Let'salsoassumeyouwanttoknowhowthesevalues

aredistributed.Whatarethelowestandhighestvalues?Are

theremorelowvaluesthanhighvalues?Werebattingaverages

totallyrandomnumbersbetween0and.400,orwasthere

somepattern?

Battingaveragecantakemanydifferentvalues.Between1955

and2004,6,032playershadqualifyingbattingaverages,and

therewere1,229uniquevaluesforbattingaverage.Youcan

plotthenumberofplayerswitheachuniquebattingaverage

(thoughIcan'timaginewhatthisgraphwouldlooklike).But

wedon'treallycareabouteachuniquevalue;forexample,the

factthat13playershadabattingaverageof.2862isnotthat

interesting.Instead,wemightwanttoknowthenumberof

playerswithverysimilarbattingaveragessay,between.285and

.290.



Let'sthinkofeachrangeasabucket.Everyplayer-seasongoes

intoabucket.Forexample,in1959,HankAaronhada.354

average,sowe'llputthatseasoninthe.350-.355bucket.So,

here'sourplan:we'llputeachplayer-seasonintoabucket,

countthenumberofplayer-seasonsineachbucket,anddrawa

graphshowing(inascendingorder)thenumberofplayersin

eachbucket.Thissinglediagramisahistogram.



TheCode

Inthisexample,Iwantedtolookatthedistributionofbatting

average.Iusedatablecontainingthetotalbattingstatisticsfor

eachplayerineachyear(andthelistofallteamsforwhich

eachplayerplayed),andIcalledthetableb_and_t.Iselected

onlybatterswithenoughplateappearancestoqualifyfora

leaguetitle,andonlythoseplayerswhoplayedbetween1955

and2004:

SELECTb.playerID,M.nameLast,M.nameFirst,b.yearID,b.teamG,

b.teamIDs,b.AB,b.H,

b.H/b.ABASAVG,

b.AB+b.BB+b.HBP+b.SFasPA

FROMb_and_tbinnerjoinMasterM

onb.playerID=m.playerID

WHEREyearID>1954

ANDb.AB+b.BB+b.HBP+b.SF>b.teamG*3.1;



Afterrunningthisquery,IsavedtheresultstoanExcelfile

namedbatting_averages.xls.

OnewaytodrawhistogramsinExcelistousetheAnalysis

ToolPakadd-in.YoucanaddthisbyselectingAdd-Ins...from

theToolsmenu,andthenselectingAnalysisToolPak.Thisadds

anewmenuitemtotheToolsmenu,calledDataAnalysis,



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Hack 56. Predict the Outcome of a Baseball Game

Tải bản đầy đủ ngay(0 tr)

×