Hack 57. Plot Histograms in Excel

# Hack 57. Plot Histograms in Excel

Let'sthinkofeachrangeasabucket.Everyplayer-seasongoes

average,sowe'llputthatseasoninthe.350-.355bucket.So,

here'sourplan:we'llputeachplayer-seasonintoabucket,

countthenumberofplayer-seasonsineachbucket,anddrawa

graphshowing(inascendingorder)thenumberofplayersin

eachbucket.Thissinglediagramisahistogram.

TheCode

Inthisexample,Iwantedtolookatthedistributionofbatting

average.Iusedatablecontainingthetotalbattingstatisticsfor

eachplayerineachyear(andthelistofallteamsforwhich

eachplayerplayed),andIcalledthetableb_and_t.Iselected

onlybatterswithenoughplateappearancestoqualifyfora

leaguetitle,andonlythoseplayerswhoplayedbetween1955

and2004:

SELECTb.playerID,M.nameLast,M.nameFirst,b.yearID,b.teamG,

b.teamIDs,b.AB,b.H,

b.H/b.ABASAVG,

b.AB+b.BB+b.HBP+b.SFasPA

FROMb_and_tbinnerjoinMasterM

onb.playerID=m.playerID

WHEREyearID>1954

ANDb.AB+b.BB+b.HBP+b.SF>b.teamG*3.1;

Afterrunningthisquery,IsavedtheresultstoanExcelfile

namedbatting_averages.xls.

OnewaytodrawhistogramsinExcelistousetheAnalysis

whichintroducesseveralnewfunctions,includingaHistogram

function.ButIfindthisinterfaceconfusingandinflexible,soI

dosomethingelse.

Hereismymethodforcreatingahistogram:

1. Inthedataworksheet,createanewcolumncalledRange.

2. Inthefirstcellofthiscolumn,useafunctiontoroundthe

valueforwhichyouwouldliketoplotthedistribution.The

simplewaytodothisistousetheSignificantFiguresoption

oftheROUNDfunction.Inmyworksheet,columnIcontained

thevalueforwhichIwantedtocalculatethedistribution

(battingaverage),soIcoulduseaformulasuchas

ROUND(I2,2)toroundtothenearest.010.Personally,Ifinda

bucketsizeof.005tobemoredescriptive,soIuseatrick.

YoucanmultiplyavalueinsidetheROUNDfunctionandthen

divideoutsidethefunctiontogetbucketsofalmostany

size.InsidetheROUNDfunction,Imultiplybythereciprocalof

thebucketsizeinthiscase,1/.005=200.Outsidethe

function,Imultiplybythebucketsize.Inmyworksheet,

columnIcontainedtheaveragevalues.So,IusedROUND(I2*

200,0)/200asmyformula.Copyandpastethisformulainto

everyrowoftheworksheet.(Youcandouble-clickthe

bottom-rightcornerofthecelltodothisquickly.)

bucket.Selectallofthedataintheworksheet,includingthe

andPivotChartReport.SelectPivotChartReportandclick

Finish(we'lluseallthedefaults).Wewillselecttwofields

forourpivottable.FromthePivotTableFieldListpalette,

selectRange.Drag-and-dropthisontotheDropRowFields

Herepartofthepivottable.Next,drag-and-drop"playerID"

ontotheDropDataItemHerepartofthepivottable.By

default,ExcelwillcountthenumberofplayerIDsinthe

underlyingdatathatmatcheachrangevalue.Thepivot

tableisnowshowingthenumberofitemsineachbucket.

Youshouldseea(veryugly)graphwiththenumberof

playersineachbucket.

4. Cleanupthegraph.(Iliketoerasethebackgroundfilland

linesandchangethewidthofthecolumns.)Figure5-5

showsanexampleofacleaned-upgraph.

Figure5-5.Histogramfromapivotchartreport

Lookingatthehistogram,weseethatthedistributionlooks

similartoabellcurve;itskewstowardtherightandiscentered

ataround.275.

HackingtheHack

thatyoucaneasilychangetheformulaforbinning.Herearea

fewsuggestionsforotherformulas:

ROUNDDOWN(,)andROUNDUP(,

)

ThisROUNDDOWNfunctionroundsdowntothenearest

significantfigure.Forexample,ROUNDDOWN(3.59,0)equals3,

andROUNDDOWN(3.59,1)equals3.5.Similarly,ROUNDUProundsup

tothenearestsignificantfigure.ROUNDUP(3.59,0)equals4,

andROUNDUP(3.59,1)equals3.6.

LOG(,)

Sometimesit'susefultoplotavalueonalogarithmicscale,

andtouselogarithmic-sizebins.YoucancombineLOG

functionswithROUNDfunctionstocreatevariable-sizebins.

CONCATENATE(...)

TheCONCATENATEfunctiondoesn'tcomputenumbers,itputs

texttogether.Ifyouwanttoexplicitlylistranges(suchas

3.500-3.599),youcanusetheCONCATENATEfunctiontocreate

these;forexample,CONCATENATE(ROUNDDOWN(3.59,1),"to

",ROUNDUP(3.59,1)-0.01)returns3.5to3.59.

Ifyouwanttotakethistothenextlevel,youcanreplacethe

binsizewithanamedvalue.(Forexample,namecellA1

bin_size.)Thismakesiteasytochangethebinsizedynamically

andexperimentwithdifferentnumbersofbins.

Hack58.GoforTwo

Infootball,whenisthetwo-pointconversionattemptthe

rightchoice?Regardlessofwhich"chart"you'reusing,

theproblemgetsevenmorecomplicatedwhen

statisticiansenterthedebate.

Afewyearsback,Iwasenjoyingwatchingmylocalprofessional

footballteamastheywerelosingaclosegame.Iwasn't

entertainedbymyteam'sdismalperformanceasmuchasIwas

delightedbymyteam'sbefuddledcoachasheattemptedto

Infootball,afteratouchdownisscored(thetouchdownitselfisworth

sixpoints),thescoringteamhastwooptionsforscoringan"extra

point"ortwo.Usually,theteamchoosestokickasingleextrapoint

throughtheuprights(likeashort-distancefieldgoal),buttheymight

alsochooseto"gofortwo"points(knownasthetwo-pointconversion),

whichinvolvestheoffenserushingorpassingforanothertripintothe

endzone.

Atthetime,aswaslater"confirmed"bysportswriters,itwas

wheninterpretingthecolumnonthechartthatlistedhowmany

point-afterconversion.

this"chart"andwhatprinciplesitwasbasedon.Later,asI

searchedforthe"officialchart,"Ifoundtwo"official"charts,

andtheydidn'talwaysagree.

Morerecently,Iranacrossachartbasedonastatistical

analysisoftheprobabilityofpossibleoutcomesandonthe

amountoftimeremaining(asindicatedbythenumberof

possessionsremaining).Thischartdidn'tagreewitheitherof

theearlierchartsIdiscovered.

Thishackisforyou,Coach.Itexaminesfromastatistical

perspectivewhentogofortwopointsandwhentosettlefor

one.

WhenyouseeacoachonTVholdingaplasticlaminatedcard

andstudyingitbeforedecidingwhethertogofortwo,

sportscastersliketorefertothecardasthechart,though,as

mentionedintheprevioussection,there'smorethanonechart

inuse.Theslightdifferencesmightbeduetothefactthatone

isidentifiedasbeingusedintheNFLandtheotherisidentified

asaclassicsetofstandarddecisionsusedincollegefootball.

Thedifferencesmightalsobebasedonthefactthatthecollege

moreaggressiveorconfidentstyle.Thecollegechartseemsto

playforavictory,notatie.Thoughcollegeballnowhas

overtimerules,theyareafairlyrecentdevelopment,whereas

TheNFLchartisprovidedonNormHitzges'website(Normisa

http://www.normhitzges.com/thechart.htm.Thecollegechart

(foundathttp://www.NFL.com/fans/twopointconv.html)is

identifiedastheoneusedinthe1970sanddevelopedatthe

UniversityofCalifornia,LosAngeles(UCLA).Table5-14

providesthesuggesteddecisionsfrombothchartsandis

condensedabit.

TableClassicdecisionmakingfortwo-pointattempts

Behind(NFL)

Behind(College)

0

1

0

1

1

1

2

1

2

2

2

2

2

2

1

1

3

1

1

3

1

1

4

1

4

2

2

5

2

2

5

2

2

6

1

1

6

1

1

7

1

1

7

1

1

8

1

1

8

1

1

9

1

2

9

1

1

10

2

1

10

1

1

11

1

2

11

2

1

12

1

2

12

2

2

TheUCLAchartdoesnotprovidesuggestionsforwhenthe

scoreistiedorwhenyourteamisbehindbyfourpoints.The

Asdiscussed,theprimarydifferenceseemstobewhether

you'rewillingtoplayforthetieornot.UCLAclearlydidnot

wishtoplayforthetie,whiletheNFLcharthasnosuch

hesitancy.

ModernSuper-ScientificChart

Intherealworld,asetofstatisticalprobabilitiescontrolsthe

gofortwoortaketheextrapointshouldbebasedonmore

informationthanjustthescoreandwhetheryourteamis

winningorlosing.Inactualgamesituations,smartcoaches

Thelikelihoodthattheirfieldgoalkickerwillmakethefield

goal

Thelikelihoodthattheirteamwillscoreonagiventwo-

pointconversionplay

Thecurrenthealth,attitude,andskilloftheirplayers

PaststatisticsshowthattheaverageNFLfootballteammakes

two-pointattempts.Coachesmustusetheirexperienceand

intuitiontogaugetheirplayers'currentabilitylevel,andachart

isn'tmuchhelponthatscore.

Asforpossessionsleft,however,thisisexactlythetypeof

informationthatdecisionsystemsbasedonprobabilityneedto

takeintoaccount.Basedonaprocessofworkingbackward

fromtheendingofahypotheticalfootballgamethattakesthe

probabilityofsuccessoneitheroption(98percentforone-point

playsand40percentfortwo-pointplays)intoaccount,

statisticianshaveproducedachartbasedonnotonlyonthe

currentscore,butalsoonthetotalnumberofpossessions

remainingforbothteams.

Ina2000issueofChancemagazine(Vol.13,No.3),Harold

Sackrowitzpresentedtheresultsofsuchananalysisusinga

processcalleddynamicprogramming.Table5-15showsa

portionofDr.Sackrowitz'schart.

TableModerndecisionmakingfortwo-pointattempts

Possessions

remaining

1

2

3

Pointsbehindor

0

1 2 3 4 5 6 7 8 9 10 11 12

12 1

211 2111

12112 12

21112111

12112 12

Behind 1

Behind 1

Behind 1

2

2

4

5

6

Behind 1

Behind 1

Behind 1

2111211111

1211211222

2211211111

1211211222

2111211111

1211211222

2211211111

1

1

1

1

1

1

1

2

2

2

2

2

Thistwo-pointconversionchartisbasedonthebranching

possibilitiesstartingatdifferentpointsinthegameand

assumingbasicprobabilitiesofsuccessforeitheranextrapoint

oratwo-pointconversion.AnaverageNFLquarterseessix

possessionsintotal,sothinkofthischartasbeingmostuseful

inthefourthquarter.Sackrowitzalsoassumesa50percent

chanceforovertimevictories.

HowItWorks

ThecalculationsforTable5-15worksomethinglikethissimple

example:

1. Imagineyouaredownbyonepointwithoutmuchchanceof

gettingtheballagain.

2. Youhavea98percentchanceofmakinganextrapointkick

anda50percentchanceofwinninginovertime.Goingfor

theextrapointresultsinavictory49percentofthetime

(.98x.50=.49).

3. Youhavea40percentchanceofconvertingatwo-point

play,sogoingfortwopointsresultsinavictory40percent

ofthetime.Failureendsthegame,andsuccesswinsthe

game.

4. 49percentisbetterthan40percent,soyoushouldelectto

gofortheextrapoint.Noticethatifyoubelieveyourteam's

chancesofconvertingthetwo-pointplayarebetterthan49

percent,youshouldgoforit.Calculationslikethese,but

overalongerseriesofpossessions,resultinthedecision

treereflectedinTable5-15.

Whichchartshouldyouusethenexttimeyoufindyourself

coachinginacrucialfootballgamewithakeydecisiontomake?

That'suptoyou,butjustrememberthatbefuddledfootball

coachIwatchedonTVafewyearsago.Notonlywashe

replacedthenextyearbyDickVermeil,consideredoneofthe

brighterfootballcoachesaround,butitwasVermeilwhohelped

developtheUCLAtwo-pointconversionchartshowninTable514.Nowyouknowtherestofthestory!

Hack59.RankwiththeBestofThem

Therearemanywaystousedatatomakejudgments

compareperformanceinindividualsportshavevalidity

concerns,however.

MyfriendsandIareacompetitivelot.Ourarenaofcombat,

mostrecently,hasbeenpoker.Onaregularbasis,myfriends

andIgatheratmyhomeandtakepartinaTexasHold'Em

pokertournament.It'saninformalaffair,butwealltakeitvery

seriously.Thewayourpokertournamentswork,everyone

startswiththesameamountofchips,andwhentheyaregone

theyaregone.Thereisafirstoneout,alastoneout,and

everythinginbetween.So,forexample,ifsevenpeopleplay,

someonecomesinfirst,second,third,fourth,fifth,sixth,and

seventh.

Weallthinkofourselvesasprettygoodand,beingcompetitive,

wehavelongedforanobjectivemethodofcomparing

performanceacrosstournaments.Asoneofthestatisticiansin

thegroup,Itookituponmyselftodevisevariouswaysof

producingsomesortofobjectiveindexthatwouldallowall

participantstocomparetheirperformancewitheachotherto

decideonceandforallwhoisthebestplayerandwhoisonly

luckynowandagain.Thisisthestoryofmyquestandthe

statisticalsolutionsIchose.Nottogivetheendingaway,butI

learnedthatthereisnosinglebestsolution.

HowtoRankFairly