Tải bản đầy đủ - 0 (trang)
Hack 2. Describe the World Using Just Two Numbers

Hack 2. Describe the World Using Just Two Numbers

Tải bản đầy đủ - 0trang

Inferentialstatisticstendtousetwovaluestodescribe

populations,themeanandthestandarddeviation.



Mean

Ratherthandescribeasampleofvaluesbyshowingthemall,it

issimplymoreefficienttoreportsomefairsummaryofagroup

ofscoresinsteadoflistingeverysinglescore.Thissingle

numberismeanttofairlyrepresentallthescoresandwhat

theyhaveincommon.Consequently,thissinglenumberis

referredtoasthecentraltendencyofagroupofscores.

Typically,thebestmeasureofcentraltendency,foravarietyof

reasons,isthemean[Hack#21].Themeanisthearithmetic

averageofallthescoresandiscalculatedbyaddingtogetherall

thevaluesinagroup,andthendividingthattotalbythe

numberofvalues.Themeanprovidesmoreinformationabout

allthescoresinagroupthanothercentraltendencyoptions

(suchasreportingthemiddlescore,themostcommonscore,

andsoon).

Infact,mathematically,themeanhasaninterestingproperty.A

sideeffectofhowitiscreated(addingupallscoresanddividing

bythenumberofscores)producesanumberthatisascloseas

possibletoalltheotherscores.Themeanwillbeclosetosome

scoresandfarawayfromsomeothers,butifyouaddupthose

distances,yougetatotalthatisassmallaspossible.Noother

number,realorimagined,willproduceasmallertotaldistance

fromallthescoresinagroupthanthemean.



Standarddeviation

Justknowingthemeanofadistributiondoesn'tquitetellus

enough.Wealsoneedtoknowsomethingaboutthevariability

ofthescores.Aretheymostlyclosetothemeanormostlyfar



fromthemean?Twowildlydifferentdistributionscouldhavethe

samemeanbutdifferintheirvariability.Themostcommonly

reportedmeasureofvariabilitysummarizesthedistances

betweeneachscoreandthemean.

Aswiththemean,themoreinformativemeasureofvariability

wouldbeonethatusesallthevaluesinadistribution.A

measureofvariabilitythatdoesthisisthestandarddeviation.

Thestandarddeviationistheaveragedistanceofeachscore

fromthemean.Astandarddeviationcalculatesallthedistances

inadistributionandaveragesthem.The"distances"referredto

arethedistancebetweeneachscoreandthemean.



Anothercommonlyreportedvaluethatsummarizesthevariabilityina

distributionisthevariance.Thevarianceissimplythestandard

deviationsquaredandisnotparticularlyusefulinpicturinga

distribution,butitishelpfulwhencomparingdifferentdistributionsand

isfrequentlyusedasavalueinstatisticalcalculations,suchaswiththe

independentttest[Hack#17].



Theformulaforthestandarddeviationappearstobemore

complicatedthanitneedstobe,buttherearesome

mathematicalcomplicationswithsummingdistances(negative

distancesalwayscanceloutthepositivedistanceswhenthe

meanisusedasthedividingpoint).Consequently,hereisthe

equation:

Smeanstosumup.Thexmeanseachscore,andthenmeans

thenumberofscores.



CentralLimitTheorem

TheCentralLimitTheoremisfairlybrief,butverypowerful.



Beholdthetruth:

Ifyourandomlyselectmultiplesamplesfromapopulation,the

meansofeachofthosesampleswillbenormallydistributed.

Attachedtothetheoremareacoupleofmathematicalrulesfor

accuratelyestimatingthedescriptivevaluesforthisimaginary

distributionofsamplemeans:

Themeanofthesemeans(that'samouthful)willbeequal

tothepopulationmean.Themeanofasinglesampleisa

goodestimateforthismeanofmeans.

Thestandarddeviationofthesemeansisequaltothe

samplestandarddeviationdividedbythesquarerootofthe

samplesize,n:

Thesemathematicalrulesproducemoreaccurateresults,and

thedistributionisclosertothenormalcurveasthesamplesize

withinanysamplegetsbigger.



30ormoreinasampleseemstobeenoughtoproduceaccurate

applicationsoftheCentralLimitTheorem.



SoWhat?

Okay,sotheCentralLimitTheoremappearssomewhat

intellectuallyinterestingandnodoubtmakesstatisticiansall

gigglyandwriggly,butwhatdoesitallmean?Howcananyone

useittodoanythingcool?



Asdiscussedin"KnowtheBigSecret"[Hack#1],thesecret

trickthatallstatisticiansknowishowtosolveproblems

statisticallybytakingknowninformationaboutthedistribution

ofsomevaluesandexpressingthatinformationasastatement

ofprobability.Thekey,ofcourse,ishowoneknowsthe

distributionofalltheseexotictypesofvaluesthatmight

interestastatistician.Howcanoneknowthedistributionof

averagedifferencesorthedistributionofthesizeofa

relationshipbetweentwosetsofvariables?TheCentralLimit

Theorem,that'show.

Forexample,toestimatetheprobabilitythatanytwogroups

woulddifferonsomevariablebyacertainamount,weneedto

knowthedistributionofmeansinthepopulationfromwhich

thosesamplesweredrawn.Howcouldwepossiblyknowwhat

thatdistributioniswhenthepopulationofmeansisinvisible

andmightevenbeonlytheoretical?TheCentralLimitTheorem,

Bub,that'show!Howcanweknowthedistributionsof

correlations(anindexofthestrengthofarelationshipbetween

twovariables)whichcouldbedrawnfromapopulationof

infinitepossiblecorrelations?EverhearoftheCentralLimit

Theorem,dude?

Becauseweknowtheproportionofvaluesthatresideallalong

thenormalcurve[Hack#23],andtheCentralLimitTheorem

tellsmethatthesesummaryvaluesarenormallydistributed,I

canplaceprobabilitiesoneachstatisticaloutcome.Icanuse

theseprobabilitiestoindicatethelevelofstatisticalsignificance

(thelevelofcertainty)Ihaveinmyconclusionsanddecisions.

WithouttheCentralLimitTheorem,Icouldhardlyevermake

statementsaboutstatisticalsignificance.Andwhatadrab,sad

lifethatwouldbe.



ApplyingtheCentralLimitTheorem

ToapplytheCentralLimitTheorem,Ineedstartwithonlya



sampleofvaluesthatIhaverandomlydrawnfromapopulation.

Imagine,forexample,thatIhaveagroupofeightnewCub

Scouts.It'smyjobtoteachthemknottying.Isuspect,let's

say,thatthisisn'tthebrightestbunchofScoutswhohaveever

cometomeforknot-tyingguidance.

BeforeIdemandextrapay,Iwanttodeterminewhetherthey

are,infact,afewbadgesshortofabushel.Iwanttoknow

theirIQ.Iknowthatthepopulation'saverageIQis100,butI

noticethatnooneinmygrouphasanintelligencetestscore

above100.Iwouldexpectatleastsomeabovethatscore.

Couldthisgrouphavebeenselectedfromthataverage

population?Maybemysampleisjustunusualanddoesn't

representallCubbies.Astatisticalapproach,usingtheCentral

LimitTheorem,wouldbetoask:

IsitpossiblethatthemeanIQofthepopulationrepresentedby

thissampleis100?

IfIwanttoknowsomethingaboutthepopulationfromwhich

myScoutsweredrawn,IcanusetheCentralLimitTheoremto

prettyaccuratelyestimatethepopulation'smeanIQandits

standarddeviation.Icanalsofigureouthowmuchdifference

thereislikelytobebetweenthepopulation'smeanIQandthe

meanIQinmysample.

Ineedsomedatafrommyscoutstofigureallthisout.Table11shouldprovidesomegoodinformation.

TableScoutsmarts



Scout

Jimmy

Perry

Clark

Lex

Neil

Billy

Greg



IQ

100

95

90

92

85

88

93



John



91



ThedescriptivestatisticsforthissampleofeightIQscoresare:

MeanIQ=91.75

Standarddeviation=4.53

So,Iknowinmysamplethatmostscoresarewithinabout

41/2IQpointsof91.75.Itistheinvisiblepopulationtheycame

from,though,thatIammostinterestedin.TheCentralLimit

Theoremallowsmetoestimatethepopulation'smean,

standarddeviation,and,mostimportantly,howfarsample

meanswilllikelystrayfromthepopulationmean:



MeanIQ

Oursamplemeanisourbestestimate,sothepopulation

meanislikelycloseto91.75.



StandarddeviationofIQscoresinthepopulation

Theformulaweusedtocalculateoursamplestandard

deviationisdesignedespeciallytoestimatethepopulation

standarddeviation,sowe'llguess4.53.



Standarddeviationofthemean

Thisistherealvalueofinterest.Weknowoursamplemean



islessthan100,butcouldthatbebychance?Howfar

wouldameanfromasampleofeighttendtostrayfromthe

populationmeanwhenchosenrandomlyfromthat

population?Here'swhereweusetheequationfromearlier

inthishack.Weenteroursamplevaluestoproduceour

standarddeviationofthemean,whichisusuallycalledthe

standarderrorofthemean:

Wenowknow,thankstotheCentralLimitTheorem,thatmost

samplesofeightScoutswillproducemeansthatarewithin1.6

IQpointsofthepopulationmean.Itisunlikely,then,thatour

samplemeanof91.75couldhavebeendrawnfroma

populationwithameanof100.Ameanof93,maybe,or94,

butnot100.

Becauseweknowthesemeansarenormallydistributed,wecan

useourknowledgeoftheshapeofthenormaldistribution

[Hack#23]toproduceanexactprobabilitythatourmeanof

91.75couldhavecomefromapopulationwithameanof100.

Itwillhappenwaylessthan1outof100,000times.Itseems

verylikelythatmyknot-tyingstudentsaretoughertoteach

thannormal.Imightaskforextramoney.



WhereElseItWorks

AfuzzyversionoftheCentralLimitTheorempointsoutthat:

Datathatareaffectedbylotsofrandomforcesandunrelated

eventsendupnormallydistributed.

Asthisistrueofalmosteverythingwemeasure,wecanapply

thenormaldistributioncharacteristicstomakeprobability

statementsaboutmostvisibleandinvisibleconcepts.

Wehaven'tevendiscussedthemostpowerfulimplicationofthe

CentralLimitTheorem.Meansdrawnrandomlyfroma



populationwillbenormallydistributed,regardlessoftheshape

ofthepopulation.Thinkaboutthatforasecond.Evenifthe

populationfromwhichyoudrawyoursampleofvaluesisnot

normalevenifitistheoppositeofnormal(likemyUncleFrank,

forexample)themeansyoudrawoutwillstillbenormally

distributed.

Thisisaprettyremarkableandhandycharacteristicofthe

universe.WhetherIamtryingtodescribeapopulationthatis

normalornon-normal,onEarthoronMars,thetrickstillworks.



Hack3.FiguretheOdds



WillIwinthelottery?WillIgetstruckbylightningand

hitbyabusonthesameday?Willmybasketballteam

havetomeetourhatedrivalearlyintheNCAA

tournament?Atitscore,statisticsisallabout

determiningthelikelihoodthatsomethingwillhappen

andansweringquestionslikethese.Thebasicrulesfor

calculatingprobabilityallowstatisticianstopredictthe

future.

Thisbookisfullofinterestingproblemsthatcanbesolvedusing

coolstatisticaltricks.Whileallthetoolspresentedinthese

hacksareappliedindifferentwaysindifferentcontexts,many

oftheproceduresusedinthesecleversolutionsworkbecause

ofacommoncoresetofelements:therulesofprobability.

Therulesareakeysetofsimple,establishedfactsabouthow

probabilityworksandhowprobabilitiesshouldbecalculated.

Thinkofthesetwobasicrulesasasetoftoolsinabeginner's

toolboxthat,likeahammerandscrewdriver,areprobably

enoughtosolvemostproblems:



Additiverule

Theprobabilityofanyoneofseveralindependentevents

occurringisthesumofeachevent'sprobability.



Multiplicativerule



Theprobabilityofaseriesofindependenteventsall

occurringistheproductofeachevent'sprobability.

Thesetwotoolswillbeenoughtoanswermostofyour

everyday"Whatarethechances?"questions.



QuestionsAbouttheFuture

Whenastatisticiansayssomethinglike"a1outof10chanceof

happening,"shehasjustmadeapredictionaboutthefuture.It

mightbeahypotheticalstatementaboutaseriesofeventsthat

willneverbetested,oritmightbeanhonest-to-goodness

statementaboutwhatisabouttohappen.Eitherway,she's

makingastatisticalstatementaboutthelikelihoodofan

outcome,whichisjustaboutallstatisticianseversay[Hack

#1].



Ifthefollowingstatementmakessomeintuitivesensetoyou,thenyou

havealltheabilitynecessarytoactandthinklikeastathacker:"If

thereare10thingsthatmighthappenandall10thingsareequally

likelytohappen,thenany1ofthosethingshasa1outof10chanceof

happening."



Researchisfullofquestionsthatareansweredusingstatistics,

ofcourse,andprobabilityrulesapply,buttherearemany

problemsintheworldoutsidethelaboratorythataremore

importantthananystupidoldscienceproblemlikegameswith

dice,forexample!Imagineyouareapart-timegambler,baby

needsanewpairofshoesandallthat,andthevaluesshowing

thenexttimeyouthrowapairofdicewilldetermineyour

future.Youmightwanttoknowthelikelihoodofvarious

outcomesofthatdiceroll.Youmightwanttoknowthat

likelihoodveryprecisely!



Youcananswerthethreemostimportanttypesofprobability

questionsthatyouarelikelytoaskusingonlyyourtwo-piece

probabilitytoolkit.Yourquestionsprobablyfallintooneofthese

threetypes:

Howlikelyisitthataspecificsingleoutcomeofinterestwill

occurnext?Forexample,willadicerollof7comeupnext?

Howlikelyisitthatanyofagroupofoutcomesofinterest

willoccurnext?Forexample,willeithera7or11comeup

next?

Howlikelyisitthataseriesofoutcomeswilloccur?For

example,couldanhonestpairofdicereallybethrownall

nightanda7never(Imeannever!)comeup?!Imean,

really,couldit?!Couldit?!



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Hack 2. Describe the World Using Just Two Numbers

Tải bản đầy đủ ngay(0 tr)

×