Tải bản đầy đủ - 0 (trang)
Hack 2. Describe the World Using Just Two Numbers

# Hack 2. Describe the World Using Just Two Numbers

Tải bản đầy đủ - 0trang

Inferentialstatisticstendtousetwovaluestodescribe

populations,themeanandthestandarddeviation.

Mean

Ratherthandescribeasampleofvaluesbyshowingthemall,it

issimplymoreefficienttoreportsomefairsummaryofagroup

ofscoresinsteadoflistingeverysinglescore.Thissingle

numberismeanttofairlyrepresentallthescoresandwhat

theyhaveincommon.Consequently,thissinglenumberis

referredtoasthecentraltendencyofagroupofscores.

Typically,thebestmeasureofcentraltendency,foravarietyof

reasons,isthemean[Hack#21].Themeanisthearithmetic

averageofallthescoresandiscalculatedbyaddingtogetherall

thevaluesinagroup,andthendividingthattotalbythe

numberofvalues.Themeanprovidesmoreinformationabout

allthescoresinagroupthanothercentraltendencyoptions

(suchasreportingthemiddlescore,themostcommonscore,

andsoon).

Infact,mathematically,themeanhasaninterestingproperty.A

sideeffectofhowitiscreated(addingupallscoresanddividing

bythenumberofscores)producesanumberthatisascloseas

possibletoalltheotherscores.Themeanwillbeclosetosome

scoresandfarawayfromsomeothers,butifyouaddupthose

distances,yougetatotalthatisassmallaspossible.Noother

number,realorimagined,willproduceasmallertotaldistance

fromallthescoresinagroupthanthemean.

Standarddeviation

Justknowingthemeanofadistributiondoesn'tquitetellus

enough.Wealsoneedtoknowsomethingaboutthevariability

ofthescores.Aretheymostlyclosetothemeanormostlyfar

fromthemean?Twowildlydifferentdistributionscouldhavethe

samemeanbutdifferintheirvariability.Themostcommonly

reportedmeasureofvariabilitysummarizesthedistances

betweeneachscoreandthemean.

Aswiththemean,themoreinformativemeasureofvariability

wouldbeonethatusesallthevaluesinadistribution.A

measureofvariabilitythatdoesthisisthestandarddeviation.

Thestandarddeviationistheaveragedistanceofeachscore

fromthemean.Astandarddeviationcalculatesallthedistances

inadistributionandaveragesthem.The"distances"referredto

arethedistancebetweeneachscoreandthemean.

Anothercommonlyreportedvaluethatsummarizesthevariabilityina

distributionisthevariance.Thevarianceissimplythestandard

deviationsquaredandisnotparticularlyusefulinpicturinga

distribution,butitishelpfulwhencomparingdifferentdistributionsand

isfrequentlyusedasavalueinstatisticalcalculations,suchaswiththe

independentttest[Hack#17].

Theformulaforthestandarddeviationappearstobemore

complicatedthanitneedstobe,buttherearesome

mathematicalcomplicationswithsummingdistances(negative

distancesalwayscanceloutthepositivedistanceswhenthe

meanisusedasthedividingpoint).Consequently,hereisthe

equation:

Smeanstosumup.Thexmeanseachscore,andthenmeans

thenumberofscores.

CentralLimitTheorem

TheCentralLimitTheoremisfairlybrief,butverypowerful.

Beholdthetruth:

Ifyourandomlyselectmultiplesamplesfromapopulation,the

meansofeachofthosesampleswillbenormallydistributed.

Attachedtothetheoremareacoupleofmathematicalrulesfor

accuratelyestimatingthedescriptivevaluesforthisimaginary

distributionofsamplemeans:

Themeanofthesemeans(that'samouthful)willbeequal

tothepopulationmean.Themeanofasinglesampleisa

goodestimateforthismeanofmeans.

Thestandarddeviationofthesemeansisequaltothe

samplestandarddeviationdividedbythesquarerootofthe

samplesize,n:

Thesemathematicalrulesproducemoreaccurateresults,and

thedistributionisclosertothenormalcurveasthesamplesize

withinanysamplegetsbigger.

30ormoreinasampleseemstobeenoughtoproduceaccurate

applicationsoftheCentralLimitTheorem.

SoWhat?

Okay,sotheCentralLimitTheoremappearssomewhat

intellectuallyinterestingandnodoubtmakesstatisticiansall

gigglyandwriggly,butwhatdoesitallmean?Howcananyone

useittodoanythingcool?

Asdiscussedin"KnowtheBigSecret"[Hack#1],thesecret

trickthatallstatisticiansknowishowtosolveproblems

statisticallybytakingknowninformationaboutthedistribution

ofsomevaluesandexpressingthatinformationasastatement

ofprobability.Thekey,ofcourse,ishowoneknowsthe

distributionofalltheseexotictypesofvaluesthatmight

interestastatistician.Howcanoneknowthedistributionof

averagedifferencesorthedistributionofthesizeofa

relationshipbetweentwosetsofvariables?TheCentralLimit

Theorem,that'show.

Forexample,toestimatetheprobabilitythatanytwogroups

woulddifferonsomevariablebyacertainamount,weneedto

knowthedistributionofmeansinthepopulationfromwhich

thosesamplesweredrawn.Howcouldwepossiblyknowwhat

thatdistributioniswhenthepopulationofmeansisinvisible

andmightevenbeonlytheoretical?TheCentralLimitTheorem,

Bub,that'show!Howcanweknowthedistributionsof

correlations(anindexofthestrengthofarelationshipbetween

twovariables)whichcouldbedrawnfromapopulationof

infinitepossiblecorrelations?EverhearoftheCentralLimit

Theorem,dude?

Becauseweknowtheproportionofvaluesthatresideallalong

thenormalcurve[Hack#23],andtheCentralLimitTheorem

tellsmethatthesesummaryvaluesarenormallydistributed,I

canplaceprobabilitiesoneachstatisticaloutcome.Icanuse

theseprobabilitiestoindicatethelevelofstatisticalsignificance

(thelevelofcertainty)Ihaveinmyconclusionsanddecisions.

WithouttheCentralLimitTheorem,Icouldhardlyevermake

statementsaboutstatisticalsignificance.Andwhatadrab,sad

lifethatwouldbe.

ApplyingtheCentralLimitTheorem

ToapplytheCentralLimitTheorem,Ineedstartwithonlya

sampleofvaluesthatIhaverandomlydrawnfromapopulation.

Imagine,forexample,thatIhaveagroupofeightnewCub

Scouts.It'smyjobtoteachthemknottying.Isuspect,let's

say,thatthisisn'tthebrightestbunchofScoutswhohaveever

cometomeforknot-tyingguidance.

BeforeIdemandextrapay,Iwanttodeterminewhetherthey

are,infact,afewbadgesshortofabushel.Iwanttoknow

theirIQ.Iknowthatthepopulation'saverageIQis100,butI

noticethatnooneinmygrouphasanintelligencetestscore

above100.Iwouldexpectatleastsomeabovethatscore.

Couldthisgrouphavebeenselectedfromthataverage

population?Maybemysampleisjustunusualanddoesn't

representallCubbies.Astatisticalapproach,usingtheCentral

LimitTheorem,wouldbetoask:

IsitpossiblethatthemeanIQofthepopulationrepresentedby

thissampleis100?

IfIwanttoknowsomethingaboutthepopulationfromwhich

myScoutsweredrawn,IcanusetheCentralLimitTheoremto

prettyaccuratelyestimatethepopulation'smeanIQandits

standarddeviation.Icanalsofigureouthowmuchdifference

thereislikelytobebetweenthepopulation'smeanIQandthe

meanIQinmysample.

Ineedsomedatafrommyscoutstofigureallthisout.Table11shouldprovidesomegoodinformation.

TableScoutsmarts

Scout

Jimmy

Perry

Clark

Lex

Neil

Billy

Greg

IQ

100

95

90

92

85

88

93

John

91

ThedescriptivestatisticsforthissampleofeightIQscoresare:

MeanIQ=91.75

Standarddeviation=4.53

So,Iknowinmysamplethatmostscoresarewithinabout

41/2IQpointsof91.75.Itistheinvisiblepopulationtheycame

from,though,thatIammostinterestedin.TheCentralLimit

Theoremallowsmetoestimatethepopulation'smean,

standarddeviation,and,mostimportantly,howfarsample

meanswilllikelystrayfromthepopulationmean:

MeanIQ

Oursamplemeanisourbestestimate,sothepopulation

meanislikelycloseto91.75.

StandarddeviationofIQscoresinthepopulation

Theformulaweusedtocalculateoursamplestandard

deviationisdesignedespeciallytoestimatethepopulation

standarddeviation,sowe'llguess4.53.

Standarddeviationofthemean

Thisistherealvalueofinterest.Weknowoursamplemean

islessthan100,butcouldthatbebychance?Howfar

wouldameanfromasampleofeighttendtostrayfromthe

populationmeanwhenchosenrandomlyfromthat

population?Here'swhereweusetheequationfromearlier

inthishack.Weenteroursamplevaluestoproduceour

standarddeviationofthemean,whichisusuallycalledthe

standarderrorofthemean:

Wenowknow,thankstotheCentralLimitTheorem,thatmost

samplesofeightScoutswillproducemeansthatarewithin1.6

IQpointsofthepopulationmean.Itisunlikely,then,thatour

samplemeanof91.75couldhavebeendrawnfroma

populationwithameanof100.Ameanof93,maybe,or94,

butnot100.

Becauseweknowthesemeansarenormallydistributed,wecan

useourknowledgeoftheshapeofthenormaldistribution

[Hack#23]toproduceanexactprobabilitythatourmeanof

91.75couldhavecomefromapopulationwithameanof100.

Itwillhappenwaylessthan1outof100,000times.Itseems

verylikelythatmyknot-tyingstudentsaretoughertoteach

thannormal.Imightaskforextramoney.

WhereElseItWorks

AfuzzyversionoftheCentralLimitTheorempointsoutthat:

Datathatareaffectedbylotsofrandomforcesandunrelated

eventsendupnormallydistributed.

Asthisistrueofalmosteverythingwemeasure,wecanapply

thenormaldistributioncharacteristicstomakeprobability

statementsaboutmostvisibleandinvisibleconcepts.

Wehaven'tevendiscussedthemostpowerfulimplicationofthe

CentralLimitTheorem.Meansdrawnrandomlyfroma

populationwillbenormallydistributed,regardlessoftheshape

ofthepopulation.Thinkaboutthatforasecond.Evenifthe

populationfromwhichyoudrawyoursampleofvaluesisnot

normalevenifitistheoppositeofnormal(likemyUncleFrank,

forexample)themeansyoudrawoutwillstillbenormally

distributed.

Thisisaprettyremarkableandhandycharacteristicofthe

universe.WhetherIamtryingtodescribeapopulationthatis

normalornon-normal,onEarthoronMars,thetrickstillworks.

Hack3.FiguretheOdds

WillIwinthelottery?WillIgetstruckbylightningand

hitbyabusonthesameday?Willmybasketballteam

havetomeetourhatedrivalearlyintheNCAA

tournament?Atitscore,statisticsisallabout

determiningthelikelihoodthatsomethingwillhappen

andansweringquestionslikethese.Thebasicrulesfor

calculatingprobabilityallowstatisticianstopredictthe

future.

Thisbookisfullofinterestingproblemsthatcanbesolvedusing

coolstatisticaltricks.Whileallthetoolspresentedinthese

hacksareappliedindifferentwaysindifferentcontexts,many

oftheproceduresusedinthesecleversolutionsworkbecause

ofacommoncoresetofelements:therulesofprobability.

Therulesareakeysetofsimple,establishedfactsabouthow

probabilityworksandhowprobabilitiesshouldbecalculated.

Thinkofthesetwobasicrulesasasetoftoolsinabeginner's

toolboxthat,likeahammerandscrewdriver,areprobably

enoughtosolvemostproblems:

Additiverule

Theprobabilityofanyoneofseveralindependentevents

occurringisthesumofeachevent'sprobability.

Multiplicativerule

Theprobabilityofaseriesofindependenteventsall

occurringistheproductofeachevent'sprobability.

Thesetwotoolswillbeenoughtoanswermostofyour

everyday"Whatarethechances?"questions.

QuestionsAbouttheFuture

Whenastatisticiansayssomethinglike"a1outof10chanceof

happening,"shehasjustmadeapredictionaboutthefuture.It

mightbeahypotheticalstatementaboutaseriesofeventsthat

willneverbetested,oritmightbeanhonest-to-goodness

statementaboutwhatisabouttohappen.Eitherway,she's

makingastatisticalstatementaboutthelikelihoodofan

outcome,whichisjustaboutallstatisticianseversay[Hack

#1].

Ifthefollowingstatementmakessomeintuitivesensetoyou,thenyou

havealltheabilitynecessarytoactandthinklikeastathacker:"If

thereare10thingsthatmighthappenandall10thingsareequally

likelytohappen,thenany1ofthosethingshasa1outof10chanceof

happening."

Researchisfullofquestionsthatareansweredusingstatistics,

ofcourse,andprobabilityrulesapply,buttherearemany

problemsintheworldoutsidethelaboratorythataremore

importantthananystupidoldscienceproblemlikegameswith

dice,forexample!Imagineyouareapart-timegambler,baby

needsanewpairofshoesandallthat,andthevaluesshowing

thenexttimeyouthrowapairofdicewilldetermineyour

future.Youmightwanttoknowthelikelihoodofvarious

outcomesofthatdiceroll.Youmightwanttoknowthat

likelihoodveryprecisely!

Youcananswerthethreemostimportanttypesofprobability

questionsthatyouarelikelytoaskusingonlyyourtwo-piece

probabilitytoolkit.Yourquestionsprobablyfallintooneofthese

threetypes:

Howlikelyisitthataspecificsingleoutcomeofinterestwill

occurnext?Forexample,willadicerollof7comeupnext?

Howlikelyisitthatanyofagroupofoutcomesofinterest

willoccurnext?Forexample,willeithera7or11comeup

next?

Howlikelyisitthataseriesofoutcomeswilloccur?For

example,couldanhonestpairofdicereallybethrownall

nightanda7never(Imeannever!)comeup?!Imean,

really,couldit?!Couldit?!

### Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Hack 2. Describe the World Using Just Two Numbers

Tải bản đầy đủ ngay(0 tr)

×