Tải bản đầy đủ - 0 (trang)
Hack 18. Find Out Just How Wrong You Really Are

Hack 18. Find Out Just How Wrong You Really Are

Tải bản đầy đủ - 0trang

"MeasurePrecisely"[Hack#6]discusseshowtousestandarderrorsin

thecaseofmeasurement.Calculatingthestandarderrorof

measurementallowsyoutoknowhowcloseyourtestscoreistoyour

typicallevelofperformance.Justasmeasurementallowsustoproduce

95percentconfidenceintervalsaroundindividualobservedscores,

statisticiansroutinelyproduce95percentconfidenceintervalsarounda

widevarietyofsamplevalues.



Fortunatelyforanyonecurioustoknowhowfarastatistical

findingisfromthehiddentruth,everypopularstatistical

procedureprovidesastandarderror.Afterintroducingthe

followingbasicconcepts,thishackwillexplainhowtoapplythe

followingstandarderrors:

Standarderrorofthemeanindescriptivestatistics

Standarderroroftheproportioninsurveysampling

Standarderroroftheestimateinregression



TheCentralLimitTheorem[Hack#2]isakeytoolforknowinghow

wrongwearewhenwesample,becauseitprovidestheformulafor

calculatingstandarderrorsandsuggeststhatallsamplesummary

valuesarenormallydistributed.



Therearethreecommonwaysthatstandarderrorsareusedto

verifytheaccuracyofresultsofstatisticalanalyses.The

particulartoolyouusedependsonwhetheryouwanttoknow

howcloseyouaretocorrectlyestimating:

Themeanscoreofapopulationonsomevariable(e.g.,



averagesalaryofuntenuredcollegeprofessors)

Theproportionofapopulationthathavesome

characteristic(e.g.,whowillvoteformyUncleFrankas

ChiefDogcatcher)

Futureperformance(e.g.,probablecollegeGPAforyourpet

monkey,whomyouhavetrainedtotakemultiple-choice

tests)



MeanEstimates

Theprecisionofasamplemeanasanestimateofapopulation

meanisbasedonsamplesize.Here'stheformula:

Asthesamplesizeincreases,thecloserthesamplemeanisto

thetruepopulationmean.Thismakessenseifyouthinkof

samplesizeasthenumberofindependentobservations;the

morelooksyougetatsomething,themoreaccurateyour

descriptionwillbe.



Thestandarderrorofthemeanistheaveragedistanceofsample

meansfromtheirpopulationmean.



ProportionEstimates

Whenasampleofpeopleissurveyedandtheresultsare

presentedassomepercentageorproportion(e.g.,"72percent

ofallsailorshavekneetrouble"),thatpercentageissome

distancefromtheactualpercentageyou'dfindifyousurveyed



thewholepopulation.Ifthesamplewasselectedrandomly,the

standarderrorofproportionindicateshowclosethesample

percentageistothepopulationpercentage.

Thestandarderrorofproportionisbasedonsamplesizeand

thesizeoftheproportion.Here'stheformula:

Likethestandarderrorofthemean,asthesamplesize

increases,thesizeofthestandarderroroftheproportion

decreases.Ifyouaremathematicallyoriented,youmightnotice

thatastheproportionmovesawayfrom.50,thesmallerthat

numberinthetoppartoftheformulabecomes.

Whenthecalculationsaremade,then,thefurtherthesample

proportionisfrom.50,thesmallerthestandarderrorofthe

proportion.Anotherpointofinterestisthatthetoppartofthe

formulaisanindicationoftheamountofvariabilityinthe

sample.(proportion)(1-proportion)isthestandarddeviation

forproportionssquared.



Thestandarderroroftheproportionistheaveragedistanceofsample

proportionsfromthetrueproportioninthepopulation.



EstimatesofFuturePerformance

Inregressionanalyses,scoresononeormorevariablesare

usedtoestimatescoresonanothervariable[Hack#13].

However,thatpredictedscoreisunlikelytobeexactlyright.

Justaswecancalculatehowfaranaveragesamplemeanis

fromapopulationmeanorhowfaroffoursurveyresultsare

fromtheoreticalpopulationresults,wecanalsosayhowfaroff,



onaverage,ourregressionpredictionwillbefromtheactual

scoreapersonwouldget.Here'stheformula:

Thestandarddeviationusedintheequationisthestandard

deviationofthecriterionvariable,whichistheoneyouare

predicting.Thecorrelationisthecorrelationbetweenyour

predictor(s)andthecriterionvariable.



Intheinterestofaccuracy(thepointofthishack,afterall),Ishould

pointoutthatthestandarderroroftheestimateformulagivenearlier

isn'tquitecorrect.However,itdoesprovidealmostthesameresultas

thismorecomplex,butcorrect,equation:



Noticewiththisformulathatthelargerthecorrelation,the

smallerthestandarderroroftheestimate.Thismakessense,

becauseifthereisalotofinformationaloverlapbetweentwo

variables,youcangetagoodsenseofthescoreononevariable

bylookingattheother.



Thestandarderroroftheestimateistheaveragedistanceoftheactual

scorefromeachpredictedscore.



UsingStandardErrors

Here'showtousethesetoolstostatewithsomeconfidencethe

rangewithinwhichthetruthlies.Becausesamplingerrorsare

normallydistributed,thestandarderrorcanbeusedjustlikea

standarddeviationtodefinespecificproportionsofscoresunder

thenormalcurve.



Forexample,ifwewanttoprovidearangeofvaluesinwhich

thepopulationvaluefalls95percentofthetime,wecanbuilda

95percentconfidenceintervalaroundoursamplevalue.Based

onthenormalcurve[Hack#23],1.96standarderrorsoneither

sideofthesamplevalueshouldprovidearangeofvaluesthat

wecansaywith95percentcertaintycontainsthepopulation

value.

Table2-11showssomeexamplesofvariousstandarderrors

andtheuseofsampledatatoproducetheseconfidence

intervals[Hack#6].Noticehowalargersamplesizecreatesa

sampleestimateclosertothepopulationvalue,andalarger

samplesizealsopointstoaconfidenceintervalthatismore

precise.

TableBuilding95percentconfidenceintervals



Typeof

Standard Sample Sample Standard

standarderror deviation size

value

error

Standarderrorof

themean

Standarderrorof

themean

Standarderrorof

theproportion

Standarderrorof

theproportion

Standarderrorof

theestimate

Standarderrorof

theestimate



95percent

confidence

interval



15



30



100



2.74



94.63-105.37



15



60



100



1.94



96.20-103.80



.25



30



.50



.09



.32-.68



.25



60



.50



.06



.38-.62



15



30



100



14.81



70.97-129.03



15



60



100



14.65



71.29-128.71



The"Samplevalue"columninTable2-11forthestandarderrorofthe

estimateisanexampleofanestimatedorpredictedscoreonsome

variable.Thecalculationsintheexampleassumeacorrelationof.25

betweenthepredictorandthecriterion.



UncleFrank'sCampaignforDogcatcher

AsthecampaignmanagerformyUncleFrankinhisrecent

campaignfordogcatcher,Ihadanopportunitytousestandard

errors.Severalweeksbeforetheelection,Isurveyed30

randomlychosenvotersinthetownofTonganoxie,Kansas,

whereFranklives.Mysurveyfoundthat50percentof

respondentssaidtheywouldvoteforhim.IwarnedUncleFrank

thatthesamplewassosmallthatitwasnotaveryprecise

reflectionoftheentirepopulationofvoters.

AfterreferringtoTable2-11,Ideterminedthatifwehad

surveyedallthevotersintown,thepercentagesayingthey

wouldvoteforFrankmightreasonablybeanywherebetween

about32percentand68percent,thoughthemostlikelyvalue

was50percent.Ofcourse,theoptimistthatismyuncle

interpretedthisasmeaninghemighthave68percentofthe

voteandahugelead.Hespenttherestofhiscampaignchest

onagiantvictorypartythenightbeforetheelection.I,being

therealistthatIamandknowingmyuncle'sreputationaround

town,assumedthetrueoutcomewouldbeintheother

direction.Itwas.That'sokay,though.Itwasagreatparty.



WhyItWorks

Wecantrusttheaccuracyofstandarderrorsifweacceptthe

followingassumptionsandapplysomecommonsense:



Samplingerrorsarenormallydistributed



Thismeansthatthesizeoftheseerrorsrangeinvalueina

waythatmatchesthenormalcurve.Thisallowsusto

producethosepersuasivelypreciseconfidenceintervals.



Samplingerrorsarenonbiased

Thismeansthatsamplevaluesareequallylikelytobe

greaterorlessthanthepopulationvalue.Thisisconvenient

becauseitmeansthatacrossrepeatedstudies,onecan

zeroinonthetruepopulationvalue.

Theformulasareconstructedinsuchawaythatifyouhave

littleornoinformationaboutthepopulation,thenthesizeof

theerrorinyoursampleestimateisaboutthesizeofthe

standarddeviationofthepopulation.

Lookwhathappenswiththestandarderrorofthemeanorthe

standarderroroftheproportionwhenthesamplesizeis1,or

whathappenswiththestandarderroroftheestimatewhenthe

correlationis0.00.Intuitively,agoodformulaforfiguringthe

standarderrorsizeshouldproducesmallererrorswhenmoreis

knownaboutthepopulation.







Hack19.SampleFairly



Ifyouwanttofindsomethingoutabouteverysingle

customeroremployeeinyourbusiness,youcouldtalkto

everysingleoneofthem.Ifyouareconcernedaboutthe

qualityofthebeeryouserveatyourbar,youcouldtaste

everyonebeforeserving.Or,tosavetime,money,and

braincells,"sample"efficientlyinstead.

Managementthrivesonknowingthecharacteristicsofevery

widgetproduced,everytransactionconducted,andeveryclient

helped.Ofcourse,thewholesetofallofthesewidgets,

interactions,andpeoplecanneverbebroughttogetherunder

onemicroscopeandobservedandevaluated.Nospecimenslide

isbigenough.

Thesameistrueforthoseofusinsocialscienceresearchers

interestedinpeoplesimplycannotmeasureeverybody.As

muchaswe'dliketoprobe,shock,inject,hassle,embarrass,

andgenerallybothereveryoneintheworld,wejustcan'tdoit.

Wedon'thavethetime,space,ormoney,and,frankly,noone

reallywantstogettoknowsomanypeople.

Theproblemis,"Howcanyouknowabouteverything,without

beingabletolookateverything?"Asisthecasewithallhacks

inthisbook,thesolutionisprovidedbystatistics.Thereare

scientificallysoundwaystoaccuratelydescribeanywholesetof

thingsbyjustlookingatasmallsubsetofthosethings.



UsingSamplestoMakeInferences

Inferentialstatisticsallowsustogeneralizetoalarger



population,basedondatafromasmallersample.Forthese

generalizationstobevalid,though,thesamplehastorepresent

thepopulationfairly.



Apopulation,inthesenseweuseithere,israrelythe"population"ofa

countryorcityorplanetinthewaythetermisusedinsocialstudies.

Ininferentialstatistics,apopulationisadescriptionofthetypeof

personorthingyou'restudying.Populationscanbethird-gradeboysin

Nebraska,nursesatShawneeMissionMedicalCenterinMerriam,

Kansas,SouthAmericangiantotters,orbooksintheLibraryof

Congress.Theonlyruleisthatapopulationisbiggerthanits

correspondingsample.



Agoodsamplerepresentsapopulation.Thismeansthatthe

distributionofeveryimportantcharacteristicinapopulation

mustbedistributed,proportionately,inthesamewayinthe

sample.Muchofthishackisabouthowtoconstructagood

sample,solet'slookatagoodsample.

Imagineapopulationofsquares,diamondsandtriangles,as

showninFigure2-4.



Figure2-4.Asamplewithinapopulation



Afairsampletakenfromapopulationofsquares,diamonds,

andtriangleswouldcontainthoseshapesinthesame

proportionasinthepopulation.Inourdiagram,theouteroval

representsapopulation,andthedifferentshapesare

distributedas40percentsquares,20percenttriangles,and40

percentdiamonds.Theinnerovalisthesample,whichcontains

asubgroupofthoseelementsinthepopulation.Theshapesin

thesamplearedistributedintheexactproportionsasinthe

population:40percentsquares,20percenttriangles,and40

percentdiamonds.

Thissampleisfair.Itrepresentsthepopulationwell,atleastin

termsofthecharacteristicofshape.Whensamplingpeopleor

things,samplestypicallyrepresentavarietyoftraits.People

andthingsarenotentirelytrianglesorsquares,soasampleof

peopleisrepresentativewhenitsmeanleveloftraitsmatches

wellwiththepopulationlevels.Eachpersonwillhavesome

levelofallthecharacteristics,andwon'tbeentirelyonetrait,

unlikeourshapeexample.(ThoughmyUncleFrankispretty

muchentirelysquare,accordingtomyAuntHeloise.)



Thepersonaskingthequestiongetstopickthepopulationheis

interestedin,butheisthenaccuratewhengeneralizingtothat

populationonly,notanyother.



Ifyouknewthatthesamplingmethodsusedtoproducethis

sample(theelementsintheinneroval)werecorrect,youcould

infersomethingaboutthepopulationbyjustlookingatthe

sample.Theprocedureissimpleandintuitive:

1. Observethesample.Forexample,20percentofthesample

istriangles.



2. Infertothepopulation.Ibet20percentofthepopulationis

triangles.

Insteadofabstracttrianglesinatheoreticalpopulation,imagine

youareinterestedincheckingthequalityofthebeeryousellin

yourbar.Togetanideaofthebeerpopulation,constructa

goodsampleofthebeersyousellandtasteeachofthem:

1. Observethesample.Forexample,20percentofthebeers

havejustahintofapossumaftertaste.

2. Infertothepopulation.Ibet20percentofallthebeersyou

sellhavejustahintofapossumaftertaste.Youmight

considercleaningyourbeertap.

Inferenceisprettyeasytodo,butitworkswellonlywhenthe

sampleisgood.Constructingagoodsampleisthekey.



ConstructingtheBestRandomSample

Agoodsamplerepresentsthepopulation.Representative

samplingbeginswithdefiningtheuniverse,or,inotherwords,

thepopulationofthingsfromwhicharesearcherwishesto

sample.Thereareavarietyofwaystoconceptualizethese

elementsandvariouslevelsofgroupingthatareexplicitlyor

implicitlyidentifiedwhenchoosingapopulationandselectinga

sample.Youhavetoknowaboutthesewaysoforganizingyour

population;otherwise,youcannotcreateagoodsample:



Generaluniverse

Abstractpopulationtowhicharesearcherhopesto

generalizehisfindings.Forexample,Imightwanttosay

somethingaboutallcomicbookcollectors.



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Hack 18. Find Out Just How Wrong You Really Are

Tải bản đầy đủ ngay(0 tr)

×