Tải bản đầy đủ - 0 (trang)
Hack 18. Find Out Just How Wrong You Really Are

# Hack 18. Find Out Just How Wrong You Really Are

Tải bản đầy đủ - 0trang

"MeasurePrecisely"[Hack#6]discusseshowtousestandarderrorsin

thecaseofmeasurement.Calculatingthestandarderrorof

measurementallowsyoutoknowhowcloseyourtestscoreistoyour

typicallevelofperformance.Justasmeasurementallowsustoproduce

95percentconfidenceintervalsaroundindividualobservedscores,

statisticiansroutinelyproduce95percentconfidenceintervalsarounda

widevarietyofsamplevalues.

Fortunatelyforanyonecurioustoknowhowfarastatistical

findingisfromthehiddentruth,everypopularstatistical

procedureprovidesastandarderror.Afterintroducingthe

followingbasicconcepts,thishackwillexplainhowtoapplythe

followingstandarderrors:

Standarderrorofthemeanindescriptivestatistics

Standarderroroftheproportioninsurveysampling

Standarderroroftheestimateinregression

TheCentralLimitTheorem[Hack#2]isakeytoolforknowinghow

wrongwearewhenwesample,becauseitprovidestheformulafor

calculatingstandarderrorsandsuggeststhatallsamplesummary

valuesarenormallydistributed.

Therearethreecommonwaysthatstandarderrorsareusedto

verifytheaccuracyofresultsofstatisticalanalyses.The

particulartoolyouusedependsonwhetheryouwanttoknow

howcloseyouaretocorrectlyestimating:

Themeanscoreofapopulationonsomevariable(e.g.,

averagesalaryofuntenuredcollegeprofessors)

Theproportionofapopulationthathavesome

characteristic(e.g.,whowillvoteformyUncleFrankas

ChiefDogcatcher)

Futureperformance(e.g.,probablecollegeGPAforyourpet

monkey,whomyouhavetrainedtotakemultiple-choice

tests)

MeanEstimates

Theprecisionofasamplemeanasanestimateofapopulation

meanisbasedonsamplesize.Here'stheformula:

Asthesamplesizeincreases,thecloserthesamplemeanisto

thetruepopulationmean.Thismakessenseifyouthinkof

samplesizeasthenumberofindependentobservations;the

morelooksyougetatsomething,themoreaccurateyour

descriptionwillbe.

Thestandarderrorofthemeanistheaveragedistanceofsample

meansfromtheirpopulationmean.

ProportionEstimates

Whenasampleofpeopleissurveyedandtheresultsare

presentedassomepercentageorproportion(e.g.,"72percent

ofallsailorshavekneetrouble"),thatpercentageissome

distancefromtheactualpercentageyou'dfindifyousurveyed

thewholepopulation.Ifthesamplewasselectedrandomly,the

standarderrorofproportionindicateshowclosethesample

percentageistothepopulationpercentage.

Thestandarderrorofproportionisbasedonsamplesizeand

thesizeoftheproportion.Here'stheformula:

Likethestandarderrorofthemean,asthesamplesize

increases,thesizeofthestandarderroroftheproportion

decreases.Ifyouaremathematicallyoriented,youmightnotice

thatastheproportionmovesawayfrom.50,thesmallerthat

numberinthetoppartoftheformulabecomes.

proportionisfrom.50,thesmallerthestandarderrorofthe

proportion.Anotherpointofinterestisthatthetoppartofthe

formulaisanindicationoftheamountofvariabilityinthe

sample.(proportion)(1-proportion)isthestandarddeviation

forproportionssquared.

Thestandarderroroftheproportionistheaveragedistanceofsample

proportionsfromthetrueproportioninthepopulation.

EstimatesofFuturePerformance

Inregressionanalyses,scoresononeormorevariablesare

usedtoestimatescoresonanothervariable[Hack#13].

However,thatpredictedscoreisunlikelytobeexactlyright.

Justaswecancalculatehowfaranaveragesamplemeanis

fromapopulationmeanorhowfaroffoursurveyresultsare

fromtheoreticalpopulationresults,wecanalsosayhowfaroff,

onaverage,ourregressionpredictionwillbefromtheactual

scoreapersonwouldget.Here'stheformula:

Thestandarddeviationusedintheequationisthestandard

deviationofthecriterionvariable,whichistheoneyouare

predicting.Thecorrelationisthecorrelationbetweenyour

predictor(s)andthecriterionvariable.

Intheinterestofaccuracy(thepointofthishack,afterall),Ishould

pointoutthatthestandarderroroftheestimateformulagivenearlier

isn'tquitecorrect.However,itdoesprovidealmostthesameresultas

thismorecomplex,butcorrect,equation:

Noticewiththisformulathatthelargerthecorrelation,the

smallerthestandarderroroftheestimate.Thismakessense,

becauseifthereisalotofinformationaloverlapbetweentwo

variables,youcangetagoodsenseofthescoreononevariable

bylookingattheother.

Thestandarderroroftheestimateistheaveragedistanceoftheactual

scorefromeachpredictedscore.

UsingStandardErrors

Here'showtousethesetoolstostatewithsomeconfidencethe

rangewithinwhichthetruthlies.Becausesamplingerrorsare

normallydistributed,thestandarderrorcanbeusedjustlikea

standarddeviationtodefinespecificproportionsofscoresunder

thenormalcurve.

Forexample,ifwewanttoprovidearangeofvaluesinwhich

thepopulationvaluefalls95percentofthetime,wecanbuilda

95percentconfidenceintervalaroundoursamplevalue.Based

onthenormalcurve[Hack#23],1.96standarderrorsoneither

sideofthesamplevalueshouldprovidearangeofvaluesthat

wecansaywith95percentcertaintycontainsthepopulation

value.

Table2-11showssomeexamplesofvariousstandarderrors

andtheuseofsampledatatoproducetheseconfidence

intervals[Hack#6].Noticehowalargersamplesizecreatesa

sampleestimateclosertothepopulationvalue,andalarger

samplesizealsopointstoaconfidenceintervalthatismore

precise.

TableBuilding95percentconfidenceintervals

Typeof

Standard Sample Sample Standard

standarderror deviation size

value

error

Standarderrorof

themean

Standarderrorof

themean

Standarderrorof

theproportion

Standarderrorof

theproportion

Standarderrorof

theestimate

Standarderrorof

theestimate

95percent

confidence

interval

15

30

100

2.74

94.63-105.37

15

60

100

1.94

96.20-103.80

.25

30

.50

.09

.32-.68

.25

60

.50

.06

.38-.62

15

30

100

14.81

70.97-129.03

15

60

100

14.65

71.29-128.71

The"Samplevalue"columninTable2-11forthestandarderrorofthe

estimateisanexampleofanestimatedorpredictedscoreonsome

variable.Thecalculationsintheexampleassumeacorrelationof.25

betweenthepredictorandthecriterion.

UncleFrank'sCampaignforDogcatcher

AsthecampaignmanagerformyUncleFrankinhisrecent

errors.Severalweeksbeforetheelection,Isurveyed30

randomlychosenvotersinthetownofTonganoxie,Kansas,

whereFranklives.Mysurveyfoundthat50percentof

respondentssaidtheywouldvoteforhim.IwarnedUncleFrank

thatthesamplewassosmallthatitwasnotaveryprecise

reflectionoftheentirepopulationofvoters.

surveyedallthevotersintown,thepercentagesayingthey

wouldvoteforFrankmightreasonablybeanywherebetween

was50percent.Ofcourse,theoptimistthatismyuncle

interpretedthisasmeaninghemighthave68percentofthe

onagiantvictorypartythenightbeforetheelection.I,being

therealistthatIamandknowingmyuncle'sreputationaround

town,assumedthetrueoutcomewouldbeintheother

direction.Itwas.That'sokay,though.Itwasagreatparty.

WhyItWorks

Wecantrusttheaccuracyofstandarderrorsifweacceptthe

followingassumptionsandapplysomecommonsense:

Samplingerrorsarenormallydistributed

Thismeansthatthesizeoftheseerrorsrangeinvalueina

waythatmatchesthenormalcurve.Thisallowsusto

producethosepersuasivelypreciseconfidenceintervals.

Samplingerrorsarenonbiased

Thismeansthatsamplevaluesareequallylikelytobe

greaterorlessthanthepopulationvalue.Thisisconvenient

becauseitmeansthatacrossrepeatedstudies,onecan

zeroinonthetruepopulationvalue.

Theformulasareconstructedinsuchawaythatifyouhave

standarddeviationofthepopulation.

Lookwhathappenswiththestandarderrorofthemeanorthe

standarderroroftheproportionwhenthesamplesizeis1,or

whathappenswiththestandarderroroftheestimatewhenthe

correlationis0.00.Intuitively,agoodformulaforfiguringthe

standarderrorsizeshouldproducesmallererrorswhenmoreis

Hack19.SampleFairly

qualityofthebeeryouserveatyourbar,youcouldtaste

everyonebeforeserving.Or,tosavetime,money,and

Managementthrivesonknowingthecharacteristicsofevery

widgetproduced,everytransactionconducted,andeveryclient

helped.Ofcourse,thewholesetofallofthesewidgets,

interactions,andpeoplecanneverbebroughttogetherunder

onemicroscopeandobservedandevaluated.Nospecimenslide

isbigenough.

Thesameistrueforthoseofusinsocialscienceresearchers

interestedinpeoplesimplycannotmeasureeverybody.As

muchaswe'dliketoprobe,shock,inject,hassle,embarrass,

andgenerallybothereveryoneintheworld,wejustcan'tdoit.

Wedon'thavethetime,space,ormoney,and,frankly,noone

reallywantstogettoknowsomanypeople.

beingabletolookateverything?"Asisthecasewithallhacks

inthisbook,thesolutionisprovidedbystatistics.Thereare

scientificallysoundwaystoaccuratelydescribeanywholesetof

thingsbyjustlookingatasmallsubsetofthosethings.

UsingSamplestoMakeInferences

Inferentialstatisticsallowsustogeneralizetoalarger

population,basedondatafromasmallersample.Forthese

generalizationstobevalid,though,thesamplehastorepresent

thepopulationfairly.

Apopulation,inthesenseweuseithere,israrelythe"population"ofa

countryorcityorplanetinthewaythetermisusedinsocialstudies.

Kansas,SouthAmericangiantotters,orbooksintheLibraryof

Congress.Theonlyruleisthatapopulationisbiggerthanits

correspondingsample.

Agoodsamplerepresentsapopulation.Thismeansthatthe

distributionofeveryimportantcharacteristicinapopulation

mustbedistributed,proportionately,inthesamewayinthe

sample,solet'slookatagoodsample.

Imagineapopulationofsquares,diamondsandtriangles,as

showninFigure2-4.

Figure2-4.Asamplewithinapopulation

Afairsampletakenfromapopulationofsquares,diamonds,

andtriangleswouldcontainthoseshapesinthesame

proportionasinthepopulation.Inourdiagram,theouteroval

representsapopulation,andthedifferentshapesare

distributedas40percentsquares,20percenttriangles,and40

percentdiamonds.Theinnerovalisthesample,whichcontains

asubgroupofthoseelementsinthepopulation.Theshapesin

thesamplearedistributedintheexactproportionsasinthe

population:40percentsquares,20percenttriangles,and40

percentdiamonds.

Thissampleisfair.Itrepresentsthepopulationwell,atleastin

termsofthecharacteristicofshape.Whensamplingpeopleor

things,samplestypicallyrepresentavarietyoftraits.People

andthingsarenotentirelytrianglesorsquares,soasampleof

peopleisrepresentativewhenitsmeanleveloftraitsmatches

wellwiththepopulationlevels.Eachpersonwillhavesome

levelofallthecharacteristics,andwon'tbeentirelyonetrait,

unlikeourshapeexample.(ThoughmyUncleFrankispretty

muchentirelysquare,accordingtomyAuntHeloise.)

interestedin,butheisthenaccuratewhengeneralizingtothat

populationonly,notanyother.

Ifyouknewthatthesamplingmethodsusedtoproducethis

sample(theelementsintheinneroval)werecorrect,youcould

sample.Theprocedureissimpleandintuitive:

1. Observethesample.Forexample,20percentofthesample

istriangles.

2. Infertothepopulation.Ibet20percentofthepopulationis

triangles.

youareinterestedincheckingthequalityofthebeeryousellin

yourbar.Togetanideaofthebeerpopulation,constructa

goodsampleofthebeersyousellandtasteeachofthem:

1. Observethesample.Forexample,20percentofthebeers

havejustahintofapossumaftertaste.

2. Infertothepopulation.Ibet20percentofallthebeersyou

sellhavejustahintofapossumaftertaste.Youmight

considercleaningyourbeertap.

Inferenceisprettyeasytodo,butitworkswellonlywhenthe

sampleisgood.Constructingagoodsampleisthekey.

ConstructingtheBestRandomSample

Agoodsamplerepresentsthepopulation.Representative

samplingbeginswithdefiningtheuniverse,or,inotherwords,

thepopulationofthingsfromwhicharesearcherwishesto

sample.Thereareavarietyofwaystoconceptualizethese

elementsandvariouslevelsofgroupingthatareexplicitlyor

implicitlyidentifiedwhenchoosingapopulationandselectinga

population;otherwise,youcannotcreateagoodsample:

Generaluniverse

Abstractpopulationtowhicharesearcherhopesto

generalizehisfindings.Forexample,Imightwanttosay