Tải bản đầy đủ - 0 (trang)
Hack 20. Sample with a Touch of Scotch

Hack 20. Sample with a Touch of Scotch

Tải bản đầy đủ - 0trang

whyheisusingstatisticsthatmaketheassumptionofinterval

measurement.

Hereferstohisparticipantsaseither/orbecausedoingso

makesiteasierforhimtopicturetherepresentativenessofhis

sampling.It'sasmartstrategy,becausebythinkingofsamples

asrepresentingbig,discretecategoriesinsteadofmoreprecise,

continuousvalues,thissometimesmakesquestionsabout

samplingeasiertoanswerandjustify.



ASamplingProblem

Here'sabrainteaserthatcentersonasamplingquestion.A

drunk,untenuredstatistician(I'vemetafew)ismixingdrinks

ataparty.HeismakingaScotchandsodaforhisdepartment

chair.Thechairdemandsadrinkwithsomeexactproportionof

Scotchtowater(itdoesn'tmatterwhatthespecificrequestis;

ourheronevermakesitthatfar).

Thestatisticianstartswithtwoglassesofthesamesize.One

glass(thefirstglass)hastwoouncesofScotchinit;theother

(thesecondglass)hastwoouncesofwaterinit.Hestartsby

pouringanounceofwaterfromthewaterglassintotheScotch.

Heapparentlyalreadyscrewedup,becausehechangeshis

mindandpoursanounceofthenewmixture(threeouncesof

Scotchandwatermixedup)backintothewaterglass.Both

glassesnowhavetwoouncesofliquidinthem,buttheliquidin

eachglassissomemixofwaterandScotch.

Nervously,thestatisticianattemptstostartallover,buthis

departmentchairstopshim.Shesays:

Ihaveapropositionforyou.Wecan'tpossiblyknowtheexact

proportionofScotchandwaterineachglassrightnow,because

wecan'tknowhowmixedupeverythingis.Butifyoucan

answerthefollowingquestioncorrectly,I'llwriteastrongletter



ofsupporttoyourtenurecommittee.Ifnot,well,I'msure

someonewithyourqualificationsshouldhavenotroublefinding

workinthehotel/motelorfoodserviceindustry.Here'sthe

question:rightnow,doesthefirstglasshavemorewaterinit,

ordoesthesecondglasshavemoreScotchinit?

Thinkofthequestionasasamplingissue.Doesthefirst

sample,theliquidinthefirstglass,havemorewaterinit,or

doesthesecondsample,theliquidinthesecondglass,have

moreScotchinit?BecausebothScotchandwateraremadeup

ofreallysmallparticles,itisdifficulttopicturehowmuchof

eachliquidisrepresentedineachsample.Evenproportionately,

wecan'tbesurehowmanywaterparticles(orsampledscores

thatequal"water")aremixedintothesampleof"Scotch"

scores,becausewhoknowshowmuchwaterdrifteddowninto

thebottomofthefirstglassandwouldhaveremainedthereas

thetoppartoftheliquidnearthesurfacewaspouredbackinto

thesecondglass.Anintuitiveansweriscalledfor.

Unfortunately,itiswrong.

Theintuitiveanswertypicallygeneratedbysmartpeopleisthat

thefirstglass,theScotchglass,hasmorewaterinitthanthe

waterglasshasScotchinit.Thismakessensebecausepure

waterwaspouredintotheScotch,whilesomemixofwaterand

Scotchwaspouredbackintothewaterglass.Amazingly,this

cleverthinkingleadsusastray.Thecorrectansweristhatthe

proportionsareequal!Thereisthesameamountofwaterinthe

ScotchglassasthereisScotchinthewaterglass.



UsingMetaphortoSolvetheProblem

Thesolutiontothesamplingproblemisclearerifweimagine

thatourvariablesarenottinyparticles,butinsteadarelarge

categories,suchasblueandwhitemarbles.Insteadofaglass

ofScotch,imagineaglassof100bluemarbles.Insteadofa

glassofwater,imagineaglassof100whitemarbles.



Theglassesarebig,sothemarblescangetmixedtogether

well.Thinklargeglassfishbowls.Thisisnecessarytoensure

thatrandomselectionispossible,aswaslikelywiththemixedupliquids.Keepyoureyeonthemarblesthrougheachstepof

themixing.

Ourherotakes50whitemarblesfromthesecondglassand

mixesthemintothefirstglass.Thedistributionofthetwo

variablesisnow:



Sample1

100bluemarbles,50whitemarbles



Sample2

50whitemarbles

Now,he(randomly,remember,tosimulatethemixedliquids)

takesany50marblesfromthefirstglassandmixesthemback

intothesecondglass.Let'simagineavarietyofpossibilities.

Ifbychanceheselectsallthewhitemarbles,theygobackinto

thesecondglassandthedistributionisnow:



Sample1

100bluemarbles



Sample2

100whitemarbles



Ifbychanceheselectsnowhitemarblesandputs50blue

marblesintothesecondglass,thedistributionis:



Sample1

50bluemarbles,50whitemarbles



Sample2

50whitemarbles,50bluemarbles

Nowimagineamorelikelyscenario:someofthemarbleshe

randomlydrawsarewhiteandsomeareblue.Forexample,he

coulddrawout10whitemarblesand40bluemarblesandplace

theminthesecondglass.Inthatcase,thenewdistributionis:



Sample1

60bluemarbles,40whitemarbles



Sample2

60whitemarbles,40bluemarbles

Trythiswithanymixofmarblesyouwish,butrememberyou

havetodrawoutatotalof50marbles(toduplicatetheone

ounce,orhalf,ofthewateroriginallymixedup).

Noticethatanymixtureyoutryresultsin100marblesineach

glassattheend.Also,mostimportantly,noticethattheratioof

bluetowhitemarblesinthefirstglassattheendisalways

equaltotheratioofwhitetobluemarblesinthesecondglass.



Anybluemarblethatisnotinthesecondglassmustbeinthe

firstglass,andanywhitemarblethatisnotinthefirstglass

mustbeinthesecondglass.

ThesameistrueforScotchandwater.Thecorrectansweris

thattheproportionswillbeequal,nomatterhowtheywere

originallymixedup.



WhereElseItWorks

Real-lifepollingcompanies,whomaketheirlivingandstake

theirreputationsontheaccuracyofelectionpredictions,are

alsoprimarilyconcernedwiththeproportionofsampleswhoare

ineachofseveralcrucialcategories.Ifpeoplehavejustvoted

andtherearetwocandidates,anyonewhodidnotvotefor

candidateAvotedforcandidateB.Theirabsenceinone

categoryguaranteestheirpresenceintheother.Reporting

predictionsaspercentagescreatesthepotentialforgreater

accuracy.Italsoallowsforgreatererror,asavoterpredictedto

beincategoryAwhoendsupincategoryBhastherefore

producederrorinbothcategories.

Whenstatisticalsocialscienceresearcherswanttobeconvinced

thattheirsampleisrepresentativeofitspopulation,their

primaryconcernisalwaystheproportionsofcharacteristicsin

theirsample,notthenumberofpeoplewiththose

characteristics.Whatmattersmostisthattheproportionsof

eachscoreforthekeyresearchvariablesarethesameinboth

samplesandtheirpopulations.



Hack21.ChoosetheHonestAverage



Data-drivendecisions,suchaswhetheryoucanaffordto

buyahouseinanewtownorwhothecoremarketisfor

yourbusiness,oftenrelyonthe"average"asthebest

descriptionforalargesetofdata.Theproblemisthat

therearethreecompletelydifferentvaluesthatcanbe

labeledasthe"average,"andthedifferentaverages

oftenresultindifferentdecisions.Makeyourdecisions

usingthecorrectaverage.

Whenmostpeoplehearastatementlike"theaveragepricefor

ahouseinthistownis$290,000"(whichmightsoundlow,high,

orjustright,dependingonwhereyoucallhome),theyimagine

thatthisfigurewasdeterminedbyaddingupallofthesales

pricesfromallofthehousesinthetown,andthendividingthat

sumbythenumberofhouses.Butstatisticiansknowthereis

morethanonewaytodeterminethe"average,"andsometimes

onekindisbetterthananother.

Whetherthat$290,000reallyrepresentsthetypicalhousing

pricedependsonwhethertheaverageisactuallythemean,

median,ormode.Italsodependsontheshapeofthe

distributionofallthenumbersthatareaveraged.Wisefolkswill

makesuretheyaremakingtheirdecisionsusingthebest

summaryvalue.Here'swhentotrusteachtypeofaverage.



MeasuresofCentralTendency

Thepurposeofdetermininganaverageforasetof

valueswhetherthosevaluesarehouseprices,gradesfroma

finalexam,orthenumberofstudentsinayogaclassisto



efficientlycommunicatethecentraltendencyforthosevalues.

It'struethat,mostofthetime,centraltendencyisdetermined

byaddingupallofthevaluesinadistribution,andthen

dividingthesumbythenumberofvalues.Statisticiansdon't

callthistheaverage,though;theycallitthemean.So,whynot

alwaysusethemeantodeterminecentraltendency?Becausein

somesituations,themeandoesn'trepresentanyoftheactual

values!

Considertheopeningexampleabouttheaveragepriceofa

house.Let'ssayyoucollectdatafor300housesinatownand

wanttodeterminetheaveragesalespriceinthatsample.

Generallyspeaking,themeanisnotaverygoodindicatorof

centraltendencyforhouseprices.Figure2-5illustrateswhy.



Figure2-5.Meanasamisleadingaverage



Themeanisnotaveryhonestaverageinthissituation,

becausethedistributionofsalespricesisskewedbyafew

outlyingvaluesthatareverylarge.Ofthe300housessampled,

231ofthemweresoldforpricesinbetween$100,000and

$600,000.Theremaining69housessoldforpricesabove

$600,000,with56ofthoseaboveamilliondollars.Themeanis



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Hack 20. Sample with a Touch of Scotch

Tải bản đầy đủ ngay(0 tr)

×