Tải bản đầy đủ - 0 (trang)
Hack 13. Use One Variable to Predict Another

Hack 13. Use One Variable to Predict Another

Tải bản đầy đủ - 0trang

theexistenceofarelationshipbetweenthetwomakesACT

scoresagoodcandidateasapredictortoguessGPA.

Simplelinearregressionistheprocedurethatproducesallthe

valuesweneedtocookupthemagicformulathatwillpredict

thefuture.Thisprocedureproducesaregressionlinethatwe

cangraphtodeterminewhatthefutureholds[Hack#12],but

oncewehavetheformula,wedon'tactuallyneedtodoany

graphingtomakeourguesses.



CookingUptheEquation

First,examinetherecipeforcreatingtheformula(seethe

"RegressionFormulaRecipe"sidebar),andthenwe'llseehowto

useitwithrealdata.Youcanclipthisrecipeoutandkeepitin

thekitchendrawer.



RegressionFormulaRecipe

Ingredients

2samplesofdatafromcorrelatedvariables:

1criterionvariable(theoneyouwanttopredict)

1predictorvariable(theoneyouwillpredictwith)

1correlationcoefficientoftherelationshipbetweenthe2variables

2samplemeans

2samplestandarddeviations

Container

Anemptyequationshapedlikethis:

Directions

Calculatetheweightbywhichyouwillmultiplyyourpredictorvariable:

Calculatetheconstant:

Filltheregressionequationwiththeweightandconstantyoujustprepared.

Serves

Anyoneinterestedinguessingwhatwouldhappenif....



Theregressionrecipecallsfortwootheringredients,meansand

standarddeviationsforbothvariables.Herearethosestatistics

forourexample:

Variable

ACTscores

GPA



Mean

20.10

2.98



Standarddeviation

2.38

.68



Youcanreviewmeansandstandarddeviationsin"DescribetheWorld

UsingJustTwoNumbers"[Hack#2].



Theadmissionsofficebuiltaregressionequationfromthis

information.Consequently,aseachapplicant'slettercameinto

theadmissionsoffice,anofficercouldenterthestudent'sACT

scoreintotheregressionformulaandpredicthisGPA.Let's

figureoutthepartsoftheregressionequationinthisexample:

Byplacingallthisinformationintotheregressionequation

format,wegetthisformulaforpredictingfreshmanGPAusing

ACTscores:



Noticethattheconstantinthiscaseisanegativenumber.That'sOK.



PredictingScores

Inourcollegeadmissionsexample,imaginetwolettersarrive.

Oneapplicant,Melissa,hasanACTscoreof26.Theother

applicantlet'scallhimBrucehasanACTscoreof14.

Usingtheregressionequationwehavebuilt,therewouldbe

twodifferentpredictionsforthesefolks'eventualgradepoint

averages:



ForMelissa



PredictedGPA=-.24+(26x.16)

PredictedGPA=-.24+4.16

PredictedGPA=3.90



ForBruce

PredictedGPA=-.24+(14x.16)

PredictedGPA=-.24+2.24

PredictedGPA=2.00

Ihope,forBruce'ssake,thereismorethanonespotavailable.



Thetwovariablesinthisexample,ACTscoresandGPA,areondifferent

scales,withACTscorestypicallyrunningbetween1and36andGPA

rangingfrom0to4.0.Partofthemagicofcorrelationalanalysesisthat

thevariablescanbeonallsortsofdifferentscalesanditdoesn't

matter.Thepredictedoutcomesomehowknowstobeonthescaleof

thecriterionvariable.Kindofspooky,huh?



WhyItWorks

Whentwovariablescorrelatewitheachother,thereisoverlapin

theinformationtheyprovide.Itisasiftheyshareinformation.

Statisticianssometimesusecorrelationalinformationtotalk

aboutvariablessharingvariance.

Ifsomeofthevarianceinonevariableisaccountedforbythe

varianceinanothervariable,itmakessensethatsmart



mathematicianscanuseonecorrelatedvariabletoestimatethe

amountofvariancefromthemean(ordistancefromthemean)

onanothervariable.Theywouldhavetousenumbersthat

representthevariables'meansandvariability,andanumber

thatrepresentstheamountofoverlapininformation.Our

regressionequationusesallthatinformationbyincluding

means,standarddeviations,andthecorrelationcoefficient.



WhereElseItWorks

Regressionishelpfulinansweringresearchquestionsbeyond

makingpredictions.Sometimes,scientistsjustwantto

understandavariableandhowitoperatesorhowitis

distributedinapopulation.Theycandothisbylookingathow

thatvariableisrelatedtoanothervariablethattheyknowmore

about.



Statisticianscallsimplelinearregressionsimplenotbecauseitiseasy,

butbecauseitusesonlyonepredictorvariable.Itissimpleas

comparedtocomplex.Real-lifepredictionslikethoseinourexample

usuallyusemanypredictors,notjustone.Themethodofpredictinga

criterionvariableusingmorethanonepredictoriscalledmultiple

regression[Hack#14].



WhereItDoesn'tWork

Therewillbeerrorinpredictionsunderthreecircumstances.

First,ifthecorrelationislessthanperfectbetweentwo

variables,thepredictionwillnotbeperfectlyaccurate.Since

therearealmostneverreallylargerelationshipsbetween

predictorsandcriteria,letaloneperfect1.0correlations,real-



worldapplicationsofregressionmakelotsofmistakes.Inthe

presenceofanycorrelationatall,though,thepredictionis

moreaccuratethanblindguessing.Youcandeterminethesize

ofyourerrorswiththestandarderrorofestimate[Hack#18].

Second,linearregressionassumesthattherelationshipis

linear.Thisisdiscussedin"GraphRelationships"[Hack#12]in

greaterdetail,butifthestrengthoftherelationshipvariesat

differentpointsalongtherangeofscores,theregression

predictionwillmakelargeerrorsinsomecases.

Finally,ifthedatacollectedtofirstestablishthevaluesusedin

theregressionequationarenotrepresentativeoffuturedata,

resultswillbeinerror.Forexample,inourcollegeadmissions

example,ifanapplicantpresentswithanACTscoreof36,the

predictedGPAis5.52.Thisisanimpossiblevaluethatdoesnot

evenfitontheGPAscale,whichmaxesoutat4.0.Becausethe

pastdatathatwasusedtoestablishthepredictionformula

includedfewornoACTscoresof36,theequationwasnot

equippedtodealwithsuchahighscore.







Hack14.UseMoreThanOneVariabletoPredict

Another



Thesuperpowersofpredictingthefutureandseeingthe

invisibleareavailabletoanystatisticshackerswhofeel

theyareworthy.Statisticiansoftenanswerquestions

andusecorrelationalinformationtosolveproblemsby

usingonevariabletopredictanother.Formoreaccurate

predictions,though,severalpredictorvariablescanbe

combinedinasingleregressionequationbyusingthe

methodsofmultipleregression.

"GraphRelationships"[Hack#12]discussestheuseful

propheticqualitiesofaregressionline.Thoseproceduresallow

administratorsandstatisticalresearcherstopredict

performanceonassessmentsnevertaken,understand

variables,andbuildtheoriesaboutrelationshipsamongthose

variables.Theyaccomplishthesetricksusingjustasingle

predictorvariable.

"UseOneVariabletoPredictAnother"[Hack#13]presentsthe

problemcollegeshavewhendecidingwhichapplicantstoadmit.

Theywanttoadmitstudentswhowillsucceed,sotheytryto

predictfutureperformance.Thesolutioninthathackusesone

variable(astandardizedtestscore)toestimateperformanceon

afuturevariable(collegegrades).

Often,real-liferesearcherswanttomakeuseoftheinformation

foundinabunchofvariables,notjustonevariable,tomake

predictionsorestimatescores.Whentheywantgreater

accuracy,scientistsattempttofindseveralvariablesthatall

appeartoberelatedtothecriterionvariableofinterest(the

variableyouaretryingtopredict).Theyuseallthisinformation



toproduceamultipleregressionequation.



ChoosingPredictorVariables

Youprobablyshouldreadorreread"UseOneVariabletoPredict

Another"[Hack#13]beforegoingfurtherwiththishack,justto

reviewtheproblemathandandhowregressionsolvesit.Here

istheequationwebuiltinthathackforusingasinglepredictor,

ACTscores,toestimatefuturecollegeadmission:

PredictedGPA=-.24+(ACTScorex.16)

Thissinglepredictorproducedaregressionequationwithoutput

thatcorrelated.55withthecriterion.Prettygood,andpretty

accurate,butitcouldbebetter.

Imagineouradministratordecidesshe'sunhappywiththelevel

ofprecisionshecouldgetusingtheregressionlineorequation

shehadbuilt,andwantstodoabetterjob.Shecouldgeta

moreaccurateresultifshecouldfindmorevariablesthat

correlatewithcollegegrades.Let'simaginethatouramateur

statisticianfoundtwootherpredictorvariablesthatcorrelated

withcollegeperformance:

Anattitudemeasure

Thequalityofawrittenessay

Perhapsperformanceonacollegeattitudesurveyiscollectedby

thecollege(scoresrangebetween20and100),andisfoundto

havesomecorrelationwithfutureGPA.Additionally,ascoreof

1to5onapersonalessaycouldcorrelatewithcollegeGPAand

mightbeincludedinthemultipleregressionequation.



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Hack 13. Use One Variable to Predict Another

Tải bản đầy đủ ngay(0 tr)

×