Tải bản đầy đủ - 0 (trang)
Hack 52. Use Random Selection as Artificial Intelligence

Hack 52. Use Random Selection as Artificial Intelligence

Tải bản đầy đủ - 0trang

outcomewasnegative,thecreatureislesslikelytomakethat

choiceagain.

Noticethatthereisnoguaranteethata"good"behavioris

alwaysrepeatedorthatabadbehaviorbecomesextinct;itis

onlyamatterofprobability.Therightdecisionismorelikelyto

bemadeandthewrongdecisionislesslikelytobemade.To

makeamachinethatmimicsthewaythatanimalslearn,we

mustbuildonthisprobabilityangle.

Gameplayingreflectsmuchofthetrial-and-errorlearning

processbecauseoutcomesareeasilyinterpretedaspositive(a

win)ornegative(aloss).Ingames,thefeedbackisoften

immediate,andstudiesshowthattheclosenessintime

betweenthechoiceandthefeedbackisakeyfactorinwhether

learninghasoccurred.Andlearning,remember,isdefinedhere

asanincreaseinthelikelihoodofcorrectchoicesoradecrease

inthelikelihoodofincorrectchoices.



BuildingaTic-Tac-ToeMachine

Stuckonthisislandwithnofriends,youmightwanttofight

boredombyplayinggameswithasmartopponent.Hereare

instructionsforbuildingacontraptionthatdoesnotuseany

electricityorsilicon,butwillplayagameandprovidedecent

competition.

Thismachinelearns:themoretimesyouplayagainstit,the

betteritwillbe.ThegamethismachineplaysisTic-Tac-Toe,

buttheoretically,youcouldbuildadeviceforanytwo-person

strategygameusingthesameprinciples.Tic-Tac-Toeissimple

enoughthatitdemonstrateswellthemethodsofdesign,

construction,andoperation.

IftheProfessoronGilligan'sIslandeverdidbuildacomputer

outofcoconuts,hewaslikelyinfluencedbythepioneeringwork



ofbiologistDonaldMichieandhismatchboxes.Michiepublished

anarticleintheveryfirstissueoftheComputerJournalin

1963,afewyearsbeforeGilliganandhispalswerestrandedon

theirisland.Michiedescribeshowhedesignedandactuallybuilt

anonelectriccomputerwiththefollowingcompletelistofparts:



287matchboxes

Eachmatchboxhasalittledrawerthatcanbeopened.

Michielabeledeachmatchboxwithoneof287different

possibleTic-Tac-Toeconfigurationsthroughoutagame.

Thereareactuallymanymorepossiblepositions,but

becausethestandardTic-Tac-Toelayoutofthreerowsand

threecolumnsissymmetrical,fourdifferentunique

positionscanbesummarizedwithjustoneposition.Atany

pointinthegame,thecurrentlayoutofthe"board"directs

thehumanoperatortothecorrespondingmatchbox.



Alargesupplyofbeadsofninedifferentcolors

Theninecolorsrepresenteachoftheninedifferentspaces

ontheTic-Tac-Toeboard.Eachmatchboxbeginswithan

equalsupplyofbeadsforeachofthepossiblenextmoves.

Onlybeadsrepresentinglegalmovesareputineachbox.

Differentpositionsandmatchboxes,ofcourse,correspond

toonlyasmallsetoflegalnextmoves,soeachboxhasa

slightlydifferentmixtureofbeads.

TheProfessorwouldhaveusedcoconutshellsinsteadof

matchboxesandsandpebblesorseeds(orperhapsMr.Howell's

coincollection,whichhenevergoesanywherewithout)instead

ofbeads.Gatherthesesuppliesfromyourtropical

surroundings,organizethepebble-filledcoconutsinanefficient

grouping,andyouhaveyourdesertislandgame-playing



computer.Yes,you'llneedtofind287coconuts,butdoyou

haveanythingbettertodo?



OperatingtheComputer

ToplayagameofTic-Tac-Toeagainstyourpebble-poweredPC,

followtheseinstructions:

1. Thecomputergoesfirst.Findthecoconutthatislabeled

withthecurrentposition.(Forthefirstmove,itisablank

layout.)Closeyoureyesandrandomlydrawoutapebble.

2. MarkanXonyourboard(drawninthesand,I'massuming)

inthespaceindicatedbythecolorofthepebble.Setthe

pebbleasideinasafeplace.

3. Makeyourmove,markinganOinyourchosenspace.

4. Thereisanewpositionontheboardnow.Gotothe

correspondingcoconutandrandomlydrawoutapebble

fromit.Returntostep2.

5. Repeatsteps2through4untilthereisawinneroradraw.

Whathappensnextisthemostimportantpartbecauseit

resultsinthecomputerlearningtoplaybetter.Behavioral

psychologistscallthisfinalstagereinforcement.

Ifthecomputerloses,"punish"itbytakingthepebblesthatyou

drewrandomlyfromthecoconutsandthrowingthemintothe

ocean.

Ifthemachinewinsordrawsthegame,returnthepebblesto

thecoconutsfromwhichtheycameand"reward"itbyadding

anadditionalpebbleofthesamecolor.



WhyItWorks

Theprocessofrewardingorpunishingthecomputeressentially

duplicatestheprocessbywhichanimalslearn.Positiveresults

leadtoanincreaseinthelikelihoodoftherewardedbehavior,

whilenegativeresultsleadtoadecreaseinthelikelihoodofthe

punishedbehavior.Byaddingorremovingpebbles,youare

literallyincreasingordecreasingthetrueprobabilityofthe

machinemakingcertainmovesinthegame.

Considerthisstageofagame,wherethecomputer,playingX,

mustmakeitsmove:

X







O

O





X







Youprobablyrecognizethatthebestmovereally,theonlymove

toconsiderisforthecomputertoblockyourimpendingwinby

puttingitsXinthebottomcenterspace.Thecomputer,though,

recognizesseveralpossibilities.Itconsidersanylegalmove.

Twomovesthatitwouldconsider(whichmeans,literally,thatit

wouldallowtobedrawnrandomlyoutofthecoconutshell)are

thebestmoveandabadmove:

X







O

O

X



X















X



X



O

O





X







Whenthecomputerfirststartsplayingthegame,boththese

moves(orbehaviors)areequallylikely.Othermovesarealso

possibleinthissituation,andtheyarealsoequallylikely.The



moveontheleftprobablywon'tresultinaloss,atleastnot

immediately,soaspebblesrepresentingthatmoveareaddedto

thecoconut,therelativeprobabilityofthatmoveincreases

comparedtoothermoves.Themoveontherightprobablyends

inaloss(exceptagainstGilligan,maybe),sothechanceofthat

movebeingselectednexttimemathematicallydecreases,as

therearefewerpebblesofthatcolortoberandomlyselected.

Theprobabilityofanygivenmovebeingselectedcanbe

representedbythissimpleexpression:

Themachinebeginswithanequalnumberofpebblesor,in

otherwords,anequallikelihoodofanyofavarietyofmoves

beingchosen.Ofcourse,somemoveslookfoolishtoour

experiencedgame-playingeyeandwouldneverbemadeina

realgameexceptbythemostnaiveofplayers.Thepointthat

behavioralpsychologistsargue,though,isthatallcreaturesare

novicesuntiltheyhavebuiltupalargepoolofexperiencesthat

haveshapedthebasicprobabilitiesthattheywillengageina

behavior.



HackingtheHack

Thereareseveralwaystomodifyyourmachinetomakeit

smarter.Forexample,youcanchoosetorewardmovesthat

leadtowinsmorethanmovesthatleadtoties.Thisshould

produceagoodplayermorequickly.Michiesuggestedthree

beadsforawinandonebeadforatie.

Ifyouwanttosimulatethewayanimallearningoccurs,youcan

adjustthesystemsothatmovesneartheendofthegameare

morecrucialthanthosemadeatthebeginning.Thisismeantto

mirrortheobservationthatreinforcementthatcomesclosestin

timetowhenthebehavioroccursismosteffective.Inthecase

ofTic-Tac-Toe,mistakesthatleadtoimmediatelossesshould

bedealtwithandpunishedmoreeffectively.Byhavingfewer



totalbeadsinuseformoveslateinthegame,thelearningwill

occurmorequickly.

Anobviousupgradeistomakeyourcomputersmarterbynot

evenallowingbadmoves.Don'tevenplacepebbles

representingmovesthatwillresultinimmediatedefeatinto

yourcontainers.Thiswillsolvetheproblemofyourcomputer's

initiallowintelligence,butitdoesn'treallyreflecttheway

animalslearn.So,whilethismightmakeforastronger

competitor,theProfessorwouldbedisappointedinyourlackof

scientificrigor.







Hack53.DoCardTricksThroughtheMail



Ashuffleddeckofcardsismeanttoberandom.Scientific

analysesshowthatitactuallyisn'trandom,andyoucan

capitalizeonknownprobabilitiesofcarddistributionsto

performanamazingcardtrickforpeopleyouhavenever

met.

Imagineyoureceiveathick,mysteriousenvelopeinthemail.

Ratherthanhavingitdisposedofbythenearestdomestic

securityofficers,youopenitandfindanordinarydeckofcards

andthefollowingsetofinstructions:

1. Cutthedeck.

2. Shufflethecardsonce,usingariffleshuffle(definedlaterin

thishack).

3. Cutthedeckagain.

4. Shufflethecardsonemoretimeusingariffleshuffle.

5. Cutthedeckagain.

6. Removethetopcardofthedeck,writeitdown,andplaceit

anywhereinthedeck.

7. Cutthedeckagain.

8. Shuffleagain.

9. Cutonemoretime.

10. Mailthisdeckbacktotheenclosedaddress(apostoffice

boxinTonganoxie,Kansas,orsomeotherplacewitha



namethatconjuresupwonderandwhimsy).

Youfollowalltheseinstructions(whilewearingprotective

rubbergloves)andreturnthedeck.Aboutaweeklater,a

smallerenvelopearrives.Initisyourchosencard!(Therealso

mightbearequestfor$300andanoffertopredictyourfuture,

butyoujustthrowtheofferaway.)

Amazing,yes?Impossible,yousay?Thankstotheknownlikely

distributionofshuffledcards,itismorethanpossible,andeven

abuddingstatisticianlikeyoucandoit.Noenrollmentin

Hogwartsnecessary.



HowItWorks

Quiteabitisknown,mathematically,abouttheeffectsof

varioustypesofshufflesonadeckofcards.Thoughathorough

shuffle(suchasadovetailorriffleshuffle,whichinterlacestwo

halvesofthedeck)ismeanttoreallyscrambleupadeckfrom

whateverorderthecardswereintosomeneworderthat's

quitedifferentfromtheoriginal,partsoftheoriginalsequence

ofcardsremainevenafterseveralcutsandshuffles.

Statisticianshaveanalyzedthesepatternsandpublishedthem

inscientificjournals.Theworkissimilartothatwhichresulted

inthegroundbreakingsuggestionthatoneshouldshuffleadeck

ofcardsexactlyseventimestoattainthebestmixbefore

dealingthenextroundofhandsforpoker,spades,orbridge.

Pictureadeckofcardsinsomeorder.Afteroneshuffle,ifthe

shuffleisperfect,theoriginalorderwouldstillbevisiblewithin

thenowsupposedlymixeddistributionofcards.Infact,there

wouldbetwooriginalsequencesnowoverlappingeachother,

andbytakingthealternatecards,youcouldreconstructthe

originaloverallorder.



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Hack 52. Use Random Selection as Artificial Intelligence

Tải bản đầy đủ ngay(0 tr)

×