Tải bản đầy đủ - 0 (trang)
Hack 70. Break Codes with Etaoin Shrdlu

Hack 70. Break Codes with Etaoin Shrdlu

Tải bản đầy đủ - 0trang

samelettersubstitutesforthesameletterthroughoutthe

message.Forexample,asimpleciphermightusethe

substitutionpatternshowninTable6-18,inwhichtheletterson

thetoprow(theplaintext)arereplacedbythelettersonthe

bottomrow(theciphertext).



Plaintext

Ciphertext



TableAsinglesubstitutioncipher

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

N A O B P C Q D R E S F T G U H V I W J X K Y L Z M



WithacodeliketheoneshowninTable6-18,thefollowing

plain-textpassage:

Tomappearedonthesidewalkwithabucketofwhitewash

andalong-handledbrush.

appearsinciphertextlikethis:

Jutnhhpnipbugjdpwrbpynfsyrjdnaxospjucydrjpyhwd

ngbufugq-dngbfpbaixwd.

Thepassagelookslikenonsense,butwiththekeyshownin

Table6-18,anyonecouldeasilyreplacethenonsenseletters

withtheoriginalletters,causingtheopeningsentenceofthe

secondparagraphinChapterTwoofTomSawyertoreveal

itself.



UsingProbabilitytoDecodeSubstitution

Ciphers

Ofcourse,therealtaskwhendecipheringciphersistodoit

withoutaccesstothecodekey.Real-lifecodebreakersand

winningcontestantsonWheelofFortuneusethesametoolto

solvetheirproblems:theyapplytheknowndistributionof



lettersinEnglishlanguagewords.

Theadventofcomputers,computeranalysis,andelectronic

copiesofmillionsofbookshasmadethecalculationofexact

probabilitiesforeachletterofthealphabetpossible,though

cryptographers(codemakersandbreakers)haveknownthe

basicsforsometime.Herearesomeofthesebasics:

Themostcommonletter,intermsofusageinEnglish,isE.

TheleastcommonlyusedletterisZ.

ThemostcommonconsonantisT.

JandXarerarelyused,asisQ.

WhenQisused,itisalmostalwaysfollowedbyU.

OnlyAandIareusedasone-letterwordsinEnglish.

Withevenjustthesebasicprobabilityfacts,youcouldbeginto

tackledecodingaciphersuchasourMarkTwainpassage.The

mostcommonlyappearinglettersinthegarbledversionareP

andN.BecauseNisusedasasingle-letterword,itcannotbeE

(NismostlikelyA),soagoodfirstguessforPisthatit

substitutesforE.

Withjustalittleknowledgeofletterdistribution,wehave

alreadyidentifiedthesubstitutesforEandA.Wecan'tbesure

weareright,butlikeanygoodstatistician,wethinkweare

probablyright.Table6-19showsthelikelydistributionforeach

letterofthealphabet.

TableFrequencydistributionoflettersinEnglish



Letter

A



Frequency

8.04percent



B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z



1.54percent

3.06percent

3.99percent

12.51percent

2.30percent

1.96percent

5.49percent

7.26percent

0.16percent

0.67percent

4.14percent

2.53percent

7.09percent

7.60percent

2.00percent

0.11percent

6.12percent

6.54percent

9.25percent

2.71percent

0.99percent

1.92percent

0.19percent

1.73percent

0.09percent



ETAOINSHRDLU

Thestrangephrase"ETAOINSHRDLU"isamnemonicdevice

(memorytool)forrememberingthemostfrequentlyoccurring

letters.These12lettersaccountforover80percentoftotal

letterfrequency.

YoumightnoticethattheorderoflettersinETAOINSHRDLUis

notexactlytherankorderofpopularityshowninTable6-19.It



iscloseenough,though,andeasiertopronouncethanifitwere

exactlycorrect.Anotherthingtorememberisthatany

"definitive"listofletterprobabilitydependsonthesource

materialforthelettercount.Youcanfindmanydifferentlistsof

letterorderandfrequency,andsomedifferslightlyfromothers.

Forexample,oneorganizationthatproducedalistofstatistical

distributionsoflettersinEnglishtextreliedonacomputer

analysisandactualcountofletteroccurrenceinsevenliterary

classics,suchasJaneEyreandWutheringHeights.Twoofthese

sevenbookswereTarzannovels.I'mguessingthatifwewere

tocomparethattableofletterdistributionswithothers,we

wouldfindthattheproportionalnumberoftimestheletterZ

appearedwasgreaterthanifothersourceswereused.Forthe

commonletters,thoughsuchasE,T,andAthereiswide

agreementontheiruseasbestfirstguessesforcodebreaking.



WheelofFortuneStrategy

OntheTVgameshowWheelofFortune,beforesolvingthebigpuzzleatthe

end,theproducersareniceenoughtoprovidecertainlettersandshowwhether

theyappearinthehangman-typephrase.TheyprovideR,S,T,L,N,andE.

Thesearegiven,ofcourse,becausetheyarecommonletters,andareinourtop

12:ETAOINSHRDLU.Theplayerisallowedtochoosethreemoreconsonantsand

anothervowel.Usingourstatisticalknowledgeofletterfrequency,agoodbasic

strategywouldbetopickAasthevowelandthethreemostcommon

consonantsnotyetshown:H,D,andC.



StatisticalAnalysisofCodedTexts

Here'showyoumightusetheseletterstatsinreallifeto

decodeasecretmessageorsolveapuzzle.Thismethodworks

bestifthecodedtextislengthy,butitworkssurprisinglywell

evenforshorterpassages.Calculatethedistributionofthe

coded,substituteletters(theciphertext),andthencompareit

tothedistributionshowninTable6-19.

Figure6-8showshowthisprocessmightlookgraphically.Only

thefirst10mostcommonlettersareshown,buttheanalysis

wouldusealltheletters.Thisexamplepretendsthatthereisa

lotofcodedtextandthatthesubstituteciphershowninTable

6-18isbeingused.



Figure6-8.Englishletterfrequency(left)and

codedletterfrequency(right)



BecausethemostcommonsubstitutelettersareP,followedby

J,agoodguessforbreakingthecodewouldbetoseewhether

PcouldreallybeEandJcouldreallybeT.Thesefirstguesses

canbemadeallthewaydownthelineforeachletter.By

startingwiththemostfrequentlyappearinglettersandmoving

downthelist,acodebreakercanquicklyseewhetherthese

firsthypothesesarerightorwrongandchangeguessesaround

untilEnglishwordsstarttoappear.



OtherCommonLetterPatterns

Beyondjustknowingthefrequencyofindividualletters

appearing,goodcodebreakersuseprobabilityinformation

aboutotherpatternsofletters:

WordsaremostlikelytostartwithT,O,A,W,orB.

MostwordsendwithanE,T,D,orS.

Iftwolettersaredoubledinaword,theyaremostlikelyto

beSS,EE,TT,FF,orLL.

Frequentlyappearingtwo-letterwordsincludeof,to,in,it,



andis.

Byfar,themostcommonthree-letterwordsaretheand

and.Othercommonthree-letterwordsincludefor,are,and

but.

LettersthattendtocomeinpairsincludeTH,HE,AN,IN,

andER.

Themostfrequentlyusedwordsarethe,of,and,to,in,a,

is,that,be,andit.

Perhapsindicatingwhatpeopletendtowriteabout,thetop

100most-usedwordsinwrittentextsincludedollars,great,

general,andpublic.Debtsjustbarelyfailedtomakethetop

100,butitissurprisinglycommon.



SeeAlso

Agoodexplanationofsinglesubstitutioncipherscanbe

foundundertheentryforfrequencyanalysisat

http://en.wikipedia.org/wiki/Frequency_analysis.

Someofthestatisticsreportedinthishackwerefoundat

http://www.data-compression.comand

http://www.scottbryce.com.Goodinformationandadvice

forsolvingcryptogramsandothercodesusingstatisticscan

befoundatthosesites.







Hack71.DiscoveraNewSpecies



Whileeverydayentirespeciesofcreaturesbecome

extinct,occasionallynewspeciesareidentifiedthatwere

previouslyunknown.Surprisingly,statisticaltools,not

biologicaltools,candothetrick.

Afewyearsback,anewspecies,atypeofpossum,was

identified.Thenewspecieswasnamedtrichosurus

cunninghamii.Trichosurusmeans,um...possum(Iguess),and

thecunninghamiipartreferstoitsdiscoverer,Ross

Cunningham,astatisticianatAustralianNationalUniversity.If

you'dliketohaveaspeciesnamedforyou,here'showstatistics

canhelp.



IdentifyingSpecieswithStatistics

Thereisafamilyofstatisticalanalysesthatlooksatabunchof

variablesandfindsnaturallyoccurringgroupingsamongthem.

Typically,thegroupingsorclustersofvariablesareidentifiedon

thebasisofthecorrelationsamongthem[Hack#11].

Oneprocedurethatusesthisstrategyattemptstofind

underlyingdimensionsorinvisible,giantbasicvariablesthat

accountforabunchoflessimportantvariables.Thisprocedure

isfactoranalysis,andelsewhereweseehowitcan,among

otherthings,beusedtoidentifywriters'styles[Hack#65].

Statisticsisfullofsimilartechniquesthatcanidentify

dimensions,underlyingcauses,andgroupings.Thegoalof

identifyinggroupingsisofgreatestusetobiologicallyinclined

statisticianswhowishtoidentifynewspecies.



Forsomegroupofanimalstotechnicallybeaseparatespecies,

itmustshareauniquesetofbiologicalcharacteristicsthat

makeitdistinctfromsimilaranimals.Sure,animalswithinthe

samefamilyalllookalittledifferentfromeachother,butthen,

peoplelookalotdifferentfromeachotherandweareallone

species(myUncleFrankbeingperhapstheexceptionthat

provestherule).

Ifagroupofanimals,suchasDr.Cunningham'spossums,have

moreincommonwitheachotherthantheydowiththeother

creaturesintheirspecies,theymightbecandidatesfor

considerationasaspeciesintheirownright.Statisticscan

determinethat"morelikeeachotherandmoredifferentfrom

therestofthespeciesthanchancealonewouldproduce"point.

UsingCunningham'sdiscoveryasamodel,thereareafew

stepstofollowforyoutomakeyourowndiscovery.



Collectsomedata

ThispossumexistedinAustralianearpeopleformorethan200

yearsandnoonenoticed.Tobefair,itlookedanawfullotlike

theotherpossums,themostcommonofwhichwasthe

trichosuruscaninus,nowcalledtheshort-earedpossum.

Itwasassumedforsometimethattherewasreallyjustthis

onespeciesofthelittleguys.PartofDr.Cunningham'sjobwas

tocollectandorganizedescriptivedataforthewildlifearound

him.Consequently,hehadatonofveryspecificquantitative

descriptionsofvariouspossumpartseyes,ears,nose,and

throatandmeasurementsofotherphysicalcharacteristics.



Chooseastatisticalmethod

Cunningham'schoicewasatechniquesimilartofactoranalysis



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Hack 70. Break Codes with Etaoin Shrdlu

Tải bản đầy đủ ngay(0 tr)

×