Tải bản đầy đủ - 0 (trang)
Chapter 9. Getting Control-Regular Expression Metacharacters

Chapter 9. Getting Control-Regular Expression Metacharacters

Tải bản đầy đủ - 0trang

Explanation

Thisregularexpressioncontainsmetacharacters.(SeeTable

9.1.)Thefirstoneisacaret(^).Thecaretmetacharacter

matchesforastringonlyifitisatthebeginningoftheline.

Theperiod(.)isusedtomatchforanysinglecharacter,

includingwhitespace.Thisexpressioncontainsthreeperiods,

representinganythreecharacters.Tofindaliteralperiodor

anyothercharacterthatdoesnotrepresentitself,the

charactermustbeprecededbyabackslashtoprevent

interpretation.9.39Examples9.38

Table9.1.Metacharacters

Metacharacter WhatItMatches

CharacterClass:SingleCharactersandDigits

.



Matchesanycharacterexcepta

newline



[a–z0–9]



Matchesanysinglecharacterinset



[^a–z0–9]



Matchesanysinglecharacternotin

set



\d



Matchesonedigit



\D



Matchesanondigit,sameas[^0–9]



\w



Matchesanalphanumeric(word)

character



\W



Matchesanonalphanumeric(nonword)

character



CharacterClass:WhitespaceCharacters

\s



Matchesawhitespacecharacter,such

asspaces,tabs,andnewlines



\S



Matchesnonwhitespacecharacter



\n



Matchesanewline



\r



Matchesareturn



\t



Matchesatab



\f



Matchesaformfeed



\b



Matchesabackspace



\0



Matchesanullcharacter



CharacterClass:AnchoredCharacters

\b



Matchesawordboundary(whennot

inside[])



\B



Matchesanonwordboundary



^



Matchestobeginningofline



$



Matchestoendofline



\A



Matchesthebeginningofthestring

only



\Z



Matchestheendofthestringorline



\z



Matchestheendofstringonly



\G



Matcheswherepreviousm//gleftoff



CharacterClass:RepeatedCharacters

x?



Matches0or1x



x*



Matches0ormoreoccurrencesofx



x+



Matches1ormoreoccurrencesofx



(xyz)+



Matches1ormorepatternsofxyz



x{m,n}



Matchesatleastmoccurrencesofx

andnomorethannoccurrencesofx



CharacterClass:AlternativeCharacters

was|were|will



Matchesoneofwas,were,orwill



CharacterClass:RememberedCharacters

(string)



Usedforbackreferencing(see



Examples9.38and9.39)

\1or$1



Matchesfirstsetofparentheses[a]



\2or$2



Matchessecondsetofparentheses



\3or$3



Matchesthirdsetofparentheses



CharacterClass:MiscellaneousCharacters

\12



Matchesthatoctalvalue,upto\377



\x811



Matchesthathexvalue



\cX



Matchesthatcontrolcharacter;e.g.,

\cCis-Cand\cVis-V



\e



MatchestheASCIIESCcharacter,not

backslash



\E



Markstheendofchangingcasewith

\U,\L,or\Q



\l



Lowercasethenextcharacteronly



\L



Lowercasecharactersuntiltheendof

thestringoruntil\E



\N



Matchesthatnamedcharacter;e.g.,

\N{greek:Beta}



\p{PROPERTY} Matchesanycharacterwiththenamed

property;e.g.,\p{IsAlpha}/

\P{PROPERTY} Matchesanycharacterwithoutthe

namedproperty

\Q



Quotemetacharactersuntil\E



\u



Titlecasenextcharacteronly



\U



Uppercaseuntil\E



\x{NUMBER}



MatchesUnicodeNUMBERgivenin

hexadecimal



\X



MatchesUnicode"combiningcharacter

sequence"string



\[



Matchesthatmetacharacter



\\



Matchesabackslash



[a]\1and$1arecalledbackreferences.Theydifferinthatthe\1



backreferenceisvalidwithinapattern,whereasthe$1notationisvalidwithin

theenclosingblockoruntilanothersuccessfulsearch.



InExample9.1,theregularexpressionreads:Searchatthe

beginningofthelineforana,followedbyanythreesingle

characters,followedbyac.Itwillmatch,forexample:abbbc,

a123c,ac,oraAx3conlyifthosepatternswerefoundatthe

beginningoftheline.

[a]\1and$1arecalledbackreferences.Theydifferinthatthe\1backreference



isvalidwithinapattern,whereasthe$1notationisvalidwithintheenclosing

blockoruntilanothersuccessfulsearch.







9.1.1.MetacharactersforSingleCharacters

Ifyouaresearchingforaparticularcharacterwithinaregular

expression,youcanusethedotmetacharactertorepresenta

singlecharacteroracharacterclassthatmatchesone

characterfromasetofcharacters.Inadditiontothedotand

characterclass,Perlhasaddedsomebackslashedsymbols

(calledmetasymbols)torepresentsinglecharacters.(See

Table9.2.)

Table9.2.MetacharactersforSingleCharacters

Metacharacter WhatItMatches

.



Matchesanycharacterexceptanewline



[a–z0–9_]



Matchesanysinglecharacterinset



[^a–z0–9_]



Matchesanysinglecharacternotinset



\d



Matchesasingledigit



\D



Matchesasinglenondigit;sameas[^0–9]



\w



Matchesasinglealphanumeric(word)

character;sameas[a–z0–9_]



\W



Matchesasinglenonalphanumeric(nonword)

character;sameas[^a–z0–9_]



TheDotMetacharacter

Thedot(.)metacharactermatchesanysinglecharacterwith

theexceptionofthenewlinecharacter.Forexample,theregular

expression/a.b/ismatchedifthestringcontainsana,followed

byanyonesinglecharacter(exceptthe\n),followedbyb,

whereastheexpression/.../matchesanystringcontainingat

leastthreecharacters.

Example9.2.

(TheScript)

#Thedotmetacharacter

1while(){

2print"FoundNorma!\n"if/N..ma/;

}

__DATA__

SteveBlenheim101

BettyBoop201

IgorChevsky301

NormaCord401

JonathanDeLoach501

KarenEvich601

(Output)

FoundNorma!



Explanation

1. ThespecialDATAfilehandlegetsitsinputfromthetext

afterthe__DATA__token.Thewhileloopisenteredand

thefirstlinefollowingthe__DATA__tokenisreadinand

assignedto$_.Eachtimetheloopisentered,thenextline

below__DATA__isassignedto$_untilallthelineshave

beenprocessed.

2. ThestringFoundNorma!\nisprintedonlyifthepattern

foundin$_containsanuppercaseN,followedbyanytwo

singlecharacters,followedbyanmandana.Itwouldfind

Norma,Noman,Normandy,etc.



ThesModifier—TheDotMetacharacterandthe

Newline

Normally,thedotmetacharacterdoesnotmatchthenewline

character,\n,becauseitmatchesonlythecharacterswithina

stringupuntilthenewlineisreached.Thesmodifiertreatsthe

linewithembeddednewlinesasasingleline,ratherthana

groupofmultiplelines,andallowsthedotmetacharacterto

treatthenewlinecharacterthesameasanyothercharacterit

mightmatch.Thesmodifiercanbeusedwithboththem

(match)andthes(substitution)operators.

Example9.3.

(TheScript)

#Thesmodifierandthenewline

1$_="Singasongofsixpence\nApocketfullofrye.\n";

2print$&if/pence./s;

3print$&if/rye\../s;

4printifs/sixpence.A/twopence,a/s;

(Output)



2pence

3rye.

4Singasongoftwopence,apocketfullofrye.



Explanation

1. The$_scalarisassigned;itcontainstwonewlines.

2. Theregularexpression,/pence./,containsadot

metacharacter.Thedotmetacharacterdoesnotmatcha

newlinecharacterunlessthesmodifierisused.The$&

specialscalarholdsthevaluethepatternfoundinthelast

successfulsearch;i.e.,pence\n.

3. Theregularexpression/rye\../containsaliteralperiod

(thebackslashmakestheperiodliteral),followedbythe

dotmetacharacterthatwillmatchonthenewline,thanks

tothesmodifier.The$&specialscalarholdsthevaluethe

patternfoundinthelastsuccessfulsearch;i.e.,rye.\n.

4. Thesmodifierallowsthedottomatchonthenewline

character,\n,foundinthesearchstring.Thenewlinewill

bereplacedwithaspace.



TheCharacterClass

Acharacterclassrepresentsonecharacterfromasetof

characters.Forexample,[abc]matchesana,b,orc,and[a–z]

matchesonecharacterfromasetofcharactersintherange

fromatoz,and[0–9]matchesonecharacterintherangeof

digitsbetween0and9.Ifthecharacterclasscontainsaleading

caret(^),thentheclassrepresentsanyonecharacternotin

theset;forexample,[^a–zA–Z]matchesasinglecharacter

notintherangefromatozorAtoZ,and[^0–9]matchesa

singlecharacternotintherangebetween0and9.[1]To

representanumberbetween10and13,use1[0–3],not[10–

13].

[1]Don'tconfusethecaretinsidesquarebracketswiththecaretusedasa



beginningoflineanchor.SeeTable9.7onpage258.



Perlprovidesadditionalsymbols,metasymbols,torepresenta



characterclass.Thesymbols\dand\Drepresentasingledigit

andasinglenon-digit,respectively;theyarethesameas[0–9]

and[^0–9].Similarly,\wand\Wrepresentasingleword

characterandasinglenonwordcharacter,respectively;theyare

thesameas[A–Za–z_0–9]and[^A–Za–z_0–9].

Example9.4.

(FromaScript)

1while(){

2printif/[A-Z][a-z]eve/;

}

__DATA__

SteveBlenheim101

BettyBoop201

IgorChevsky301

NormaCord401

JonathanDeLoach501

KarenEvich601

(Output)

SteveBlenheim101



Explanation

1. ThespecialDATAfilehandlegetsitsinputfromthetext

afterthe__DATA__token.Thewhileloopisenteredand

thefirstlinefollowingthe__DATA__tokenisreadinand

assignedto$_.Eachtimetheloopisentered,thenextline

after__DATA__isassignedto$_untilallthelineshave

beenprocessed.

2. Theline$_isprintedonlyif$_containsapatternmatching

oneuppercaseletter[A–Z],followedbyonelowercase

letter[a–z],andfollowedbyeve.

Example9.5.

(TheScript)

#Thebracketedcharacterclass

1while(){

2printif/[A-Za-z0-9_]/;

}

__DATA__

SteveBlenheim101

BettyBoop201

IgorChevsky301

NormaCord401

JonathanDeLoach501

KarenEvich601

(Output)

SteveBlenheim101

BettyBoop201

IgorChevsky301

NormaCord401

JonathanDeLoach501

KarenEvich601



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Chapter 9. Getting Control-Regular Expression Metacharacters

Tải bản đầy đủ ngay(0 tr)

×