Tải bản đầy đủ - 0 (trang)
A. The Evolution of the awk Language

A. The Evolution of the awk Language

Tải bản đầy đủ - 0trang

AppendixA.TheEvolutionoftheawk

Language

ThisbookdescribestheGNUimplementationofawk,whichfollowsthePOSIX

specification.Manylongtimeawkuserslearnedawkprogrammingwiththeoriginalawk

implementationinVersion7Unix.(ThisimplementationwasthebasisforawkinBerkeley

Unix,through4.3-Reno.SubsequentversionsofBerkeleyUnix,and,forawhile,some

systemsderivedfrom4.4BSD-Lite,usedvariousversionsofgawkfortheirawk.)This

chapterbrieflydescribestheevolutionoftheawklanguage,withcross-referencestoother

partsofthebookwhereyoucanfindmoreinformation.

Tosavespace,wehaveomittedinformationonthehistoryoffeaturesingawkfromthis

edition.Youcanfinditintheonlinedocumentation.



MajorChangesBetweenV7andSVR3.1

TheawklanguageevolvedconsiderablybetweenthereleaseofVersion7Unix(1978)and

thenewversionthatwasfirstmadegenerallyavailableinSystemVRelease3.1(1987).

Thissectionsummarizesthechanges,withcross-referencestofurtherdetails:

Therequirementfor‘;’toseparaterulesonaline(seeawkStatementsVersusLines)

User-definedfunctionsandthereturnstatement(seeUser-DefinedFunctions)

Thedeletestatement(seeThedeleteStatement)

Thedo-whilestatement(seeThedo-whileStatement)

Thebuilt-infunctionsatan2(),cos(),sin(),rand(),andsrand()(seeNumeric

Functions)

Thebuilt-infunctionsgsub(),sub(),andmatch()(seeString-ManipulationFunctions)

Thebuilt-infunctionsclose()andsystem()(seeInput/OutputFunctions)

TheARGC,ARGV,FNR,RLENGTH,RSTART,andSUBSEPpredefinedvariables(see

PredefinedVariables)

Assignable$0(seeChangingtheContentsofaField)

Theconditionalexpressionusingtheternaryoperator‘?:’(seeConditional

Expressions)

Theexpression‘indxinarray’outsideofforstatements(seeReferringtoanArray

Element)

Theexponentiationoperator‘^’(seeArithmeticOperators)anditsassignmentoperator

form‘^=’(seeAssignmentExpressions)

C-compatibleoperatorprecedence,whichbreakssomeoldawkprograms(seeOperator

Precedence(HowOperatorsNest))

RegexpsasthevalueofFS(seeSpecifyingHowFieldsAreSeparated)andasthethird

argumenttothesplit()function(seeString-ManipulationFunctions),ratherthan

usingonlythefirstcharacterofFS

Dynamicregexpsasoperandsofthe‘~’and‘!~’operators(seeUsingDynamic

Regexps)

Theescapesequences‘\b’,‘\f’,and‘\r’(seeEscapeSequences)

Redirectionofinputforthegetlinefunction(seeExplicitInputwithgetline)

MultipleBEGINandENDrules(seeTheBEGINandENDSpecialPatterns)

Multidimensionalarrays(seeMultidimensionalArrays)



ChangesBetweenSVR3.1andSVR4

TheSystemVRelease4(1989)versionofUnixawkaddedthesefeatures(someofwhich

originatedingawk):

TheENVIRONarray(seePredefinedVariables)

Multiple-foptionsonthecommandline(seeCommand-LineOptions)

The-voptionforassigningvariablesbeforeprogramexecutionbegins(seeCommandLineOptions)

The--signalforterminatingcommand-lineoptions

The‘\a’,‘\v’,and‘\x’escapesequences(seeEscapeSequences)

Adefinedreturnvalueforthesrand()built-infunction(seeNumericFunctions)

Thetoupper()andtolower()built-instringfunctionsforcasetranslation(seeStringManipulationFunctions)

Acleanerspecificationforthe‘%c’format-controlletterintheprintffunction(see

Format-ControlLetters)

Theabilitytodynamicallypassthefieldwidthandprecision("%*.*d")intheargument

listofprintfandsprintf()(seeFormat-ControlLetters)

Theuseofregexpconstants,suchas/foo/,asexpressions,wheretheyareequivalent

tousingthematchingoperator,asin‘$0~/foo/’(seeUsingRegularExpression

Constants)

Processingofescapesequencesinsidecommand-linevariableassignments(see

Assigningvariablesonthecommandline)



ChangesBetweenSVR4andPOSIXawk

ThePOSIXCommandLanguageandUtilitiesstandardforawk(1992)introducedthe

followingchangesintothelanguage:

Theuseof-Wforimplementation-specificoptions(seeCommand-LineOptions)

TheuseofCONVFMTforcontrollingtheconversionofnumberstostrings(see

ConversionofStringsandNumbers)

Theconceptofanumericstringandtightercomparisonrulestogowithit(seeVariable

TypingandComparisonExpressions)

Theuseofpredefinedvariablesasfunctionparameternamesisforbidden(seeFunction

DefinitionSyntax)

Morecompletedocumentationofmanyofthepreviouslyundocumentedfeaturesofthe

language

In2012,anumberofextensionsthathadbeencommonlyavailableformanyyearswere

finallyaddedtoPOSIX.Theyare:

Thefflush()built-infunctionforflushingbufferedoutput(seeInput/Output

Functions)

Thenextfilestatement(seeThenextfileStatement)

Theabilitytodeleteallofanarrayatoncewith‘deletearray’(seeThedelete

Statement)

SeeCommonExtensionsSummaryforalistofcommonextensionsnotpermittedbythe

POSIXstandard.

The2008POSIXstandardcanbefoundonlineat

http://www.opengroup.org/onlinepubs/9699919799/.



ExtensionsinBrianKernighan’sawk

BrianKernighanhasmadehisversionavailableviahishomepage(seeOtherFreely

AvailableawkImplementations).

Thissectiondescribescommonextensionsthatoriginallyappearedinhisversionofawk:

The‘**’and‘**=’operators(seeArithmeticOperatorsandAssignmentExpressions)

Theuseoffuncasanabbreviationforfunction(seeFunctionDefinitionSyntax)

Thefflush()built-infunctionforflushingbufferedoutput(seeInput/Output

Functions)

SeeCommonExtensionsSummaryforafulllistoftheextensionsavailableinhisawk.



ExtensionsingawkNotinPOSIXawk

TheGNUimplementation,gawk,addsalargenumberoffeatures.Theycanallbedisabled

witheitherthe--traditionalor--posixoptions(seeCommand-LineOptions).

Anumberoffeatureshavecomeandgoneovertheyears.Thissectionsummarizesthe

additionalfeaturesoverPOSIXawkthatareinthecurrentversionofgawk.

Additionalpredefinedvariables:

TheARGIND,BINMODE,ERRNO,FIELDWIDTHS,FPAT,IGNORECASE,LINT,PROCINFO,RT,and

TEXTDOMAINvariables(seePredefinedVariables)

SpecialfilesinI/Oredirections:

The/dev/stdin,/dev/stdout,/dev/stderr,and/dev/fd/Nspecialfilenames(see

SpecialFilenamesingawk)

The/inet,/inet4,and‘/inet6’specialfilesforTCP/IPnetworkingusing‘|&’to

specifywhichversionoftheIPprotocoltouse(seeUsinggawkforNetwork

Programming)

Changesand/oradditionstothelanguage:

The‘\x’escapesequence(seeEscapeSequences)

FullsupportforbothPOSIXandGNUregexps(seeChapter3)

TheabilityforFSandforthethirdargumenttosplit()tobenullstrings(seeMaking

EachCharacteraSeparateField)

TheabilityforRStobearegexp(seeHowInputIsSplitintoRecords)

Theabilitytouseoctalandhexadecimalconstantsinawkprogramsourcecode(see

Octalandhexadecimalnumbers)

The‘|&’operatorfortwo-wayI/Otoacoprocess(seeTwo-WayCommunicationswith

AnotherProcess)

Indirectfunctioncalls(seeIndirectFunctionCalls)

Directoriesonthecommandlineproduceawarningandareskipped(seeDirectories

ontheCommandLine)

Newkeywords:

TheBEGINFILEandENDFILEspecialpatterns(seeTheBEGINFILEandENDFILE

SpecialPatterns)

Theswitchstatement(seeTheswitchStatement)

Changestostandardawkfunctions:

Theoptionalsecondargumenttoclose()thatallowsclosingoneendofatwo-way

pipetoacoprocess(seeTwo-WayCommunicationswithAnotherProcess)

POSIXcomplianceforgsub()andsub()with--posix

Thelength()functionacceptsanarrayargumentandreturnsthenumberofelements

inthearray(seeString-ManipulationFunctions)

Theoptionalthirdargumenttothematch()functionforcapturingtext-matching

subexpressionswithinaregexp(seeString-ManipulationFunctions)

Positionalspecifiersinprintfformatsformakingtranslationseasier(seeRearranging



printfArguments)

Thesplit()function’sadditionaloptionalfourthargument,whichisanarraytohold

thetextofthefieldseparators(seeString-ManipulationFunctions)

Additionalfunctionsonlyingawk:

Thegensub(),patsplit(),andstrtonum()functionsformorepowerfultext

manipulation(seeString-ManipulationFunctions)

Theasort()andasorti()functionsforsortingarrays(seeControllingArray

TraversalandArraySorting)

Themktime(),systime(),andstrftime()functionsforworkingwithtimestamps

(seeTimeFunctions)

Theand(),compl(),lshift(),or(),rshift(),andxor()functionsforbit

manipulation(seeBit-ManipulationFunctions)

Theisarray()functiontocheckifavariableisanarrayornot(seeGettingType

Information)

Thebindtextdomain(),dcgettext(),anddcngettext()functionsfor

internationalization(seeInternationalizingawkPrograms)

Changesand/oradditionsinthecommand-lineoptions:

TheAWKPATHenvironmentvariableforspecifyingapathsearchforthe-fcommandlineoption(seeCommand-LineOptions)

TheAWKLIBPATHenvironmentvariableforspecifyingapathsearchforthe-l

command-lineoption(seeCommand-LineOptions)

The-b,-c,-C,-d,-D,-e,-E,-g,-h,-i,-l,-L,-M,-n,-N,-o,-O,-p,-P,-r,-S,-t,

and-Vshortoptions.Also,theabilitytouseGNU-stylelong-namedoptionsthatstart

with--;andthe--assign,--bignum,--characters-as-bytes,--copyright,-debug,--dump-variables,--exec,--field-separator,--file,--gen-pot,--help,

--include,--lint,--lint-old,--load,--non-decimal-data,--optimize,--posix,

--pretty-print,--profile,--re-interval,--sandbox,--source,--traditional,

--use-lc-numeric,and--versionlongoptions(seeCommand-LineOptions)

Supportforthefollowingobsoletesystemswasremovedfromthecodeandthe

documentationforgawkversion4.0:

Amiga

Atari

BeOS

Cray

MIPSRiscOS

MS-DOSwiththeMicrosoftCompiler

MS-WindowswiththeMicrosoftCompiler

NeXT

SunOS3.x,Sun386(RoadRunner)

Tandem(non-POSIX)

PrestandardVAXCcompilerforVAX/VMS

GCCforVAXandAlphahasnotbeentestedforawhile.

Supportforthefollowingobsoletesystemwasremovedfromthecodeforgawkversion



4.1:

Ultrix



CommonExtensionsSummary

Thefollowingtablesummarizesthecommonextensionssupportedbygawk,Brian

Kernighan’sawk,andmawk,thethreemostwidelyusedfreelyavailableversionsofawk

(seeOtherFreelyAvailableawkImplementations).

Feature



BWKawk mawk gawk Nowstandard



‘\x’escapesequence



















FSasnullstring



















/dev/stdinspecialfile



















/dev/stdoutspecialfile



















/dev/stderrspecialfile



















deletewithoutsubscript



















fflush()function



















length()ofanarray



















nextfilestatement



















**and**=operators



















funckeyword



















BINMODEvariable



















RSasregexp



















Time-relatedfunctions



















RegexpRangesandLocales:ALongSadStory

Thissectiondescribestheconfusinghistoryofrangeswithinregularexpressionsandtheir

interactionswithlocales,andhowthisaffecteddifferentversionsofgawk.

TheoriginalUnixtoolsthatworkedwithregularexpressionsdefinedcharacterranges

(suchas‘[a-z]’)tomatchanycharacterbetweenthefirstcharacterintherangeandthe

lastcharacterintherange,inclusive.Orderingwasbasedonthenumericvalueofeach

characterinthemachine’snativecharacterset.Thus,onASCII-basedsystems,‘[a-z]’

matchedallthelowercaseletters,andonlythelowercaseletters,asthenumericvaluesfor

thelettersfrom‘a’through‘z’werecontiguous.(OnanEBCDICsystem,therange‘[az]’includesadditionalnonalphabeticcharactersaswell.)

AlmostallintroductoryUnixliteratureexplainedrangeexpressionsasworkinginthis

fashion,andinparticular,wouldteachthatthe“correct”waytomatchlowercaseletters

waswith‘[a-z]’,andthat‘[A-Z]’wasthe“correct”waytomatchuppercaseletters.And

indeed,thiswastrue.[104]

The1992POSIXstandardintroducedtheideaoflocales(seeWhereYouAreMakesa

Difference).Becausemanylocalesincludeotherlettersbesidestheplain26lettersofthe

Englishalphabet,thePOSIXstandardaddedcharacterclasses(seeUsingBracket

Expressions)asawaytomatchdifferentkindsofcharactersbesidesthetraditionalonesin

theASCIIcharacterset.

However,thestandardchangedtheinterpretationofrangeexpressions.Inthe"C"and

"POSIX"locales,arangeexpressionlike‘[a-dx-z]’isstillequivalentto‘[abcdxyz]’,as

inASCII.Butoutsidethoselocales,theorderingwasdefinedtobebasedoncollation

order.

Whatdoesthatmean?Inmanylocales,‘A’and‘a’arebothlessthan‘B’.Inotherwords,

theselocalessortcharactersindictionaryorder,and‘[a-dx-z]’istypicallynotequivalent

to‘[abcdxyz]’;instead,itmightbeequivalentto‘[ABCXYabcdxyz]’,forexample.

Thispointneedstobeemphasized:muchliteratureteachesthatyoushoulduse‘[a-z]’to

matchalowercasecharacter.Butonsystemswithnon-ASCIIlocales,thisalsomatchesall

oftheuppercasecharactersexcept‘A’or‘Z’!Thiswasacontinuouscauseofconfusion,

evenwellintothetwenty-firstcentury.

Todemonstratetheseissues,thefollowingexampleusesthesub()function,whichdoes

textreplacement(seeString-ManipulationFunctions).Here,theintentistoremove

trailinguppercasecharacters:

$echosomething1234abc|gawk-3.1.8'{sub("[A-Z]*$","");print}'

something1234a



Thisoutputisunexpected,asthe‘bc’attheendof‘something1234abc’shouldnot

normallymatch‘[A-Z]*’.Thisresultisduetothelocalesetting(andthusyoumaynotsee

itonyoursystem).

Similarconsiderationsapplytootherranges.Forexample,‘["-/]’isperfectlyvalidin

ASCII,butisnotvalidinmanyUnicodelocales,suchasen_US.UTF-8.

Earlyversionsofgawkusedregexpmatchingcodethatwasnotlocale-aware,soranges



hadtheirtraditionalinterpretation.

Whengawkswitchedtousinglocale-awareregexpmatchers,theproblemsbegan;

especiallyasbothGNU/LinuxandcommercialUnixvendorsstartedimplementingnonASCIIlocales,andmakingthemthedefault.Perhapsthemostfrequentlyaskedquestion

becamesomethinglike,“Whydoes‘[A-Z]’matchlowercaseletters?!?”

Thissituationexistedforcloseto10years,ifnotmore,andthegawkmaintainergrew

wearyoftryingtoexplainthatgawkwasbeingnicelystandards-compliant,andthatthe

issuewasintheuser’slocale.Duringthedevelopmentofversion4.0,hemodifiedgawkto

alwaystreatrangesintheoriginal,pre-POSIXfashion,unless--posixwasused(see

Command-LineOptions).[105]

Fortunately,shortlybeforethefinalreleaseofgawk4.0,themaintainerlearnedthatthe

2008standardhadchangedthedefinitionofranges,suchthatoutsidethe"C"and"POSIX"

locales,themeaningofrangeexpressionswasundefined.[106]

Byusingthislovelytechnicalterm,thestandardgiveslicensetoimplementorsto

implementrangesinwhateverwaytheychoose.Thegawkmaintainerchosetoapplythe

pre-POSIXmeaningbothwiththedefaultregexpmatchingandwhen--traditionalor-posixareused.InallcasesgawkremainsPOSIX-compliant.



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

A. The Evolution of the awk Language

Tải bản đầy đủ ngay(0 tr)

×