Tải bản đầy đủ - 0 (trang)
Chapter 5. Correct, Beautiful, Fast (in That Order): Lessons from Designing XML Verifiers

Chapter 5. Correct, Beautiful, Fast (in That Order): Lessons from Designing XML Verifiers

Tải bản đầy đủ - 0trang

5.Correct,Beautiful,Fast(inThat

Order):LessonsfromDesigningXML

Verifiers

ElliotteRustyHarold

ThisisthestoryoftworoutinesthatperforminputverificationforXML,thefirst



inJDOM,andthesecondinXOM.Iwasintimatelyinvolvedin

thedevelopmentofboth,andwhilethetwocodebasesare

completelyseparateandsharenocommoncode,theideasfrom

thefirstclearlytrickledintothesecond.Thecode,inmy

opinion,graduallybecamemorebeautiful.Itcertainlybecame

faster.

Speedwasthedrivingfactorineachsuccessiverefinement,but

inthiscasetheimprovementsinspeedwereaccompaniedby

improvementsinbeautyaswell.Ihopetodispelthemyththat

fastcodemustbeillegible,uglycode.Onthecontrary,Ibelieve

thatmoreoftenthannot,improvementsinbeautyleadto

improvementsinexecutionspeed,especiallytakinginto

accounttheimpactofmodernoptimizingcompilers,just-in-time

compilers,RISC(reducedinstructionsetcomputer)

architectures,andmulti-coreCPUs.



5.1.TheRoleofXMLValidation

XMLachievesinteroperabilitybyrigorouslyenforcingcertain

rulesaboutwhatmayandmaynotappearinanXMLdocument.

Withafewverysmallexceptions,aconformingprocessorcan

processanywell-formedXMLdocumentandcanidentify(and

notattempttoprocess)malformeddocuments.Thisensuresa

highdegreeofinteroperabilitybetweenplatforms,parsers,and

programminglanguages.Youdon'thavetoworrythatyour

parserwon'treadmydocumentbecauseyourswaswritteninC

andrunsonUnix,whileminewaswritteninJavaandrunson

Windows.



FullymaintainingXMLcorrectnessnormallyinvolvestwo

redundantchecksonthedata:

1. Validationoccursoninput.AsaparserreadsanXML

document,itchecksthedocumentforwell-formednessand,

optionally,validity.Well-formednesscheckspurelysyntactic

constraints,suchaswhethereverystarttaghasamatching

endtag.ThisisrequiredofallXMLparsers.Validitymeans

thatonlyelementsandattributesspecificallylistedina

DocumentTypeDefinition(DTD)appear,andonlyinthe

properpositions.

2. Verificationhappensonoutput.WhengeneratinganXML

documentthroughanXMLAPIsuchasDOM,JDOM,or

XOM,theparserchecksallstringspassingthroughtheAPI

tomakesurethey'relegalinXML.

WhileinputvalidationismorethoroughlydefinedbytheXML

specification,outputverificationcanbeequallyimportant.In

particular,itiscriticalfordebuggingandmakingsurethatthe

codeiscorrect.







Correct,Beautiful,Fast(inThatOrder):Lessonsfrom

DesigningXMLVerifiers>TheRoleofXMLValidation



5.Correct,Beautiful,Fast(inThat

Order):LessonsfromDesigningXML

Verifiers

ElliotteRustyHarold

ThisisthestoryoftworoutinesthatperforminputverificationforXML,thefirst



inJDOM,andthesecondinXOM.Iwasintimatelyinvolvedin

thedevelopmentofboth,andwhilethetwocodebasesare

completelyseparateandsharenocommoncode,theideasfrom

thefirstclearlytrickledintothesecond.Thecode,inmy

opinion,graduallybecamemorebeautiful.Itcertainlybecame

faster.

Speedwasthedrivingfactorineachsuccessiverefinement,but

inthiscasetheimprovementsinspeedwereaccompaniedby

improvementsinbeautyaswell.Ihopetodispelthemyththat

fastcodemustbeillegible,uglycode.Onthecontrary,Ibelieve

thatmoreoftenthannot,improvementsinbeautyleadto

improvementsinexecutionspeed,especiallytakinginto

accounttheimpactofmodernoptimizingcompilers,just-in-time

compilers,RISC(reducedinstructionsetcomputer)

architectures,andmulti-coreCPUs.



5.1.TheRoleofXMLValidation

XMLachievesinteroperabilitybyrigorouslyenforcingcertain

rulesaboutwhatmayandmaynotappearinanXMLdocument.

Withafewverysmallexceptions,aconformingprocessorcan

processanywell-formedXMLdocumentandcanidentify(and

notattempttoprocess)malformeddocuments.Thisensuresa

highdegreeofinteroperabilitybetweenplatforms,parsers,and

programminglanguages.Youdon'thavetoworrythatyour

parserwon'treadmydocumentbecauseyourswaswritteninC

andrunsonUnix,whileminewaswritteninJavaandrunson

Windows.



FullymaintainingXMLcorrectnessnormallyinvolvestwo

redundantchecksonthedata:

1. Validationoccursoninput.AsaparserreadsanXML

document,itchecksthedocumentforwell-formednessand,

optionally,validity.Well-formednesscheckspurelysyntactic

constraints,suchaswhethereverystarttaghasamatching

endtag.ThisisrequiredofallXMLparsers.Validitymeans

thatonlyelementsandattributesspecificallylistedina

DocumentTypeDefinition(DTD)appear,andonlyinthe

properpositions.

2. Verificationhappensonoutput.WhengeneratinganXML

documentthroughanXMLAPIsuchasDOM,JDOM,or

XOM,theparserchecksallstringspassingthroughtheAPI

tomakesurethey'relegalinXML.

WhileinputvalidationismorethoroughlydefinedbytheXML

specification,outputverificationcanbeequallyimportant.In

particular,itiscriticalfordebuggingandmakingsurethatthe

codeiscorrect.







Correct,Beautiful,Fast(inThatOrder):Lessonsfrom

DesigningXMLVerifiers>TheProblem



5.2.TheProblem

TheveryfirstbetareleasesofJDOMdidnotverifythestrings

usedtocreateelementnames,textcontent,orprettymuch

anythingelse.Programswerefreetogenerateelementnames

thatcontainedwhitespace,commentsthatendedinhyphens,

textnodesthatcontainednulls,andothermalformedcontent.

MaintainingthecorrectnessofthegeneratedXMLwas

completelyleftuptotheclientprogrammer.

Thisbotheredme.WhileXMLissimplerthansomealternatives,

itisnotsimpleenoughthatitcanbefullyunderstoodwithout

immersingyourselfinspecificationarcana,suchasexactly

whichUnicodecodepointsareorarenotlegalinXMLnames

andtextcontent.

JDOMaimedtobeanAPIthatbroughtXMLtothemasses.

JDOMaimedtobeanAPIthat,unlikeDOM,didnotrequirea

two-weekcourseandanexpensiveexpertmentortolearnto

useproperly.Toenablethis,JDOMneededtoliftasmuchofthe

burdenofunderstandingXMLfromtheprogrammeraspossible.

Properlyimplemented,JDOMwouldkeeptheprogrammerfrom

makingmistakes.

TherearenumerouswaysJDOMcoulddothis.Someofthem

felloutasadirectresultofitsdatamodel.Forinstance,in

JDOMitisnotpossibletooverlapelements(

Sallysaid,

let'sgothepark.

.Thenlet'splayball.

).BecauseJDOM'sinternalrepresentationisatree,

there'ssimplynowaytogeneratethismarkupfromJDOM.

However,anumberofotherconstraintsneedtobechecked

explicitly,suchaswhether:

Thenameofanelement,attribute,orprocessinginstruction



isalegalXMLname

Localnamesdonotcontaincolons

Attributenamespacesdonotconflictwiththenamespaces

oftheirparentelementorsiblingattributes

EveryUnicodesurrogatecharacterappearsaspartofa

surrogatepairconsistingofonehighsurrogatefollowedby

onelowsurrogate

Processinginstructiondatadoesnotcontainthetwocharacterstring?>

Whenevertheclientsuppliesastringforuseinoneofthese

areas,itshouldbecheckedtoseethatitmeetstherelevant

constraints.Thedetailsvary,butthebasicapproachisthe

same.

Forpurposesofthischapter,I'mgoingtoexaminetherulesfor

checkingXML1.0elementnames.

IntheXML1.0specification(partofwhichisgiveninExample

5-1),rulesaregiveninaBackus-NaurForm(BNF)grammar.

Here#xddddrepresentstheUnicodecodepointwiththe

hexadecimalvaluedddd.[#xdddd-#xeeee]representsall

Unicodecodepointsfrom#xddddto#xeeee.

Example5-1.BNFgrammarforcheckingXMLnames

(abridged)

CodeView:Scroll/ShowAll



BaseChar::=[#x0041-#x005A]|[#x0061-#x007A]|[#x00C0-#x0

NameChar::=Letter|Digit|'.'|'-'|'_'|':'|Combin

Name::=(Letter|'_'|':')(NameChar)*

Letter::=BaseChar|Ideographic

Ideographic::=[#x4E00-#x9FA5]|#x3007|[#x3021-#x3029]



Digit::=[#x0030-#x0039]|[#x0660-#x0669]|[#x06F0-#x0

|[#x0966-#x096F]|[#x09E6-#x09EF]|[#x0A66-#x0

|[#x0AE6-#x0AEF]|[#x0B66-#x0B6F]|[#x0BE7-#x0

|[#x0C66-#x0C6F]|[#x0CE6-#x0CEF]|[#x0D66-#x0

|[#x0E50-#x0E59]|[#x0ED0-#x0ED9]|[#x0F20-#x0

Extender::=#x00B7|#x02D0|#x02D1|#x0387|#x0640|#x

|#x3005|[#x3031-#x3035]|[#x309D-#x309E]|[

|[#x00D8-#x00F6]|[#x00F8-#x00FF]|[#x0100-#x0

|[#x0134-#x013E]|[#x0141-#x0148]|[#x014A-#x0

|[#x0180-#x01C3]...

CombiningChar::=[#x0300-#x0345]|[#x0360-#x0361]|[#x0483-#

|[#x0591-#x05A1]|[#x05A3-#x05B9]|[#x05BB|[#x05C1-#x05C2]|#x05C4|[#x064B-#x0652]|

|[#x06D6-#x06DC]|[#x06DD-#x06DF]|[#x06E0|[#x06E7-#x06E8]|[#x06EA-#x06ED]...



























Thecompletesetofruleswouldtakeupseveralpageshere,as

thereareover90,000charactersinUnicodetoconsider.In

particular,therulesforBaseCharandCombiningCharhave

beenshortenedinthisexample.

ToverifythatastringisalegalXMLname,itisnecessaryto

iteratethrougheachcharacterinthestringandverifythatitis

alegalnamecharacterasdefinedbytheNameCharproduction.







Correct,Beautiful,Fast(inThatOrder):Lessonsfrom

DesigningXMLVerifiers>Version1:TheNaïve

Implementation



5.3.Version1:TheNaïve

Implementation

MyinitialcontributiontoJDOM(showninExample5-2)simply

deferredtherulecheckstoJava'sCharacterclass.The

checkXMLNamemethodreturnsanerrormessageifanXML

nameisinvalid,andnullifit'svalid.Thisitselfisaquestionable

design;itshouldprobablythrowanexceptionifthenameis

invalid,andreturnvoidinallothercases.Laterinthischapter,

you'llseehowfutureversionsaddressedthis.

Example5-2.Thefirstversionofnamecharacter

verification

CodeView:Scroll/ShowAll















privatestaticStringcheckXMLName(Stringname){

//Cannotbeemptyornull

if((name==null)||(name.length()==0)||(name

return"XMLnamescannotbenullorempty";

}































//Cannotstartwithanumber

charfirst=name.charAt(0);

if(Character.isDigit(first)){

return"XMLnamescannotbeginwithanumber.";

}

//Cannotstartwitha$

if(first=='$'){

return"XMLnamescannotbeginwithadollarsi

}

//Cannotstartwitha_

if(first=='-'){

return"XMLnamescannotbeginwithahyphen(}





//Ensurevalidcontent



for(inti=0,len=name.length();i


charc=name.charAt(i);



if((!Character.isLetterOrDigit(c))



&&(c!='-')



&&(c!='$')



&&(c!='_')){

Thismethodwasstraightforwardandeasytounderstand.



returnc+"isnotallowedinXMLnames.";

Unfortunately,itwaswrong.Inparticular:



}

Itallowednamesthatcontainedcolons.BecauseJDOM

}









attemptedtomaintainnamespacewell-formedness,this

//Wegothere,soeverythingisOK

hadtobefixed.

returnnull;

}

TheJavaCharacter.isLetterOrDigitand







Character.isDigitmethodsaren'tperfectlyalignedwith

XML'sdefinitionoflettersanddigits.Javaconsiderssome











charactersaslettersthatXMLdoesn't,andviceversa.



TheJavaruleschangefromoneversionofJavatothenext.

XML'srulesdon't.

Nonetheless,thiswasareasonablefirstattempt.Itdidcatcha

largepercentageofmalformednamesanddidn'trejecttoo

manywell-formedones.Itworkedespeciallywellinthe

commoncasewhenallthenameswereASCII.Evenso,JDOM

strivedforbroaderapplicabilitythanthat.Animproved

implementationthatactuallyfollowedXML'sruleswascalledfor.







Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Chapter 5. Correct, Beautiful, Fast (in That Order): Lessons from Designing XML Verifiers

Tải bản đầy đủ ngay(0 tr)

×