Tải bản đầy đủ - 0 (trang)
Chapter 6. Working with XML Data in Excel Spreadsheets

Chapter 6. Working with XML Data in Excel Spreadsheets

Tải bản đầy đủ - 0trang

6.1SeparatingDataandLogic

Whenspreadsheetsfirstappeared,theybrilliantlyblurredthe

distinctionbetweenprogrammingandinformation.Spreadsheet

userscouldentertheirdataandworkonitwithouthavingtodo

thingslike"programming."Alltheinformationcouldresideina

singlefile,readilyshared,andcopyandpastefunctionality

alongwithafewbasicfunctionsensuredthatspreadsheets

wereeasytolearn.Anunknownbutclearlyvastamountof

businessdecision-makinghasrestedonspreadsheets,andan

incredibleamountofbusinessdataisstoredinspreadsheets.

Thispowerhascomeatsomecost,however.While

spreadsheetsareaccessible,theirmixingofdataandlogichas

createdafewproblems.Whilecopyandpasteworkswellfor

simplespreadsheets,itbecomescomplicatedquicklyif,for

example,userstrytocombinelogicfrommultiplespreadsheets.

Suddenlydevelopmentstylematters.Spreadsheetsoftware,

withitssmartcopyandpastefeaturesandsupportformultiple

workbooks,hasdonealottosimplifythisprocess,butthework

involvedinmakingthesepiecescommunicateisstillveryreal.

Mergersandacquisitions,forinstance,oftenfaceaserious

challengeinreconcilingthespreadsheetsusedbydecisionmakersatthevariousorganizations.

Evenonasmallerscale,thecombinationofdataandlogicthat

makespreadsheetssopowerfulcancreatesomesubstantial

annoyances.Iwork,forexample,withdataIneedtoanalyze

onaweekly,monthly,quarterly,andannualbasis.Iusethe

samebasiclogicforallofthisanalysis.ThecompanyIworkfor

makesitavailableinExcelspreadsheets,generatedfroma

database.Iendupwithanenormousnumberoflargely

duplicatespreadsheetsovertime,asonlythedatahas

changed.There'snosimplewayformetoaggregatethe

informationfrommultiplespreadsheets,andifIwanttomakea

changetothelogic,IhavetomakethatchangeeverytimeI



downloadnewinformation.Thatthoroughlydiscouragesme

frommakinglogicchanges.



Anothercostofspreadsheetsisthattheyactasroachmotels:data

comesinbutitnevergoesbackouttodatabases,exceptas

spreadsheets.Thisproblemwillbeaddressedinthenextchapter.



Excelhasaddressedtheseissuestosomedegreewithfeatures

likeODBCintegrationwithdatabases.Insteadofstoringallthe

informationinspreadsheetsdirectly,theusercanspecifyan

areaofthespreadsheettobepopulatedwithinformationfroma

databasequery.Inplaceswhereyoutrustyouruserswithsuch

accessorcanprovidesecurefacilitiestoprovidethe

information,thiscanbegenuinelyusefulstuff.Userscan

analyzeinformationusingtheCPUpowerontheirdesktops,

customizehowtheyseethedata,andmanipulateitwithout

ever(hopefully)havingtorequestdevelopmentofcustom

processes.Theycanloadnewdataintotheirspreadsheets

whenevertheyneedtodoso,withoutfearofoverwritingthe

logicthey'vesopainstakinglycreated.

Unfortunately,thatscenarioonlyworksforalimitednumberof

caseswhereusershavedirect(ornearlydirect)accessto

information.Therearemanyuntrustedusers,aswellasusers

whotravelorareotherwisedisconnected.Therearelotsof

userswhoneedaccesstohistoricalinformation,andmayneed

toprocessthatinformationafewtimesbeforeactuallylettingit

intothefinalspreadsheet.Thereareuserswithintermittent

connections,whoaccesstheirinformationthroughthingslike

webserversandfileservers.

Inthesecases,usingXMLasabaseformatfordataworksvery

nicely.XMLfilesareself-contained,andareeasilysentas

attachmentsinemailorloadedfromafileorwebserver



withoutanyspecialinfrastructure.Insteadofusershaving

directaccesstoadatabase,theycanbegivenaccesstocopies

ofthepartsofthedatabasethatinterestthem.Ifuserswantto

tinkerwiththedataforforecasting,forinstance,orjusttomake

themselvesfeelbrieflybetterabouttheirresultstheycantinker

withouthavinganyimpactontheoriginaldatasource.Users

whowanttoaggregateinformationfrommultipledatasources

candosousingeitherExcel'sowntoolsorthewidevarietyof

XML-processingtoolsavailable.

UserscanalsotreatExcelasatoolforcreatingand

manipulatingXMLdata,providedthatthedatastructuresfit

neatlyintoExcel'sexpectationsofcolumnsandrows.While

ExcelisinsomewaysamorelimitedXMLeditorthanWord,it

alsoprovidesamuchsimplerinterface,onethatiseasyfor

userstosetupandusethemselves.



6.2LoadingXMLintoanExcelSpreadsheet

ThereareseveraldifferentwaystoloadXMLdataintoExcel.

Someareusefulmostlyforquickexplorationandmaybesome

editing,whileothersaremoreappropriateforcreating

spreadsheetsthatuseXMLasadatasourcethatcanbeeasily

replacedwithnewdatawheneverappropriate.Allofthese

mechanismsshareacommonapproachforshowingXMLdatain

thespreadsheet,soit'sworthtakingamomenttoexaminehow

ExcelhandlesXMLstructuresbeforemovingintothemechanics

ofimportingdata.

WhenExcelopensanXMLfile,itimportsdatafromit.Ifyou

makechangestotheXMLfilewhileExcelisworkingwiththe

dataithasimportedfromthatfile,changestotheXMLwillnot

bereflectedintheExcelspreadsheet.



6.2.1TablesandTrees

Excel,likeallspreadsheets,isbuiltonagrid.Informationis

organizedintorowsandcolumns,andthisworksheetgrid(as

wellasrelationshipsamongmultipleworksheetgridsina

workbook)isusedtocreatecross-referencesbetweendifferent

sectionsofinformation.Withinthegrid,Excelisenormously

flexible.Informationdoesn'thavetofollowneattable

structurespricingdatacould,ifdesired,rundiagonallydowna

spreadsheet.It'seasiertoworkwithrangesofinformationifit

staysinasingleroworcolumn,though,somostspreadsheets

combinetableareasthatcontainrawdataandtheneither

tablesofresultsorcellsalongthefringesofthetables.

XMLhasnobuilt-innotionofagrid.Whileit'scertainlypossible

torepresentaspreadsheet'srowsandcolumnsofcellswithina

worksheetasXML(andChapter7willexplorehowMicrosoft's



chosentodothis),there'snoguaranteethatanygivenXML

documentwillneatlyfitintothenativestructuresofExcel.

Thereareafewsimplebutcriticalconditionsthatmustapplyto

XMLdocumentsforthemtobeusedeasilyassourcedatafor

Excel:



Treestructuresthatproducerows

ExcelworksbestonXMLdocumentswhentheyconformto

itsstructuralexpectations.TherootelementoftheXML

documentshouldactastheprimarycontainerforatableof

information.Eachofthechildelementsoftherootelement

shouldrepresentarow.Eachofthechildelements(or

attributes)oftherowelementsshouldrepresentacellin

thegrid.Roughly,thislookslike:





...value...

...value...

...value...

...value...





...value...

...value...



...value...

...value...



....





Excelalsoworkswellwithcellsexpressedasattributes:




cell-name4="value"/>


cell-name4="value"/>

....





Attributesandelementscanalsobemixed:





...value...



...value...





...value...

...value...



....





Excelisprettyrelaxedabouttheorderinwhichthese

appearaswell,asitusesthenamesofelementsand

attributesratherthantheirorderwhencreatingamap.



ItispossibletoextractportionsofXML

documentsthatlooklikethesestructures,

eveniftherestofthedocumentlooks

different,butitdoestakeafewextrasteps.



Regularstructure

WhenExcelworkswithanXMLdocument,itrepresentsthe

dataasrowsandcolumns.It'sverydifficultforExcelto

determinewhichrowsandcolumnstocreateifthedataof

thedocumentisn'tconsistent.Itdoesmakeabesteffort,



buttherearelimits.Theoccasionalmissingpieceof

informationshouldn'tcausedrasticdifficulties,butextra

informationmaynotbeimported,andconsistencymakes

resultsmuchmorepredictable.



Nomixedcontent

OneofXML'sbestfeaturesforworkingwithdocumentsis

theabilitytomixelementsandtexttogetherfreely.A

classicsimpleuseofmixedcontentishighlighting

informationinboldoritalic:



Thisisinboldandthisisinitalic


Unfortunately,thesestructuresfitverybadlywithExcel's

viewofXMLdataascellsinagrid.Ifyouneedtoprocess

XMLdatathatincludesmixedcontent,youshouldeither

useWord(whichisdesignedtosupportit)orpre-process

yourXMLtostripouttheextramarkup.



Schemafortypeinformation(optional)

WhileExceldoesn'trequireXMLSchemafilesthatdescribe

theXMLdocumentsyouuse,schemascanbeavery

convenienttoolbothfordescribingtheinformationthat

you'llbeincludinginaspreadsheettoExcelandforsanitycheckingthedocumentsusersworkwithintheExcel

environment.Ifthereisn'taschema,Excelmakesapretty

goodbestefforttoanalyzedataandguesswhatschema

wouldbeappropriate.



Limiteddepth

Exceldoeswellwithlistsofinformation,butcanreallyonly

presenttwolevelsoflists,representingrowsandcells.Ifa

documenthasmanylayersoflists,oruseselements

containingelementswiththesamename(recursive

markup,commonlyusedinlists),Excelwillnotbeableto

importallofthedata.

Effectively,Excelonlyworkswellwithasmallsubsetofthe

manypossibleXMLdocumentstructures.TheExcelsubset,

however,isanextremelycommonsubsetinpractice.Enormous

amountsofdataareavailableinXMLformatsthatworkwell

withExcel.



6.2.2OpeningXMLDocumentsDirectly

ThestandardExceldialogboxforopeningfilesshowsXMLfiles

(orfilesendingintheextension.xml)rightalongwithExcel

spreadsheets,asshowninFigure6-1.



Figure6-1.XMLfilesappearingintheExcelOpen

dialogbox



OnlyoneofthechoicespresentedhereisatraditionalExcel

spreadsheet,twoPlusTwo.xls.TheotherfilesareXMLfiles.XML

filesthatExcelknowsbelongtoMicrosoftWord(thankstothe

mso-applicationprocessinginstruction),thech02-xseries,

aremarkedwiththeWordicon,whilech0601.xml,anExcel

SpreadsheetMLfile,hastheExcelicon.XMLfilesusingother

vocabulariesgetadifferenticon.Onmysystem,theygeta

Mozillalogo,buttheymayhaveadifferentlogoonyour

system,dependingonwhatXML-processingsoftwareyouhave

installed.

Whateverlogoappears,however,youcanattempttoopenany

XMLfile.IftheXMLcontainsanythingotherthanExcel'sown

SpreadsheetML,coveredinChapter7,you'llseethedialogbox

showninFigure6-2.



Figure6-2.Dialogboxforchoosinghowtohandle

XMLdocumentimportation



IftheXMLdocumentyouopencontainsanyelementsnamedhtml,you

won'tseethedialogboxshowninFigure6-2.Instead,Excelwill

attempttoopenitasanHTMLdocument.Itevenseemstodothisif

theelementsthatlooklikeHTMLareinanothernamespace.



6.2.2.1Openingdocumentsasalist

We'llstartwithasimpleXMLdocumentrecording(imaginary)

salesofbookstoexplorehowthesedifferentoptionswork,

showninExample6-1.



Example6-1.AsimpleXMLdocumentforanalysis

inExcel









10/5/2003



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Chapter 6. Working with XML Data in Excel Spreadsheets

Tải bản đầy đủ ngay(0 tr)

×
x