Tải bản đầy đủ - 0 (trang)
Hour 8. Storing Data in Microsoft Azure Storage Blob

Hour 8. Storing Data in Microsoft Azure Storage Blob

Tải bản đầy đủ - 0trang

FIGURE8.1VisualrepresentationofstorageconceptsinMicrosoftAzure.

Note

BecauseHDInsightcurrentlysupportsonlyblockblobs,thishourfocuses

onlyonblobstorage.ThisbookdoesnotdiscusstheotherAzurestorage

services.



BenefitsofAzureStorageBloboverHDFS

HadoopclustershavebeenrelyingontraditionalHDFSforstoragefromtheearlydays.

However,HDInsightreliesonblobstoragetostoreuserdataandusesHDFSonlyasa

temporarystorageforstoringintermediateprocessingresultsandtemporaryjobdata.

Theseprimarymotivatingfactorsstrengthenthecasefortheuseofblobstorage:

BlobstoragefacilitatesthesafedecommissioningofHDInsightclusterwithoutthe

lossofuserdata.

Becausethedatastorageisnotdependentonthecluster,itispossibletoeasily

decommissiontheclusterwheneveritisnotinuse,thusdeliveringadditionalcost

benefits.

Blobstoragealsoprovidessomesecondarybenefits:

BecausethedataresidesintheBlobstoreinsteadofHDFSonacomputecluster,

multipleHDInsightclustersandotherapplicationscaneasilysharethisdata.

Blobstoragehasbuilt-inscalingcapabilities.WithHDFSstorage,ontheotherhand,

scalingoutstoragewouldinvolveaddingdatanodesinthecluster.Thisfurther

impliesthatstoragecanbescaledoutwithoutscalingoutthecompute,andvice

versa.ThisismuchhardertoachievewithtraditionalHDFS,inwhichstorageand

computearetiedtogether.

Blobstoragecanbegeoreplicatedandprovidestheadditionalbenefitsofgeographic

recoveryanddataredundancy.



AzureStorageExplorerTools

SeveraltoolsareavailableforaccessingAzureStorage.Thissectionexploressomeof

them.



AzureStorageExplorer

AzureStorageExplorerisaWindowsExplorer–likefreetoolavailablefordownloadon

CodePlexathttp://azurestorageexplorer.codeplex.com/.

TouseAzureStorageExplorer,youmusthaveanAzurestorageaccountnameandan

accountkey.Toobtaintheaccountnameandaccesskey,navigatetotheAzure

ManagementPortaland,fromtheStoragetab,selectManageAccessKeysitemfromthe

bottomofthepage.Thisbringsupdetailsrelatedtotheaccountnameandaccesskeys(see

Figure8.2).



FIGURE8.2RetrievingtheAzurestorageaccountnameandaccesskeys.

ClicktheAddAccountbuttoninAzureStorageExplorerandpopulatethestorageaccount

nameandstorageaccountkeyyouobtainedearlier(seeFigure8.3).



FIGURE8.3PopulatingstorageaccountnameandkeyinAzureStorageExplorer.

AzureStorageExplorerisnowreadyforuse.WithitsWindowsExplorer–likeinterface,

AzureStorageExplorerallowsfortheconvenientadditionandremovalofcontainersand

files.



AZCopy

AZCopyisacommand-lineutilitythatyoucanincorporateintocustomapplicationsto

transferdataintoandoutofAzurestorage.AZCopyissimilarinusetootherMicrosoft

filecopyutilities,suchasrobocopy,andalsosupportsnesteddirectoriesandfiles.

Forexample,thefollowingcommandcopiesallsubdirectoriesandfilesintheC:\Data

directoryonthelocaldiskasblockblobstothecontainernamed

MyStorageContainerinthestorageaccountMyStorageAccount.Useofthe/S

switch(recursivemode)ensuresthatfilesinanysubdirectoriesarealsocopied.

Clickheretoviewcodeimage

AzCopyC:\Data

https://MyStorageAccount.blob.core.windows.net/MyStorageContainer/

/destkey:AccessKey/S



AzurePowerShell

AzurePowerShellprovidescmdletstomanageAzurethroughWindowsPowerShell.

AzurePowerShellisinstalledfromtheMicrosoftWebPlatformInstaller(seeFigure8.4).



FIGURE8.4InstallingWindowsAzurePowerShell.

Thecmdletsneedsubscriptiondetailstoconnectandmanagetheservices.Youprovide

subscriptioninformationintwoways:

Usingamanagementcertificate

SigningdirectlyintoAzurewithusercredentials(theAzureADmethod)

TheAzureADmethodistheeasiermethod,butthecredentialsareavailabletoAzure

PowerShellforonly12hoursbeforetheyexpire.BeforeyoucanuseAzurePowerShellto

uploaddatatoBlobStorage,youneedthenameofthedestinationstorageaccountandthe

container.Theeasiestwaytoobtainanexistingstorageaccountandcontainernameor

createanewoneistonavigatetotheStoragetabontheAzureManagementPortal.

ClickingtheNewbuttoncreatesanewstorageaccount;selectinganexistingaccountand

clickingtheContainerstablistsexistingcontainersforastorageaccount.

TocopyafilefromlocaldisktoAzureStorageBlob,followthesesteps:

1.LaunchtheAzurePowerShellconsoleandtypeAdd-AzureAccount.Thisopens

apop-upwindow.

2.ProvidetheemailaddressandpasswordassociatedwiththeAzuresubscriptionin

thepop-upwindow.

3.AfterAzureauthentication,usethefollowingscripttouploadtheSample.txt

filefromtheC:\Datafoldertoblobstorage:

Clickheretoviewcodeimage



$storageAccountName=“StorageAccountName”

$containerName=“ContainerName”

$fileName=“C:\Data\Sample.txt”

$blobName=“example/data/sample.txt”

#Obtainthestorageaccount’sprimarykey

$storageaccountkey=get-azurestoragekey$storageAccountName|%

{$_.Primary}

#Createastoragecontextobject

$destContext=New-AzureStorageContext-StorageAccountName

$storageAccountName

-StorageAccountKey$storageaccountkey

#Finally,copyalocalfiletothedestinationBlobcontainer

Set-AzureStorageBlobContent-File$fileName-Container$containerNameBlob

$blobName-context$destContext



Note:BlobStorageIsaKey/ValuePairStore

Usingthe/characterinthefilenameinthepreviousexample(insteadof

usingthekeyname)doesnotimplytheexistenceofexampleordata

directoriesonBlobstorage.BecauseBlobstorageisakey/valuepairstore,no

underlyinghierarchicaldirectorystructureispresent.



HadoopCommandLine

YoucanuseHadoopcommandstomanipulatefilesandfoldersontheAzureStorage

Blob.Thefollowingexerciseexplorestheprocessofuploadingdatatodefaultfilesystem

(AzureStorageBlobforHDInsight)usingHadoopcommands.

Note:BehaviorofNativeHDFS-SpecificCommands

OnlycommandsthatarespecifictothenativeHDFSimplementation

(referredtoasDFS),suchasfschkanddfsadmin,willshowdifferent

behavioronAzurestorageblob.OtherHDFScommandsshowsimilar

behaviorinAzurestorageblobasinconventionalHDFSstorage.



TryItYourself:UploadingDatatotheDefaultFileSystemUsingtheHadoop

CommandLine

FollowthesestepstouploadasamplefiletoHDFSusingthecopyFromLocal

command:

1.Usearemotedesktopconnectiontoremotelylogintothenamenodeofthe

HDInsightcluster.

GOTO RefertoHour7,“ExploringTypicalComponentsofHDFS

Cluster,formoredetailsonenablingaremotedesktopconnectiontothe

HDInsightnamenode.

2.CreateatextfilewithsomecontentintheC:\tempdirectory(callit

sample.txt)onthenamenode.

3.OpentheHadoopcommandlinefromthedesktopshortcutandrunthe

followingcommand:

Clickheretoviewcodeimage

hadoopfs-copyFromLocalC:\temp\sample.txt/example/data/sample.txt

command



4.BecausetheHDInsightdefaultfilesystemisAzureStorageBlob,thefile

example/sample.txtisactuallyuploadedonAzureStorageBlob,not

HDFS.

5.Optionally,toverifywhetherthefilehasbeenuploadedtoAzureStorageBlob,

browsetheblobstoragewithAzureStorageExplorerandlookforthefile

example/data/sample.txt.



HDInsightStorageArchitectureDetails

HDInsightsupportsboththeHadoopDistributedFileSystem(HDFS)andAzureStorage

Blobforstoringdata.AzureStorageBlobisthedefaultfilesystem.

Tip

UsingtraditionalHDFSforuserdatastorageisnotrecommended.Traditional

HDFSismoretransitoryinnature,andHDInsightusesittostore

intermediateresultsgeneratedbyMapReducejobsandpackages.

Microsoft’simplementationofHDFSbasedonAzureStorageBlobiscalledWindows

AzureStorage—Blob(WASB).Figure8.5,adoptedfromazure.microsoft.com,showsthe

architectureofanHDInsightcluster.ThefigureillustratestheexistenceofAzureStorage

BlobandHDFSsidebyside.



FIGURE8.5HDInsightBlobstoragearchitecture.

AsyoucanseeinFigure8.5,theconventionalHDFSstorage(DFS)isstillavailabletothe

headnode(master)andcomputenodes(workers),alongwiththefull-featuredHDFS

implementationonWASB.

ConventionalHDFSstorageisaccessedusingthefollowingURI:

Clickheretoviewcodeimage

hdfs://:/



YoucanaccessblobstorageusingthefollowingfullyqualifiedURIscheme:

Clickheretoviewcodeimage

wasb[s]://@.blob.core.windows.net/



TheURIwiththewasbprefixprovidesunencryptedaccess.Thewasbsprefixprovides

SSLencryptedaccess.

Note:ASVVersusWASB

ThetermASV(AzureStorageVault),usedinearlierversionsofHDInsightto

refertoAzureStorageBlob,hasbeendeprecatedinfavorofWASB.

Therefore,thesyntaxasv://isnolongersupportedfromHDInsight

version3.0onward;ithasbeenreplacedwithwasb://syntax.

ThefullyqualifiedURIschemecontainingthecontainernameandstorageaccountname

isnotrequiredwhileaccessingthefilesonblobstorage,configuredasthedefaultfile

system,fromtheHadoopcommandline.Infact,youcannotusetherelativepathandthe

absolutepathinterchangeably.

Forexample,thesample.txtfileuploadedearliercanbereferredtousinganyoneof

thefollowingURIthreeschemesfromtheHadoopcommandline:

Clickheretoviewcodeimage



wasb://mycontainer@myaccount.blob.core.windows.net/example/data/sample.txt

wasb://example/data/sample.txt

/example/data/sample.txt



Note:FullyQualifiedDomainNamewithWASB

Whenusingwasb,thefullyqualifieddomainnameisrequiredandcannotbe

skipped.

ToquerythetraditionalHDFS,determinetheHDFSURIbyexaminingthe

dfs.namenode.rpc-addresspropertyinthefile

%hadoop_home%\etc\hadoop\hdfs-site.xml.Forexample,iftheHDFSURI

ishdfs://headnodehost:9000,youcanquerytheHDFScontentsusingthe

followingcommand:

Clickheretoviewcodeimage

hadoopfs-lshdfs://headnodehost:9000/



ConfiguringtheDefaultFileSystem

BlobstorageisthedefaultstorageforanHDInsightcluster;however,itispossible

(thoughnotrecommended)tochangethedefaultfilesystemtolocalHDFS.Becauselocal

HDFSistransitoryinnature,datastoredonHDFSislostwhentheclusteris

decommissioned.

Tomodifythissetting,connectremotelytotheHDInsightnamenode.(Fordetailed

instructionsonremotelogin,refertotheTryItYourself“ExploringtheServiceson

HDInsightNameNode”inHour7.)Youcanchangethedefaultfilesystemsettingby

modifyingthefs.defaultFSpropertyinthefile%hadoop_home%\etc\

hadoop\core-site.xml(seeFigure8.6).DeterminetheHDFSURIbyexamining

thefiledfs.namenode.rpc-addresspropertyin

%hadoop_home%\etc\hadoop\hdfs-site.xml.Openthefilecoresite.xmlinNotepadandmodifythefs.defaultFSpropertyvaluetouseHDFSas

thedefaultstorage.Saveandclosethefile.



FIGURE8.6Modifyingthedefaultfilesystem.

Usethecommandhadoopfs-ls/toquerythedefaultfilesystemandverifythatit

haschangedtoHDFS.



UnderstandingtheImpactofBlobStorageonPerformanceandData

Locality

AcoreprincipleofHadoopisdatalocality—thedatatobeprocessedresidesclosetothe

computenodesonlocaldisks.Thisproximityminimizesdatamovementbecauseeach

computenodeissupposedtoworkonthedatasetavailablelocally.

Incompletecontrast,inHDInsight,dataisstoredonAzureStorageBlob,separatedout

fromthecluster.Thisseemstoconflictwiththeideaofdatalocalityandinvolvesmoving

datatocomputeinsteadofmovingcomputetodata.Notreally!

AzureFlatNetworkStorageprovidesperformancecomparabletoreadsfromdisk.In

addition,notallqueriesoperateoncompletedatainthecluster.Mostqueriesaskfora

subsetofdata,whichfurtherreducestheamountofdatatobereadandtransferredfrom

blobstorage.

Note

Whenitcomestowrites,AzureStorageBlobisaclearwinner.Thewrite

operationtoHDFS(assumingareplicationfactorof3)isnotcompleteuntil

allthreecopiesarewritten.WithAzureStorageBlob,ontheotherhand,the

writeoperationiscompleteafterthefirstcopyiswritten.Azuresubsequently

takescareofreplication,forbetterwriteperformance.

Overall,AzureStorageBlobseemstobeareasonablygoodchoiceforstorage.



Summary

ThishourintroducedMicrosoftAzureStorageservicesanddiscussedthebenefits

associatedwithuseofAzureStorageBlobasthedefaultfilesysteminHDInsight.

TraditionalHadoopdistributionshavebeenusingHDFSasthedefaultfilesystem,but

AzureStorageBlobprovidesseveraladditionalbenefitsthatfavoritsuseasthedefault

filesysteminHDInsight.ThishouralsointroducedpopularAzureStorageExplorertools

andexaminedtheimpactofblobstorageonperformanceanddatalocality.



Q&A

Q.AreotherMicrosoftAzureStorageExplorertoolsavailablebesidestheones

discussedinthishour?

A.Yes,manyothertoolsareavailable.Foranelaboratelistoftoolsandafeature

comparison,refertoablogpostfromtheMicrosoftAzureStorageteam,at

http://blogs.msdn.com/b/windowsazurestorage/archive/2014/03/11/windows-azurestorage-explorers-2014.aspx.

Q.AreWindowsAzureandMicrosoftAzurethesametechnologies?

A.Yes.WindowsAzurewasrenamedMicrosoftAzureonApril3,2014.Refertothe

followingannouncementrelatedtothisnamechangefromMicrosoft:

http://azure.microsoft.com/blog/2014/03/25/upcoming-name-change-for-windowsazure/.



Quiz

1.WhatisthedefaultfilesysteminHDInsightcluster?

2.IsthetraditionalHDFSstillavailableandaccessibleinHDInsightcluster?

3.Whichcommand-linetoolcanbeusedconvenientlyincustomapplicationsto

transferdatatoandfromMicrosoftAzurestorage?



Answers

1.MicrosoftAzureStorageBlobisthedefaultfilesysteminHDInsightclusters.

2.TraditionalHDFSisstillavailable;however,itismoretransitoryanditsuseis

generallydiscouraged.

3.AZCopyisacommand-lineutilitythatcanbeincorporatedintocustomapplications

totransferdataintoandoutofAzurestorage.AZCopyissimilartootherMicrosoft

filecopyutilities,suchasrobocopy,andalsosupportsnesteddirectoriesandfiles.



Hour9.WorkingwithMicrosoftAzureHDInsightEmulator

WhatYou’llLearninThisHour:

GettingStartedwithHDInsightEmulator

SettingUpMicrosoftAzureEmulatorforStorage

UsingHDInsightclusterfordevelopment,exploration,andexperimentationisnotalways

feasiblebecauseofthecostsinvolvedinhavingamultinodeclusterconstantlyupand

running.AlthoughyoucandropandspinupanHDInsightclusterondemandtotestmap

reducejobfunctionality,tosavecost,youhaveabetteralternative.

MicrosoftprovidesHDInsightemulatorforsuchscenarios.Theemulatorisasinglenode

clusterthatiswellsuitedfordevelopmentscenariosandexperimentation.

ThishourfocusesonthestepsinvolvedinsettingupHDInsightemulatorandfiringa

MapReducejobtotesttheemulatorfunctionality.



GettingStartedwithHDInsightEmulator

Asthenamesuggests,HDInsightemulatorprovidesanemulationfortheAzureHDInsight

cluster.Withthecostsinvolvedinkeepingamultinodeclusterupandrunningfor

developmentandtesting,developingandtestingintheAzureHDInsightenvironmentis

notalwaysfeasible.

HDInsightemulatorcomestotherescuebyprovidingasingle-nodelocaldevelopment

environment(apseudocluster)forAzureHDInsight.

KeepthefollowingimportantpointsinmindwhendevelopingonHDInsightemulator:

HDInsightemulatorisapseudocluster,asingle-nodedeploymentwithallservices

(namenode,datanode,andsoon)runningonthesamenode.

HDFSisthedefaultfilesystemforthesingle-nodecluster.

Theemulatorismeantfordevelopmentscenariosonly.

AlthoughthedefaultfilesystemfortheemulatorisHDFS,itispossibletoswitchoverto

MicrosoftAzureblobstorage.Infact,HDInsightemulatorcanpairupwiththeMicrosoft

AzureemulatortoemulatetheAzurestoragedevelopmentexperiencelocally,asyou

exploreinsubsequentsections.



SettingUpMicrosoftHDInsightEmulator

YoucansetupMicrosoftHDInsightemulatoronaworkstationthatmeetscertain

prerequisites.

Onlythe64-bitversionofthefollowingoperatingsystemsaresupported:

Windows7ServicePack1

WindowsServer2008R2ServicePack1



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Hour 8. Storing Data in Microsoft Azure Storage Blob

Tải bản đầy đủ ngay(0 tr)

×