Tải bản đầy đủ - 0 (trang)
Chapter 31. System-Level Fault Tolerance (Clustering/Network Load Balancing)

Chapter 31. System-Level Fault Tolerance (Clustering/Network Load Balancing)

Tải bản đầy đủ - 0trang

applications,userdata,andnetworkingservicesisrequired

whenunexpecteddowntimeisunacceptable.

WindowsServer2003providesseveralmethodsofimproving

system-orserver-levelfaulttolerancebyusingafewofthe

servicesincludedintheEnterpriseandDatacenterplatforms.

Chapter30,"FileSystemFaultTolerance,"discussedfile-level

faulttolerance,includingtheDistributedFileSystem(DFS)and

volumeshadowcopies.Thischaptercoverssystem-levelfault

toleranceusingWindowsServer2003networkloadbalancing

(NLB)andtheMicrosoftClusterService(MSCS).Thesebuilt-in

clusteringtechnologiesprovideload-balancingandfailover

capabilitiesthatcanbeusedtoincreasefaulttolerancefor

manydifferenttypesofapplicationsandnetworkservices.Each

oftheseclusteringtechnologiesisdifferentinmanyways.

Choosingthecorrecttypeofclusteringdependsonthe

applicationsandservicesthatwillbehostedonthecluster.

WindowsServer2003technologiessuchasNLBandMSCS

improvefaulttoleranceforapplicationsandnetworkservices,

butbeforethesetechnologiescanbeleveragedeffectively,basic

serverstabilitybestpracticesmustbeputinplace.

Thischapterfocusesonthepoliciesandproceduresneededto

createanenvironmentthatsupportsafault-tolerantnetwork.

Additionally,thischaptercontainsthestep-by-stepprocedures

neededtomakeserverhardwaremorereliablethroughthe

successfulimplementationofNLBandMSCS.







BuildingFault-TolerantSystems

Buildingfault-tolerantcomputingsystemsconsistsofcarefully

planningandconfiguringserverhardwareandsoftware,

networkdevices,andpowersources.Purchasingqualityserver

andnetworkhardwareisagoodstarttobuildingafault-tolerant

system,buttheproperconfigurationofthishardwareisequally

important.Also,providingthisequipmentwithstablelinepower

thatisbackedupbyabatteryorgeneratoraddsfaulttolerance

tothenetwork.Lastbutnotleast,propertuningofserver

operatingsystemshelpsenhanceavailabilityofnetwork

servicessuchasfileshares,printservers,networkapplications,

andauthenticationservers.



UsingUninterruptiblePowerSupplies

Connectinglinepowertoserverandnetworkdevicesthrough

uninterruptiblepowersupplies(UPSs)notonlyprovides

conditionedincomingpowerbyremovingvoltagespikesand

providingsteadylinevoltagelevels,butitalsoprovidesbattery

backuppower.Whenlinepowerfails,theUPSswitchesto

batterymode,whichshouldprovideampletimetoshutdown

theserverornetworkdevicewithoutriskofdamaginghardware

orcorruptingdata.UPSmanufacturerscommonlyprovide

softwarethatcansendnetworknotifications,runscripts,or

evengracefullyshutdownserverswhenpowerthresholdsare

met.Onefinalwordonpoweristhatmostcomputerand

networkhardwaremanufacturersprovidedeviceconfigurations

thatincorporateredundantpowersuppliesdesignedtokeepthe

systempoweredupintheeventofasinglepowersupply

failure.

Duringpoweroutages,manysystemadministratorsfindout

whichcriticaldevicesarenotconnectedtoaUPS,andtherace



beginstoshutdownandshiftpowerfromnon-criticaldevices.

Toavoidthesesituations,administratorsneedtoperform

regularinspectionsofcriticalhardwaredevicesinserverrooms

andnetworkclosetstoensurethatallnecessaryservers,

networkrouters,switches,hubs,andfirewallsarebackedby

batterypower.Whenpowertoaserverfailsandthebattery

providesonlyafewminutesforuserstosavedataandclose

connectionstoreducethechanceofdatacorruption,itis

essentialforthenetworktoremainavailable.



ChoosingNetworkingHardwareforFault

Tolerance

Networkdesigncanalsoincorporatefaulttolerancebycreating

redundantnetworkroutesandbyutilizingtechnologiesthatcan

groupdevicestogetherforthepurposesofloadbalancingand

devicefailover.Loadbalancingistheprocessofspreading

requestsacrossmultipledevicestokeepindividualdeviceload

atanacceptablelevel.Failoveristheprocessofmoving

servicesofferedononedevicetoanotherupondevicefailure,to

maintainavailability.

NetworkinghardwaresuchasEthernetswitches,routers,and

networkcardscanbeconfiguredtoprovidefault-tolerant

servicesthroughload-balancingapplicationsorthroughfeatures

withinthenetworkdevicefirmwareoroperatingsystem.Refer

tothemanufacturer'sdocumentationtoresearchfault-tolerant

configurationsavailableinyourorganization'snetworkdevices.

Formorerobustredundantnetworkcardconfigurations,thirdpartyhardwarevendorshavecreatednetworkcardteamingand

networkcardfault-tolerantsoftwareapplications.These

technologiesallowclient/servercommunicationtofailoverfrom

onenetworkinterfacecard(NIC)toanotherintheeventofan

NICfailure.Also,theycanbeconfiguredtobalancenetwork



requestsacrossalltheNICsinoneserversimultaneously.Refer

totheparticularhardwaremanufacturer'sdocumentationto

findoutwhetheracompatibleteamingapplicationisavailable

foryournetworkcard.



Note

WindowsServer2003networkloadbalancingdoes

notallowmultipleNICsonthesameserverto

participateinthesameNLBcluster.



SelectingServerStorageforRedundancy

Serverdiskstorageusuallycontainsuserdataand/oroperating

systemfilesthatmakeitacriticalserversubsystemthatshould

incorporatefaulttolerance.Thereareafewdifferentwaysto

createfault-tolerantdiskstoragefortheWindowsServer2003

operatingsystem.ThefirstiscreatingRedundantArraysof

InexpensiveDisks(RAID)usingdiskcontrollerconfiguration

utilities,andthesecondiscreatingtheRAIDdisksusing

dynamicdiskconfigurationfromwithintheWindowsServer

2003operatingsystem.

Usingtwoormoredisks,differentRAID-levelarrayscanbe

configuredtoprovidefaulttolerancethatcanwithstanddisk

failuresandstillprovideuninterrupteddiskaccess.

Implementinghardware-levelRAIDconfiguredandstoredon

thediskcontrollerispreferredoverthesoftware-levelRAID

configurablewithinWindowsServer2003DiskManagement

becausetheDiskManagementandsynchronizationprocessesin

hardware-levelRAIDareoffloadedtotheRAIDcontroller.With

DiskManagementandsynchronizationprocessesoffloadedfrom



theRAIDcontroller,theoperatingsystemwillperformbetter

overall.

Anothergoodreasontoprovidehardware-levelRAIDisthatthe

configurationofthedisksdoesnotdependontheoperating

system,whichgivesadministratorsgreaterflexibilitywhenit

comestorecoveringserversystemsandperformingupgrades.

RefertoChapter22,"WindowsServer2003Managementand

MaintenancePractices,"formoreinformationonwaystocreate

RAIDarraysusingWindowsServer2003DiskManagement.

Also,refertothemanufacturer'sdocumentationoncreating

RAIDarraysonyourRAIDdiskcontroller.



ImprovingApplicationReliability

Anapplication'sreliabilityisgreatlydependentonthesoftware

codeandthehardwareitisrunningon.Administratorscan

makeapplicationsmorereliableonWindowsServer2003by

runninglegacyclient/serverapplicationsinlowerapplication

compatibilitymodestoimproveoverallreliability;theydosoby

isolatingeachapplicationinstancetoaseparatememory

location.Ifoneinstancecrashes,theremaininginstancesand

theserveritselfremainavailableandunaffected.Reliabilityfor

client/server-basedapplicationswrittenforWindowsServer

2003canbeimprovedbydeployingtheseapplicationson

clusters.WindowsServer2003EnterpriseandDatacenter

serversprovidetwodifferentclusteringtechnologiesthat

enhanceapplicationreliabilitybyprovidingserverload

balancingandfailovercapabilities.







ExaminingWindowsServer2003Clustering

Technologies

WindowsServer2003providestwoclusteringtechnologies,

whichareincludedontheEnterpriseandDatacenterserver

platforms.Clusteringisthegroupingofindependentserver

nodesthatareaccessedandviewedonthenetworkasasingle

system.Whenanapplicationisrunfromacluster,theenduser

canconnecttoasingleclusternodetoperformhiswork,or

eachrequestcanbehandledbymultiplenodesinthecluster.In

caseswheredataisread-only,theclientmayrequestdataand

receivetheinformationfromallthenodesinthecluster,

improvingoverallperformanceandresponsetime.

ThefirstclusteringtechnologyWindowsServer2003providesis

ClusterService,alsoknownasMicrosoftClusterService

(MSCS).TheClusterServiceprovidessystemfaulttolerance

throughaprocesscalledfailover.Whenasystemfailsoris

unabletorespondtoclientrequests,theclusteredservicesare

takenofflineandmovedfromthefailedservertoanother

availableserver,wheretheyarebroughtonlineandbegin

respondingtoexistingandnewconnectionsandrequests.

ClusterServiceisbestusedtoprovidefaulttoleranceforfile,

print,enterprisemessaging,anddatabaseservers.

ThesecondWindowsServer2003clusteringtechnologyis

networkloadbalancing(NLB)andisbestsuitedtoprovidefault

toleranceforfront-endWebapplicationsandWebsites,

Terminalservers,VPNservers,andstreamingmediaservers.

NLBprovidesfaulttolerancebyhavingeachserverinthe

clusterindividuallyrunthenetworkservicesorapplications,

removinganysinglepointsoffailure.Certainapplicationsfor

example,TerminalServicesrequireaclienttoconnecttothe

sameserverduringtheentiresession,whileclientsviewing

Websitescanrequestpagesfromanynodeinthecluster



duringavisit.Configuringhowclient/servercommunicationis

dividedandbalancedacrosstheserversisdependentonthe

application'sneeds.



Note

MicrosoftdoesnotsupportrunningbothMSCSand

NLBonthesamecomputerduetopotential

hardwaresharingconflictsbetweenthetwo

technologies.



ReviewingClusterTerminology

BeforeyoucandesignandimplementMSCSandNLBclusters,

youmustunderstandcertainclusteringterminology.The

followinglistdescribeskeytermsassociatedwithWindows

Server2003clustering:

ClusterAclusterisagroupofindependentserversthatare

accessedandviewedonthenetworkasasinglesystem.

NodeAnodeisanindependentserverthatisamemberof

acluster.

ClusterresourceAclusterresourceisanetwork

applicationorservicedefinedandmanagedbythecluster

application.Someexamplesofclusterresourcesare

networknames,IPaddresses,logicaldisks,andfileshares.

ClusterresourcegroupClusterresourcesarecontained

withinaclusterinalogicalsetcalledaclusterresource



group,orcommonlyreferredtoasaclustergroup.Cluster

groupsaretheunitsoffailoverwithinthecluster.Whena

clusterresourcefailsandcannotberestartedautomatically,

theentireclustergroupistakenofflineandfailedoverto

anotheravailableclusternode.

ClustervirtualserverAclustervirtualserverisacluster

resourcegroupthatcontainsanetworknameandIP

addressresource.Virtualserverresourcesareaccessed

eitherbythedomainnamesystem(DNS)orNetBIOSname

resolutionordirectlyfromtheIPaddress.ThenameandIP

addressremainthesameregardlessofwhichclusternode

thevirtualserverisrunningon.

ClusterheartbeatTheclusterheartbeatisthe

communicationthatiskeptbetweenindividualcluster

nodesthatisusedtodeterminenodestatus.Typically,

heartbeatcommunicationbetweennodesmustbenolonger

than500milliseconds,orthenodesmaybelievethatthere

isafailureandcommenceclustergroupfailovers.

ClusterquorumdiskTheclusterquorumdiskmaintains

thedefinitiveclusterconfigurationdata.MSCSusesa

quorumdiskordisksandrequirescontinuousaccesstothe

clusterconfigurationdatacontainedwithinit.Thequorum

containsconfigurationdatadefiningwhichservernodes

activelyparticipateinthecluster,whatapplicationsand

servicesaredefinedinthecluster,andthecurrentstatesof

theresourcesandtheindividualnodes.Thisdataisusedto

determinewhetheraparticularresourcegrouporgroups

needtobefailedtoanavailableclusternodeintheeventof

afailureonanactivenode.Ifaclusternodelosesaccessto

thequorum,theClusterServicewillfailonthatnode.Ina

typicalMSCScluster,thequorumresourceislocatedona

sharedstoragedevice.



LocalquorumresourceLikethequorumresource,the

localquorumcontainstheclusterconfigurationdata.Unlike

thestandardquorumdevicethatisusuallyhousedona

shareddisk,thelocalquorumiskeptonanode'slocaldisk.

Thelocalquorumresourcewascreatedforsingle-node

clusterconfigurations,commonlyusedforcluster

applicationdevelopmentandtesting.

MajorityNodeSet(MNS)resourceTheMNSresourceis

thequorumresourceusedforaMajorityNodeSetcluster.

TheMNSresourcemaintainsconsistentconfigurationdata

acrossallthenodesinthecluster.IftheMNSquorumis

lost,itcanberecoveredby"forcingthequorum"ona

remainingclusternode.RefertotheWindowsServer2003

onlinehelpandlookforthetopic"ForcingtheQuorumina

MajorityNodeSetCluster."

GenericclusterresourceGenericclusterresourceswere

createdtodefinecluster-unawareapplicationswithina

clustergroup.Thisgivestheabilitytofailtheresourceover

toanothernodeintheclusterwhentheactivenodefails.

Thisresourceisnotmonitoredbytheclusterapplication;

therefore,applicationfailuredoesnotresultinarestartor

failoverscenario.Genericclusterresourcesincludethe

genericapplication,genericscript,andgenericservice

resources.Formoreinformationontheseresources,refer

totheWindowsServer2003HelpandSupporttooland

searchfor"genericclusterresources."

Cluster-awareapplicationAcluster-awareapplication

providesamechanismbywhichtheClusterServicecantest

theapplicationavailabilitytodeterminewhetheritis

functioningasdesired.Whenacluster-awareapplication

fails,theclustercanstopandrestarttheapplicationas

necessaryonthesamenodeand,ifnecessary,moveitto

anotheravailablenodewhereitcanberestarted.



Cluster-unawareapplicationAcluster-unaware

applicationcanrunonacluster,buttheapplicationitselfis

notmonitoredbytheClusterService.Thismeansthatthe

clustercanfailovertheapplicationonlyintheeventthat

anotherresourcefailsintheclustergroup.Iftheapplication

stopsresponding,theclusterisnotawareandtherefore

cannotrestartit.Keepinmindthatthereareotherwaysto

managecluster-unawareapplicationsoutsidethecluster,

andinsomecasestheseapproachesmaybetheonly

option.Formoreinformationonhowtoinstallandconfigure

genericapplications,refertotheWindowsServer2003Help

andSupportandsearchfor"genericapplicationresource

type."

FailoverFailoveristheprocessofaclustergroupmoving

fromthecurrentactivenodetoanotheravailablenodein

thecluster.Failoveroccurswhenaserverbecomes

unavailableorwhenaresourceintheclustergroupfailsand

cannotrecoverwiththefailurethreshold.

FailbackFailbackistheprocessofaclustergroupmoving

backtoapreferrednodeafterthepreferrednoderesumes

clustermembership.Failbackmustbeconfiguredwithina

clustergroupforthistohappen.Theclustergroupmust

haveapreferrednodedefinedandafailbackthreshold

configured.Apreferrednodeisthenodeyouwouldlike

yourclustergrouptorunonduringregularcluster

operation.Whenagroupisfailingback,theclusteris

performingthesamefailoveroperationbutistriggeredbya

serverrejoiningorresumingclusteroperationinsteadofby

aserverorresourcefailure.



Note

Plancarefullywhenconsideringfailback.Formore

information,refertothe"ConfiguringFailoverand



Failback"sectionlaterinthischapter.



Active/PassiveClusteringMode

Active/passiveclusteringoccurswhenonenodeinthecluster

providesclusteredserviceswhiletheotheravailablenodeor

nodesremainonlinebutdonotprovideservicesorapplications

toendusers.Whentheactivenodefails,theclustergroups

previouslyrunningonthatnodearefailedovertothepassive

node,causingthenode'sparticipationintheclustertogofrom

passivetoactivestatetobeginservicingclientrequests.

Thisconfigurationisusuallyimplementedwithdatabaseservers

thatprovideaccesstodatathatisstoredinonlyonelocation

andistoolargetoreplicatethroughouttheday.Oneadvantage

ofActive/Passivemodeisthatifeachnodeintheclusterhas

similarhardwarespecifications,thereisnoperformanceloss

whenafailoveroccurs.Theonlyrealdisadvantageofthismode

isthatthepassivenode'shardwareresourcescannotbe

leveragedduringregulardailyclusteroperation.



Note

Active/passiveconfigurationsareagreatchoicefor

keepingclusteradministrationandmaintenanceas

lowaspossible.Forexample,thepassivenodecan

beusedtotestupdatesandotherpatcheswithout

directlyimpactingproduction.However,itis

nonethelessimportanttotestinanisolatedlab

environmentor,ataminimum,duringafterhoursor

predefinedmaintenancewindows.



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Chapter 31. System-Level Fault Tolerance (Clustering/Network Load Balancing)

Tải bản đầy đủ ngay(0 tr)

×