Tải bản đầy đủ - 0 (trang)
Chapter 14.  Block Device Drivers

Chapter 14.  Block Device Drivers

Tải bản đầy đủ - 0trang

14.1.BlockDevicesHandling

Eachoperationonablockdevicedriverinvolvesalargenumber

ofkernelcomponents;themostimportantonesareshownin

Figure14-1.

Letussuppose,forinstance,thataprocessissuedaread()

systemcallonsomediskfilewe'llseethatwriterequestsare

handledessentiallyinthesameway.Hereiswhatthekernel

typicallydoestoservicetheprocessrequest:

1. Theserviceroutineoftheread()systemcallactivatesa

suitableVFSfunction,passingtoitafiledescriptorandan

offsetinsidethefile.TheVirtualFilesystem



Figure14-1.Kernelcomponentsaffectedbya

blockdeviceoperation



istheupperlayeroftheblockdevicehandlingarchitecture,

anditprovidesacommonfilemodeladoptedbyall

filesystemssupportedbyLinux.Wehavedescribedat

lengththeVFSlayerinChapter12.

2. TheVFSfunctiondeterminesiftherequesteddatais

alreadyavailableand,ifnecessary,howtoperformtheread

operation.Sometimesthereisnoneedtoaccessthedata

ondisk,becausethekernelkeepsinRAMthedatamost

recentlyreadfromorwrittentoablockdevice.Thedisk

cachemechanismisexplainedinChapter15,whiledetails

onhowtheVFShandlesthediskoperationsandhowit

interfaceswiththediskcacheandthefilesystemsaregiven

inChapter16.

3. Let'sassumethatthekernelmustreadthedatafromthe

blockdevice,thusitmustdeterminethephysicallocationof

thatdata.Todothis,thekernelreliesonthemappinglayer

,whichtypicallyexecutestwosteps:



a. Itdeterminestheblocksizeofthefilesystemincluding

thefileandcomputestheextentoftherequesteddata

intermsoffileblocknumbers.Essentially,thefileis

seenassplitinmanyblocks,andthekerneldetermines

thenumbers(indicesrelativetothebeginningoffile)of

theblockscontainingtherequesteddata.

b. Next,themappinglayerinvokesafilesystem-specific

functionthataccessesthefile'sdiskinodeand

determinesthepositionoftherequesteddataondiskin

termsoflogicalblocknumbers.Essentially,thediskis

seenassplitinblocks,andthekerneldeterminesthe

numbers(indicesrelativetothebeginningofthediskor

partition)correspondingtotheblocksstoringthe

requesteddata.Becauseafilemaybestoredin

nonadjacentblocksondisk,adatastructurestoredin

thediskinodemapseachfileblocknumbertoalogical

blocknumber.[*]

[*]However,ifthereadaccesswasdoneonarawblockdevicefile,themappinglayer

doesnotinvokeafilesystem-specificmethod;rather,ittranslatestheoffsetinthe

blockdevicefiletoapositioninsidethediskordiskpartitioncorrespondingtothedevice

file.



WewillseethemappinglayerinactioninChapter16,while

wewillpresentsometypicaldisk-basedfilesystemsin

Chapter18.

4. Thekernelcannowissuethereadoperationonthe

blockdevice.Itmakesuseofthegenericblocklayer,

whichstartstheI/Ooperationsthattransferthe

requesteddata.Ingeneral,eachI/Ooperation

involvesagroupofblocksthatareadjacentondisk.

Becausetherequesteddataisnotnecessarily

adjacentondisk,thegenericblocklayermightstart

severalI/Ooperations.EachI/Ooperationis

representedbya"blockI/O"(inshort,"bio")

structure,whichcollectsallinformationneededby

thelowercomponentstosatisfytherequest.



Thegenericblocklayerhidesthepeculiaritiesofeach

hardwareblockdevice,thusofferinganabstractviewofthe

blockdevices.Becausealmostallblockdevicesaredisks,

thegenericblocklayeralsoprovidessomegeneraldata

structuresthatdescribe"disks"and"diskpartitions."We

willdiscussthegenericblocklayerandthebiostructurein

thesection"TheGenericBlockLayer"laterinthischapter.

5. Belowthegenericblocklayer,the"I/Oscheduler"sortsthe

pendingI/Odatatransferrequestsaccordingtopredefined

kernelpolicies.Thepurposeofthescheduleristogroup

requestsofdatathatlieneareachotheronthephysical

medium.Wewilldescribethiscomponentinthesection

"TheI/OScheduler"laterinthischapter.

6. Finally,theblockdevicedriverstakecareoftheactualdata

transferbysendingsuitablecommandstothehardware

interfacesofthediskcontrollers.Wewillexplaintheoverall

organizationofagenericblockdevicedriverinthesection

"BlockDeviceDrivers"laterinthischapter.

Asyoucansee,therearemanykernelcomponentsthatare

concernedwithdatastoredinblockdevices;eachofthem

managesthediskdatausingchunksofdifferentlength:

Thecontrollersofthehardwareblockdevicestransferdata

inchunksoffixedlengthcalled"sectors."Therefore,theI/O

schedulerandtheblockdevicedriversmustmanagesectors

ofdata.

TheVirtualFilesystem,themappinglayer,andthe

filesystemsgroupthediskdatainlogicalunitscalled

"blocks."Ablockcorrespondstotheminimaldiskstorage

unitinsideafilesystem.

Aswewillseeshortly,blockdevicedriversshouldbeableto

copewith"segments"ofdata:eachsegmentisamemory



pageoraportionofamemorypageincludingchunksofdata

thatarephysicallyadjacentondisk.

Thediskcachesworkon"pages"ofdiskdata,eachofwhich

fitsinapageframe.

Thegenericblocklayergluestogetheralltheupperand

lowercomponents,thusitknowsaboutsectors,blocks,

segments,andpagesofdata.

Eveniftherearemanydifferentchunksofdata,theyusually

sharethesamephysicalRAMcells.Forinstance,Figure14-2

showsthelayoutofa4,096-bytepage.Theupperkernel

componentsseethepageascomposedoffourblockbuffersof

1,024byteseach.Thelastthreeblocksofthepagearebeing

transferredbytheblockdevicedriver,thustheyareinsertedin

asegmentcoveringthelast3,072bytesofthepage.Thehard

diskcontrollerconsidersthesegmentascomposedofsix512bytesectors.



Figure14-2.Typicallayoutofapageincluding

diskdata



Inthischapterwedescribethelowerkernelcomponentsthat

handletheblockdevicesgenericblocklayer,I/Oscheduler,and

blockdevicedriversthuswefocusourattentiononsectors,

blocks,andsegments.



14.1.1.Sectors

Toachieveacceptableperformance,harddisksandsimilar

devicestransferseveraladjacentbytesatonce.Eachdata

transferoperationforablockdeviceactsonagroupofadjacent

bytescalledasector.Inthefollowingdiscussion,wesaythat

groupsofbytesareadjacentwhentheyarerecordedonthe

disksurfaceinsuchamannerthatasingleseekoperationcan

accessthem.Althoughthephysicalgeometryofadiskis

usuallyverycomplicated,theharddiskcontrolleraccepts

commandsthatrefertothediskasalargearrayofsectors.

Inmostdiskdevices,thesizeofasectoris512bytes,although

therearedevicesthatuselargersectors(1,024and2,048



bytes).Noticethatthesectorshouldbeconsideredasthebasic

unitofdatatransfer;itisneverpossibletotransferlessthan

onesector,althoughmostdiskdevicesarecapableof

transferringseveraladjacentsectorsatonce.

InLinux,thesizeofasectorisconventionallysetto512bytes;

ifablockdeviceuseslargersectors,thecorrespondinglowlevelblockdevicedriverwilldothenecessaryconversions.

Thus,agroupofdatastoredinablockdeviceisidentifiedon

diskbyitspositiontheindexofthefirst512-bytesectorandits

lengthasnumberof512-bytesectors.Sectorindicesarestored

in32-or64-bitvariablesoftypesector_t.



14.1.2.Blocks

Whilethesectoristhebasicunitofdatatransferforthe

hardwaredevices,theblockisthebasicunitofdatatransferfor

theVFSand,consequently,forthefilesystems.Forexample,

whenthekernelaccessesthecontentsofafile,itmustfirst

readfromdiskablockcontainingthediskinodeofthefile(see

thesection"InodeObjects"inChapter12).Thisblockondisk

correspondstooneormoreadjacentsectors,whicharelooked

atbytheVFSasasingledataunit.

InLinux,theblocksizemustbeapowerof2andcannotbe

largerthanapageframe.Moreover,itmustbeamultipleofthe

sectorsize,becauseeachblockmustincludeanintegral

numberofsectors.Therefore,on80x86architecture,the

permittedblocksizesare512,1,024,2,048,and4,096bytes.

Theblocksizeisnotspecifictoablockdevice.Whencreatinga

disk-basedfilesystem,theadministratormayselecttheproper

blocksize.Thus,severalpartitionsonthesamediskmight

makeuseofdifferentblocksizes.Furthermore,eachreador

writeoperationissuedonablockdevicefileisa"raw"access

thatbypassesthedisk-basedfilesystem;thekernelexecutesit



byusingblocksoflargestsize(4,096bytes).

Eachblockrequiresitsownblockbuffer,whichisaRAM

memoryareausedbythekerneltostoretheblock'scontent.

Whenthekernelreadsablockfromdisk,itfillsthe

correspondingblockbufferwiththevaluesobtainedfromthe

hardwaredevice;similarly,whenthekernelwritesablockon

disk,itupdatesthecorrespondinggroupofadjacentbyteson

thehardwaredevicewiththeactualvaluesoftheassociated

blockbuffer.Thesizeofablockbufferalwaysmatchesthesize

ofthecorrespondingblock.

Eachbufferhasa"bufferhead"descriptoroftypebuffer_head.

Thisdescriptorcontainsalltheinformationneededbythe

kerneltoknowhowtohandlethebuffer;thus,beforeoperating

oneachbuffer,thekernelchecksitsbufferhead.Wewillgivea

detailedexplanationofallfieldsofthebufferheadinChapter

15;inthepresentchapter,however,wewillonlyconsiderafew

fields:b_page,b_data,b_blocknr,andb_bdev.

Theb_pagefieldstoresthepagedescriptoraddressofthepage

framethatincludestheblockbuffer.Ifthepageframeisinhigh

memory,theb_datafieldstorestheoffsetoftheblockbuffer

insidethepage;otherwise,itstoresthestartinglinearaddress

oftheblockbufferitself.Theb_blocknrfieldstoresthelogical

blocknumber(i.e.,theindexoftheblockinsidethedisk

partition).Finally,theb_bdevfieldidentifiestheblockdevicethat

isusingthebufferhead(seethesection"BlockDevices"laterin

thischapter).



14.1.3.Segments

WeknowthateachdiskI/Ooperationconsistsoftransferring

thecontentsofsomeadjacentsectorsfromortosomeRAM

locations.Inalmostallcases,thedatatransferisdirectly

performedbythediskcontrollerwithaDMAoperation(seethe



section"DirectMemoryAccess(DMA)"inChapter13).The

blockdevicedriversimplytriggersthedatatransferbysending

suitablecommandstothediskcontroller;oncethedatatransfer

isfinished,thecontrollerraisesaninterrupttonotifytheblock

devicedriver.

ThedatatransferredbyasingleDMAoperationmustbelongto

sectorsthatareadjacentondisk.Thisisaphysicalconstraint:

adiskcontrollerthatallowsDMAtransferstonon-adjacent

sectorswouldhaveapoortransferrate,becausemovinga

read/writeheadonthedisksurfaceisquiteaslowoperation.

Olderdiskcontrollerssupport"simple"DMAoperationsonly:in

eachsuchoperation,dataistransferredfromortomemory

cellsthatarephysicallycontiguousinRAM.Recentdisk

controllers,however,mayalsosupporttheso-calledscattergatherDMAtransfers:ineachsuchoperation,thedatacanbe

transferredfromortoseveralnoncontiguousmemoryareas.

Foreachscatter-gatherDMAtransfer,theblockdevicedriver

mustsendtothediskcontroller:

Theinitialdisksectornumberandthetotalnumberof

sectorstobetransferred

Alistofdescriptorsofmemoryareas,eachofwhich

consistsofanaddressandalength.

Thediskcontrollertakescareofthewholedatatransfer;for

instance,inareadoperationthecontrollerfetchesthedata

fromtheadjacentdisksectorsandscattersitintothevarious

memoryareas.

Tomakeuseofscatter-gatherDMAoperations,blockdevice

driversmusthandlethedatainunitscalledsegments.A

segmentissimplyamemorypageoraportionofamemory

pagethatincludesthedataofsomeadjacentdisksectors.Thus,



ascatter-gatherDMAoperationmayinvolveseveralsegments

atonce.

Noticethatablockdevicedriverdoesnotneedtoknowabout

blocks,blocksizes,andblockbuffers.Thus,evenifasegment

isseenbythehigherlevelsasapagecomposedofseveral

blockbuffers,theblockdevicedriverdoesnotcareaboutit.

Aswe'llsee,thegenericblocklayercanmergedifferent

segmentsifthecorrespondingpageframeshappentobe

contiguousinRAMandthecorrespondingchunksofdiskdata

areadjacentondisk.Thelargermemoryarearesultingfrom

thismergeoperationiscalledphysicalsegment.

Yetanothermergeoperationisallowedonarchitecturesthat

handlethemappingbetweenbusaddressesandphysical

addressesthroughadedicatedbuscircuitry(theIO-MMU;see

thesection"DirectMemoryAccess(DMA)"inChapter13).The

memoryarearesultingfromthiskindofmergeoperationis

calledhardwaresegment.Becausewewillfocusonthe80x86

architecture,whichhasnosuchdynamicmappingbetweenbus

addressesandphysicaladdresses,wewillassumeintherestof

thischapterthathardwaresegmentsalwayscoincidewith

physicalsegments.







14.2.TheGenericBlockLayer

Thegenericblocklayerisakernelcomponentthathandlesthe

requestsforallblockdevicesinthesystem.Thankstoits

functions,thekernelmayeasily:

Putdatabuffersinhighmemorythepageframe(s)willbe

mappedinthekernellinearaddressspaceonlywhenthe

CPUmustaccessthedata,andwillbeunmappedrightafter.

Implementwithsomeadditionalefforta"zero-copy"schema,

wherediskdataisdirectlyputintheUserModeaddress

spacewithoutbeingcopiedtokernelmemoryfirst;

essentially,thebufferusedbythekernelfortheI/O

transferliesinapageframemappedintheUserMode

linearaddressspaceofaprocess.

ManagelogicalvolumessuchasthoseusedbyLVM(the

LogicalVolumeManager)andRAID(RedundantArrayof

InexpensiveDisks):severaldiskpartitions,evenon

differentblockdevices,canbeseenasasinglepartition.

Exploittheadvancedfeaturesofthemostrecentdisk

controllers,suchaslargeonboarddiskcaches,enhanced

DMAcapabilities,onboardschedulingoftheI/Otransfer

requests,andsoon.



14.2.1.TheBioStructure

Thecoredatastructureofthegenericblocklayerisadescriptor

ofanongoingI/Oblockdeviceoperationcalledbio.Eachbio

essentiallyincludesanidentifierforadiskstorageareatheinitial

sectornumberandthenumberofsectorsincludedinthe



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Chapter 14.  Block Device Drivers

Tải bản đầy đủ ngay(0 tr)

×