Tải bản đầy đủ - 0 (trang)
Chapter 5. Measuring Time and Complexity

# Chapter 5. Measuring Time and Complexity

Tải bản đầy đủ - 0trang

Chapter5.MeasuringTimeand

Complexity

INTHISCHAPTER

TheMarriageofTheoryandPractice

SystemInfluences

Inpreviouschapters,youhaveseenhowtoidentifypartsofprograms

thatarelikelycandidatesforoptimization.Youhavealsoseenthatitis

necessarytodeterminewhetheritisworthoptimizinganexisting

algorithm,orwhetheracompletelynewalgorithmisnecessarytoachieve

thenecessaryperformancegoals.However,whenyouhaveachoice

betweenalargenumberofalgorithms,howdoyoudecidewhichwillbe

fastest?Doyouimplementallofthemanddospeedstestsorcan

somethingintelligentbesaidbeforehand?Also,howexactlycanyou

performspeedcomparisonseasilybetweendifferentalgorithms?

Thischapterintroducesyoutothetoolsandbackgroundinformationyou

willneedwhenoptimizingyoursoftware.Theconceptslaidouthereare

usedthroughoutthisbooktoproveanddemonstratethepresented

theory.Thesubjectscoveredinthischapterare:anotationofalgorithm

complexity,ahandytimingfunctiontouseinyourcode,system

influencestolookoutforwhenperformingtimingtestsandtipsto

decreasetheinfluenceofthesystem.

TheMarriageofTheoryandPractice

Thissectionintroducestwotechniquesforprovidingefficiency

informationonalgorithms.Thefirsttechniqueisatheoreticaldescription

ofthecomplexityofanalgorithm,theO(n)notation.Itcanbeusedto

comparethemeritsofunderlyingconceptsofalgorithms.Thiscanbe

donewithoutdoinganyimplementation.Thesecondtechnique

introducedinthissectionisatimingfunctionthatcanbeusedtotime

implementedcode.Itisusedthroughoutthisbooktotimesuggested

codingsolutions.

AlgorithmComplexity(theONotation)

TheOnotationisusedtocomparetheefficiencyofdifferentalgorithms.

Thisnotationexpressesthemathematicalcomplexityofanalgorithm.It

tellsyou,forexample,howthenumberofelementsthatisstoredina

databaseinfluencestheamountoftimethatsortingthedatabasewill

take.So,bylookingattheOnotationofasortingalgorithm,youcan

determinetheimpactonsortingtimewhenthenumberofelements

increases.IntheOnotationtheletterOisfollowedbyanexpression

betweenbrackets.Thisexpressioncontainstheletternwhichdenotes

thenumberofelementsonwhichthealgorithmistobeunleashed..Here

aresomeexamplecomplexities:

O(n)

O(n^2)

Inthefirstexample—O(n)—thealgorithmcomplexityisadirectfunction

ofthenumberofelementsonwhichthealgorithmwillbeunleashed.This

means,forexample,thatifsearchingthroughadatabaseof1,000

elementstakesonesecond,itwilltaketwosecondstosearchthrougha

databaseof2,000elements.Italsomeansthatthreetimesthenumberof

elementstosearchwilltakethreetimesaslong.AnexampleofanO(n)

algorithmisanalgorithmthathandleseachelementthesamenumberof

timesduringprocessing.Traversingalinkedlistoranarrayfrom

beginningtoendisanexampleofthis.

Inthesecondexample—O(n^2)—thetimetakenbyanalgorithm

increaseswiththesquareofthenumberofelements;twiceasmany

elementstohandlewilltakefourtimesaslongandsoon.

Ofcoursethesenotationscanonlyserveasindications,theexact

amountoftimethatanalgorithmwilltakedependsonmanyvariables.

Thinkofhowcloseabaseofinformationalreadyistobeingsorted.Also,

somesearchingalgorithmsusestartingassumptions;thecloserthese

aretotheactualtruth,thefastersearchingwillbe.TheOnotationis

thereforeusedtoexpressgeneralcharacteristicsofalgorithms:worst

casetime,bestcasetime,averagetime.

Sohowcanyouusethisnotation?Thinkofasortingalgorithm:When

sortingsmallbasesofinformationyouwillhardlynoticeanydifference

betweentheperformanceofdifferentsortingalgorithms.However,as

datasizeincreases,choosingtherightsortingalgorithmcaneasilymean

thedifferencebetweenwaiting500secondsor24hours.Thismaysound

extremebutitcananddoeshappen.Thefollowingtabledemonstrates

thisbycomparingfivedifferentcomplexitiesfoundinsortingalgorithms

(ForexamplesofalgorithmsrefertoChapter10):

ElementsO(log2n)O(n)O(nlog2n)O(n^2

1031033100

100710066410,00

100010100099661,000

10,0001310,000132,877100,0

Thefirstcolumnoftheprevioustableindicatesthenumberofelementsto

sort.Theremainingcolumnsrepresentworst-casesortingtimesfor

algorithmswithdifferentcomplexities.Let'ssaythebasetimeisone

minute.Thismeansthatsorting10,000elementswithanalgorithmwith

anO(log2n)complexitytakes13minutes.Sortingthesamenumberof

elementswithanalgorithmwithanO(n^2)complexitytakes100,000,000

minutes,whichisover190years!Thatissomethingyouprobablydonot

wanttowaitfor.Thisalsomeansthatchoosingtherightalgorithmwill

improveperformancefargreaterthantweakingthewrongalgorithmever

could.Whatyoudowhentweakinganalgorithmischangingitsbase

time,butnotthenumberofalgorithmicstepsitneedstotake.Inthe

examplegivenbytheprevioustable,thiscouldmeanoptimizingthe

O(n^2)algorithmtoperformsortingin100,000,000*30secondsinstead

of100,000,000*60seconds.Whichmeansthenewimplementationis

twiceasfast,butsadly,itstilltakesover80yearstocompleteitstaskin

worstcase,anditstilldoesnotcomparetothe13minutesoftheO(log2

n)algorithm.Whatyouarelookingforisthusanalgorithmwithaslowa

complexityaspossible.

YouhavealreadyseenanexampleofanalgorithmwithO(n)complexity.

Butwhataboutthoseothercomplexities?AlgorithmswithO(n^2)

complexityarethosewhich,dependingonthenumberofelements,need

todosomeprocessingforallotherelements.Thinkofaddinganelement

tothebackofalinkedlist.Whenyouhavenopointertothelastelement,

youneedtotraversethelistfrombeginningtoendeachtimeyouaddan

element.Thefirstelementisaddedtoanemptylist,thefifthelement

traversesthefirstfour,thetenthelementtraversesthefirstnine,andso

on.

Othercomplexitiesyouareverylikelytocomeacrossinsoftware

engineeringareO(log2n)andO(nlog2n).Thefirstninnlog2niseasy

toexplain;itsimplymeansthatsomethingneedstobedoneforall

elements.Thelog2npartiscreatedbyalgorithmicchoicesthat

continuouslysplitinhalfthenumberofdataelementsthatthealgorithm

isinterestedin.Thiskindofalgorithmisused,forinstance,inagamein

whichyouhavetoguessanumberthatsomeoneisthinkingofbyasking

asfewquestionsaspossible.Afirstquestioncouldbe,Isthenumber

evenorodd?Whenthenumbershavealimitedrange,from1to10for

instance,asecondquestioncouldbe,Isthenumberhigherthan5?Each

questionhalvesthenumberofpossibleanswers.

AlgorithmswithO(1)complexityarethoseforwhichaprocessingtimeis

completelyindependentofthenumberofelements.Agoodexampleof

thisisaccessinganarraywithanindex:inta=array[4];nomatterhow

manyelementsareplacedintothisarray,thisoperationwillalwaystake

thesameamountoftime.

FinalremarksontheOnotation:

ConstantsprovetobeinsignificantwhencomparingdifferentO(n)

expressions,andcanbeignored;O(2n^2)andO(9n^2)areboth

consideredtobeO(n^2)comparedtoO(n)andsoon.

Linkingalgorithmsofdifferentcomplexitiescreatesanalgorithmwith

thehighestofthelinkedcomplexities;whenyouincorporateastep

ofO(n^2)complexityintoanalgorithmwithO(n)complexitythe

resultingcomplexityisO(n^2).

Nestingalgorithms(ineffectmultiplyingtheirimpact)createsan

algorithmwithmultipliedcomplexity;O(n)nestedwithO(log2n)

producesO(nlog2n).

TheTimingFunction

Thissectionintroducesatimingfunctionwhichisusedthroughoutthis

booktotimedifferentsolutionstoperformanceproblems.Itshowshow

andwhyitcanbeused.Thesourcecodeforthetimingfunctioncanbe

foundintheaccompanyingfiles:BookTools.handBookTool.cpp.

ConsidertheMul/ShiftexampleprograminListing5.1:

Listing5.1Mul/ShiftExample

#include

#include"booktools.h"

longMulOperator()

{

longi,j=1031;

for(i=0;i<20000000;i++)

{

j*=2;

}

return0;

}

longMulShift()

{

longi,j=1031;

for(i=0;i<20000000;i++)

{

j<<=1;

}

return0;

}

voidmain(void)

{

cout<<"TimeinMulOperator:"<
cout<<"TimeinshiftOperator:"<
}

ThisMul/Shiftexampleprogramusestwodifferenttechniquesfor

multiplyingavariable(j)bytwo.ThefunctionMulOperator()uses

thestandardarithmeticoperator*,whereasthefunctionMulShift()

usesabitwiseshifttotheleft<<.Performanceofbothmultiplication

functionsistimedwithafunctioncalledtime_fn().Thisfunctionis

explainedlaterinthischapter.

Youwillagreethatthechoicebetweenwritingj*=2orj<<=1

hasnorealimpactoncodereliability,maintainability,orcomplexity.

However,whenyouwriteaprogramthatatsomepointneedstodothis

kindofmultiplicationonalargebaseofdata,itisimportanttoknow

beforehandwhichtechniqueisfastest(orsmallest).Thisisespecially

truewhenmultiplicationisusedwidelythroughoutthesourcesyouwrite

foracertaintargetsystem.

Sohowdoesthistesthelpyou?TheresultexpectedoftheMul/Shift

examplebyanyonewithatechnicalbackgroundwouldbethatthelogical

shiftismuchfasterthanthemultiplication,simplybecausethisactionis

easierfor(most)microprocessorstoperform.However,whenyourun

thistestusingMicrosoft'sDeveloperStudioonanx86-compatible

processor,youwillseethatbothtechniquesareequallyfast.Howisthis

possible?Theanswerbecomesclearwhenyoulookattheassembly

generatedbythecompiler—howtoobtainassemblylistingsisexplained

inChapter4.

Thefollowingassemblysnippetsshowthetranslationsofthetwo

multiplicationtechniquesforMicrosoft'sDeveloperStudioonanx86compatibleprocessor.

//Codeforthemultiplication:

;16:j*=2;

shlDWORDPTR_j\$[ebp],1

//Codefortheshift:

;36:j<<=1;

shlDWORDPTR_j\$[ebp],1

Withoutevenknowinganyassembly,itiseasytoseewhyboth

multiplicationfunctionsareequallyfast;apparentlythecompilerwas

smartenoughtotranslatebothmultiplicationcommandsintoabitwise

shifttotheleft(shl).So,inthissituation,takingtimeouttooptimize

yourdatamultiplicationroutinestouseshiftsinsteadofmultiplierswould

havebeenacompletewasteoftime.(Justforfun,lookatwhatkindof

assemblylistingyougetwhenyoumultiplyby3;j*=3;)Similarly,if

youhadruntheMul/ShiftexampleonaMIPS,youwouldhavenoticed

thatforthisparticularprocessorthemultiplicationisinfactfasterthanthe

shiftoperator.Thisiswhyitisimportanttofindoutspecifictargetand

compilerbehaviorbeforetryingtooptimizeinthesekindsofareas.

TheMul/Shiftexampleintroducedasecondmethodforyoutotimeyour

functions(thefirstmethod,usingprofilingasameansofgenerating

timinginformation,wasintroducedinChapter4.)TheMul/Shiftexample

usesthefunctiontime_fn().Youcanfindthedefinitionofthistiming

functioninthefileBookTools.h.

YoucanfindtheimplementationofthetimingfunctioninListing5.2and

inthefileBookTools.cpp.

Thebestwaytousethetimingfunctionistosimplyaddbothbooktools

filestothemakefileofyoursoftwareproject.

Listing5.2TheTimingFunction

unsignedlongtime_fn(long(*fn)(void),intnrSamples=

{

unsignedlongaverage=0;

clock_ttBegin,tEnd;

for(inti=0;i
{

tBegin=clock();

fn();

tEnd=clock();

average+=tEnd-tBegin;

}

return((unsignedlong)average/nrSamples);

}

Thetime_fn()functionreceivestwoparameters:Thefirstisapointer

tothefunctionwhichithastotime;thesecondisthenumberoftiming

samplesitwilltake.IntheMul/Shiftexample,youseethatthe

time_fn()functioniscalledfirstwithapointertothemultiplier

function,andthenagainwithapointertotheshiftfunction.Inbothcases,

fivesamplesarerequested.Thedefaultnumberofsamplesisjustone.

Theactualtimingpartofthefunctionisdoneviatheclock()function.

Thisfunctionreturnsthenumberofprocessortimerticksthathave

elapsedsincethestartoftheprocess.Bynot-ingtheclockvaluetwice—

oncebeforeexecutingthefunctionwhichistobetimed,andonceafterit

isfinished—youcanapproximatequitenicelyhowmuchtimewasspent.

Thefollowingsectionexplainshowtominimizeexternalinfluencestothis

timing.Notethattheoverheadcreatedbytheforloopsinthe

MulOperator()andMulShift()functionsoftheMUL/Shift

exampleisalsotimed.Thisisofnoconsequencetothetimingresultsas

youareinterestedonlyintherelationbetweentheresultsofthetwo

functionsandbothfunctionscontaintheexactsameoverhead.

Theclock()functionisnotpartofANSIC++,soitsusagecanbe

slightlydifferentpersystem.Thisiswhyseveral#ifdefcompiler

directivescanbefoundinthebooktoolsfiles.Theexamplesystems

(Linux/UNIXandWindows9x)usedinthisbookareseparatedbythefact

thattheDeveloperStudioautomaticallycreatesadefinition_MSC_VER.

Whenusingthetime_fn()forsystemsotherthantheexample

systemsusedinthisbook,youshouldconsulttherelevantmanualsto

checkwhethertherearedifferencesintheuseoftheclock()function.

SystemInfluences

Whendoingtimingtestsasproposedintheprevioussection,itis

importanttorealizethatyouareinfacttimingsystembehaviorwhileyou

arerunningyourcode.Thismeansthat,becauseyourpieceofcodeis

nottheonlythingthesystemisdealingwith,otherfactorswillinfluence

yourtestresults.Themostintrusivefactorsareotherprogramsrunning

onthesystem.Thisinfluencecanofcoursebeminimizedbyrunningas

fewotherprogramsonyoursystemaspossiblewhileperformingtiming

tests,andincreasingthepriorityoftheprocessyouaretimingasmuch

aspossible.Stillsomeinfluenceswillremainbecausetheyareinherent

totheoperatingsystem.Thissectiondiscussestheremainingsystem

influencesandhowtodealwiththem.

CacheMisses

Cacheismemorythatisusedasabufferbetweentwoormorememoryusingobjects.Thecachethatisreferredtointhissectionisthebuffer

betweentheCPUandtheinternalmemoryofacomputersystem.Most

CPUarchitecturessplitthiscacheintotwolevels.Level1cacheis

locatedphysicallyintheCPUitself,andlevel2cacheislocatedoutside,

butcloseto,theCPU.Level1cacheisusuallyreferredtoasl1in

technicalliterature,level2cacheisusuallyreferredtoasl2.Figure5.1

depictsthetwo-levelcacheconcept.

Figure5.1.CPUlevel1andlevel2cache.

Thereasonsforsplittingthecacheintotwolevelshavetodowiththe

differentcharacteristicsofthecachelocations.Ingeneral,available

spaceforcacheinsideaCPUissmallerthanoutside,withtheadded

advantagethataccessingcacheinsidetheCPUisoftenfaster.Onmost

architecturesl2hastobeaccessedwithaclockfrequencywhichisa

fractionoftheCPUsclockfrequency.So,asaruleofthumb:l1issmaller

andfasterthanl2.

UsageofcachingmeansthataCPUdoesnotneedtoretrieveeachunit

ofdatafrommemoryseparately;rather,ablockofdataiscopiedfrom

memoryintocacheinoneaction.Theadvantageofthisisthatpartofthe

overheadoftransferringdatafrommemorytoCPU(forexample,finding

anaddressanddoingbusinteraction)isincurredonlyonceforalarge

numberofmemoryaddresses.TheCPUusesthiscacheddatauntilit

needsdatafromamemoryaddressthatisnotcached.Atthispointa

cachemissisincurredandanewblockofdataneedstobetransferred

frommemorytocache.Ofcoursedataisnotonlyread,itcanbe

changedoroverwrittenaswell.Whenacachemissoccurs,itisoften

necessarytotransferthecachebacktointernalmemoryfirst,tosavethe

changes.Eachtimeacachemissisincurred,processinghaltswhilethe

copyactionstakeplace.Itwould,therefore,beunfortunateifacache

misscausesablockofcacheddatatobeswitchedforanewblock,after

whichanothercachemissswitchesthenewblockbackfortheoriginal

blockagain.Tominimizethesekindofscenariosoccurring,further

refiningcachingconceptsareoftenintroducedinsystemarchitecture,as

thefollowingtwosectionsexplain.

UsingCachePages

Withcachepagingtheavailablecacheresourceisdividedintoblocksof

acertainsize.Eachblockiscalledapage.Whenacachemissoccurs,

onlyonepage(orafewpages)isactuallyoverwrittenwithnewdata,and

therestofthepagesstayuntouched.Inthisscenario,atleastthepagein

whichthecachemissoccurredhastostayuntouched.Thisdoesmean

thatforeachpage,aseparateadministrationisneededtokeeptrackof

thememoryaddressesthatthepagerepresentsandwhetherchanges

havebeenmadetothedatainthepage.

UsingSeparateDataandInstructionCaches

Withseparatedataandinstructioncachestheavailablecacheresource

issplitintotwofunctionalparts:onepartforcachingCPUinstructions,

whichisacopyofpartoftheexecutableimageoftheactiveprocess;and

anotherpartforcachingdata,whichisacopyofasubsetofthedatathat

isusedbytheactiveprocess.

Thesestrategiesare,ofcourse,verygenericastheycannottakeinto

accountanyspecificcharacteristicsofthesoftwarethatwillrunona

certainsystem.Softwaredesignersandimplementerscan,however,

designtheirsoftwareinsuchawaythatcachemissesarelikelytobe

minimized.Whensoftwareisdesignedtorunonaspecificsystem,even

moreextremeoptimizationscanbemadebytakingintoaccountactual

systemcharacteristics—suchasl1andl2cachesizes.

Differenttechniquesforminimizingcachemissesandpagefaultsare

presentedinalatersectiontitled"TechniquesforMinimizingSystem

Influences."

PageFaults

### Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Chapter 5. Measuring Time and Complexity

Tải bản đầy đủ ngay(0 tr)

×