Tải bản đầy đủ
Chapter 4. Enterprise Technologies and Big Data Business Intelligence

Chapter 4. Enterprise Technologies and Big Data Business Intelligence

Tải bản đầy đủ

heldwithintheoperational-levelinformationsystemsofanorganization.Moreover,
databasestructureisleveragedwithqueriestogenerateinformation.Higherupthe
analyticfoodchainareanalyticalprocessingsystems.Thesesystemsleveragemultidimensionalstructurestoanswermorecomplexqueriesandprovidedeeperinsightinto
businessoperations.Onalargerscale,dataiscollectedfromthroughouttheenterpriseand
warehousedinadatawarehouse.Itisfromthesedatastoresthatmanagementgains
insightintobroadercorporateperformanceandKPIs.
Thischaptercoversthefollowingtopics:
•OnlineTransactionProcessing(OLTP)
•OnlineAnalyticalProcessing(OLAP)
•ExtractTransformLoad(ETL)
•DataWarehouses
•DataMarts
•TraditionalBI
•BigDataBI

OnlineTransactionProcessing(OLTP)
OLTPisasoftwaresystemthatprocessestransaction-orienteddata.Theterm“online
transaction”referstothecompletionofanactivityinrealtimeandisnotbatch-processed.
OLTPsystemsstoreoperationaldatathatisnormalized.Thisdataisacommonsourceof
structureddataandservesasinputtomanyanalyticprocesses.BigDataanalysisresults
canbeusedtoaugmentOLTPdatastoredintheunderlyingrelationaldatabases.OLTP
systems,forexampleapointofsalesystem,executebusinessprocessesinsupportof
corporateoperations.AsshowninFigure4.1,theyperformtransactionsagainsta
relationaldatabase.

Figure4.1OLTPsystemsperformsimpledatabaseoperationstoprovidesub-second
responsetimes.
ThequeriessupportedbyOLTPsystemsarecomprisedofsimpleinsert,deleteandupdate
operationswithsub-secondresponsetimes.Examplesincludeticketreservationsystems,
bankingandpointofsalesystems.

OnlineAnalyticalProcessing(OLAP)
Onlineanalyticalprocessing(OLAP)systemsareusedforprocessingdataanalysis
queries.OLAPsformanintegralpartofbusinessintelligence,dataminingandmachine
learningprocesses.TheyarerelevanttoBigDatainthattheycanserveasbothadata
sourceaswellasadatasinkthatiscapableofreceivingdata.Theyareusedindiagnostic,
predictiveandprescriptiveanalytics.AsshowninFigure4.2,OLAPsystemsperform
long-running,complexqueriesagainstamultidimensionaldatabasewhosestructureis
optimizedforperformingadvancedanalytics.

Figure4.2OLAPsystemsusemultidimensionaldatabases.
OLAPsystemsstorehistoricaldatathatisaggregatedanddenormalizedtosupportfast
reportingcapability.Theyfurtherusedatabasesthatstorehistoricaldatain
multidimensionalstructuresandcananswercomplexqueriesbasedontherelationships
betweenmultipleaspectsofthedata.

ExtractTransformLoad(ETL)
ExtractTransformLoad(ETL)isaprocessofloadingdatafromasourcesystemintoa
targetsystem.Thesourcesystemcanbeadatabase,aflatfile,oranapplication.Similarly,
thetargetsystemcanbeadatabaseorsomeotherstoragesystem.
ETLrepresentsthemainoperationthroughwhichdatawarehousesarefeddata.ABig
DatasolutionencompassestheETLfeature-setforconvertingdataofdifferenttypes.
Figure4.3showsthattherequireddataisfirstobtainedorextractedfromthesources,after
whichtheextractsaremodifiedortransformedbytheapplicationofrules.Finally,the
dataisinsertedorloadedintothetargetsystem.

Figure4.3AnETLprocesscanextractdatafrommultiplesourcesandtransformitfor
loadingintoasingletargetsystem.

DataWarehouses
Adatawarehouseisacentral,enterprise-widerepositoryconsistingofhistoricaland
currentdata.DatawarehousesareheavilyusedbyBItorunvariousanalyticalqueries,and
theyusuallyinterfacewithanOLAPsystemtosupportmulti-dimensionalanalytical
queries,asshowninFigure4.4.

Figure4.4Batchjobsperiodicallyloaddataintoadatawarehousefromoperational
systemslikeERP,CRMandSCM.
Datapertainingtomultiplebusinessentitiesfromdifferentoperationalsystemsis

periodicallyextracted,validated,transformedandconsolidatedintoasingledenormalized
database.Withperiodicdataimportsfromacrosstheenterprise,theamountofdata
containedinagivendatawarehousewillcontinuetoincrease.Overtimethisleadsto
slowerqueryresponsetimesfordataanalysistasks.Toresolvethisshortcoming,data
warehousesusuallycontainoptimizeddatabases,calledanalyticaldatabases,tohandle
reportinganddataanalysistasks.AnanalyticaldatabasecanexistasaseparateDBMS,as
inthecaseofanOLAPdatabase.

DataMarts
Adatamartisasubsetofthedatastoredinadatawarehousethattypicallybelongstoa
department,division,orspecificlineofbusiness.Datawarehousescanhavemultipledata
marts.AsshowninFigure4.5,enterprise-widedataiscollectedandbusinessentitiesare
thenextracted.Domain-specificentitiesarepersistedintothedatawarehouseviaanETL
process.

Figure4.5Adatawarehouse’ssingleversionof“truth”isbasedoncleanseddata,
whichisaprerequisiteforaccurateanderror-freereports,aspertheoutputshownon
theright.

TraditionalBI
TraditionalBIprimarilyutilizesdescriptiveanddiagnosticanalyticstoprovide
informationonhistoricalandcurrentevents.Itisnot“intelligent”becauseitonlyprovides
answerstocorrectlyformulatedquestions.Correctlyformulatingquestionsrequiresan
understandingofbusinessproblemsandissuesandofthedataitself.BIreportson
differentKPIsthrough:
•ad-hocreports

•dashboards

Ad-hocReports
Ad-hocreportingisaprocessthatinvolvesmanuallyprocessingdatatoproducecustommadereports,asshowninFigure4.6.Thefocusofanad-hocreportisusuallyona
specificareaofthebusiness,suchasitsmarketingorsupplychainmanagement.The
generatedcustomreportsaredetailedandoftentabularinnature.

Figure4.6OLAPandOLTPdatasourcescanbeusedbyBItoolsforbothad-hoc
reportinganddashboards.

Dashboards
Dashboardsprovideaholisticviewofkeybusinessareas.Theinformationdisplayedon
dashboardsisgeneratedatperiodicintervalsinrealtimeornear-realtime.Thepresentation
ofdataondashboardsisgraphicalinnature,usingbarcharts,piechartsandgauges,as
showninFigure4.7.

Figure4.7BItoolsusebothOLAPandOLTPtodisplaytheinformationon
dashboards.
Aspreviouslyexplained,datawarehousesanddatamartscontainconsolidatedand

validatedinformationaboutenterprise-widebusinessentities.TraditionalBIcannot
functioneffectivelywithoutdatamartsbecausetheycontaintheoptimizedandsegregated
datathatBIrequiresforreportingpurposes.Withoutdatamarts,dataneedstobeextracted
fromthedatawarehouseviaanETLprocessonanad-hocbasiswheneveraqueryneeds
toberun.Thisincreasesthetimeandefforttoexecutequeriesandgeneratereports.
TraditionalBIusesdatawarehousesanddatamartsforreportinganddataanalysis
becausetheyallowcomplexdataanalysisquerieswithmultiplejoinsandaggregationsto
beissued,asshowninFigure4.8.

Figure4.8AnexampleoftraditionalBI.

BigDataBI
BigDataBIbuildsupontraditionalBIbyactingonthecleansed,consolidatedenterprisewidedatainthedatawarehouseandcombiningitwithsemi-structuredandunstructured
datasources.Itcomprisesbothpredictiveandprescriptiveanalyticstofacilitatethe
developmentofanenterprise-wideunderstandingofbusinessperformance.
WhiletraditionalBIanalysesgenerallyfocusonindividualbusinessprocesses,BigData
BIanalysesfocusonmultiplebusinessprocessessimultaneously.Thishelpsreveal
patternsandanomaliesacrossabroaderscopewithintheenterprise.Italsoleadstodata
discoverybyidentifyinginsightsandinformationthatmayhavebeenpreviouslyabsentor
unknown.

BigDataBIrequirestheanalysisofunstructured,semi-structuredandstructureddata
residingintheenterprisedatawarehouse.Thisrequiresa“next-generation”data
warehousethatusesnewfeaturesandtechnologiestostorecleanseddataoriginatingfrom
avarietyofsourcesinasingleuniformdataformat.Thecouplingofatraditionaldata
warehousewiththesenewtechnologiesresultsinahybriddatawarehouse.This
warehouseactsasauniformandcentralrepositoryofstructured,semi-structuredand
unstructureddatathatcanprovideBigDataBItoolswithalloftherequireddata.This
eliminatestheneedforBigDataBItoolstohavetoconnecttomultipledatasourcesto
retrieveoraccessdata.InFigure4.9,anext-generationdatawarehouseestablishesa
standardizeddataaccesslayeracrossarangeofdatasources.

Figure4.9Anext-generationdatawarehouse.

TraditionalDataVisualization
Datavisualizationisatechniquewherebyanalyticalresultsaregraphicallycommunicated
usingelementslikecharts,maps,datagrids,infographicsandalerts.Graphically
representingdatacanmakeiteasiertounderstandreports,viewtrendsandidentify
patterns.
Traditionaldatavisualizationprovidesmostlystaticchartsandgraphsinreportsand
dashboards,whereascontemporarydatavisualizationtoolsareinteractiveandcanprovide
bothsummarizedanddetailedviewsofdata.Theyaredesignedtohelppeoplewholack
statisticaland/ormathematicalskillstobetterunderstandanalyticalresultswithouthaving
toresorttospreadsheets.
Traditionaldatavisualizationtoolsquerydatafromrelationaldatabases,OLAPsystems,
datawarehousesandspreadsheetstopresentbothdescriptiveanddiagnosticanalytics
results.

DataVisualizationforBigData
BigDatasolutionsrequiredatavisualizationtoolsthatcanseamlesslyconnectto
structured,semi-structuredandunstructureddatasourcesandarefurthercapableof
handlingmillionsofdatarecords.DatavisualizationtoolsforBigDatasolutionsgenerally
usein-memoryanalyticaltechnologiesthatreducethelatencynormallyattributedto
traditional,disk-baseddatavisualizationtools.
AdvanceddatavisualizationtoolsforBigDatasolutionsincorporatepredictiveand
prescriptivedataanalyticsanddatatransformationfeatures.Thesetoolseliminatetheneed
fordatapre-processingmethods,suchasETL.Thetoolsalsoprovidetheabilityto
directlyconnecttostructured,semi-structuredandunstructureddatasources.Aspartof
BigDatasolutions,advanceddatavisualizationtoolscanjoinstructuredandunstructured
datathatiskeptinmemoryforfastdataaccess.Queriesandstatisticalformulascanthen
beappliedaspartofvariousdataanalysistasksforviewingdatainauser-friendlyformat,
suchasonadashboard.
CommonfeaturesofvisualizationtoolsusedinBigData:
•Aggregation–providesaholisticandsummarizedviewofdataacrossmultiple
contexts
•Drill-down–enablesadetailedviewofthedataofinterestbyfocusinginonadata
subsetfromthesummarizedview
•Filtering–helpsfocusonaparticularsetofdatabyfilteringawaythedatathatis
notofimmediateinterest
•Roll-up–groupsdataacrossmultiplecategoriestoshowsubtotalsandtotals
•What-ifanalysis–enablesmultipleoutcomestobevisualizedbyenablingrelated
factorstobedynamicallychanged.
CaseStudyExample

EnterpriseTechnology
ETIemploysOLTPinalmosteverybusinessfunction.Itspolicyquotation,policy
administration,claimsmanagement,billing,enterpriseresourceplanning(ERP)and
customerrelationshipmanagement(CRM)systemsareallOLTP-based.An
exampleofETI’semploymentofOLTPoccurswheneverthereisthesubmissionof
anewclaim,foritresultsinthecreationofanewrecordintheclaimtablefound
withintherelationaldatabaseusedbytheclaimsmanagementsystem.Similarly,as
theclaimgetsprocessedbytheclaimadjuster,itsstatuschangesfromsubmittedto
assignedandfromassignedtoprocessingandfinallytoprocessedthroughsimple
databaseupdateoperations.
TheEDWispopulatedweeklyviamultipleETLoperationsthatinvolveextracting
datafromtablesintherelationaldatabasesusedbyoperationalsystems,validating
andtransformingthedataandloadingitintotheEDW’sdatabase.Dataextracted
fromtheoperationalsystemsisinaflatfileformatthatisfirstimportedintoa
stagingdatabase,whereitistransformedbytheexecutionofvariousscripts.One
ETLprocessthatdealswithcustomerdatainvolvestheapplicationofseveraldata
validationrules,oneofwhichistoconfirmthateachcustomerhasboththefirstand
surnamefieldspopulatedwithmeaningfulcharacters.Also,aspartofthesameETL
process,thefirsttwolinesoftheaddressarejoinedtogether.
TheEDWincludesanOLAPsystemwheredataiskeptintheformofcubesthat
enabletheexecutionofvariousreportingqueries.Forexample,thepolicycubeis
madeupofcalculationsofpoliciessold(thefacttable)anddimensionsoflocation,
typeandtime(dimensiontables.)Theanalystsperformqueriesondifferentcubesas
partofbusinessintelligence(BI)activities.Forsecurityandfastqueryresponse,the
EDWfurthercontainstwodatamarts.Oneofthemiscomprisedofclaimand
policydatathatisusedbytheactuariesandthelegalteamforvariousdataanalyses,
includingriskassessmentandregulatorycomplianceassurance.Thesecondone
containssales-relateddatathatisusedbythesalesteamtomonitorsalesandset
futuresalesstrategies.

BigDataBusinessIntelligence
Asestablished,ETIcurrentlyemploysBIthatfallsintothecategoryoftraditional
BI.Oneparticulardashboardusedbythesalesteamdisplaysvariouspolicy-related
KPIsviadifferentcharts,suchasabreakdownofsoldpoliciesbytype,regionand
valueandpoliciesexpiringeachmonth.Differentdashboardsinformagentsoftheir
currentperformances,suchascommissionsearnedandwhetherornottheyareon
trackforachievingtheirmonthlytargets.Bothofthesedashboardsarefeddata
fromthesalesdatamart.
Inthecallcenter,ascoreboardprovidesvitalstatisticsrelatedtodailyoperationsof
thecenter,suchasthenumberofcallsinqueue,averagewaitingtime,numberof
callsdroppedandcallsbytype.Thisscoreboardisfeddatadirectlyfromthe
CRM’srelationaldatabasewithaBIproductthatprovidesasimpleuserinterface
forconstructingdifferentSQLqueriesthatareperiodicallyexecutedtoobtain
requiredKPIs.Thelegalteamandtheactuaries,however,generatesomead-hoc

reportsthatresembleaspreadsheet.Someofthesereportsaresenttotheregulatory
authoritiesaspartofassuringcontinuousregulatorycompliance.
ETIbelievesthattheadoptionofBigDataBIwillgreatlyhelpinachievingits
strategicgoals.Forexample,theincorporationofsocialmediaalongwithacall
centeragent’snotesmayprovideabetterunderstandingofthereasonsbehinda
customer’sdefection.Similarly,thelegitimacyofafiledclaimcanbeascertained
morequicklyifvaluableinformationcanbeharvestedfromthedocuments
submittedatthetimeapolicywaspurchasedandcross-referencedagainsttheclaim
data.Thisinformationcanthenbecorrelatedwithsimilarclaimstodetectfraud.
Withregardstodatavisualization,theBItoolsusedbytheanalystscurrentlyonly
operateonstructureddata.Intermsofsophisticationandeaseofuse,mostofthese
toolsprovidepoint-and-clickfunctionalitywhereeitherawizardcanbeusedorthe
requiredfieldscanbeselectedmanuallyfromtherelevanttablesdisplayed
graphicallytoconstructadatabasequery.Thequeryresultscanthenbedisplayed
bychoosingtherelevantchartsandgraphs.Theendresultisadashboardwhere
differentstatisticsaredisplayed.Thedashboardcanbeconfiguredtoaddfiltering,
aggregationanddrill-downoptions.Anexampleofthiscouldbeauserwhoclicks
onaquarterlysalesfigureschartandistakentoamonthlybreakdownofsales
figures.Althoughadashboardthatprovidesthewhat-ifanalysisfeatureisnot
currentlysupported,havingonewouldallowtheactuariestoquicklyascertain
differentrisklevelsbychangingrelevantriskfactors.

PartII:StoringandAnalyzingBigData

Chapter5BigDataStorageConcepts
Chapter6BigDataProcessingConcepts
Chapter7BigDataStorageTechnology
Chapter8BigDataAnalysisTechniques
AspresentedinPartI,thedriversbehindBigDataadoptionarebothbusiness-and
technology-related.Intheremainderofthisbook,thefocusshiftsfromprovidingahighlevelunderstandingofBigDataanditsbusinessimplicationstocoveringkeyconcepts
relatedtothetwomainBigDataconcerns:storageandanalysis.
PartIIhasthefollowingstructure:
•Chapter5exploreskeyconceptsrelatedtothestorageofBigDatadatasets.These
conceptsinformthereaderofhowBigDatastoragehasradicallydifferent
characteristicsthantherelationaldatabasetechnologycommontotraditional
businessinformationsystems.
•Chapter6providesinsightsintohowBigDatadatasetsareprocessedbyleveraging
distributedandparallelprocessingcapabilities.Thisisfurtherillustratedwithan
examinationoftheMapReduceframework,whichshowshowitleveragesadivideand-conquerapproachtoefficientlyprocessBigDatadatasets.
•Chapter7expandsuponthestoragetopic,showinghowtheconceptsfromChapter5
areimplementedwithdifferentflavorsofNoSQLdatabasetechnology.The
requirementsofbatchandrealtimeprocessingmodesarefurtherexploredfromthe