Tải bản đầy đủ - 0 (trang)
I/O, Networking, and Parallel Computing

I/O, Networking, and Parallel Computing

Tải bản đầy đủ - 0trang

Chapter8.I/O,Networking,andParallel

Computing

Inthischapter,wewillexplorehowJuliainteractswiththeoutsideworld,readingfrom

standardinputandwritingtostandardoutput,files,networks,anddatabases.Julia

providesasynchronousnetworkingI/Ousingthelibuvlibrary.Wewillseehowtohandle

datainJulia.WewillalsodiscovertheparallelprocessingmodelofJulia.

Inthischapter,thefollowingtopicsarecovered:

Basicinputandoutput

Workingwithfiles(includingtheCSVfiles)

UsingDataFrames

WorkingwithTCPsocketsandservers

Interactingwithdatabases

Paralleloperationsandcomputing



Basicinputandoutput

Julia’svisiononinput/output(I/O)isstream-oriented,thatis,readingorwritingstreams

ofbytes.Wewillintroducedifferenttypesofstreams,suchasfilestreams,inthischapter.

Standardinput(stdin)andstandardoutput(stdout)areconstantsofthetypeTTY(an

abbreviationfortheoldterm,Teletype)thatcanbeusedintheJuliacodetoreadfrom

andwriteto(refertothecodeinChapter8\io.jl):

read(STDIN,Char):Thiscommandwaitsforacharactertobeentered,andthen

returnsthatcharacter;forexample,whenyoutypeinJ,thisreturns‘J’

write(STDOUT,"Julia"):ThiscommandtypesoutJulia5(theadded5isthe



numberofbytesintheoutputstream;itisnotaddedifthecommandendsina

semicolon(;))

STDINandSTDOUTaresimplystreamsandcanbereplacedbyanystreamobjectinthe

read/writecommands.readbytesisusedtoreadanumberofbytesfromastream



intoavector:

readbytes(STDIN,3):Thiscommandwaitsforaninput,forexample,abereads3

bytesfromit,andthenreturns3-elementArray{Uint8,1}:0x610x620x65

readline(STDIN):Thiscommandreadsalltheinputsuntilanewlinecharacter\nis



entered,forexample,typeJuliaandpressENTER,thisreturns“Julia\r\n”on

Windowsand“Julia\n”onLinux

Ifyouneedtoreadallthelinesfromaninputstream,usetheeachlinemethodinafor

loop,forexample:

stream=STDIN

forlineineachline(stream)

print("Found$line")

#processtheline

end



Forexample:

Firstlineofinput

FoundFirstlineofinput

2ndlineofinput

Found2ndlineofinput

3rdline…

Found3rdline…



Totestwhetheryouhavereachedtheendofaninputstream,useeof(stream)in

combinationwithawhileloopasfollows:

while!eof(stream)

x=read(stream,Char)

println("Found:$x")

#processthecharacter

end



WecanexperimentwithreplacingstreambySTDINintheseexamples.



Workingwithfiles

Toworkwithfiles,weneedtheIOStreamtype.IOStreamisatypewiththesupertypeIO

andhasthefollowingcharacteristics:

Thefieldsaregivenbynames(IOStream)

4-elementArray{Symbol,1}::handle:ios:name:mark



ThetypesaregivenbyIOStream.types

(Ptr{None},Array{Uint8,1},String,Int64)



ThefilehandleisapointerofthetypePtr,whichisareferencetothefileobject.

Openingandreadingaline-orientedfilewiththenameexample.datisveryeasy:

//codeinChapter8\io.jl

fname="example.dat"

f1=open(fname)

fnameisastringthatcontainsthepathtothefile,usingescapingofspecialcharacterswith

\whennecessary;forexample,inWindows,whenthefileisinthetestfolderontheD:

drive,thiswouldbecomed:\\test\\example.dat.Thef1variableisnowan

IOStream()object.



Toreadalllinesoneaftertheotherinanarray,usedata=readlines(f1),whichreturns

3-elementArray{Union(ASCIIString,UTF8String),1}:

"thisisline1.\r\n"

"thisisline2.\r\n"

"thisisline3."



Forprocessinglinebyline,nowonlyasimpleloopisneeded:

forlineindata

println(line)#orprocessline

end

close(f1)



AlwaysclosetheIOStreamobjecttocleanandsaveresources.Ifyouwanttoreadthefile

intoonestring,usereadall(forexample,seetheprogramword_frequencyinChapter5,

CollectionTypes).Usethisonlyforrelativelysmallfilesbecauseofthememory

consumption;thiscanalsobeapotentialproblemwhenusingreadlines.

Thereisaconvenientshorthandwiththedosyntaxforopeningafile,applyingafunction

process,andclosingitautomatically.Thisgoesasfollows(fileistheIOStreamobjectin

thiscode):

open(fname)dofile

process(file)

end



Asyoucanrecall,intheMap,filter,andlistcomprehensionssectioninChapter3,

Functions,docreatesananonymousfunction,andpassesittoopen.Thus,theprevious



codeexamplewouldhavebeenequivalenttoopen(process,fname).Usethesame

syntaxforprocessingafilefnamelinebylinewithoutthememoryoverheadofthe

previousmethods,forexample:

open(fname)dofile

forlineineachline(file)

print(line)#orprocessline

end

end



Writingafilerequiresfirstopeningitwitha"w"flag,thenwritingstringstoitwithwrite,

print,orprintln,andthenclosingthefilehandlethatflushestheIOStreamobjecttothe

disk:

fname="example2.dat"

f2=open(fname,"w")

write(f2,"Iwritemyselftoafile\n")

#returns24(byteswritten)

println(f2,"evenwithprintln!")

close(f2)



Openingafilewiththe"w"optionwillclearthefileifitexists.Toappendtoanexisting

file,use"a".

Toprocessallthefilesinthecurrentfolder(oragivenfolderasanargumentto

readdir()),usethisforloop:

forfileinreaddir()

#processfile

end



ReadingandwritingCSVfiles

ACSVfileisacomma-separatedfile.Thedatafieldsineachlineareseparatedby

commas“,”oranotherdelimitersuchassemicolons“;“.Thesefilesarethede-facto

standardforexchangingsmallandmediumamountsoftabulardata.Suchfilesare

structuredsothatonelinecontainsdataaboutonedataobject,soweneedawaytoread

andprocessthefilelinebyline.Asanexample,wewillusethedatafileChapter

8\winequality.csvthatcontains1,599samplemeasurements,12datacolumns,suchas

pHandalcoholpersample,separatedbyasemicolon.Inthefollowingscreenshot,you

canseethetop20rows:



Ingeneral,thereaddlmfunctionisusedtoreadinthedatafromtheCSVfiles:

#codeinChapter8\csv_files.jl:

fname="winequality.csv"

data=readdlm(fname,';')



Thesecondargumentisthedelimitercharacter(here,itis;).Theresultingdataisa

1600x12Array{Any,2}arrayofthetypeAnybecausenocommontypecouldbefound:

"fixedacidity""volatileacidity""alcohol""quality"

7.40.79.45.0

7.80.889.85.0

7.80.769.85.0





Ifthedatafileiscommaseparated,readingitisevensimplerwiththefollowing

command:

data2=readcsv(fname)



Theproblemwithwhatwehavedoneuntilnowisthattheheaders(thecolumntitles)

werereadaspartofthedata.Fortunately,wecanpasstheargumentheader=truetolet



Juliaputthefirstlineinaseparatearray.Itthennaturallygetsthecorrectdatatype,

Float64,forthedataarray.Wecanalsospecifythetypeexplicitly,suchasthis:

data3=readdlm(fname,';',Float64,'\n',header=true)



Thethirdargumenthereisthetypeofdata,whichisanumerictype,StringorAny.The

nextargumentisthelineseparatorcharacter,andthefifthindicateswhetherornotthereis

aheaderlinewiththefield(column)names.Ifso,thendata3isatuplewiththedataas

thefirstelementandtheheaderasthesecond,inourcase,(1599x12Array{Float64,2},

1x12Array{String,2})(Thereareotheroptionalargumentstodefinereaddlm,seethe

helpoption).Inthiscase,theactualdataisgivenbydata3[1]andtheheaderby

data3[2].

Let’scontinueworkingwiththevariabledata.Thedataformsamatrix,andwecangetthe

rowsandcolumnsofdatausingthenormalarray-matrixsyntax(refertotheMatrices

sectioninChapter5,CollectionTypes).Forexample,thethirdrowisgivenbyrow3=

data[3,:]withdata:7.80.880.02.60.09825.067.00.99683.20.689.8

5.0,representingthemeasurementsforallthecharacteristicsofacertainwine.

Themeasurementsofacertaincharacteristicforallwinesaregivenbyadatacolumn,for

example,col3=data[:,3]representsthemeasurementsofcitricacidandreturnsa

columnvector1600-elementArray{Any,1}:"citricacid"0.00.00.040.560.0

0.0…0.080.080.10.130.120.47.

Ifweneedcolumns2-4(volatileaciditytoresidualsugar)forallwines,extractthe

datawithx=data[:,2:4].Ifweneedthesemeasurementsonlyforthewinesonrows

70-75,getthesewithy=data[70:75,2:4],returninga6x3Array{Any,2}outputas

follows:

0.320.572.0

0.7050.051.9



0.6750.262.1



Togetamatrixwiththedatafromcolumns3,6,and11,executethefollowingcommand:

z=[data[:,3]data[:,6]data[:,11]]



ItwouldbeusefultocreateatypeWineinthecode.

Forexample,ifthedataistobepassedaroundfunctions,itwillimprovethecodequality

toencapsulateallthedatainasingledatatype,likethis:

typeWine

fixed_acidity::Array{Float64}

volatile_acidity::Array{Float64}

citric_acid::Array{Float64}

#otherfields

quality::Array{Float64}

end



Then,wecancreateobjectsofthistypetoworkwiththem,likeinanyotherobjectorientedlanguage,forexample,wine1=Wine(data[1,:]...),wheretheelementsof



therowaresplattedwiththe...operatorintotheWineconstructor.

TowritetoaCSVfile,thesimplestwayistousethewritecsvfunctionforacomma

separator,orthewritedlmfunctionifyouwanttospecifyanotherseparator.Forexample,

towriteanarraydatatoafilepartial.dat,youneedtoexecutethefollowingcommand:

writedlm("partial.dat",data,';')



Ifmorecontrolisnecessary,youcaneasilycombinethemorebasicfunctionsfromthe

previoussection.Forexample,thefollowingcodesnippetwrites10tuplesofthree

numberseachtoafile:

//codeinChapter8\tuple_csv.jl

fname="savetuple.csv"

csvfile=open(fname,"w")

#writingheaders:

write(csvfile,"ColNameA,ColNameB,ColNameC\n")

fori=1:10

tup(i)=tuple(rand(Float64,3)...)

write(csvfile,join(tup(i),","),"\n")

end

close(csvfile)



UsingDataFrames

Ifyoumeasurenvariables(eachofadifferenttype)ofasingleobjectofobservation,then

yougetatablewithncolumnsforeachobjectrow.Iftherearemobservations,thenwe

havemrowsofdata.Forexample,giventhestudentgradesasdata,youmightwantto

know“computetheaveragegradeforeachsocioeconomicgroup“,wheregrade

andsocioeconomicgrouparebothcolumnsinthetable,andthereisonerowperstudent.

TheDataFrameisthemostnaturalrepresentationtoworkwithsucha(mxn)tableof

data.TheyaresimilartopandasDataFramesinPythonordata.frameinR.ADataFrame

isamorespecializedtoolthananormalarrayforworkingwithtabularandstatisticaldata,

anditisdefinedintheDataFramespackage,apopularJulialibraryforstatisticalwork.

InstallitinyourenvironmentbytypinginPkg.add("DataFrames")intheREPL.Then,

importitintoyourcurrentworkspacewithusingDataFrames.Dothesameforthe

packagesDataArraysandRDatasets(whichcontainsacollectionofexampledatasets

mostlyusedintheRliterature).

Acommoncaseinstatisticaldataisthatdatavaluescanbemissing(theinformationisnot

known).TheDataArrayspackageprovidesuswiththeuniquevalueNA,whichrepresents

amissingvalue,andhasthetypeNAtype.Theresultofthecomputationsthatcontainthe

NAvaluesmostlycannotbedetermined,forexample,42+NAreturnsNA.(Juliav0.4also

hasanewNullable{T}type,whichallowsyoutospecifythetypeofamissingvalue).A

DataArray{T}arrayisadatastructurethatcanben-dimensional,behaveslikeastandard

Juliaarray,andcancontainvaluesofthetypeT,butitcanalsocontainthemissing(Not

Available)valuesNAandcanworkefficientlywiththem.Toconstructthem,usethe@data

macro:

//codeinChapter8\dataarrays.jl

usingDataArrays

usingDataFrames

dv=@data([7,3,NA,5,42])



Thisreturns5-elementDataArray{Int64,1}:73NA542.

Thesumofthesenumbersisgivenbysum(dv)andreturnsNA.OnecanalsoassigntheNA

valuestothearraywithdv[5]=NA;then,dvbecomes[7,3,NA,5,NA]).Converting

thisdatastructuretoanormalarrayfails:convert(Array,dv)returnsERROR:

NAException.

HowtogetridoftheseNAvalues,supposingwecandososafely?Wecanusethedropna

function,forexample,sum(dropna(dv))returns15.Ifyouknowthatyoucanreplace

themwithavaluev,usethearrayfunction:

repl=-1

sum(array(dv,repl))#returns13



ADataFrameisakindofanin-memorydatabase,versatileinthewaysyoucanworkwith

thedata.ItconsistsofcolumnswithnamessuchasCol1,Col2,Col3,andsoon.Eachof

thesecolumnsareDataArraysthathavetheirowntype,andthedatatheycontaincanbe



referredtobythecolumnnamesaswell,sowehavesubstantiallymoreformsofindexing.

Unliketwo-dimensionalarrays,columnsinaDataFramecanbeofdifferenttypes.One

columnmight,forinstance,containthenamesofstudentsandshouldthereforebeastring.

Anothercolumncouldcontaintheirageandshouldbeaninteger.

WeconstructaDataFramefromtheprogramdataasfollows:

//codeinChapter8\dataframes.jl

usingDataFrames

#constructingaDataFrame:

df=DataFrame()

df[:Col1]=1:4

df[:Col2]=[e,pi,sqrt(2),42]

df[:Col3]=[true,false,true,false]

show(df)



Noticethatthecolumnheadersareusedassymbols.Thisreturnsthefollowing4x3

DataFrameobject:



Wecouldalsohaveusedthefullconstructorasfollows:

df=DataFrame(Col1=1:4,Col2=[e,pi,sqrt(2),42],Col3=[true,false,

true,false])



Youcanrefertothecolumnseitherbyanindex(thecolumnnumber)orbyaname,both

ofthefollowingexpressionsreturnthesameoutput:

show(df[2])

show(df[:Col2])



Thisgivesthefollowingoutput:

[2.718281828459045,3.141592653589793,1.4142135623730951,42.0]



Toshowtherowsorsubsetsofrowsandcolumns,usethefamiliarsplice(:)syntax,for

example:

Togetthefirstrow,executedf[1,:].Thisreturns1x3DataFrame.

|Row|Col1|Col2|Col3|

|-----|------|---------|------|

|1|1|2.71828|true|



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

I/O, Networking, and Parallel Computing

Tải bản đầy đủ ngay(0 tr)

×