-
Book Overview & Buying
-
Table Of Contents
Mastering Julia - Second Edition
By :
Users of R will be aware of the success of data frames when employed in analyzing datasets, a success that has been mirrored by Python with the pandas package.
Julia too adds data frame support through the use of a DataFrames package.
The package extends Julia’s base by introducing three basic types, as follows:
Missing.missing: An indicator that a data value is missingDataArray: An extension to the Array type that can contain missing valuesDataFrame: A data structure for representing tabular datasetsIt is such a large topic that we will be looking at data frames in some depth when we consider statistical computing.
However, here’s some code to get a flavor of processing data with these packages:
julia>using DataFramesjulia>df1 = DataFrame(ID = 1:4, Cost = [10.1,7.9,missing,4.5]) 4 ×2 DataFrame │ Row │ ID │ Cost │ ├─────┼────┼─────────┤ │ 1 │ 1 │ 10.1 │ │ 2 │ 2 │ 7.9 │ │ 3 │ 3 │ missing │ │ 4 │ 4 │ 4.5 │
Common operations include computing mean(d) or var(d) of the Cost because of the missing value in row 3:
julia>using Statisticsjulia>mean(!, df1[:Cost]) missing
We can create a new data frame by dropping ALL rows with missing values, and now statistical functions can be applied as normal:
julia>df2 = dropmissing(df1). << This might have changed ??? >>> 3 ×2 DataFrames.DataFrame │ Row │ ID │ Cost │ ├─────┼────┼──────┤ │ 1 │ 1 │ 10.1 │ │ 2 │ 2 │ 7.9 │ │ 3 │ 4 │ 4.5 │julia>(μ,σ) = (mean(df2[!,:Cost]),std(df2[!,:Cost])) (7.5, 2.8213471959331766)
We will cover data frames in much greater detail when we consider data I/O in Chapter 6.
At this time, we will look at the Tables API, implemented in the Tables.jl file, which is used by a large number of packages.