Follow these steps to use Dask to perform some computations on a DataFrame object:
- First, we need to load the data from sample.csv into a Dask DataFrame:
data = dd.read_csv("sample.csv")
- Next, we perform a standard calculation on the columns of the DataFrame:
sum_data = data.lower + data.upper
print(sum_data)
Unlike with Pandas DataFrames, the result is not a new DataFrame. The print statement gives us the following information:
Dask Series Structure:
npartitions=1
float64
...
dtype: float64
Dask Name: add, 6 tasks
- To actually get the result, we need to use the compute method:
result = sum_data.compute()
print(result.head())
The result is now shown as expected:
0 -0.911811
1 0.947240
2 -0.552153
3 -0.429914
4 1.229118
dtype: float64
- We compute the means of the final two columns in exactly the same...