In this chapter, we discussed why data exploration is important and how can we perform exploratory analysis on datasets.
These are the various important techniques and concepts that we discussed:
Sampling is a technique to randomly select unrelated data from the given dataset so that we can generalize the results that we generate on this selected data over the complete dataset.
Weight vectors are important when the dataset that we have or gather doesn't represent the actual data.
Why it is necessary to know the column types and how summary functions can be really helpful in getting the gist of the dataset.
Mean, median, mode, standard deviation, variance, and scalar statistics, and how they are implemented in Julia.
Measuring the variations in a dataset is really important and z-scores and entropy can be really useful.
After some basic data cleaning and some understanding, visualization can be very beneficial and insightful.