A probability distribution is when each point or subset in a randomized experiment is allotted a certain probability. So, every random experiment (and, in fact, the data of every data science experiment) follows a certain probability distribution. And the type of distribution being followed by the data is very important for initiating the analytics process, as well as for selecting the machine learning algorithms that are to be implemented. It should also be noted that, in a multivariate data set, each variable might follow a separate distribution. So, it is not necessary that all variables in a dataset follow similar distributions.
To get ready, the Distributions
library has to be installed and imported. We install it using the Pkg.add()
function, as follows:
Pkg.add("Distributions")
Then the package has to be imported for use in the session. It can be imported through the
using ...
command, as follows:
using Distributions