statsmodels has quite a lot of sample datasets in its distribution. The complete list can be found at https://github.com/statsmodels/statsmodels/tree/master/statsmodels/datasets.
In this tutorial, we will concentrate on the copper dataset, which contains information about copper prices, world consumption, and other parameters.
Before we start, we might need to install patsy. patsy is a library that describes statistical models. It is easy enough to see whether this library is necessary; just run the code. If you get errors related to patsy, execute any one of the following commands:
$ sudo easy_install patsy $ pip install --upgrade patsy
In this section, we will load a dataset from statsmodels as a pandas DataFrame
or Series
object.
The function we need to call is
load_pandas()
. Load the data as follows:data = statsmodels.api.datasets.copper.load_pandas()
This loads the data in a
DataSet
object, which contains pandas...