Given some descriptors of a song, the goal of this problem is to predict the year when the song was produced. That's basically a regression problem, since the target variable to predict is a number in the range between 1922 and 2011.
For each song, in addition to the year of production, 90 attributes are provided. All of them are related to the timbre: 12 of them relate to the timbre average and 78 attributes describe the timbre's covariance; all the features are numerical (integer or floating point numbers).
The dataset is composed of more than half a million observations. As for the competition behind the dataset, the authors tried to achieve the best results using the first 463,715 observations as a training set and the remaining 51,630 for testing.
The metric used to evaluate the results is the Mean Absolute Error (MAE) between the predicted year and the real year of production for the songs composing the testing set. The goal is to minimize the error measure.