By data analysis packages, we mean software designed for analyzing data in some way. A simple statistical regression would be an example. Software implementing machine-learning algorithms would be another example.
Saddle is Scala's answer to R and Python's pandas package. It supports reading in structured data in a variety of different formats, including CSV and HDF5. The data can be loaded into frames and then manipulated as you would in other similar software. Statistical analysis can be performed, and you can build your own statistical analysis methods on top of the data structures provided by Saddle. Saddle is examined in detail in a separate chapter dedicated to it. It can be found at the following website:
Apache's MLlib library provides machine learning algorithms for the Spark platform. The library can be accessed from Scala as well as from Java and Python. It supports basic statistical methods for data analysis, various regression and classification methods, clustering via k-means, dimensionality reduction, and optimization methods. The number of algorithms in the library is constantly growing. The MLib library can be found at the following website: