Overview of this book

If you are looking to build data science models that are good for production, Java has come to the rescue. With the aid of strong libraries such as MLlib, Weka, DL4j, and more, you can efficiently perform all the data science tasks you need to. This unique book provides modern recipes to solve your common and not-so-common data science-related problems. We start with recipes to help you obtain, clean, index, and search data. Then you will learn a variety of techniques to analyze, learn from, and retrieve information from data. You will also understand how to handle big data, learn deeply from data, and visualize data. Finally, you will work through unique recipes that solve your problems while taking data science to production, writing distributed data science applications, and much more - things that will come in handy at work.
Table of Contents (16 chapters)
Creating and saving an Attribute-Relation File Format (ARFF) file

Weka's native file format is called Attribute-Relation File Format (ARFF). There are two logical parts of an ARFF file. The first part is called header, and the second part is called data . The header part has three physical sections that must be present in an ARFF file--the name of the relation, the attributes or features, and their data types and ranges. The data part has one physical section that must also be present to generate a machine-learning model. The header part of an ARFF file looks like the following:

% 1. Title: Iris Plants Database 
   % 2. Sources: 
   %      (a) Creator: R.A. Fisher 
   %      (b) Donor: Michael Marshall (MARSHALL%[email protected]) 
   %      (c) Date: July, 1988 
   @RELATION iris 
   @ATTRIBUTE sepallength     NUMERIC 
   @ATTRIBUTE sepalwidth      NUMERIC 
   @ATTRIBUTE petallength     NUMERIC 
   @ATTRIBUTE petalwidth...