Developing a machine learning application
In this section, we will present a machine learning example for textual analysis. Refer to Chapter 6, Using Spark SQL in Machine Learning Applications, for more details the machine learning code presented in this section.
The Dataset used in the following example contains 1,080 documents of text business descriptions of Brazilian companies categorized into a of nine categories. You can download this Dataset from https://archive.ics.uci.edu/ml/datasets/CNAE-9.
scala> val inRDD = spark.sparkContext.textFile("file:///Users/aurobindosarkar/Downloads/CNAE-9.data") scala> val rowRDD = inRDD.map(_.split(",")).map(attributes => Row(attributes(0).toDouble, attributes(1).toDouble, attributes(2).toDouble, attributes(3).toDouble, attributes(4).toDouble, attributes(5).toDouble, . . . attributes(852).toDouble, attributes(853).toDouble, attributes(854).toDouble, attributes(855).toDouble, attributes(856).toDouble))
Next, we define a schema for the input...