In the following code snippet, we have implemented sentiment analysis based on the NLP theory we discussed in this chapter. It uses SPARK libraries on Tweeter JSON records to train models for identifying sentiments like happy
or unhappy
. It looks for keywords like happy
in the twitter messages and then flags it with value 1
indicating that this message represents a happy sentiment. Other messages are flagged with value 0
which represents unhappy sentiment. Finally TF-IDF algorithm is applied to train models:
import org.apache.spark.ml.feature.{HashingTF, RegexTokenizer, StopWordsRemover, IDF} import org.apache.spark.sql.functions._ import org.apache.spark.ml.classification.LogisticRegression import org.apache.spark.ml.Pipeline import org.apache.spark.ml.classification.MultilayerPerceptronClassifier import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator import scala.util.{Success, Try} import sqlContext.implicits._ val sqlContext = new org...