In this subsection, we represent a semi-automated technique of TM using Spark. Using other options as defaults, we train LDA on the dataset downloaded from GitHub at https://github.com/minghui/Twitter-LDA/tree/master/data/Data4Model/test. However, we will use more well-known text datasets in the model reuse and deployment phase later in this chapter.
The following steps show TM from data reading to printing the topics, along with their term weights. Here's the short workflow of the TM pipeline:
object topicmodelingwithLDA { def main(args: Array[String]): Unit = { val lda = new LDAforTM() // actual computations are done here val defaultParams = Params().copy(input = "data/docs/") //Loading parameters for training lda.run(defaultParams) // Training the LDA model with the default parameters. } }
We also need to import some related packages and libraries:
import edu.stanford.nlp.process.Morphology...