Let's go back to the same discussion we had of building a machine learning/NLP model on Hadoop and the other where we score a ML model on Hadoop. We discussed second option of scoring in depth in the last section. Instead sampling a smaller data-set and scoring let’s use a larger data-set and build a large-scale machine learning model step-by-step using PySpark. I am again using the same running data with the same schema:
ID |
Comment |
Class |
---|---|---|
UA0001 |
I tried calling you, The service was not up to the mark |
1 |
UA0002 |
Can you please update my phone no |
0 |
UA0003 |
Really bad experience |
1 |
UA0004 |
I am looking for an iPhone |
0 |
UA0005 |
Can somebody help me with my password |
1 |
UA0006 |
Thanks for considering my request for |
0 |
Consider the schema for last 10 years worth of comments of the organization. Now, instead of using a small sample to build a classification model, and then using a pretrained model to score all the comments, let me give you a step-by-step example of how to build...