This recipe will address issues around tuning the classifier by paying attention to the mistakes made by the system and making linguistic adjustments by adjusting parameters and features. We will continue with the sentiment use case from the previous recipe and work with the same data. We will start with a fresh class at src/com/lingpipe/cookbook/chapter3/LinguisticTuning.java
.
We have very little data. In the real world, we will insist on more training data—at least 100 of the smallest category, negative, are needed with a natural distribution of positives and others.
We will jump right in and run some data—the default is data/activeLearningCompleted/disneySentimentDedupe.2.csv
, but you can specify your own file in the command line.