It is finally time to build our state-of-the-art, SVM-based news topic classifier with all we just learned.
Load and clean the news dataset with the whole 20 groups:
>>> categories = None
>>> data_train = fetch_20newsgroups(subset='train',
categories=categories, random_state=42)
>>> data_test = fetch_20newsgroups(subset='test',
categories=categories, random_state=42)
>>> cleaned_train = clean_text(data_train.data)
>>> label_train = data_train.target
>>> cleaned_test = clean_text(data_test.data)
>>> label_test = data_test.target
>>> term_docs_train =
tfidf_vectorizer.fit_transform(cleaned_train)
>>> term_docs_test = tfidf_vectorizer.transform(cleaned_test)
Recall that the linear kernel is good at...