-
Book Overview & Buying
-
Table Of Contents
Practical Data Science with Python
By :
Before we undertake text analysis, it's often helpful to undertake some common cleaning and preprocessing steps.
This often includes:
Cleaning and preparing text can improve the performance of ML algorithms as well as make it easier to understand the results of analysis. We'll cover the cleaning and preparation steps we have listed in order.
First, lowercasing is quite easy in Python. We simply take a string variable and use the built-in .lower() method. We'll use the book War and Peace by Leo Tolstoy for our text since it's one of the most famous long books. Perhaps we can draw some conclusions about the topics of the book without reading it. The Project Gutenberg website (https://www.gutenberg.org/) will...