Real-world NLP datasets
This chapter will use the same Netflix and Twitter real-world NLP datasets from Chapter 5. In addition, both datasets have been vetted, cleaned, and stored in the
pluto_data directory in this book’s GitHub repository. The startup sequence is similar to the previous chapters. It is as follows:
- Clone the Python Notebook and Pluto.
- Verify Pluto.
- Locate the NLP data.
- Load the data into pandas.
- View the data.
Let’s start with the Python Notebook and Pluto.
Python Notebook and Pluto
Start by loading the
data_augmentation_with_python_chapter_6.ipynb file into Google Colab or your chosen Jupyter Notebook or JupyterLab environment. From this point onward, we will only display code snippets. The complete Python code can be found in the Python Notebook.
The next step is to clone the repository. We will reuse the code from Chapter 5. The
%run statements are used to instantiate Pluto:
# clone Packt GitHub...