Book Image

Mastering spaCy

By : Duygu Altınok
Book Image

Mastering spaCy

By: Duygu Altınok

Overview of this book

spaCy is an industrial-grade, efficient NLP Python library. It offers various pre-trained models and ready-to-use features. Mastering spaCy provides you with end-to-end coverage of spaCy's features and real-world applications. You'll begin by installing spaCy and downloading models, before progressing to spaCy's features and prototyping real-world NLP apps. Next, you'll get familiar with visualizing with spaCy's popular visualizer displaCy. The book also equips you with practical illustrations for pattern matching and helps you advance into the world of semantics with word vectors. Statistical information extraction methods are also explained in detail. Later, you'll cover an interactive business case study that shows you how to combine all spaCy features for creating a real-world NLP pipeline. You'll implement ML models such as sentiment analysis, intent recognition, and context resolution. The book further focuses on classification with popular frameworks such as TensorFlow's Keras API together with spaCy. You'll cover popular topics, including intent classification and sentiment analysis, and use them on popular datasets and interpret the classification results. By the end of this book, you'll be able to confidently use spaCy, including its linguistic features, word vectors, and classifiers, to create your own NLP apps.
Table of Contents (15 chapters)
1
Section 1: Getting Started with spaCy
4
Section 2: spaCy Features
9
Section 3: Machine Learning with spaCy

Installing spaCy

Let's get started by installing and setting up spaCy. spaCy is compatible with 64-bit Python 2.7 and 3.5+, and can run on Unix/Linux, macOS/OS X, and Windows. CPython is a reference implementation of Python in C. If you already have Python running on your system, most probably your CPython modules are fine too – hence you don't need to worry about this detail. The newest spaCy releases are always downloadable via pip (https://pypi.org/) and conda (https://conda.io/en/latest/). pip and conda are two of the most popular distribution packages.  

pip is the most painless choice as it installs all the dependencies, so let's start with it.

Installing spaCy with pip

You can install spaCy with the following command:

$ pip install spacy

If you have more than one Python version installed in your system (such as Python 2.8, Python 3.5, Python 3.8, and so on), then select the pip associated with Python you want to use. For instance, if you want to use spaCy with Python 3.5, you can do the following:

$ pip3.5 install spacy

If you already have spaCy installed on your system, you may want to upgrade to the latest version of spaCy. We're using spaCy 3.1 in this book; you can check which version you have with the following command:

$ python –m spacy info

This is how a version info output looks like. This has been generated with the help of my Ubuntu machine:

Figure 1.7 – An example spaCy version output

Figure 1.7 – An example spaCy version output

Suppose you want to upgrade your spaCy version. You can upgrade your spaCy version to the latest available version with the following command:

$ pip install –U spacy

Installing spaCy with conda

conda support is provided by the conda community. The command for installing spaCy with conda is as follows:

$ conda install -c conda-forge spacy 

Installing spaCy on macOS/OS X

macOS and OS X already ship with Python. You only need to install a recent version of the Xcode IDE. After installing Xcode, please run the following:

$ xcode-select –install

This installs the command-line development tools. Then you will be able to follow the preceding pip commands.

Installing spaCy on Windows

If you have a Windows system, you need to install a version of Visual C++ Build Tools or Visual Studio Express that matches your Python distribution. Here are the official distributions and their matching versions, taken from the spaCy installation guide (https://spacy.io/usage#source-windows):

Figure 1.8 – Visual Studio and Python distribution compatibility table

Figure 1.8 – Visual Studio and Python distribution compatibility table

If you didn't encounter any problems so far, then that means spaCy is installed and running on your system. You should be able to import spaCy into your Python shell:

 import spacy

Now you successfully installed spaCy – congrats and welcome to the spaCy universe! If you have installation problems please continue to the next section, otherwise you can move on to language model installation.

Troubleshooting while installing spaCy

There might be cases where you get issues popping up during the installation process. The good news is, we're using a very popular library so most probably other developers have already encountered the same issues. Most of the issues are listed on Stack Overflow (https://stackoverflow.com/questions/tagged/spacy) and the spaCy GitHub Issues section (https://github.com/explosion/spaCy/issues) already. However, in this section, we'll go over the most common issues and their solutions.

Some of the most common issues are as follows:

  • The Python distribution is incompatible: In this case please upgrade your Python version accordingly and then do a fresh installation.
  • The upgrade broke spaCy: Most probably there are some leftover packages in your installation directories. The best solution is to first remove the spaCy package completely by doing the following:
     pip uninstall spacy

    Then do a fresh installation by following the installation instructions mentioned.

  • You're unable to install spaCy on a Mac: On a Mac, please make sure that you don't skip the following to make sure you correctly installed the Mac command-line tools and enabled pip:
    $ xcode-select –install

In general, if you have the correct Python dependencies, the installation process will go smoothly.

We're all set up and ready for our first usage of spaCy, so let's go ahead and start using spaCy's language models.