Natural Language Processing and Computational Linguistics

Natural Language Processing and Computational Linguistics

By : Bhargav Srinivasa-Desikan

Buy this Book

Natural Language Processing and Computational Linguistics

By: Bhargav Srinivasa-Desikan

Buy this Book

Overview of this book

Modern text analysis is now very accessible using Python and open source tools, so discover how you can now perform modern text analysis in this era of textual data. This book shows you how to use natural language processing, and computational linguistics algorithms, to make inferences and gain insights about data you have. These algorithms are based on statistical machine learning and artificial intelligence techniques. The tools to work with these algorithms are available to you right now - with Python, and tools like Gensim and spaCy. You'll start by learning about data cleaning, and then how to perform computational linguistics from first concepts. You're then ready to explore the more sophisticated areas of statistical NLP and deep learning using Python, with realistic language and text samples. You'll learn to tag, parse, and model text using the best tools. You'll gain hands-on knowledge of the best frameworks to use, and you'll know when to choose a tool like Gensim for topic models, and when to work with Keras for deep learning. This book balances theory and practical hands-on examples, so you can learn about and conduct your own natural language processing projects and computational linguistics. You'll discover the rich ecosystem of Python tools you have available to conduct NLP - and enter the interesting world of modern text analysis.

Title Page

Packt Upsell

Contributors

Preface

Free Chapter

What is Text Analysis?

What is text analysis?

Where's the data at?

Garbage in, garbage out

Why should you do text analysis?

Summary

References

Python Tips for Text Analysis

Why Python?

Text manipulation in Python

Summary

References

spaCy's Language Models

spaCy

Installation

Tokenizing text

Summary

References

Gensim – Vectorizing Text and Transformations and n-grams

Introducing Gensim

Vectors and why we need them

Vector transformations in Gensim

n-grams and some more preprocessing

Summary

References

POS-Tagging and Its Applications

What is POS-tagging?

POS-tagging in Python

Training our own POS-taggers

POS-tagging code examples

Summary

References

NER-Tagging and Its Applications

What is NER-tagging?

NER-tagging in Python

Training our own NER-taggers

NER-tagging examples and visualization

Summary

References

Dependency Parsing

Dependency parsing

Dependency parsing in Python

Dependency parsing with spaCy

Training our dependency parsers

Summary

References

Topic Models

What are topic models?

Topic models in Gensim

Topic models in scikit-learn

Summary

References

Advanced Topic Modeling

Advanced training tips

Exploring documents

Topic coherence and evaluating topic models

Visualizing topic models

Summary

References

Clustering and Classifying Text

Clustering text

Starting clustering

K-means

Hierarchical clustering

Classifying text

Summary

References

Similarity Queries and Summarization

Summary

Word2Vec, Doc2Vec, and Gensim

Word2Vec

Doc2Vec

Other word embeddings

Summary

References

Deep Learning for Text

Deep learning

Deep learning for text (and more)

Generating text

Summary

References

Keras and spaCy for Deep Learning

Keras and spaCy

Classification with Keras

Classification with spaCy

Summary

References

Sentiment Analysis and ChatBots

Sentiment analysis

ChatBots

Summary

References

Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Why should you do text analysis?

We've talked about what text analysis is, where we can find the data, and some of the things to keep in mind before diving into text analysis. But after all, what motivation do you, the reader, have to actually go about doing text analysis?

For starters, it's the sheer abundance of easily available data that we can use. In the big data age, there really is no excuse to not have a look at what all our data really means. In fact, apart from the massive data sets, we can download off the internet, we also have access to small data– text messages, emails, a collection of poems are such examples. You could even do a meta-analysis and run an analysis on this very book! Textual data is even easier to get a hold-off, but far more importantly - it's easy to interpret and understand the results of the analysis. Numbers might not always make sense and are not always appealing to look at - but words are easier for us human beings to appreciate.

Text analysis remains exciting also because we can use data which directly involves the user- our own text conversations, our favorite childhood book, or tweets by our favorite celebrity. The personal nature of text data always adds an extra bit of motivation, and it also likely means we are aware of the nature of the data, and what kind of results to expect.

NLP techniques can also help us construct tools that can assist personal businesses or enterprises – chatbots, for example, are becoming increasingly common in major websites, and with the right approach, it is possible to have a personal chat-bot. This is largely due to a sub-field of machine learning, called Deep Learning, where we use algorithms and structures that are inspired by the structure of the human brain. These algorithms and structures are also referred to as neural networks. Advances in deep learning have introduced to powerful neural networks such as Recurrent Neural Networks(RNNs) and Convolutional Neural Networks(CNNs). Now, even with minimal knowledge of the mathematical functioning of these algorithms, high-level APIs are allowing us to use these tools. Integrating this into our daily life is no longer reserved for computer science researchers or full-time engineers – with the right collection of data and open source packages, this is well within our capabilities.

Open source packages have become industry standard – Google has released and maintains TensorFlow [21], and packages such as scikit-learn [22] are used by Apple and Spotify, and spaCy [23], which we will extensively discuss throughout this book – is used by Quora, a popular question-answer website.

We are no longer limited by either data or the tools – the only two things we would need to do text analysis.

The programming language python will be our friend throughout the book, and all the tools we will use will all be free open-source software. While we move towards open science, we also move towards open source code, and this will remain a key philosophy throughout the book. In the world of research, open source code means academic results are reproducible and available to all those interested. Python remains an easy-to-use and powerful language and serves as a great way to enter the world of natural language processing.

One could argue that the last thing needed was the knowledge of how to apply these tools and to wrangle with the data – but that is precisely the purpose of the book and, hoping to let the reader build their own natural language processing pipelines and models at the end of the journey.

Natural Language Processing and Computational Linguistics

By : Bhargav Srinivasa-Desikan

Natural Language Processing and Computational Linguistics

By: Bhargav Srinivasa-Desikan

Overview of this book

Related Content you might be interested in

Current Title:

Natural Language Processing and Computational Linguistics

Mastering spaCy

The Handbook of NLP with Gensim

Natural Language Processing with Python Quick Start Guide

Why should you do text analysis?