Book Image

Network Science with Python

By : David Knickerbocker
Book Image

Network Science with Python

By: David Knickerbocker

Overview of this book

Network analysis is often taught with tiny or toy data sets, leaving you with a limited scope of learning and practical usage. Network Science with Python helps you extract relevant data, draw conclusions and build networks using industry-standard – practical data sets. You’ll begin by learning the basics of natural language processing, network science, and social network analysis, then move on to programmatically building and analyzing networks. You’ll get a hands-on understanding of the data source, data extraction, interaction with it, and drawing insights from it. This is a hands-on book with theory grounding, specific technical, and mathematical details for future reference. As you progress, you’ll learn to construct and clean networks, conduct network analysis, egocentric network analysis, community detection, and use network data with machine learning. You’ll also explore network analysis concepts, from basics to an advanced level. By the end of the book, you’ll be able to identify network data and use it to extract unconventional insights to comprehend the complex world around you.
Table of Contents (17 chapters)
1
Part 1: Getting Started with Natural Language Processing and Networks
5
Part 2: Graph Construction and Cleanup
9
Part 3: Network Science and Social Network Analysis

Additional NLP and network considerations

This has been a marathon of a chapter. Please bear with me a little longer. I have a few final thoughts that I’d like to express, and then we can conclude this chapter.

Data cleanup

First, if you work with language data, there will always be cleanup. Language is messy and difficult. If you are only comfortable working with pre-cleaned tabular data, this is going to feel very messy. I love that, as every project allows me to improve my techniques and tactics.

I showed two different approaches for extracting entities: PoS tagging and NER. Both approaches work very well, but consider which approach gets us closer to a clean and useful entity list the quickest and easiest. With PoS tagging, we get one token at a time. With NER, we very quickly get to entities, but the models occasionally misbehave or don’t catch everything, so there is always cleanup with this as well.

There is no silver bullet. I want to use whatever...