Book Image

Network Science with Python

By : David Knickerbocker
Book Image

Network Science with Python

By: David Knickerbocker

Overview of this book

Network analysis is often taught with tiny or toy data sets, leaving you with a limited scope of learning and practical usage. Network Science with Python helps you extract relevant data, draw conclusions and build networks using industry-standard – practical data sets. You’ll begin by learning the basics of natural language processing, network science, and social network analysis, then move on to programmatically building and analyzing networks. You’ll get a hands-on understanding of the data source, data extraction, interaction with it, and drawing insights from it. This is a hands-on book with theory grounding, specific technical, and mathematical details for future reference. As you progress, you’ll learn to construct and clean networks, conduct network analysis, egocentric network analysis, community detection, and use network data with machine learning. You’ll also explore network analysis concepts, from basics to an advanced level. By the end of the book, you’ll be able to identify network data and use it to extract unconventional insights to comprehend the complex world around you.
Table of Contents (17 chapters)
1
Part 1: Getting Started with Natural Language Processing and Networks
5
Part 2: Graph Construction and Cleanup
9
Part 3: Network Science and Social Network Analysis

Selecting a model

For this exercise, my goal is to simply show you how network data may be useful in ML, not to go into great detail about ML. There are many, many, many thick books on the subject. This is a book about how NLP and networks can be used together to understand the hidden strings that exist around us and the influence that they have on us. So, I am going to speed past the discussion on how different models work. For this exercise, we are going to use one very useful and powerful model that often works well enough. This model is called Random Forest.

Random Forest can take both numeric and categorical data as input. Our chosen features should work very well for this exercise. Random Forest is also easy to set up and experiment with, and it’s also very easy to learn what the model found most useful for predictions.

Other models would work. I attempted to use k-nearest neighbors and had nearly the same level of success, and I’m sure that Logistic regression...