Machine Learning for Streaming Data with Python

By : Joos Korstanje

Machine Learning for Streaming Data with Python

By: Joos Korstanje

Overview of this book

Streaming data is the new top technology to watch out for in the field of data science and machine learning. As business needs become more demanding, many use cases require real-time analysis as well as real-time machine learning. This book will help you to get up to speed with data analytics for streaming data and focus strongly on adapting machine learning and other analytics to the case of streaming data. You will first learn about the architecture for streaming and real-time machine learning. Next, you will look at the state-of-the-art frameworks for streaming data like River. Later chapters will focus on various industrial use cases for streaming data like Online Anomaly Detection and others. As you progress, you will discover various challenges and learn how to mitigate them. In addition to this, you will learn best practices that will help you use streaming data to generate real-time insights. By the end of this book, you will have gained the confidence you need to stream data in your machine learning models.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Share Your Thoughts

Part 1: Introduction and Core Concepts of Streaming Data

Free Chapter

Chapter 1: An Introduction to Streaming Data

Technical requirements

A short history of data science

Working with streaming data

Real-time data formats and importing an example dataset in Python

Summary

Further reading

Chapter 2: Architectures for Streaming and Real-Time Machine Learning

Technical requirements

Defining your analytics as a function

Understanding microservices architecture

Communicating between services through APIs

Demystifying the HTTP protocol

Building a simple API on AWS

Big data tools for real time streaming

Summary

Further reading

Chapter 3: Data Analysis on Streaming Data

Technical requirements

Descriptive statistics on streaming data

Introduction to sampling theory

Overview of the main descriptive statistics

Real-time visualizations

Building basic alerting systems

Summary

Further reading

Part 2: Exploring Use Cases for Data Streaming

Chapter 4: Online Learning with River

Technical requirements

What is online machine learning?

Using River for online learning

Summary

Further reading

Chapter 5: Online Anomaly Detection

Technical requirements

Defining anomaly detection

Exploring use cases of anomaly detection

Comparing anomaly detection and imbalanced classification

Algorithms for detecting anomalies in River

Going further with anomaly detection

Summary

Further reading

Chapter 6: Online Classification

Technical requirements

Defining classification

Identifying use cases of classification

Overview of classification algorithms in River

Summary

Further reading

Chapter 7: Online Regression

Technical requirements

Defining regression

Use cases of regression

Overview of regression algorithms in River

Summary

Further reading

Chapter 8: Reinforcement Learning

Technical requirements

Defining reinforcement learning

The main steps of a reinforcement learning model

Exploring Q-learning

Deep Q-learning

Using reinforcement learning for streaming data

Use cases of reinforcement learning

Implementing reinforcement learning in Python

Summary

Further reading

Part 3: Advanced Concepts and Best Practices around Streaming Data

Chapter 9: Drift and Drift Detection

Technical requirements

Defining drift

Introducing model explicability

Measuring drift

Measuring drift in Python

Counteracting drift

Summary

Further reading

Chapter 10: Feature Transformation and Scaling

Technical requirements

Challenges of data preparation with streaming data

Scaling data for streaming

Transforming features in a streaming context

Summary

Further reading

Chapter 11: Catastrophic Forgetting

Technical requirements

Introducing catastrophic forgetting

Catastrophic forgetting in online models

Detecting catastrophic forgetting

Model explicability versus catastrophic forgetting

Summary

Further reading

Chapter 12: Conclusion and Best Practices

Going further

Summary

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Comparing anomaly detection and imbalanced classification

For detecting positive cases against negative cases, the standard go-to family of methods would be classification. For the problems described, as long as you have historical data on at least a few positive and negative cases, you can use classification algorithms. However, you have a very common problem: there are only very few observations that are anomalies. This is a problem that is generally known as the problem of imbalanced data.

The problem of imbalanced data

Imbalanced datasets are datasets in which the target class has very unevenly distributed occurrences. An often-occurring example is website sales: among 1,000 visitors, you often have at least 900 visitors that are just watching and browsing, as opposed to maybe 100 who actually buy something.

Using classification methods carelessly on imbalanced data is prone to errors. Imagine that you fit a classification model that needs to predict for each website visitor...

Machine Learning for Streaming Data with Python

By : Joos Korstanje

Machine Learning for Streaming Data with Python

By: Joos Korstanje

Overview of this book

Related Content you might be interested in

Current Title:

Machine Learning for Streaming Data with Python

Machine Learning for Time-Series with Python

Comparing anomaly detection and imbalanced classification

The problem of imbalanced data