Sign In Start Free Trial

Book Overview & Buying
Table Of Contents

Hands-On Machine Learning for Algorithmic Trading

By : Stefan Jansen

4.1 (20)

Hands-On Machine Learning for Algorithmic Trading

4.1 (20)

By: Stefan Jansen

Overview of this book

The explosive growth of digital data has boosted the demand for expertise in trading strategies that use machine learning (ML). This book enables you to use a broad range of supervised and unsupervised algorithms to extract signals from a wide variety of data sources and create powerful investment strategies. This book shows how to access market, fundamental, and alternative data via API or web scraping and offers a framework to evaluate alternative data. You’ll practice the ML work?ow from model design, loss metric definition, and parameter tuning to performance evaluation in a time series context. You will understand ML algorithms such as Bayesian and ensemble methods and manifold learning, and will know how to train and tune these models using pandas, statsmodels, sklearn, PyMC3, xgboost, lightgbm, and catboost. This book also teaches you how to extract features from text data using spaCy, classify news and assign sentiment scores, and to use gensim to model topics and learn word embeddings from financial reports. You will also build and evaluate neural networks, including RNNs and CNNs, using Keras and PyTorch to exploit unstructured data for sophisticated strategies. Finally, you will apply transfer learning to satellite images to predict economic activity and use reinforcement learning to build agents that learn to trade in the OpenAI Gym.

Preface

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Free Chapter

Machine Learning for Trading

Machine Learning for Trading

How to read this book

The rise of ML in the investment industry

Design and execution of a trading strategy

ML and algorithmic trading strategies

Summary

Market and Fundamental Data

Market and Fundamental Data

How to work with market data

How to work with fundamental data

Efficient data storage with pandas

Summary

Alternative Data for Finance

Alternative Data for Finance

The alternative data revolution

Evaluating alternative datasets

The market for alternative data

Working with alternative data

Summary

Alpha Factor Research

Alpha Factor Research

Engineering alpha factors

Seeking signals – how to use zipline

Separating signal and noise – how to use alphalens

Alpha factor resources

Summary

Strategy Evaluation

Strategy Evaluation

How to build and test a portfolio with zipline

How to measure performance with pyfolio

How to avoid the pitfalls of backtesting

How to manage portfolio risk and return

Summary

The Machine Learning Process

The Machine Learning Process

Learning from data

The machine learning workflow

Summary

Linear Models

Linear Models

Linear regression for inference and prediction

The multiple linear regression model

How to build a linear factor model

Shrinkage methods – regularization for linear regression

How to use linear regression to predict returns

Linear classification

Summary

Time Series Models

Time Series Models

Analytical tools for diagnostics and feature extraction

Univariate time series models

Multivariate time series models

Summary

Bayesian Machine Learning

Bayesian Machine Learning

How Bayesian machine learning works

Probabilistic programming with PyMC3

Summary

Decision Trees and Random Forests

Decision Trees and Random Forests

Decision trees

Random forests

Summary

Gradient Boosting Machines

Gradient Boosting Machines

Adaptive boosting

Gradient boosting machines

Fast scalable GBM implementations

How to interpret GBM results

Summary

Unsupervised Learning

Unsupervised Learning

Dimensionality reduction

Clustering

Summary

Working with Text Data

Working with Text Data

How to extract features from text data

From text to tokens – the NLP pipeline

From tokens to numbers – the document-term matrix

Text classification and sentiment analysis

Summary

Topic Modeling

Topic Modeling

Learning latent topics: goals and approaches

Latent semantic indexing

Probabilistic latent semantic analysis

Latent Dirichlet allocation

Summary

Word Embeddings

Word Embeddings

How word embeddings encode semantics

Word vectors from SEC filings using gensim

Sentiment analysis with Doc2vec

Bonus – Word2vec for translation

Summary

Deep Learning

Deep Learning

Deep learning and AI

How to design a neural network

How to build a neural network using Python

How to train a neural network

How to use DL libraries

How to optimize neural network architectures

Summary

Convolutional Neural Networks

Convolutional Neural Networks

How ConvNets work

How to design and train a CNN using Python

Transfer learning – faster training with less data

How to detect objects

Recent developments

Summary

Recurrent Neural Networks

Recurrent Neural Networks

How RNNs work

How to build and train RNNs using Python

Summary

Autoencoders and Generative Adversarial Nets

Autoencoders and Generative Adversarial Nets

How autoencoders work

Designing and training autoencoders using Python

How GANs work

Summary

Reinforcement Learning

Reinforcement Learning

Key elements of RL

How to solve RL problems

Dynamic programming – Value and Policy iteration

Q-learning

Deep reinforcement learning

Reinforcement learning for trading

Summary

Next Steps

Next Steps

Key takeaways and lessons learned

ML for trading in practice

Conclusion

Other Books You May Enjoy

Other Books You May Enjoy

Leave a review - let other readers know what you think

Summary

In this chapter, we explored numerous techniques and options to process unstructured data with the goal of extracting semantically meaningful, numerical features for use in machine learning models.

We covered the basic tokenization and annotation pipeline and illustrated its implementation for multiple languages using spaCy and TextBlob. We built on these results to create a document model based on the bag-of-words model to represent documents as numerical vectors. We learned how to refine the preprocessing pipeline and then used vectorized text data for classification and sentiment analysis.

In the remaining two chapters on alternative text data, we will learn how to summarize text using unsupervised learning to identify latent topics (in the next chapter) and examine techniques to represent words as vectors that reflect the context of word usage and have been used very...

CONTINUE READING

83

Tech Concepts

36

Programming languages

73

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

Hands-On Machine Learning for Algorithmic Trading

Search

Your notes and bookmarks