Book Image

Python for Finance Cookbook - Second Edition

By : Eryk Lewinson
5 (1)
Book Image

Python for Finance Cookbook - Second Edition

5 (1)
By: Eryk Lewinson

Overview of this book

Python is one of the most popular programming languages in the financial industry, with a huge collection of accompanying libraries. In this new edition of the Python for Finance Cookbook, you will explore classical quantitative finance approaches to data modeling, such as GARCH, CAPM, factor models, as well as modern machine learning and deep learning solutions. You will use popular Python libraries that, in a few lines of code, provide the means to quickly process, analyze, and draw conclusions from financial data. In this new edition, more emphasis was put on exploratory data analysis to help you visualize and better understand financial data. While doing so, you will also learn how to use Streamlit to create elegant, interactive web applications to present the results of technical analyses. Using the recipes in this book, you will become proficient in financial data analysis, be it for personal or professional projects. You will also understand which potential issues to expect with such analyses and, more importantly, how to overcome them.
Table of Contents (18 chapters)
16
Other Books You May Enjoy
17
Index

Splitting data into training and test sets

Having completed the EDA, the next step is to split the dataset into training and test sets. The idea is to have two separate datasets:

  • Training set—on this part of the data we train a machine learning model,
  • Test set—this part of the data was not seen by the model during training and is used to evaluate its performance.

By splitting the data this way we want to prevent overfitting. Overfitting is a phenomenon that occurs when a model finds too many patterns in data used for training and performs well only on that particular data. In other words, it fails to generalize to unseen data.

This is a very important step in the analysis, as doing it incorrectly can introduce bias, for example, in the form of data leakage. Data leakage can occur when, during the training phase, a model observes information to which it should not have access. We follow up with an example. A common scenario is that of imputing missing values with the feature...