Book Image

Python for Finance Cookbook

By : Eryk Lewinson
Book Image

Python for Finance Cookbook

By: Eryk Lewinson

Overview of this book

Python is one of the most popular programming languages used in the financial industry, with a huge set of accompanying libraries. In this book, you'll cover different ways of downloading financial data and preparing it for modeling. You'll calculate popular indicators used in technical analysis, such as Bollinger Bands, MACD, RSI, and backtest automatic trading strategies. Next, you'll cover time series analysis and models, such as exponential smoothing, ARIMA, and GARCH (including multivariate specifications), before exploring the popular CAPM and the Fama-French three-factor model. You'll then discover how to optimize asset allocation and use Monte Carlo simulations for tasks such as calculating the price of American options and estimating the Value at Risk (VaR). In later chapters, you'll work through an entire data science project in the financial domain. You'll also learn how to solve the credit card fraud and default problems using advanced classifiers such as random forest, XGBoost, LightGBM, and stacked models. You'll then be able to tune the hyperparameters of the models and handle class imbalance. Finally, you'll focus on learning how to use deep learning (PyTorch) for approaching financial tasks. By the end of this book, you’ll have learned how to effectively analyze financial data using a recipe-based approach.
Table of Contents (12 chapters)

Exploratory data analysis

The second step, after loading the data, is to carry out Exploratory Data Analysis (EDA). By doing this, we get to know the data we are supposed to work with. Some insights we try to gather are:

  • What kind of data do we actually have, and how should we treat different types?
  • What is the distribution of the variables?
    • Are there outliers in the data, and how can we treat them?
    • Are any transformations required? For example, some models work better with (or require) normally distributed variables, so we might want to use techniques such as log transformation.
    • Does the distribution vary per group (for example, gender or education level)?
  • Do we have cases of missing data? How frequent are these, and in which variables?
  • Is there a linear relationship between some variables (correlation)?
  • Can we create new features using the existing set of variables? An example...