-
Book Overview & Buying
-
Table Of Contents
Python Feature Engineering Cookbook - Second Edition
By :
Categorical variables are those whose values are selected from a group of categories or labels. For example, the Gender variable with the values of Male and Female is categorical, and so is the marital status variable with the values of never married, married, divorced, and widowed. In some categorical variables, the labels have an intrinsic order; for example, in the Student’s grade variable, the values of A, B, C, and Fail are ordered, with A being the highest grade and Fail being the lowest. These are called ordinal categorical variables. Variables in which the categories do not have an intrinsic order are called nominal categorical variables, such as the City variable, with the values of London, Manchester, Bristol, and so on.
The values of categorical variables are often encoded as strings. To train mathematical or machine learning models, we need to transform those strings into numbers. The act of replacing strings with numbers is called categorical encoding. In this chapter, we will discuss multiple categorical encoding methods.
This chapter will cover the following recipes: