Case classes have an inherent limitation. They can hold only 22 attributes—Catch 22, if you will. While a reasonable percentage of datasets would fit in that budget, in many cases, the limitation of 22 features in a dataset is a huge turnoff. In this recipe, we'll take a sample
Student
dataset (http://archive.ics.uci.edu/ml/datasets/Student+Performance), which has 33 features, and we'll see how we can work around this.
Case classes in Scala cannot go beyond encapsulating 22 fields because the companion classes that are generated (during compilation) for these case classes cannot find the matching FunctionN
and TupleN
classes. Let's take the example of the Employee
case class that we created in Chapter 2, Getting Started with Apache Spark DataFrames:
case class Employee(id:Int, name:String)