Questions
Before readers head to the next chapter, we invite readers to attempt an upgrade on the flight performance model. The idea is this—feed in a couple more predictors that enhance the flight delay ML process in a way that makes predictions deeper and more incisive.
Here are a few questions to open further vistas of learning:
- What is a
parquet
file and what are its advantages, especially when a dataset becomes larger, and data shuffling between nodes becomes necessary? - What are the advantages of data compressed in a columnar format?
- Occasionally, you might run into this error: "
Unable to find encoder stored in Dataset. Primitive types (Int, String, and so on) and Product types (case classes) are supported by importing spark.implicits._
". How do you get around this error? What is the root cause? Hint—build a simple dataframe with a dataset from the first chapter. Use thespark.read
approach and attempt aprintSchema
on it. If that produces the aforementioned error, investigate if it could...