So far, all the elements in our DataFrames were simple types. DataFrames support three additional collection types: arrays, maps, and structs.
The first compound type that we will look at is the struct. A struct is similar to a case class: it stores a set of key-value pairs, with a fixed set of keys. If we convert an RDD of a case class containing nested case classes to a DataFrame, Spark will convert the nested objects to a struct.
Let's imagine that we want to serialize Lords of the Ring characters. We might use the following object model:
case class Weapon(name:String, weaponType:String) case class LotrCharacter(name:String, val weapon:Weapon)
We want to create a DataFrame of LotrCharacter
instances. Let's create some dummy data:
scala> val characters = List( LotrCharacter("Gandalf", Weapon("Glamdring", "sword")), LotrCharacter("Frodo", Weapon("Sting", "dagger")), LotrCharacter("Aragorn", Weapon("Anduril", "sword")) ) characters...