In the previous chapter, we learned how to load data from a relational data into an RDD using JdbcRDD. Spark 1.4 has support to load data directly into Dataframe from a JDBC resource. This recipe will explore how to do it.
Please make sure that JDBC driver JAR is visible on the client node and all the slaves nodes on which executor will run.
Create a table named
person
in MySQL using the following DDL:CREATE TABLE 'person' ( 'person_id' int(11) NOT NULL AUTO_INCREMENT, 'first_name' varchar(30) DEFAULT NULL, 'last_name' varchar(30) DEFAULT NULL, 'gender' char(1) DEFAULT NULL, 'age' tinyint(4) DEFAULT NULL, PRIMARY KEY ('person_id') )
Insert into person values('Barack','Obama','M',53); Insert into person values('Bill','Clinton','M',71); Insert into person values('Hillary','Clinton','F',68); Insert into person values('Bill','Gates','M',69); Insert into person values('Michelle','Obama','F...