To use Apache Spark and Apache Cassandra together, we could develop the calls with our bare hands, but thanks to the open source community coordinated by the DataStax people we have the Spark Cassandra connector. If you remember the history, Cassandra was a project conceived on Facebook that became an Apache project and reached such a size that a whole company was created to support it: DataStax.
DataStax is the company responsible for Apache Cassandra's fate. DataStax has developed, among other useful tools, the Spark-Cassandra connector, which is a powerful open source library that hast three main directives:
- Expose Cassandra tables as Spark RDDs.
- Write Spark RDDs to Cassandra.
- Execute CQL queries within Spark applications.
The Spark-Cassandra connector main features are:
- Supports Apache Spark version 1.0 through 1.6
- Supports Apache Cassandra version 2.0 or later
- Supports Scala versions 2.10 and 2.11
- Supports all the Cassandra data types including collections
- Can convert...