One of the design considerations for a new framework is always the compatibility with the old frameworks. For better or worse, most data analysts still work with SQL. The roots of the SQL go to an influential relational modeling paper (Codd, Edgar F (June 1970). A Relational Model of Data for Large Shared Data Banks. Communications of the ACM (Association for Computing Machinery) 13 (6): 377–87). All modern databases implement one or another version of SQL.
While the relational model was influential and important for bringing the database performance, particularly for Online Transaction Processing (OLTP) to the competitive levels, the significance of normalization for analytic workloads, where one needs to perform aggregations, and for situations where relations themselves change and are subject to analysis, is less critical. This section will cover the extensions of standard SQL language for analysis engines traditionally used for big data analytics: Hive and Impala. Both...