Impala is a modern, open source massive parallel processing (MPP) SQL engine designed to work with a Hadoop environment. It provides the ability to execute queries with low latency. Hive does not meet the expectation for use cases requiring interactive analytics in a multi-user environment. Impala is integrated into the Hadoop environment and uses a number of standard Hadoop components such as Metastore, HDFS, HBase, YARN, and Sentry. Unlike hive, it does not run MapReduce jobs to get results. Hive uses the MapReduce engine for execution and the intermediate output results are stored on disk, which acts as an input to another job.
Impala is a massive parallel processing (MPP) distributed query execution engine. It utilizes the resources of an existing Hadoop cluster. It does not use MapReduce. However, it utilizes the data locality feature of Hadoop processing. Let's discuss the Impala architecture and its components in detail.
The following diagram shows the Impala...