Book Image

Apache Hive Essentials - Second Edition

Book Image

Apache Hive Essentials - Second Edition

Overview of this book

In this book, we prepare you for your journey into big data by frstly introducing you to backgrounds in the big data domain, alongwith the process of setting up and getting familiar with your Hive working environment. Next, the book guides you through discovering and transforming the values of big data with the help of examples. It also hones your skills in using the Hive language in an effcient manner. Toward the end, the book focuses on advanced topics, such as performance, security, and extensions in Hive, which will guide you on exciting adventures on this worthwhile big data journey. By the end of the book, you will be familiar with Hive and able to work effeciently to find solutions to big data problems
Table of Contents (12 chapters)

Data exchanging with LOAD

To move data, Hive uses the LOAD statement. Move here means the original data is moved to the target table/partition and does not exist in the original place anymore. The LOCAL keyword in the LOAD statement specifies where the files are located on the client host. If the LOCAL keyword is not specified, the files are loaded from the full Uniform Resource Identifier (URI) specified after INPATH (most of the time, hdfs path) or the value from the fs.default.name property defined in hdfs-site.xml by default. The path after INPATH can be a relative path or an absolute path. The path either points to a file or a folder (referring to all files in the folder) to be loaded, but the subfolder is not allowed in the path specified. If the data is loaded into a partition table, the partition column must be specified. The OVERWRITE keyword is used to decide whether...