-
Book Overview & Buying
-
Table Of Contents
Apache Hive Essentials
By :
Data file optimization covers the performance improvement on the data files in terms of file format, compression, and storage.
Hive supports TEXTFILE, SEQUENCEFILE, RCFILE, ORC, and PARQUET file formats. The three ways to specify the file format are as follows:
CREATE TABLE ... STORE AS <File_Format>
ALTER TABLE ... [PARTITION partition_spec] SET FILEFORMAT <File_Format>
SET hive.default.fileformat=<File_Format> --default fileformat for table
Here, <File_Type> is TEXTFILE, SEQUENCEFILE, RCFILE, ORC, and PARQUET.
We can load a text file directly to a table with the TEXTFILE format. To load data to the table with other file formats, we need to load the data to a TEXTFILE format table first. Then, use INSERT OVERWRITE TABLE <target_file_format_table> SELECT * FROM <text_format_source_table> to convert and insert the data to the file format as expected.
The file formats supported by Hive and their optimizations are as follows...