Book Image

Apache Hive Essentials. - Second Edition

By : Dayong Du
Book Image

Apache Hive Essentials. - Second Edition

By: Dayong Du

Overview of this book

In this book, we prepare you for your journey into big data by frstly introducing you to backgrounds in the big data domain, alongwith the process of setting up and getting familiar with your Hive working environment. Next, the book guides you through discovering and transforming the values of big data with the help of examples. It also hones your skills in using the Hive language in an effcient manner. Toward the end, the book focuses on advanced topics, such as performance, security, and extensions in Hive, which will guide you on exciting adventures on this worthwhile big data journey. By the end of the book, you will be familiar with Hive and able to work effeciently to find solutions to big data problems
Table of Contents (12 chapters)

A short history

In the 1960s, when computers became a more cost-effective option for businesses, people started to use databases to manage data. Later on, in the 1970s, relational databases became more popular for business needs since they connected physical data with the logical business easily and closely. In the next decade, Structured Query Language (SQL) became the standard query language for databases. The effectiveness and simplicity of SQL motivated lots of people to use databases and brought databases closer to a wide range of users and developers. Soon, it was observed that people used databases for data application and management and this continued for a long period of time.

Once plenty of data was collected, people started to think about how to deal with the historical data. Then, the term data warehousing came up in the 1990s. From that time onward, people started discussing how to evaluate current performance by reviewing the historical data. Various data models and tools were created to help enterprises effectively manage, transform, and analyze their historical data. Traditional relational databases also evolved to provide more advanced aggregation and analyzed functions as well as optimizations for data warehousing. The leading query language was still SQL, but it was more intuitive and powerful compared to the previous versions. The data was still well-structured and the model was normalized. As we entered the 2000s, the internet gradually became the topmost industry for the creation of the majority of data in terms of variety and volume. Newer technologies, such as social media analytics, web mining, and data visualizations, helped lots of businesses and companies process massive amounts of data for a better understanding of their customers, products, competition, and markets. The data volume grew and the data format changed faster than ever before, which forced people to search for new solutions, especially in the research and open source areas. As a result, big data became a hot topic and a challenging field for many researchers and companies.

However, in every challenge there lies great opportunity. In the 2010s, Hadoop, which was one of the big data open source projects, started to gain wide attention due to its open source license, active communities, and power to deal with the large volumes of data. This was one of the few times that an open source project led to the changes in technology trends before any commercial software products. Soon after, the NoSQL database, real-time analytics, and machine learning, as followers, quickly became important components on top of the Hadoop big data ecosystem. Armed with these big data technologies, companies were able to review the past, evaluate the current, and grasp the future opportunities.