Book Image

Cloud Analytics with Microsoft Azure - Second Edition

By : Has Altaiar, Jack Lee, Michael Peña
Book Image

Cloud Analytics with Microsoft Azure - Second Edition

By: Has Altaiar, Jack Lee, Michael Peña

Overview of this book

Cloud Analytics with Microsoft Azure serves as a comprehensive guide for big data analysis and processing using a range of Microsoft Azure features. This book covers everything you need to build your own data warehouse and learn numerous techniques to gain useful insights by analyzing big data. The book begins by introducing you to the power of data with big data analytics, the Internet of Things (IoT), machine learning, artificial intelligence, and DataOps. You will learn about cloud-scale analytics and the services Microsoft Azure offers to empower businesses to discover insights. You will also be introduced to the new features and functionalities added to the modern data warehouse. Finally, you will look at two real-world business use cases to demonstrate high-level solutions using Microsoft Azure. The aim of these use cases will be to illustrate how real-time data can be analyzed in Azure to derive meaningful insights and make business decisions. You will learn to build an end-to-end analytics pipeline on the cloud with machine learning and deep learning concepts. By the end of this book, you will be proficient in analyzing large amounts of data with Azure and using it effectively to benefit your organization.
Table of Contents (7 chapters)

Top business drivers for adopting data analytics in the cloud

Different companies have different reasons for adopting data analytics using a public cloud such as Microsoft Azure. But more often than not, it boils down to three major reasons: rapid growth and scale, reducing costs, and driving innovation.

Rapid growth and scale

Enterprises and businesses need to rapidly expand their digital footprint. With the rapid growth of mobile applications—particularly media types (such as images and videos), IoT sensors, and social media data—there is just so much data to capture. This means enterprises and businesses need to scale their infrastructure to support these massive demands. Company database sizes continuously grow from gigabytes of data to terabytes, or even petabytes, of data.

End users are more demanding now than ever. If your application does not respond within seconds, the user is more likely to disengage with your service or product.

Scaling does not only apply to the consumers of the applications; it is also important for data scientists, data engineers, and data analysts in order to analyze a company's data. Scaling an infrastructure is vital, as you cannot expect your data engineers to handle massive chunks of data (gigabytes to terabytes) and run scripts to test your data models on a single machine. Even if you do serve this in a single high-performance server instance, it's going to take weeks or days for it to finish the test, not to mention the fact that it's going to cause performance bottlenecks for the end users who are consuming the same database.

With a modern data warehouse like Azure Synapse Analytics, you have some managed capabilities to scale, such as a dedicated caching layer. Caching will allow analysts, engineers, and scientists to query faster.

Reducing costs

Due to scaling demands, enterprises and businesses need to have a mechanism to expand their data infrastructure in a cost-effective and financially viable way. It is too expensive to set up an on-premises data warehouse. The following are just a few of the cost considerations:

  • The waiting time for server delivery and associated internal procurement processes
  • Networking and other physical infrastructure costs, such as hardware cooling and data center real estate
  • Professional services costs associated with setting up and maintaining these servers
  • Licensing costs (if any)
  • The productivity lost from people and teams who cannot ship their products faster

With a modern data warehouse, you can spin up new high-performance servers with high-performance graphics cards on demand. And with the use of a cloud provider such as Microsoft Azure, you will only need to pay for the time that you use these servers. You can shut them down if you don't need them anymore. Not only can you turn them off on demand, but if it turns out that a particular service is not suitable for your requirements, you can delete these resources and just provision a different service.

Azure also provides a discount for "reserved" instances that you are committing to use for a specific amount of time. These are very helpful for those databases, storage solutions, and applications that need to be up 24/7 with minimal downtime.

Driving innovation

Companies need to constantly innovate in this very competitive market, otherwise someone else will rise up and take the market share. But obviously, no one can predict the future with 100% accuracy; hence, companies need to have a mechanism to explore new things based on what they know.

One good example of this is the Business Process Outsourcing (BPO) and telecommunications (telco) industries, where there are petabytes of data that may not have been explored yet. With Microsoft Azure's modern data warehouse, actors in such industries can have the infrastructure to do data exploration. With Azure Synapse Analytics, Power BI, and Azure Machine Learning, they can explore their data to drive business decisions. Maybe they can come up with a data model that can detect fraudulent actions or better understand their customer preferences and expectations to improve satisfaction ratings. With advanced analytics, these companies can come up with decisions that are relevant today (and possibly in the future) and are not just restricted to analyzing historical data.

What if you want to create an autonomous vehicle? You will need a robust data warehouse to store your datasets and a tremendous amount of data processing. You need to capture massive amounts of data—whether through pictures or videos that the car is continuously capturing—and need to come up with a response almost instantly based on your dataset and algorithms.

Using a cloud provider such as Microsoft Azure would allow you to test and validate your ideas early on, without a massive investment. With various Azure services and related tools such as GitHub and Visual Studio, you can rapidly prototype your ideas and explore possibilities. What if it turns out that the product or service that you or your team is working on does not really gain traction? If you are doing this on-premises, you will still have high liability and operations costs since you physically own the infrastructure, in addition to any associated licensing and services costs.