-
Book Overview & Buying
-
Table Of Contents
Cloud Analytics with Microsoft Azure - Second Edition
By :
Microsoft Azure is an enterprise-grade set of cloud computing services created by Microsoft using their own managed data centers. Azure is the only cloud with a true end-to-end analytics solution. With Azure, analysts can derive insights in seconds from all enterprise data. Azure provides a mature and robust data flow without limitations on concurrency.
Azure supports Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and SaaS. Many government institutions across the world, as well as 95% of Fortune 500 companies, use Azure, ranging from industries such as healthcare and financial services to retail and manufacturing.
Microsoft is a technology conglomerate that has empowered many people to achieve more with less for decades with their software, tools, and platforms. Azure provides flexibility. Familiar Microsoft tools and infrastructures (such as SQL Server, Windows Server, Internet Information Services (IIS), and .NET) or tools such as MySQL, Linux, PHP, Python, Java, or any other open source technologies can all run on the Azure cloud. Gone are the days when you could only work on a walled-garden set of tools and technologies.
Azure provides you with various products and services, depending on your needs. You have the option of doing everything in a bespoke way, from managing your IaaS by spinning up Windows Server virtual machines with Enterprise SQL Server installed, to using a managed PaaS offering such as Azure Synapse Analytics.
Figure 1.2 shows the wide range of data-specific Azure tools and services that can be used to create end-to-end data pipelines:
Figure 1.2: Microsoft Azure data-related services
Azure grants you the flexibility to choose the best approach to solve a problem for yourself, rather than being forced to bend a less adaptable product to perform an unnatural function. You're not just limited to SQL Server, either. You also have the flexibility to choose other types of databases or storage, whether through a service installed on a Linux server or containerized solution, or a managed platform (such as Azure Cosmos DB for your Cassandra and MongoDB instances). This is very important because, in the real world, different scenarios require different solutions, tools, and products.
Microsoft Azure provides you with an end-to-end platform, from Azure Active Directory for managing your user identity and access to Azure IoT offerings (such as IoT Hub) for gathering data from hundreds and thousands of IoT devices. It also provides services such as development tools and cloud hosting options for getting your developers up to speed, as well as various analytics and machine learning tools that enable data scientists, data engineers, and data analysts to be more productive (more on this in Chapter 3, Processing and visualizing data).
The full spectrum of Azure services is too wide to cover here, so instead, this book will focus on the key data warehousing and business intelligence suite of products: Azure Data Lake, Azure Synapse Analytics, Power BI, and Azure Machine Learning.
Microsoft views security as the top priority. When it comes to data, privacy and security are non-negotiable; there will always be threats. Azure has the most advanced security and privacy features in the analytics space. Azure services support data protection through Virtual Networks (VNets) so that, even though they are in the cloud, data points cannot be accessed by the public internet. Only the users in the same VNet can communicate with each other. For web applications, you get a Web Application Firewall (WAF) provided by Azure Application Gateway, which ensures that only valid requests can get into your network.
With role-based access control (authorization), you can ensure that only those with the right roles, such as administrators, have access to specific components and the capabilities of different resources. Authentication, on the other hand, ensures that if you don't have the right credentials (such as passwords), you will not be able to access a resource. Authorization and authentication are built into various services and components of Microsoft Azure with the help of Azure Active Directory.
Azure also provides a service called Azure Key Vault. Key Vault allows you to safely store and manage secrets and passwords, create encryption keys, and manage certificates so that applications do not have direct access to private keys. By following this pattern with Key Vault, you do not have to hardcode your secrets and passwords in your source code and script repository.
Azure Synapse Analytics uses ML and AI to protect your data. In Azure SQL, Microsoft provides advanced data security to ensure that your data is protected. This includes understanding if your database has vulnerabilities, such as port numbers that are publicly available. These capabilities also allow you to be more compliant with various standards, such as General Data Protection Regulation (GDPR), by ensuring that customer data that are considered sensitive are classified. Azure SQL also recently announced their new features, row-level security (RLS) and column-level security (CLS), to control access to rows and columns in a database table, based on the user characteristics.
Microsoft invests at least $1 billion each year in the cybersecurity space, including the Azure platform. Azure holds various credentials and awards from independent assessment bodies, which means that you can trust Azure in all security aspects, from physical security (such that no unauthorized users can get physical access to data centers) to application-level security.
These are a few security features that you need to consider if you are maintaining your own data center.
Azure changed the industry by making data analytics cost-efficient. Before the mass adoption of cloud computing, in order to plan for data analytics with terabytes, or even petabytes, of data, you needed to properly plan things and ensure that you had the capital expenditure to do it. This would mean very high upfront infrastructure and professional services costs, just to get started. But with Azure, you can start small (many of the services have free tiers). You can scale your cloud resources effortlessly up or down, in or out, in minutes. Azure has democratized scaling capability by making it economically viable and accessible for everyone, as shown in Figure 1.3:
Figure 1.3: Microsoft Azure regions
Microsoft Azure currently has over 60 data center regions supporting over 140 countries. Some enterprises and business industries require that your data is hosted within the same country as business operations. With the availability of different data centers worldwide, it is easy for you to expand to other regions. This multi-region approach is also beneficial in terms of making your applications highly available.
The true power of the cloud is its elasticity. This allows you to not only scale resources up but also scale them down when necessary. In data science, this is very useful because data science entails variable workloads. When data scientists and engineers are analyzing a dataset, for instance, there is a need for more computation. Azure, through services such as Azure Machine Learning (more on this in Chapter 3, Processing and visualizing data), allows you to scale according to demand. Then, during off-peak times (such as weekends, and 7 PM to 7 AM on weekdays), when the scientists and engineers don't need the processing power to analyze data, you can scale down your resources so that you don't have to pay for running resources 24/7. Azure basically offers a pay-as-you-go or pay-for-what-you-use service.
Azure also provides a Service Level Agreement (SLA) for their services as their commitments to ensure uptime and connectivity for their production customers. If downtime or an incident occurs, they will apply service credits (rebates) to the resources that were affected. This will give you peace of mind as your application will always be available with a minimal amount of downtime.
There are different scaling approaches and patterns that Microsoft Azure provides:
Another advantage of your business being in the cloud is the availability of your services. With Azure, it is easier to make your infrastructure and resources geo-redundant—that is, available to multiple regions and data centers across the world. Say you want to expand your business from Australia to Canada. You can achieve that by making your SQL Server geo-redundant so that Canadian users do not need to query against the application and database instance in Australia.
Azure, despite being a collective suite of products and service offerings, does not force you to go "all in." This means that you can start by implementing a hybrid architecture of combined on-premises data centers and cloud (Azure). There are different approaches and technologies involved in a hybrid solution, such as using Virtual Private Networks (VPNs) and Azure ExpressRoute, if you need dedicated access.
With Azure Synapse Analytics, through data integrations, Azure allows you to get a snapshot of data sources from your on-premises SQL Server. The same concept applies when you have other data sources from other cloud providers or SaaS products; you have the flexibility to get a copy of that data to your Azure data lake. This flexibility is highly convenient because it does not put you in a vendor lock-in position where you need to do a full migration.
Change the font size
Change margin width
Change background colour