We’ll explore the services in Azure to help you analyze your data, across both structured and unstructured data. Azure can help you whether you typically work structured tabular data, if you’re looking to reason over large or complex unstructured big data coming from devices, services, and applications they require a more sophisticated level of processing and scale beyond traditional data warehousing.
Azure Essentials: Data Analytics
Azure has a comprehensive set of services to ingest, store and analyze data of almost all types of scales, spanning table, file, streaming, and other data types. The Azure platform provides tools across the data analytics lifecycle. This allows you to ingest data into Azure, using robust services for batch ingestion or real-time ingestion so that you can capture events as they have been generated from your devices and services.
Store structured or unstructured data globally on a virtually unlimited scale. Train and prepare your data and data stores to derive insights and create predictive and prescriptive models on your data using machine learning and deep learning techniques. Furthermore, you can extend these capabilities to the real-time processing of streaming or log data.
Automated AI & ML Analysis
You can even leverage artificial intelligence or AI with machine learning and cognitive services for automated machine analysis. And finally, you can serve and publish this analyzed data to an operational or analytical store to help with visualizing as part reports and dashboards. Your apps can also leverage these data directly and securely while meeting your performance needs.
Let’s walk through these services in a bit more detail.
The first step in data analysis is connecting disparate datasets from multiple sources and ingesting them into Azure. Your data might originate in your data center in cloud services or span both. Now for batch ingestion of your data, your data factory is the primary service that you’ll want to use. This is an ingestion, orchestration, and scheduling service. And it determines what happens when certain events occur and which engines to use to analyze and optimally process your data. It allows you to create sophisticated data pipelines right from ingestion of the data to enter the processing, storing, and then making it available for your end-users and apps to tap into.
There are other data movement capabilities in Azure too. If you got a massive one-time upload, you may wanna use the Azure import-export service to manage the bulk loading of large data sets into Azure Blob storage and Azure files by shipping drives to an Azure data center. If your structured data, the Azure data migration service migrates data from on-premises structure databases directly into Azure maintaining the same relational structures leveraged by your current apps.
Ingesting Real-Time Data Streams
Azure also has engines for ingesting real-time data streams. Now, these engines are capable of ingesting data at a fast pace and catering to your processing needs down the line. Azure event hubs enable large-scale telemetry and event ingestion with durable buffering and low latency from millions of devices and events.
Azure IoT Hub is a device-to-cloud telemetry data service to track and understand the state of your devices and assets. And if you’ve got custom operations to perform and you want to scale out your ingestion engines with custom logic, Azure also supports the open-source Apache Kafka in HD insight as a managed high through port low latency service for real-time data. And of course, you can use the Azure CLI or command-line interface to programmatically target and ingest multiple data formats into Azure. If you’re a developer, APIs can record using Azure Software Development Kit or SDK to bring in your data.
All the tools and services we just described can bring data into Azure and as you plan how you ingest data, you’ll also plan for where and how the data will be stored in Azure. Azure Blob storage can store massive datasets irrespective of their structure or the lack of it and keep them ready for analysis, including video, images, scientific datasets, and more. And as a managed service, you don’t need to worry about the knobs and dials it just takes care of itself.
If you’re got particularly demanding analytical throughput requirements, or you have huge file sizes that you need to be optimized for analysis, you want a specialized big data store. Azure data lake store can serve that purpose. It lets you analyze all your data whether structured and unstructured with very high throughput, generally desired by analytic exceptions. It can store trillions of files and a single file can be large in one-petabyte size.
Now for operational and transactional data in structured or relational form, you can use Azure SQL DB. This works like SQL Server but as an Azure service. So you don’t need to worry about managing or scaling your host infrastructure. Of course, you can keep existing database apps and hosted Windows, all in its based virtual machines. For analytical data that’s been aggregated over the years, Azure SQL Data Warehouse provides an elastic petabyte-scale service that lets you dynamically scale your data either on-premises or in Azure.
For no SQL capabilities, if you’re bringing in data that’s a scheme or agnostic, Azure Cosmos DB is a turnkey, globally distributed, no SQL DB service, it allows you to use key-value, graph, document data together with multiple consistency levels to cater to your app requirements. Whatever the need, Azure has an optimal store for you.
Interestingly all these stores integrate seamlessly to the analytics engines as sources of data. With your data now stored in Azure, there are many analytics options for training and preparing your data, spanning super scalable and involved approaches to data engineering to automated machine analytics on serverless infrastructure.
Open-Source Analytic Capabilities
Azure Databricks is an optimized Apache Spark-based analytics cluster service offering the best of Spark with collaborative notebooks and enterprise features. It integrates with the Azure Active Directory and we also give you the native connectors to bring in other Azure data services.
Azure Databricks is your hub of Spark-based analytics whether it’s batch, streaming, or machine learning. Also, we’ve got HD insight a managed cluster service for a variety of open source big data analytics workloads. Helps you clean, curate, process and transform your data in addition to scaling your machine learning workloads. Using HD insight you can create scale-out clusters for Hadoop, Spark, Hive, Hbase, Store, and Microsoft R servers without the need to monitor and administer the underlying infrastructure.
The scale-out compute engine like traditional SQL infrastructure, data lake analytics actually develop and run large-scale, parallel data transformation and processing programs in U-SQL over petabytes of data from your data load. You can even leverage the familiarity and extensibility of U-SQL to scale your machine learning models from R or Python, to work against massive amounts of data. Most importantly, it’s a serverless environment. So your request and leverage computes resources on per-query basis. You don’t have to worry about maintaining large clusters which makes scaling and parallel execution easy.
Azure also has engines for processing real-time data streams now to analyze data logged in real-time from devices, sensors, and more, Azure stream analytics offers a powerful event processing engine that together with event hubs allows you to ingest millions of events and find patterns, detect anomalies, power dashboards, or automate event-driven actions in real-time with the simplicity and familiarity of a SQL-like language to process real-time streams. Azure HD Insight and Azure Databricks also allow you to leverage streaming capabilities within the scale-out processing engines. Like structured streaming in Spark. For more advanced analytics, Azure machine learning and Microsoft machine learning server provide you the infrastructure and tools to analyze data, create high-quality data models, train, and orchestrate machine learning as you build intelligent apps and services.
In addition to these tools, scale-out cluster technologies like Azure Databricks also allow scalable machine learning with Spark ML and Deep learning libraries.
Beyond this, we’ve also built a number of first-level AI services called cognitive services providing prebuilt intelligent services for vision, speech, text, understanding, and interpreting. Finally, once you’ve been able to analyze and derive insights from this data you’d wanna serve this enriched data to your users.
Within Azure, the best destination for all this analyzed data is the Azure SQL data warehouse where you can now combine new insights with historical trends and drive a targeted conversation by maintaining one version of data for your organization. Azure SQL data warehouse not only supports seamless connectivity to the analytics tools and services it also integrates well with business intelligence tools.
For example, Azure analysis services and Power BI provide powerful options to find and share further data insights. If the analyzed data contains insights valuable to your end consumers, these can be populated into the operational stores like Azure SQL DB and Azure Cosmos DB so that web and app experiences would be augmented by those insights.
You can even pipe data directly to your apps with Azure platform tools for developers, including Visual Studio, Azure machine learning workbench, or custom serverless apps and services using Azure functions.
With Azure, you can ensure that data are consumed only by intended users and groups securely authenticating by Azure Active directory. While network performance SLAs and privacy requirements were met using Azure ExpressRoute.
You can even hold the keys to your data once it’s in the cloud with Azure key management services. So now as an overview of the key services in Azure that comprise the data analytic life-cycle.