Takeaways from Microsoft Ignite 2021 — Azure Data and Artificial Intelligence

Dinesh Kumar P
7 min readDec 11, 2021

In 2021’s Microsoft Ignite, 90 new services and updates have been introduced! This post is a summary of Questions from folks in a couple of meetups held to discuss the product announcements especially in Data, Analytics and Governance space.

Azure Data

1. How does ‘Azure Synapse Pathway’ accelerates the migration process from other data warehouses into Azure Synapse?

Azure Synapse Pathway automates the SQL code translation from your existing data warehouse (Like Snowflake, BigQuery) into Synapse. Earlier, to translate such critical production running SQL code, backend engineers have to manually rewrite it;

Azure Synapse Pathway significantly reduced migration costs and accelerated time of translating 1000s of SQL code lines from months to minutes. It basically translates Data Definition Language (DDL) and Data Manipulation Language (DML) statements into T-SQL compliant language that is compatible with Azure Synapse SQL.

Illustration that explains — Azure Synapse Pathway code translation lifecycle
Reduce migration cost and auto-translates other SQL definitions into Azure Synapse SQL

2. How ‘noisy neighour’ problem is addressed in Azure Stream Analytics?

Azure Stream Analytics Dedicated now provides single tenant hosting for increased reliability with no noise from other tenants. Customer resources are “isolated” and perform better during bursts in traffic.

Earlier, all jobs from different customers were run in a same multi-tenant environment. When some big jobs consume majority of available resources it caused network performance issues for other jobs submitted.

Now, via single-tenant hosting with no noise from other tenants, resources allocated to that tenant are truly “isolated”.

When you build a service to be shared by multiple customers or tenants, you can build it to be multitenanted. A benefit of multitenant systems is that resources can be pooled and shared among tenants. The noisy neighbor problem occurs when one tenant’s performance is degraded because of the activities of another tenant.
Noisy neighbour problem that might occur over the total available capacity in a multi-tenant environment

3. Can you talk about the barrier that exist between ‘OLTP’ and ‘OLAP’ systems and how does Azure breaks that barrier and make our life easy?

For any product or a business application — database is essential for 2 purposes. One to take care of transactional queries reads and writes, and the other to handle analytical queries mostly only reads.

OLTP capability takes care of business — i.e. the product; that's why it is called as Transaction Processing. Similarly OLAP capability takes care of analytics —i.e. the product’s analytics needs; hence it is called as Analytical Processing.

At small scale — same database will take care of all needs. But beyond a scale a single database cannot handle both OLTP and OLAP needs . Thats the time a new database is introduced in the product’s architecture to handle OLAP.

  • Azure Cosmos DB stores in row-based format by default. Additionally it saves data as column store format within Cosmos DB itself in a span of 2 minutes via auto-sync. Column store format allows to service a level of analytics workloads.
  • Further to run near real-time complex analytics and Spark based machine learning pipelines, you need a strong backend engine like Azue Synapse. Azure Synapse Link creates a tight seamless integration between Azure Cosmos DB and Azure Synapse Analytics.

You can see that Manual ETL is avoided to sync between Cosmos and Synapse. A cloud-native hybrid transactional and analytical processing (HTAP) new capability is leveraged by Azure Synapse Link.

Architecture diagram for Azure Synapse Analytics integration with Azure Cosmos DB via Azure Synapse Link

4. Can you talk about new features released within ‘Azure Cosmos DB’?

a. Partial document update — Instead of fetch and replace a whole document, now you can define the specific updates you want to make. This allows to do path level updates to a specific fields/properties in a Azure Cosmos DB document. It reduces network bandwidth usage and saves a reasonable cost for users.

b. Continuous Backup and Point-in-Time — Provides ongoing backups and enables customers to recover and restore data from any point within the past 30 days.

c. Role-based access control (RBAC) — With Azure Active Directory (Azure AD) integration enables customers to assign “roles” to users and applications, which provides a granular, well-defined way to control data accessed by users and applications.

5. What are the new sources upon which ‘Azure Purview’ can scan & classify data residing in them?

Azure Purview is a unified data governance solution. Create a holistic, up-to-date map of your data landscape with automated data discovery, sensitive data classification as Collections.

Now, customers can also discover and govern data across their serverless and dedicated SQL pools in Azure Synapse workspaces. And note that already AWS S3, SAP and Oracle databases are data sources in preview upon which customers can purvew it.

Multi cloud data sources governence

6. What is the advantage of having a managed instance for Apache Cassandra cluster?

It provides automated deployment and scaling operations for managed use of Apache Cassandra data center and cluster. This also helps you accelerate hybrid scenarios and reduce ongoing maintenance. Few advantages to name,

  • Deployment itself made simple with only a few button clicks.
  • Scaling of nodes will be fully managed and simple.
  • Apart from default Metric collector, you can also integrate with Azure Monitor for health monitoring and diagnostic analysis.
  • Hybrid deployments also possible via Azure ExpressRoute. (On-premises or other cloud environments).
  • In short, cloud’s own advantage i.e. pay-as-you-go that incurrs cost for nodes only based workload.
Managed service for Apache Cassandra

Azure AI

7. Can you talk about the personalized search experience and its improvements in Azure Cognitive Search?

Semantic (intent) based search leverages some of the most advanced natural language models to improve the relevance and ranking of search results.
In general it will be keyword-based search, which is the industry norm too. With semantic search capability, you can see the search results over any format of your data based on intent;

Surface the most relevant results for your users based on search intent and not just keywords

8. What feature helps avoiding manual data entry especially for online banking and hotel industry?

The new feature is rolled out in Form Recognizer, an Azure Cognitive Service. Now it supports pre-built identification documents (IDs) and invoice extraction, plus the ability to read data in 64 additional languages (total — 73 languages now).

In services like online banking transactions and hotel registration, manual data entry is reduced via automatic extraction of data from documents like passport, driver’s license etc.

Capability to extract text, key-value pairs, and tables from documents helps data extraction from invoices in industries like Procurement and Supplychain as well.

Custom examples were extracted using a custom model trained with five PDF files of each form type

Power Platform

9. Microsoft has introduced Microsoft Power Fx as the low-code programming language for everyone. Why does a low-code platform need a language?

Microsoft Power Fx is a general-purpose programming language based on spreadsheet-like formulas especially useful for Citizen Data Scientists.
Coming to the question, why low code need a programming language? Actually, point-and-click tools are great for quickly assembling experiences and workflows, but many real-world solutions need a layer of logic that goes beyond what is practical to drag and drop. E.g. Show a list of customers who signed up in the last 7 days within 15 miles of this location.

A simple example of code written that makes avatars move based on slider

10. Can you talk about the new Power BI Premium architecture and features now in preview?

Really say, the team has unveiled the next gen architecture in Power BI. It has simplified the management of premium capacities, and reduce management overhead including controlling the auto-scaling.

Earlier, when resources become over-allocated, catastrophic errors can occur. Services that required full isolation like Paginated reports were not be able to be provisioned.

Now in Gen-2 it is possible. Even though not dedicated, as resources are drawn from a massive pool as needed, the performance level is guaranteed and is focused on CPU cycles. With this new architecture, full isolation like paginated reports is also possible, memory is not even a consideration.

Overall,

Azure always help businesses stay nimble in an increasingly complex market. For a consolidated view of release updates in Microsoft Ignite checkout — https://news.microsoft.com/ignite-march-2021-book-of-news/

--

--

Dinesh Kumar P

Product @Kissflow | Microsoft MVP - Data Platform | Low code & No code passionate