What is Azure Synapse?
Put simply, Synapse is a service in Azure that is meant to take care of all of your enterprise analytics needs in one single place. It’s a radical goal when you consider that the analytics system in many companies is generally scattered around the organization, often resulting in siloed and clunky architecture.
A comprehensive Data Analytics platform like Synapse aims at bringing a thorough democratization, coherence, and governance to your enterprise analytics data estate by providing a unified platform and limitless Analytics-as-a-Service (AaaS).
How does Azure Synapse Analytics work?
Let’s take a look at each feature individually.
All-in-one platform
Azure Synapse Analytics can be best described as a “one stop shop” for all enterprise analytics. Whether you’re extracting data from a data lake, transforming it on a spark cluster, prepping it up for a DW consumption in a SQL pool, engineering it for a machine learning use case, or serving it in a Power BI dashboard, you can do it all (and more) using Synapse.
Starting with SQL, it provides SQL serverless and dedicated pool options. This means that you can support both data lake and data warehouse use cases and choose the most cost-effective pricing option for each workload.
The data lake integration brings together relational and nonrelational data and can easily query files with the same service you use to build data warehousing solutions.
Support for Cloud native hybrid transactional and analytical processing (HTAP) enables you to get data from all types of databases – relational, NoSQL – in a single click.
Deep integration of Azure Machine Learning and Azure Cognitive Services facilitates a seamless Artificial Intelligence and Business Intelligence experience. There is a broad spectrum of choices in regards to which languages to pick from – Python, T-sql, Scala, .Net, etc.
Finally, you can use Azure monitoring service with Synapse to log and manage critical performance metrics.
SQL on-demand
Synapse provides a self-managed SQL serverless pool that can be utilized for analyzing data immediately, at low-cost, using familiar tools/languages. The cost is based on the data processed by your query.
The serverless pool auto scales as per the query load. There is no charge for resources reserved, so you are only being charged for the data processed by queries you fire. Hence this model is a true pay-per-use model.
Needless to say, there is no need for any infrastructure set up or maintenance. You can access data from ADLS Gen2, Spark Tables, and Cosmos DB using T-SQL.
Synapse for Data Warehousing
Synapse was formerly known as SQL Data Warehouse (DW), a cloud-based Platform-as-a-Service (PaaS) offering large-scale, distributed, massively parallel processing (MPP) relational database technology in the same class of competitors as Amazon Redshift or Snowflake.
Azure SQL DW has been a great platform for high volume analytic workloads. A great thing about Synapse Analytics is not only does it retain Azure SQL DW as its heart, but also provides a greater capability in terms of data ingestion, processing, and serving.
Big Data with Apache Spark runtime
Apache Spark is known as the Big Data platform that killed Hadoop. With its large-scale SQL, batch processing, stream processing, and machine learning offerings, Apache Spark has emerged as undisputed leader in its area.
Interestingly, Azure already provides the latest versions of Apache Spark with open-source libraries (in the form of Azure Databricks) as a service. An obvious question you may ask here is “why do you need a new spark service when you already have Databricks in Azure?”
Here is quick comparison.
Both can access and leverage data lake and delta lake.
Synapse has open-source version of delta lake whereas Databricks has propriety Databricks delta lake, which is built on the top of open source.
Both platforms can be used for machine learning apps
Again, Synapse has inbuilt support for AzureML and MLflow. Databricks has broader support for support for popular libraries like Tensorflow, Keras, PyTorch, etc. but Synapse is rapidly catching up and is expected to accommodate these sooner than later.
Both can do SQL analysis and Data warehousing
However, Azure Synapse is a hands down winner in this space as it provides a full T-SQL based relational model experience. With Databricks you’re limited to Spark SQL, with which most BI/DW professionals are (still) not comfortable.
Dashboards/Reports/BI
Synapse is, again, seen as the preferred choice here. Power BI is now integrated into Synapse studio and can be used to develop and deploy the BI dashboards. Not to mention, the autoscaling option is available along with fixed-at-max utilization of nodes.
Integration with Power BI
Power BI has emerged as one of the most popular visualization tools in last few years. Its ease of connectivity, performance, cost, governance and security, and visualization capabilities make it a leader in the BI space.
Power BI is now tightly integrated with Azure Synapse Studio which means you can basically create, modify, and publish your reports from within Synapse.
Power BI reports can connect to either SQL serverless pool or dedicated SQL DW. So, as to why you should choose Synapse+Power BI combination, it has a lower risk and lower monetary cost for implementation, deployment, and maintenance.
Stream Analytics
The need, and therefore importance, of “real time” analytics is growing everywhere. As a turnkey capability built right into it, Synapse allows streaming data from Event Hubs, IOT Hubs, Kafka, or Azure storage with egress capacity of up to 200 Mbps.
Post-streaming, you can process the streamed data with your business logic and aggregations through native T-SQL queries. All within Synapse engine.
If you need, you can use the ML capabilities built-into the SQL engine to do real-time scoring.
Azure Synapse link for Dataverse
For those who don’t know what Dataverse is, it was formerly known as Common Data Service, and is essentially a storage for data from Microsoft-based business services. Microsoft applications like D365 Sales, Customer Service, and Talent store data in Dataverse tables.
Azure has recently announced a direct link from Synapse to Dataverse, which makes it rather straightforward to get the data from these applications into the Synapse environment and perform SQL analysis, python-based processing for AI and ML, and Power BI visualizations right away. [As of Nov ‘21, this is still in Public Preview.]
A unified workspace
With Azure Synapse Studio, a developer can perform data integration, data exploration, data warehousing, big data analytics, and machine learning tasks from a single, unified environment.
Security and Privacy
Azure Synapse offers cutting-edge security and privacy that includes always-on data encryption, a dynamic, real-time data masking, automated threat detection, authentication through single-sign-on and Azure Active Directory.
The platform also includes access control features like column-level security and native row-level security for additional protection and privacy within your team.
In terms of compliance, Azure offers more certifications than any other cloud provider to ensure that your data collection and data use practices comply with industry-specific, regional, state, and national compliance standards.
Why use Azure Synapse Analytics?
As Microsoft puts it, Azure Synapse Analytics is a limitless analytics service that brings together data integration, enterprise data warehousing and big data analytics. It gives you the freedom to query data on your terms, using either serverless or dedicated resources — at scale.
To learn more about how to properly measure your own enterprise data, reach out to our experts.