A platform company at its core, Microsoft is less concerned with migrating monolithic applications and instead is focused on building a complete data integration and management layer to capture value-add workloads that tie into said applications, all while maximizing clients’ underlying Azure infrastructure usage. To replicate this approach for the AI era, Microsoft has spent years integrating its various data services, from Synapse to Power BI, to automate customers’ entire data pipelines and prepare them for AI adoption. The result is Microsoft Fabric, a new end-to-end SaaS-like data platform that could help Microsoft reach new audiences and spur Azure growth in the continued race for cloud and AI dominance.
Microsoft Is Investing in Data Cloud to Support its GenAI Strategy
What Is Microsoft Fabric?
Simply put, Microsoft Fabric is a unified data platform comprising seven core Azure data services: Data Factory, Synapse Data Engineering, Synapse Data Warehouse, Synapse Real Time Analytics, Power BI and Data Activator. While Microsoft Fabric makes it easier for customers to connect to different personas within an organization, from data engineers to business analysts, the hallmark of the new service is its simplified pricing model, which charges customers based on the total amount of IaaS resources consumed, rather than the compute and storage for each individual Azure data service.
When we interview enterprise buyers, we continue to find that consolidating point solutions in favor of complete, integrated platforms is a common trend, and Fabric is bound to resonate with customers trying to control runaway cloud costs in a still widely uncertain economy.
The other key defining attribute of Microsoft Fabric is the underlying architecture it is built on, OneLake. Microsoft Fabric is based on a repository that allows customers to query data on not just MySQL databases but also object storage, as is customary in the data lake architecture.
With OneLake, we see Microsoft moving squarely into the data lake space. Given the symbiotic relationship between data lakes, which are designed for unstructured data, and generative AI (GenAI), OneLake is Microsoft’s under-the-hood way of ensuring that customers can easily load data from multiple sources, put it through the Fabric platform for data management and visualization, and build GenAI applications.
Altogether, the unification of Microsoft OneLake and Fabric is the right step for Microsoft and exemplifies how far the company has been willing to go to execute its AI-based growth strategy.
Fabric Will Help Microsoft Change the PaaS Landscape but Not Without Infringing on Partners
As highlighted in TBR’s 3Q23 Cloud Data Services Market Landscape, Amazon Web Services (AWS) is the clear leader in the cloud data warehouse market, with Microsoft falling squarely in second place and not significantly ahead of Google Cloud and Snowflake. Azure Synapse has not gained the same level of interest and traction in the market as AWS and Google Cloud’s BigQuery. As a result, Microsoft partnered with Databricks in 2017, developing and delivering the first-party Azure Databricks service.
Partnering with Databricks to ensure customers have an effective data analytics platform natively available on Azure rather than Synapse was a strategic move. With Fabric, however, we now see Microsoft essentially re-delivering Synapse as part of a more complete product that gets to the heart of what customers want: an end-to-end set of capabilities that automate entire data pipelines from data collection and ingestion up to analytics and visualization.
This approach should bring Synapse into more client conversations while helping Microsoft expand its reach outside the analytics department. This, of course, raises the question: What becomes of Microsoft’s partnership with Databricks? As part of OneLake, the architecture underpinning Fabric, Microsoft is leveraging Delta Lake — Databricks’ protocol for storing data in an open table format — and this move could persuade Databricks customers to adopt Fabric.
Even still, Microsoft OneLake adopts the data lakehouse architecture pioneered by Databricks, and with Fabric’s feature-rich set of upper-stack capabilities, customers may be more inclined to go all in with Microsoft Fabric and its comprehensive pricing model, which would bring a new layer of competition to the Microsoft-Databricks relationship.
This trend is indicative of what we are seeing across the cloud landscape. The hyperscalers, even those perceived as more partner friendly, are expanding into new areas of the cloud stack, posing potential risks to their partners, especially as customers continue to indicate their interest in consolidating point solutions.
That said, coopetition is nothing new in the cloud landscape, and vendors are getting more adept at navigating competitive differences to deliver outcome-specific solutions to their joint customers.
Perhaps the best example is the relationship between AWS and Snowflake, which are both spending millions of dollars to get legacy data warehouse customers to Snowflake’s platform on AWS. While AWS would naturally prefer customers adopt its own data warehouse service — Redshift — over Snowflake, AWS has realized the trade-off of forfeiting some Redshift customers to Snowflake as long as those customers are running on AWS infrastructure.
Microsoft Fabric is much broader than the data warehouse, but if AWS and Snowflake are a barometer of a successful partnership, Microsoft and Databricks will similarly learn to overcome these obstacles.
With Fabric, we expect Microsoft will slowly chip away at AWS’ share and potentially Snowflake’s and Databricks’ in the coming years. However, it is important to note we do not see Fabric as any kind of direct threat to pure play data cloud platforms, particularly Snowflake, which has the established presence and reputation in the data warehouse space specifically, not to mention easy inroads into AWS’ customer base.
In our talks with enterprise buyers, we often find customers value Snowflake as it allows them to run separate workloads as part of a shared data layer that is not tied to any specific cloud infrastructure. Despite the multicloud capabilities in OneLake, nothing changes the fact that the core data warehousing capabilities within Synapse are still built specifically for Azure infrastructure for the seamless integration with other Azure services.
We have no doubt Fabric will be attractive to Microsoft-centric shops, but attracting customers invested with other cloud providers may be a more difficult feat, solidifying Snowflake’s and Databricks’ unique value propositions.
Data Lakes and GenAI Go Hand in Hand, and Microsoft Wants to be the First Hyperscaler Strongly Associated with the Architecture
One other interesting consideration with Fabric is Microsoft’s choice of open table format. Considering its partnership with Databricks, Microsoft has opted for Delta Lake, although it plans to add external support for two other popular frameworks: Apache Iceberg and Hudi.
In general, for customers that want to build a data lake, Delta Lake is the preferred format while Apache Iceberg is more aligned with data warehouses. Defaulting to Delta Lake reflects Microsoft’s intent to remain relevant with Databricks customers, while allowing customers to query data on object storage (Amazon S3 and eventually Google Cloud Storage) reflects Microsoft’s commitment to the data lake architecture.
Due to data lakes’ ability to combine both structured and unstructured data for prescriptive analytics use cases, they are becoming increasingly popular and, in some scenarios, offer customers a way to bypass data warehouse operations altogether. GenAI, which relies on unstructured data sources, such as documents or images, will fuel customers’ desire to consolidate data warehouses into data lakes, leading us to believe that Databricks is in a strong position despite Microsoft’s Fabric announcement.
This is also one of the reasons why Snowflake is trying to add more features that support unstructured and semistructured data in hopes of changing its perception in the market from a data warehouse company to a data lake company.
The hyperscalers, however, have been arguably behind in their data lake services and messaging, and with OneLake, Microsoft wants to make sure it is the hyperscaler most strongly associated with data lakes and by default, GenAI.
GenAI Enablement Sits at the Heart of Microsoft’s PaaS Strategy
Considering Microsoft has arguably made the biggest splash in generative AI, the company’s latest PaaS developments come as no surprise. As TBR discussed in our 2Q23 Cloud Ecosystems Market Landscape, a large language model (LLM) is only as good as the data that goes inside, which means the ability to establish a centralized, single source of truth is very important for an enterprise pursuing a serious generative AI strategy.
OneLake’s ability to provide an enterprisewide repository and a no-code API to manage data will help the company address this need, and the GenAI tools embedded within Fabric will help accelerate the transition to unified data pipelines.
Mostly in preview today, there are three Copilot solutions embedded within Fabric: Copilot for Data Science and Data Engineering, Copilot for Data Factory, and Copilot for Power BI. Broadly, the Copilot solutions in Microsoft Fabric enable code generation capable of automating routine tasks and expediting the transformation from raw data to structured, which is what LLMs hunger for.
The integrations built over the years between Microsoft’s platform assets and its application portfolios ensure there is plenty of raw data entering Fabric, which, as it becomes structured, presents an ideal environment for enterprises to pursue custom GenAI development. This is where the Azure OpenAI Service enters the conversation.
While the Copilot solutions offered by Microsoft provide quick-and-easy access to GenAI capabilities, true transformational value will be unlocked as enterprises build their own GenAI applications around their proprietary data and business processes, presenting a large opportunity for Microsoft.
The Azure OpenAI service has been enabling customers to train LLMs on their proprietary data since it became generally available in January, and, at Ignite 2023, Microsoft took another step forward with the public preview launch of Azure AI Studio. A new addition to the Azure OpenAI service, Azure AI Studio brings together developer tools like Azure AI SDK with the company’s growing catalog of foundation models to enable customers to build their own copilots and other generative AI applications.
As more enterprises pursue custom GenAI development, the unified approach to data management offered by Microsoft Fabric and OneLake will become more valuable, drawing interest from enterprises with large Microsoft footprints, yet coopetition at the data layer will remain the standard.
Ultimately, Microsoft’s priority is ensuring all data can be easily fed into its foundation model service, so integrations that connect the Azure OpenAI Service with third-party data leaders like Snowflake and Databricks will prove to be popular alternatives to Microsoft’s end-to-end approach.
Microsoft Is Not Just after the Data Layer: The Race for Hybrid Cloud Control Plane Continues as Azure Arc Reaches 21,000 Customers
Throughout this report, we have touched on Microsoft’s pursuit of the data layer, but it is important to note that Microsoft’s PaaS capabilities are much broader and extend closer to the box. Owing to Windows Server, Microsoft has captured a significant portion of the enterprise OS layer, allowing the company to effectively move into the multicloud control plane, which Microsoft calls Azure Arc.
Best thought of as an abstraction layer that stiches together infrastructure assets for capabilities like monitoring, provisioning and observability, all while securing the OS instance, Azure Arc has amassed 21,000 customers in the span of four years.
In recent quarters we have seen Microsoft become increasingly transparent in its customer reporting. For instance, in 2Q23 and 3Q23 Azure Arc customer count grew 150% and 140% year-to-year, respectively, putting the customer count at just 7,200 in 2Q22. This is much lower than the 21,000 customers announced in 3Q23 and indicates vast interest from Microsoft’s install base of customers trying to bridge the gap between the cloud and legacy data center.
Another factor driving the platform’s success is Microsoft’s early support for both virtual machines (VMs) and Kubernetes. This approach contrasts with Google Cloud, whose primary goal is getting customers to move away from VMs and use containers. In other words, Google Cloud wants customers to use GKE (Google Kubernetes Engine) on premises to containerize a VM and keep it there, but also wants customers to build net-new, cloud-native apps in containers.
Google Cloud did launch Anthos for VMs in 2021, which we viewed as a direct counterattack to Azure Arc, albeit not a very effective one, as Anthos’ customer count is comparatively low and could suggest the company has not been as adept at tapping into the VMware customer base and attracting enterprises that are not ready to migrate VMs.
We will continue to monitor Azure Arc’s growing customer count in the coming quarters, and it will be interesting to see if Microsoft begins to leverage Fabric to support other managed data services outside Azure SQL via Arc to turn the hybrid platform a more complete, centralized management layer.