1H24 Cloud Data Services Market Landscape
TBR Spotlight Reports represent an excerpt of TBR’s full subscription research. Full reports and the complete data sets that underpin benchmarks, market forecasts and ecosystem reports are available as part of TBR’s subscription service. Click here to receive all new Spotlight Reports in your inbox.
Vendors lead with the data lake architecture and emerging frameworks to sell a message of data intelligence amid rampant GenAI adoption
The race for Apache Iceberg mindshare is on
Data lakes remain a valuable way for enterprises to simultaneously store structured and unstructured data, particularly as the latter increases due to generative AI (GenAI) and large language models (LLMs). Data lakes are also directly attributable to the rising popularity of Apache Iceberg, an open-source format regarded by developers for its ability to store data in tables and freely move that data across any data lake architecture.
Whether a customer is creating their own data lake (e.g., on Amazon Web Services [AWS]) or deploying a data lake platform as a product (e.g., Databricks), Iceberg is playing an increasingly larger role in helping customers navigate their big data estates with the most limited vendor lock-in.
How the two data lake giants — Snowflake and Databricks — are investing best speaks to the budding role of Apache Iceberg and its growing community. Earlier this year Snowflake adopted Apache Iceberg as the native format for its platform and subsequently launched Polaris, a tool that allows customers to catalog that data stored in Iceberg tables.
In only a matter of days, Databricks, which was born out of Delta Lake, an Apache Iceberg alternative, moved into the space with its acquisition of Tabular. Tabular was created by the founders of Apache Iceberg, marginalizing Snowflake’s recent investments and intent to attract more Iceberg-heavy users, which generally include digital and cloud-native companies. The hyperscalers, primarily AWS and Microsoft, work closely with Snowflake and Databricks and benefit from their respective integrations to boost interoperability for joint customers through Iceberg.
For example, Microsoft announced its data platform Fabric, which is based on a data lake architecture (OneLake), will support Iceberg via Snowflake. This is a major win for Snowflake that elevates the company’s role as an ISV partner in the Microsoft Fabric ecosystem and further challenges Databricks, which due to its native first-party integration with Azure, has always had a rich and unique relationship with Microsoft.
A select number of vendors are leading the shift to data intelligence
Though somewhat influenced by a degree of marketing hype vendors use to differentiate themselves, data intelligence has become an emerging topic in the market, led by GenAI. At its core data intelligence refers to the use of AI on data to deliver insights tailored to the business, but the other core component of data intelligence is the underlying data architecture foundation.
Databricks is largely associated with formalizing the concept of data intelligence and even markets its platform as the Data Intelligence Platform to convey the value of having both the data lakehouse architecture and the AI components (in Databrick’s case, Mosaic AI) that allow customers to build, train and fine-tune models. Other vendors have similarly adapted their messaging around data intelligence.
For example, as part of what it now calls its Data Intelligence vision, Oracle Analytics announced Intelligent Data Lake, a reworking of existing OCI (Oracle Cloud Infrastructure) services like cataloging and integration, to create a single abstraction layer that will support both Apache Iceberg and Delta Lake formats.
Hyperscalers are taking different approaches to address the symbiotic relationship between data architecture and AI
Microsoft and Google Cloud are integrating and productizing their data services as complete solutions, exposing a lack of maturity in AWS’ fragmented approach
Microsoft made a big move when it launched Fabric, which essentially integrates seven disparate Azure data services — from data warehousing up to analytics — as part of a single platform underpinned by a unified data lake. Today, Fabric has amassed over 14,000 paid customers and a growing ecosystem of global systems integrators (GSIs) and ISVs building and selling applications on top of the platform.
Google Cloud, which has always had a strong play in data analytics, is trying to better unify key data and analytics capabilities in BigQuery to deliver a more complete, single-product experience. This includes BigLake, Google Cloud’s storage abstraction layer and services like Dataplex, so customers can apply governance tasks like lineage and profiling in Dataplex without having to leave the BigQuery interface.
Though Google Cloud’s approach may lack the level of integration compared to Microsoft Fabric, it is clear to see the direction the company is heading to help customers simplify their data estates, and ultimately capture more analytics and AI workloads.
AWS’ approach is different. Though offering the broadest set of data tools and services, from storage and ingestion up to governance, AWS is still lacking the platform mindset and strategy of its peers.
To be fair, the company has been working to better integrate services within its own ecosystem by improving data sharing between the operational database and the data warehouse (e.g., “zero-ETL” integration between Aurora and Redshift), but customers continue to stress that they have to take on more burden in the back end when crafting a data architecture on AWS.
This dynamic only reinforces the importance of AWS’ partnerships with complete data cloud platforms like Snowflake and Databricks, but of course Microsoft is also making sure it keeps these companies elevated within the Fabric ecosystem.
The GSIs are playing a prominent role in multiple facets of data, which could speak to maturing ecosystems and hyperscalers’ efforts to productize the entire data life cycle
Customers indicated that the GSIs play a prominent role in all aspects of the data strategy from change management to data architecture to governance. Just 12% of respondents say the GSIs were involved in their analytics stack, but this seemingly low percentage could be for many different reasons.
First, establishing the data architecture, or re-architecting disparate IT assets, such as data warehouses, is top of mind for many customers right now as they recognize it is a necessary step in GenAI deployment.
Secondly, the hyperscalers and pure play data platform companies are becoming more adept at delivering integrated solutions that deliver upper-stack capabilities, such as analytics based on a holistic data lake architecture. Microsoft Fabric, which has a growing ecosystem of both GSI and ISV partners, is a top example.
TBR’s newly launched Voice of the Partner Ecosystem Report found that cloud providers expect data strategy and management to be the biggest growth area coming from partners over the next two years. In fact, data strategy and management ranked higher than GenAI on its own, which is telling of what the cloud providers expect from their partners.
Though Informatica’s cloud-first vision will erode lucrative license and support revenue streams, the company is showing early signs in its ability to expand margins
Despite no longer selling perpetual licenses and actively migrating its support base to Information Data Management Cloud (IDMC) in the cloud, Informatica’s gross margins continue to expand.
Meanwhile, GAAP operating margin increased over 300 basis points year-to-year in 2Q24 as Informatica continues to benefit from economies of scale, and sign larger, more strategic contracts with customers.
Recognizing that it is navigating a highly competitive landscape, Snowflake’s investments in R&D are increasing. For context, Snowflake’s R&D accounts for a notable 50% of total revenue.