Databricks Pivots Around Data Intelligence to Address GenAI Use Cases

Just like it did with the data lakehouse five years ago, Databricks is establishing another paradigm with data intelligence, which has the data lakehouse architecture at its core but is infused with generative AI (GenAI). Data intelligence was a key theme throughout Databricks Data & AI Summit and signals Databricks’ intentions to further democratize AI and ultimately help every company become an AI company.

A Brief Databricks Backstory

Founded by the creators of Apache Spark, Databricks is known as a trailblazer for launching new concepts in the world of data, such as Delta Lake, the open table format with over 1 billion yearly downloads, and the “lakehouse” architecture, which reflects Databricks’ effort to combine the best of what the data lake and data warehouse offer. Launched in 2020, the lakehouse architecture can handle both structured and unstructured data, and addresses the data engineer and business analyst personas in a single platform.

 

Delta Lake and Unity Catalog, which governs the unstructured data stored in these Delta tables, serve as the basis for the lakehouse architecture and are part of Databricks’ longtime strategy of simplifying the data estate and, by default, AI. But with the advent of GenAI, which is causing the amount of unstructured data to proliferate, Databricks has spearheaded yet another market paradigm, pushing the company beyond its core areas of data ingestion and governance into data intelligence.

 

At the heart of data intelligence is the lakehouse architecture and also Mosaic AI, the rebranded result of last year’s MosaicML acquisition that equipped Databricks with the tools to help customers train, build and fine-tune large language models (LLMs). These also happen to be the same technologies Databricks used to build its own open-source LLM ― DBRX ― sending a compelling message to customers that they, too, can build their own models and use the Mosaic AI capabilities to contextualize that data and tailor it to their business, thus achieving true data intelligence.

What Is Data Intelligence?

Databricks’ executives and product managers largely communicated the definition of data intelligence through demonstrations. One of the more compelling demos showed how Mosaic AI can be used to create an agent that will build a social media campaign, including an image and caption for that campaign, to boost sales.

 

The demo depicted how a user can use transaction data as a tool to supplement a base model, such as Meta’s Llama 3. This demo was key to highlighting one of Databricks’ product announcements, the Shutterstock ImageAI model, which is built on Databricks in partnership with Shutterstock and marks Databricks’ foray into the multimodal model space.

 

The exercise created an image for the fictional social media campaign that included a company’s bestselling product — chosen through transaction data — and a catchy slogan. But to convey the contrast between data intelligence and general intelligence, the demonstrator removed the “intelligence” ― all the data-enabled tools that exist in Unity Catalog ― and generated the image again. This time, the image did not include the bestselling product and was accompanied by a much more generic logan.

 

This demo reinforced the importance of contextualized data in GenAI and the role of Unity Catalog, which helps govern the data being used, and Mosaic AI, which allows developers to use enterprise data as tools for creating agents (e.g., customer support bots).

 

Data intelligence is about not only the context behind the data but also making that context a reality for the enterprise. For instance, in the above scenario, the demonstrator was able to put the image and slogan into Slack and share it with the marketing team through a single prompt. In this example, it is clear how a customer with Databricks skills could use GenAI in their business.

Databricks’ Acquisition of Tabular Is a Blow to Snowflake and a Surefire Way to Stay Relevant in the Microsoft Ecosystem

As a company born on the values of openness and reducing lock-in, Databricks pioneered Delta Lake to ensure any engine can access the data sitting in a data lake. Delta Lake remains the most widely adopted lakehouse format today, handling over 90% of the data processed in Databricks, and is supported by other companies, as 66% of contributions to the open-source software come from outside Databricks.

 

But over the past few years, we have seen Apache Iceberg gain traction as a notable alternative, garnering significant investment from data cloud platforms, including Snowflake. When Databricks announced its acquisition of Tabular ― created by the founders of Apache Iceberg ― days before the Data & AI Summit, it signified a strategic shift that will help Databricks target a new set of prospects who are all in on Iceberg, including many digital natives.

 

The general availability of Databricks’ Delta Universal Format (UniForm), which helps unify tables from different formats, indicates the company’s intention to make Delta and Iceberg more interoperable and, over time, potentially reduce the nuances between both formats, though this may be a longer-term vision.

 

The Tabular acquisition in some ways also marginalizes Snowflake’s steps to become more relevant as a Microsoft Fabric partner. Available through Azure as a first-party native service, Databricks has always had a unique relationship with Microsoft, and Delta serves as the basis for Microsoft Fabric. But Microsoft’s recent announcement to support Iceberg tables with Snowflake in a push for more interoperability was notable, and now with Tabular, Databricks can ensure it remains competitive in the Microsoft Fabric ecosystem.

It Is All About Governance

First announced three years ago, Unity Catalog has emerged as one of Databricks’ more popular products, allowing customers to govern not just their tables but also their AI models, an increasingly important component in GenAI.

 

At the event, Databricks announced it will open source Unity Catalog, which we watched happen during the Day 2 keynote, when Unity Catalog was uploaded to GitHub. Despite Unity Catalog’s mounting success, this announcement is not surprising and only reinforces the company’s commitment to fostering the most open and interoperable data estate.

 

It is very early days, but open sourcing Unity Catalog could help drive adoption, especially as governance of GenAI technologies remains among the top adoption barriers.

Databricks SQL Is Gaining Momentum

It is no secret that Databricks and Snowflake have been moving into one another’s territories. Databricks, with its expertise in AI and machine learning (ML), has been progressing down the stack, trying to capture data warehouse workloads. Snowflake, with its expertise in data warehousing, is looking to get in on the AI opportunity and address the core Databricks audience of data scientists and engineers.

 

Snowflake’s early lead in the data warehouse and strong relationship with Amazon Web Services (AWS) could be making it more difficult for Databricks to attract workloads. Combined with the enormity of the market, there may never be a scenario in which Databricks becomes a “standard” in enterprise accounts for data warehousing. But Databricks’ messaging of “the best data warehouse is a lakehouse” certainly seems to be working.

 

Traditionally, customers have come to Databricks for jobs like Spark processing and ETL (Extract, Transform, Load), but customers are increasingly looking to Databricks for their data warehouse. These customers fall into two groups. In the first group, customers on legacy systems, such as Oracle, are fed up with the licensing and are looking to modernize. In the second group, existing cloud customers are looking for a self-contained environment with less lock-in, compared to vendors like Snowflake, or are seeking to avoid challenges with system management and scale after having worked with hyperscalers.

 

As highlighted by Databricks Co-founder and Chief Architect Reynold Xin, Databricks SQL is the company’s fastest-growing product, with over 7,000 customers, or roughly 60% of Databricks’ total customer base. During his keynote, Xin touted improved startup time with Databricks SQL Serverless to five seconds and automatic optimizations for BI workloads to be four times faster compared to two years ago. Provided Databricks can continue to enhance performance while pushing the boundaries on ease of use to better compete with Snowflake and other vendors in attracting less technical business personas, we expect this momentum will continue and will challenge competitors to raise the bar for their own systems.

Databricks Is Bringing an Added Layer of Value to the BI Stack

Databricks AI/BI is a new service available to all Databricks SQL customers that allows them to ask questions using natural language (Genie) and perform analytics (Dashboards). In a demo, we saw the two user interfaces (UIs) in action: BI offers common features like no-code drag and drop and cross-filtering, and AI includes the conversational experience where customers can ask questions about their data.

 

Databricks AI/BI may lack some of the complex features of incumbent BI tools, but ultimately these are not the goals of the offering. The true value is in the agents that can understand the question the business analyst is asking and hoping to visualize. Databricks’ approach exposes the challenges of bolting on generic LLMs to a BI tool. But the company is not interested in keeping this value confined to its own BI capabilities. Staying true to its culture of openness, Databricks announced at the event that it will open up its API to partners, ensuring PowerBI, Tableau and Google Looker customers can take advantage of data intelligence in these BI environments.

Conclusion

With its lakehouse architecture, which was founded on the principles of open-source software and reduced lock-in, Databricks is well positioned to help customers achieve data intelligence and deploy GenAI. The core lakehouse architecture will remain Databricks’ secret sauce, but acquisitions, including those of MosaicML and Tabular, are allowing Databricks to broaden the scope of its platform to tap into new customer bases and serve new use cases.

 

If Databricks can continue to lower the skills barrier for its technology and sell the partner ecosystem around its platform, the company will no doubt strengthen its hold on the data cloud market and make competitors, including the hyperscalers in certain instances, increasingly nervous.

Blending Industry Expertise with Cybersecurity Credibility: Insights From PwC’s EMEA Financial Services Team

Compelling Cybersecurity Needs Meet PwC’s Capabilities

A July 1, 2024, briefing by PwC’s EMEA Financial Services (FS) team provided TBR with a closer look at PwC’s largest industry practice by revenue and the ways the firm has blended industry expertise with cybersecurity managed services experience and credibility. Julian Wakeham, UK EMEA Consulting Financial Services leader; Moritz Anders, Digital Identity lead, Cyber Security & Privacy, Germany; and Joshua Khosa, Service lead, Cyber Managed Services, Germany, steered the discussion for PwC.

 

Anders said FS clients’ three compelling cybersecurity needs — compliance, cost optimization and talent — shaped PwC’s approach to cybersecurity managed services and, in TBR’s view, will be consistent revenue drivers for PwC as those needs will be perpetual. The challenges around recruiting and retaining highly specialized cybersecurity experts, for example, remain outside the core functions of most enterprises, yet the cybersecurity risks continue evolving, necessitating that consultancies step into that role. A significant part of PwC’s value, therefore, comes from assembling and deploying experts in both cybersecurity and the underpinning enterprise technologies.

 

Critically, according to Anders, PwC has approached cybersecurity managed services not as an IT play, where it can simply throw technology and people at the problems, but as an ongoing business challenge best tackled through a highly automated architecture and a sustained focus on business outcomes.

 

Echoing the three compelling cybersecurity needs highlighted above, Anders and Khosa provided details about a use case with a Europe-based bank that delivered three clear business outcomes: “compliance and audit readiness, operational efficiency, and enhanced security,” with the last relying, in part, on PwC using its alliance partners to keep emerging technologies and updates flowing to the client.

 

In TBR’s view, the compliance and audit-readiness components reflect PwC’s legacy strengths and brand around governance, risk, and compliance, and the operational efficiency outcomes build on the firm’s decades-old emphasis on and experience with operations consulting. In short, PwC continues playing to its strengths.

 

At the end of the briefing, the PwC team was asked why this particular Europe-based bank chose PwC for a complicated, multiyear cybersecurity managed services engagement. Anders said PwC remained direct and humble throughout the selection process, informing the client, without marketing spin, what PwC could and could not do well.

 

Among the strengths PwC brought to the table, according to Anders, was Europe-based talent at scale, in contrast to competitors, which relied on offshore resources. Wakeham noted PwC’s flexibility, focus on business problems (and not just selling technology solutions), PwC’s Industry Edge+ as a key enabler for business model reinvention, and the “deep trust” PwC’s clients have in the firm. 

Watch Now: TBR Vice President Dan Demers and TBR Principal Analyst Patrick M. Heffernan discuss trends expected to shape the market in 2024, including GenAI’s impact on ecosystem alliances and how clients use TBR’s research and analysis to add context to strategic questions and address challenges around alliance enablement

Business Model Reinvention, Ecosystem Strategy and Expansive Capabilities

Reflecting on the EMEA FS briefing and previous discussions with PwC across topics and capabilities as diverse as people advisory services, IoT and generative AI, TBR made a few observations. First, PwC’s focus on “business model reinvention” was mentioned at the beginning and end of the discussion, with Wakeham acknowledging that the firm did not create that term or idea but explaining that PwC’s own market research indicated the importance to CEOs of that strategic focus. TBR reported earlier this year on PwC’s ideas around business model reinvention and notes that while previous strategic shifts have taken time to gain traction across the PwC member firms, business model reinvention appears to have considerable momentum and heft.

 

Second, PwC’s alliances strategy appears to be evolving as both the competitive and ecosystem landscapes change, with increased expectations that technology partners will bring business to PwC. In contrast to the usual equivocation and lack of details around how ecosystem partners can play a role in PwC’s go-to-market strategy, the EMEA FS team provided both a direct answer to TBR’s question about whether partners bring PwC into client projects and an explanation for the underlying reasons why software vendors would introduce PwC into an engagement.

 

PwC maintains a vast array of technology partnerships across cybersecurity, enterprise platforms, cloud, IoT and more, necessitating a well-managed ecosystem effort and providing extensive opportunities to gain new clients and expanded opportunities within existing accounts. Continually refining the ecosystem playbook will be vital to PwC’s continued success.

 

Lastly, PwC’s EMEA FS team provided another example of the breadth of the firm’s capabilities, an element of PwC’s value proposition that can sometimes be forgotten when focusing too intently on one piece of the overall firm. For example, in cybersecurity managed services, PwC brings expertise and capabilities in cyber incident response, smart cyber defense, cybersecurity upskilling, identity and access management, and OT & IT security, to name a few.

 

TBR believes the extent of PwC’s capabilities and offerings, while not unique, can sometimes be lost on clients and ecosystem partners that are focused on the immediate services the firm is bringing to their engagement. If PwC remains focused on business model reinvention and continues evolving its ecosystem strategy, the breadth of the firm’s capabilities will become the underlying strength that sustains PwC’s success.