AI Inferencing Takes Center Stage at Red Hat Summit 2025
In late May, Red Hat welcomed thousands of developers, IT decision makers and partners to its annual Red Hat Summit at the Boston Convention and Exhibition Center (BCEC). Like the rest of the market, Red Hat has pivoted around AI inferencing, and this conference marked the company’s entry into the market with the productization of vLLM, the open-source project that has been shaping AI model execution over the past two years. Though Red Hat’s push into AI inferencing does not necessarily suggest a deemphasis on model alignment use cases (e.g., fine-tuning, distillation), which was the company’s big strategic focus last year, it is a recognition that AI inferencing is a production environment and that the process of running models to generate responses is where the business value lies. Red Hat’s ability to embed open-source innovation within its products and lower the cost per model token presents a sizable opportunity. Interestingly, Red Hat’s prospects are also evolving in more traditional markets. For instance, Red Hat’s virtualization customer base has tripled over the past year, with virtualization emerging as a strategic driver throughout the company’s broader business, including for communication service providers (CSPs) adopting virtualized RAN and within other domains such as their IT stacks and the mobile core.
Red Hat pivots around AI inferencing
Rooted in Linux, the basis of OpenShift, Red Hat has always had a unique ability to resolution assets to expand into new markets and use cases. Of course, AI is the most relevant example, and two years ago, Red Hat formally entered the market with Red Hat Enterprise Linux (RHEL) AI — the tool Red Hat uses to engage AI developers — and OpenShift AI, for model lifecycle management and MLOps (machine learning operations) at scale. These assets have made up the Red Hat AI platform, but at the Red Hat Summit, the company introduced a third component with AI Inference Server, in addition to new partnerships and integrations further designed to make agentic AI and inferencing realities within the enterprise.
AI and generative AI (GenAI) are rapidly evolving, but the associated core challenges and adoption barriers, including the high cost of AI models and the sometimes arduous nature of providing business context, remain largely unchanged. Between IBM’s small language models (SLMs) and Red Hat’s focus on reducing alignment complexity, both companies have crafted a strategy focused on addressing these challenges; they aim not to develop the next big AI algorithm, but rather to serve tangible enterprise use cases in both the cloud and the data center.
Everyone is aware of Red Hat’s track record of delivering enterprise-grade open-source innovation, and if Red Hat’s disruption with Linux over two decades ago is any indication, the company is well positioned to make real, cost-effective solutions for the enterprise based on reasoning models and AI inferencing.
Red Hat productizes vLLM to mark entry into AI inferencing
Though perhaps lesser known, most large language models (LLMs) today are leveraging vLLM, an upstream open-source project boasting roughly half a million downloads in any given week. At its core, vLLM is an inference server that helps address “inference-time scaling,” or the budding notion that the longer the model runs or “thinks,” the better the result will be. Of course, the challenge with this approach is the cost of running the model for a longer period of time, but vLLM’s single-server architecture is designed to optimize GPU utilization, ultimately reducing the cost per token of the AI model. Various industry leaders — namely NVIDIA, despite having its own AI model serving stack; Google; and Neural Magic, which Red Hat acquired earlier this year — are leading contributors to the project.
Leveraging its rich history of turning open-source projects into enterprise products, Red Hat launched AI Inference Server, based on vLLM, marking Red Hat’s first offering from the Neural Magic acquisition. AI Inference Server is included with both RHEL AI and OpenShift AI but can also run as its own stand-alone server. Though perhaps inclined to emphasize IBM’s watsonx models, Red Hat is extending its values of flexibility, choice and meeting customers where they are to AI Inference Server. This new offering supports accelerators outside IBM, including NVIDIA, AMD, Intel, Amazon Web Services (AWS) and Google Cloud, and offers Day 0 support for a range of LLMs. This means that as soon as a new model is released, Red Hat works with the provider to optimize the model for vLLM and validate it on Red Hat’s platform.
Building on vLLM’s early success, Red Hat launched LLM-d, a new open-source project, announced at the Red Hat Summit. LLM-d transcends vLLM’s single-server architecture, allowing inference to run in a distributed manner, further reducing the cost per token. Due to the cost, most will agree that inferencing will necessitate distributed infrastructure, and there are several recent examples across the tech landscape that have alluded to this. LLM-d is being launched with support from many of vLLM’s same contributors, including NVIDIA and Google (LLM-d runs on both GPUs and TPUs [tensor processing units]).
Partnership with Meta around MCP is all about empowering developers and making agentic AI enterprise-ready
If Google’s launch of A2A (Agent2Agent) protocol is any indication, Anthropic’s Model Context Protocol (MCP), which aims to standardize how LLMs discern context, is gaining traction. At the Red Hat Summit, Red Hat committed to MCP by announcing it will deliver Meta’s Llama Stack, integrated with MCP, in OpenShift AI and RHEL AI.
To be clear, Red Hat supports a range of models, but Meta went the open-source route early on, bringing Llama Stack, an open-source framework for building specifically on Llama models, into the Red Hat environment. This not only exposes Red Hat to another ecosystem but also provides APIs around it. Enlisting Meta at the API layer is an important aspect of this solution, as it enables customers to consume the solution and build new agentic applications with MCP playing a key role in contextualizing those applications within the AI enterprise. It is still early days for MCP, and making the protocol truly relevant in enterprise use cases will take some time and advancement in security and governance. But Red Hat indirectly supporting MCP within its products signals the framework’s potential and Red Hat’s role in bringing it to the enterprise.
Who would have thought we would be discussing virtualization in 2025?
In 2025 and the world of AI, you don’t often hear of a company putting virtualization at the top of its strategic imperatives list. However, everyone has seen how Broadcom’s takeover of VMware has caused a ripple in the market, with customers seeking cheaper, more flexible alternatives that will not disrupt their current cloud transformation journeys. In fact, when we surveyed enterprise IT decision makers, 42% of respondents indicated they still intend to use VMware, but most plan to do so in a reduced capacity. Of those planning to continue using VMware, a notable 83% are still evaluating other options*.
“Options both have increased the prices across the board, 20% to 30%, which is pretty significant. So, you could say myself and my peers are not very happy with the Broadcom method on that, and we’re looking at, you know, definitely options to migrate off VMware when possible. We’re definitely looking at Citrix, and then options from Red Hat and Microsoft.” — CTO Portfolio Manager, Consumer Packaged Goods
As a reminder, after Red Hat revolutionized Linux in the early 2000s, the company’s next big endeavor was virtualization. With the rise of cloud-native architectures, Red Hat quickly pivoted around containers, and this is where the company remains most relevant today. However, through the KVM (kernel-based virtual machine) hypervisor, which would eventually be integrated with OpenShift, virtualization has always been a part of the portfolio. Over the past year, given the opportunity surrounding the VMware customer base, Red Hat has actively revisited its virtualization roots in a few primary ways.
First, given the risky nature of switching virtualization platforms, Red Hat crafted a portfolio of high-touch services around OpenShift Virtualization, including Migration Factory and a fixed-price offering called Virtualization Migration Assessment. These services from Red Hat Consulting, which are offered in close alignment with global systems integrator (GSI) partners, help customers migrate virtual machines (VMs) as quickly as possible while minimizing risk, which largely stems from helping customers migrate VMs before modernizing them.
Secondly, Red Hat has focused on increasing public cloud support. Red Hat announced at the summit that OpenShift Virtualization is now available on Microsoft Azure, Google Cloud and Oracle Cloud Infrastructure (OCI), in addition to previously announced support for IBM Cloud and AWS, officially making the platform available on all major public clouds. Making OpenShift Virtualization applicable across the entire cloud ecosystem reinforces how serious Red Hat is about capturing these virtualization opportunities. These integrations will make it easier for customers to use their existing cloud spend commitments to offload VMware workloads to any cloud of their choice while maintaining the same cloud-native experience they are used to.
Of course, there will always be a level of overlap between Red Hat and the hyperscalers, but ultimately the hyperscalers recognize Red Hat’s role in addressing the hybrid reality and enterprises’ need to move workloads consistently across clouds and within data centers, and they welcome a more feature-rich platform like OpenShift that will spin the meter on their infrastructure.
With virtualization, Red Hat is allowing partners to sell infrastructure modernization and AI as part of the same story
At the conference, we heard from established Red Hat customers that have extended their Linux and container investments to virtualization. Examples included Ford and Emirates NBD, which has over 37,000 containers in production and is now migrating 9,000 VMs to Red Hat OpenShift Virtualization for a more consistent tech stack. Based on our conversations with customers, these scenarios — where VMs and containers run side by side — are not an easy sell and require a level of buy-in across the organization.
That said, if customers can overcome some of these change management hurdles, this side-by-side approach can offer numerous benefits, largely by creating greater consistency between legacy and cloud-native applications without significant refactoring. Though some GSIs may be better suited to the infrastructure layer than others, partners should recognize the opportunity to use OpenShift Virtualization to have client discussions around broader AI transformations. One of the compelling aspects of Red Hat is that even as it progressed through different phases — Linux, virtualization, containers and now AI — the hybrid platform foundation has remained unchanged. If customers can modernize their infrastructure on the same platform, introducing AI models via OpenShift AI becomes much more compelling.
Virtualization remains a key driver of telecom operator uptake of Red Hat solutions, but AI presents a significant upsell opportunity
Over the past few years, Red Hat has leveraged its virtualization technology in the CSP market, making significant progress in landing new CSP accounts and expanding its account share within this unique vertical. The company’s growth in this market has been aided by factors such as Broadcom’s acquisition of VMware, which initially caused a wave of CSPs to migrate to Red Hat due to the uncertainty surrounding VMware’s portfolio road map. Broadcom’s price hikes are causing a second wave of switching that TBR anticipates will continue for several years.
However, Red Hat has also succeeded in more deeply penetrating the telecom vertical due to its savvy marketing, which at times emphasizes that its solutions are “carrier-grade,” along with persistent efforts to raise awareness within the CIO and CTO organizations of CSPs that virtualization and hybrid multicloud strategies will have significant ROI for CSPs. This has led to strong adoption of Red Hat OpenStack and OpenShift, although the Ansible automation platform has lagged in terms of CSP adoption, as this customer segment prefers to use the free, open-source version of Ansible.
As CSPs iterate on their AI strategies, Red Hat has the opportunity to play a significant role, including with its new Red Hat Inference Server, as CSPs increasingly embrace edge compute investments. CSPs need to invest upfront to capitalize on the cost efficiency and revenue generation opportunities offered by AI, and Red Hat can help guide them in this direction. CSPs have difficulty moving quickly when new, disruptive technologies emerge, and, with AI specifically, have trouble evaluating and testing AI models themselves due to a lack of in-house expertise. Additionally, they feel constrained by regulations and are concerned about compromising data privacy. Red Hat’s dedicated telecom vertical services can help alleviate these concerns and accelerate CSPs’ investments in AI infrastructure.
Final thoughts
Based on our best estimate, roughly 85% of AI’s current use is focused on training and only 15% on inferencing, but the inverse could be true in the not-too-distant future. Not only that, but AI inferencing will likely occur at distributed locations for the purposes of latency and scale — which, due to its hybrid platform and ability to help customers “write once, deploy anywhere,” remains core to Red Hat’s value proposition. That is one of the compelling aspects of a platform-first approach; even as new components such as AI models are introduced, the core foundation remains unchanged.
Though all of Red Hat’s new innovations, including AI Inference Server and the LLM-d project, do not necessarily suggest a deemphasis on model alignment with assets like InstructLab, it is clear Red Hat is pivoting to address the inference opportunity. With its trusted experience productizing open-source innovation and its ability to exist within a broad technology ecosystem of hyperscalers, OEMs and chip providers, Red Hat is in a somewhat unique position to help transition AI inference from an ideal to an enterprise reality.
Further, Red Hat’s virtualization prospects are growing, as TBR’s interactions with customers continue to indicate that they are looking for new alternatives. If the hyperscalers’ recent earnings reports are any indication, the GenAI hype is waning, and we suspect many enterprises will refocus on infrastructure modernization to ultimately move beyond basic chatbots and lay the groundwork for the more strategic applications that inferencing will enable. It will be interesting to see how Red Hat capitalizes on new virtualization opportunities with its hyperscaler and services partners as part of a joint effort to bring customers to a modern platform, where VMs and containers can coexist and drive discussions around AI.
*From TBR’s 2H25 IT Infrastructure Customer Research