Sheer Scale of GTC 2025 Reaffirms NVIDIA’s Position at the Epicenter of the AI Revolution
As the undisputed leader of the AI market, NVIDIA and its GPU Technology Conference (GTC) are unmatched compared to other companies and their respective annual events when it comes to the enormous impact they have on the broader information technology market. GTC 2025 took place March 17-21 in San Jose, Calif., with a record-breaking 25,000 in-person attendees — and 300,000 virtual attendees — and nearly 400 exhibitors on-site to showcase solutions built leveraging NVIDIA’s AI and accelerated computing platforms.
NVIDIA GTC 2025: Pioneering the future of AI and accelerated computing
In 2024 NVIDIA CEO and cofounder Jensen Huang called NVIDIA GTC the “Woodstock of AI,” but to lead off the 2025 event’s keynote address at the SAP Center, he aptly changed his phrasing, calling GTC 2025 “the Super Bowl of AI,” adding that “the only difference is that everybody wins at this Super Bowl.”
While the degree to which every tech vendor “wins” in AI will vary, NVIDIA currently serves as the rising tide that is lifting all boats — in this case, hardware makers, ISVs, cloud providers, colocation vendors and service providers — to help accelerate market growth despite the economic and geopolitical struggles that have hampered technology spending in the post-COVID era. NVIDIA’s significant investments not as a GPU company but as a platform company — delivering on innovations in full-stack AI and accelerated computing infrastructure and software — have provided much of the foundation upon which vendors across the tech ecosystem continue to build their AI capabilities.
During the event, which also took place at the nearby San Jose McEnery Convention Center, Huang shared his vision for the future, emphasizing the immense scale of the inference opportunity while introducing new AI platforms to support what the company sees as the next frontiers of AI. Additionally, he reaffirmed NVIDIA’s commitment to supporting the entire AI ecosystem by building AI platforms, rather than AI solutions, to drive coinnovation and create value across the entire ecosystem.
The transformation of traditional data centers into AI factories represents a $1 trillion opportunity
The introduction of ChatGPT in November 2022 captured the attention of businesses around the world and marked the beginning of the generative AI (GenAI) revolution. Since then, organizations across all industries have invested in the exploration of GenAI technology and are increasingly transitioning from the prototyping phase to the deployment phase, leveraging the power of inference to create intelligent agents, power autonomous vehicles and drive other operational efficiencies. As AI innovation persists, driven largely by the vision of Huang and the increasingly capital-rich company behind him, new AI paradigms are emerging and NVIDIA is helping the entire AI ecosystem to prepare and adapt.
The rise of reasoning
On Jan. 27, otherwise known as DeepSeek Monday, NVIDIA stock closed the day down 17.0% from the previous day’s trading session, with investors believing DeepSeek’s innovations would materially reduce the total addressable market for AI infrastructure. DeepSeek claimed that by using a combination of model compression and other software optimization techniques, it had vastly reduced the amount of time and resources required to train its competitive AI reasoning model, DeepSeek-R1. However, at GTC 2025, NVIDIA argued that investors misunderstood implications on the inference side of the AI model equation.
Traditional knowledge-based models can quickly return answers to users’ queries, but because basic knowledge-based models rely solely on the corpus of data that they are trained on, they are limited in their ability to address more complex AI use cases. To enhance the quality of model outputs, AI model developers are increasingly leveraging post-training techniques such as fine-tuning, reinforcement learning, distillation, search methods and best-of-n sampling. However, more recently test-time scaling, also known as long thinking, has emerged as a technique to vastly expand the reasoning capabilities of AI models, allowing them to address increasingly complex queries and use cases.
From one scaling law to three
In the past, pre-training scaling was the single law dictating how applying compute resources would impact model performance, with model performance improving as pre-training compute resources increased. However, at GTC 2025, NVIDIA explained two additional scaling laws in effect — post-training scaling and test-time scaling. As their names suggest, model pre-training and post-training are on the AI model training side of the equation. However, test-time scaling takes place during inference, allocating more computational resources during the inference phase to allow a model to reason through several potential responses before outputting the best answer.
Traditional AI models operate quickly, generating hundreds of tokens to output a response. However, with test-time scaling, reasoning models generate thousands or even tens of thousands of thinking tokens before outputting an answer. As such, NVIDIA expects the new world of AI reasoning to drive more than 100 times the token generation, equating to more than 100 times the revenue opportunity for AI factories.
During an exclusive session with industry analysts, Huang said, “Inference is the hardest computing at scale problem [the world has ever seen],” dispelling the misnomer that inference is somehow easier and demands fewer resources than training while also indirectly supporting Huang’s belief that the transformation of traditional data centers into AI factories will drive total data center capital expenditures (capex) to $1 trillion or more by 2028.

NVIDIA Revenue, Growth and Projections (Source: TBR) — If you believe you have access to TBR’s NVIDIA research via your employer’s enterprise license or would like to learn how to access the full research, click here.
While on the surface, $1 trillion in data center capex by 2028 sounds like a lofty threshold, TBR believes the capex amount and timeline are feasible considering NVIDIA’s estimate that 2024 data center capex was around $400 billion.
Additionally, during 1Q25, announcements centered on investment commitments to build out data centers have become increasingly common, and TBR expects this trend to only accelerate over the next few years. For example, in January the Trump administration announced the Stargate Project with the intent to invest $500 billion over the next four years to build new AI infrastructure in the United States.
However, it is worth noting that Stargate’s $500 billion figure represents more than just AI servers; it includes other items such as the construction of new energy infrastructure to power data centers. TBR believes the same holds true for NVIDIA’s $1 trillion figure, especially when considering TBR’s 2024 total AI server market estimate of $39 billion.
The more you buy, the more you make: NVIDIA innovates to maximize potential AI factory revenue
To support the burgeoning demands of AI, NVIDIA is staying true to the playbook through which it has already derived so much success — investing in platform innovation and the support of its growing partner ecosystem to drive the adoption of AI technology across all industries.
AI factory revenue relies on user productivity
Reasoning capabilities allow models to meet the demands of a wider range of increasingly complex AI use cases. Although the revenue opportunity of AI factories increases as AI reasoning drives an exponential rise in token generation, expanding token generation also creates bottlenecks within AI factories and inevitably there is a tradeoff. To maximize revenue potential, AI factories must optimize the balance between token volume and cost per token.
From the perspective of an AI inference service user, experience comes down to the speed at which answers are generated and the accuracy of those answers. Accuracy is tied directly to the underlying AI model(s) powering the service and can be thought of as a constant variable in this scenario, while the speed at which answers are generated for a single user is dictated by the rate of output token generation for that specific user. Having more GPUs dedicated to serving a single user results in an increased rate of output token generation for that user and is something that users are typically willing to pay a premium for.
However, in general, as more GPUs are dedicated to serving a single user, the overall output token generation of the AI factory falls. On the opposite end of the spectrum, an AI factory can maximize its overall output token generation by changing GPU resource allocations to serve a greater number of users at the same time; however, this has a negative impact on the rate of output tokens generated per user, increasing request latency and thereby detracting from the user’s experience.
As NVIDIA noted during the event, to maximize revenue, AI factories must optimize the balance of total factory output token generation and the rate of output token generation per user. However, once the optimal allocation of GPU resources is determined, revenue opportunity hits a threshold. As such, to increase the productivity and revenue opportunity of AI factories, NVIDIA supports the AI ecosystem with its investments in the development of increasingly performant GPUs, allowing for greater total factory output token generation as well as increased rates of output token generation per user.
During his keynote address, Huang laid out NVIDIA’s four-year GPU road map, detailing the upcoming Blackwell Ultra as well as the NVIDIA GB300 NVL72 rack, which leverages Blackwell Ultra and features an updated NVL72 design for improved energy efficiency and serviceability. Additionally, he discussed the company’s Vera Rubin architecture, which is set for release in late 2026 and marks the shift from HBM3/HBM3e to HBM4 memory, as well as Vera Rubin Ultra, which is expected in 2027 and will leverage HBM4e memory to deliver higher memory bandwidth. To round out NVIDIA’s four-year road map, Huang announced the company’s Feynman GPU architecture, which is slated for release in 2028.
Scale up before you scale out, but NVIDIA supports both
In combination with NVIDIA’s updated GPU architecture road map, Huang revealed preliminary technical specifications for the Vera Rubin NVL144 and Rubin Ultra NVL576 racks, with each system being built on iterative generations of the company’s ConnectX SuperNIC and NVLink technologies, promising stronger networking performance with respect to increased bandwidth and higher throughput. NVIDIA’s growing focus on NVL rack systems underscores Huang’s philosophy that organizations should scale up before they scale out, prioritizing the deployment of fewer densely configured AI systems compared to a greater number of less powerful systems to drive simplicity and workload efficiency.

2024 Data Center GPU Market Share (Source: TBR) — If you believe you have access to TBR’s NVIDIA research via your employer’s enterprise license or would like to learn how to access the full research, click here.
Networking has and continues to become more integral to NVIDIA’s business as the company’s industry-leading advancements in accelerated compute have necessitated full-stack AI infrastructure innovation. While NVIDIA drives accelerated computing efficiency on and close to the motherboard through the design of increasingly high-performance GPUs and CPUs and its ongoing investments in ConnectX and NVLink, the company is also heavily invested in driving AI infrastructure efficiency through its networking platform investments in Quantum-X InfiniBand and Spectrum-X Ethernet.
Although copper is well suited for short-distance data transmissions, fiber optics is more effective over long distances. As such, the scale-out of AI factories requires an incredible number of optical transceivers to connect every NIC (network interface card) to every switch, representing the single largest hardware component in a typical AI data center. NVIDIA estimates that optical transceivers consume approximately 10% of total computing power in most AI data centers. During his keynote address, Huang announced NVIDIA Photonics — what the company describes as a coinvention across an ecosystem of copacked optics partners — to reduce power consumption and the number of discrete components in an AI data center.
Leveraging components from partners, including TSMC, Sumitomo and Corning, NVIDIA Photonics allows NVIDIA to replace pluggable optical transceivers with optical engines that are copackaged with the switch ASIC. This allows optical fibers to plug directly into the switch with the onboard optical engine processing and converting incoming data — in the form of optical signals — into electrical signals that can then be immediately processed by the switch. Liquid-cooled Quantum-X Photonic switch systems are expected to become available later this year ahead of the Spectrum-X Photonic switch systems that are coming in 2026. NVIDIA claims that the new systems improve power efficiency by 3.5x while also delivering 10x higher resiliency and 1.3x faster time to deploy compared to traditional AI data center architectures leveraging pluggable optical transceivers.
Securing the developer base
Adjacent to what the company is doing in the data center, NVIDIA announced other, more accessible Blackwell-based hardware platforms, including RTX PRO Series GPUs, DGX Spark and DGX Station, at GTC 2025. At CES (Consumer Electronics Show) 2025 in January, NVIDIA made two major announcements: Project DIGITS, a personal AI supercomputer that provides AI researchers, data scientists and students with access to the Grace Blackwell platform; and the next-generation GeForce RTX 50 Series of consumer desktop and laptop GPUs for gamers, creators and developers.
Building on these announcements, at GTC 2025 NVIDIA introduced DGX Spark, the new name of the previously announced Project DIGITS, leveraging NVIDIA GB10 Grace Blackwell Superchip and ConnectX-7 to deliver 1,000 AI TFLOPS (tera floating-point operations per second) performance in an energy-efficient and compact form factor. DGX Spark will come pre-installed with the NVIDIA AI software stack to support local prototyping, fine-tuning and inferencing of models with up to 200 billion parameters, and NVIDIA OEM partners ASUS, Dell Technologies, HP Inc. and Lenovo are already building their own branded versions.
To complement its recently unveiled GeForce RTX 50 Series, NVIDIA announced a comprehensive lineup of RTX PRO Series GPUs for laptops, desktops and servers with “PRO” denoting the solutions’ intent to support enterprise applications. At the top end of the lineup, RTX PRO 6000 will deliver up to 4,000 AI TFLOPS performance, making it the most powerful discrete desktop GPU ever created. While DGX Spark systems will be available beginning in July, DGX Station is expected to be released toward the end of the year. DGX Station promises to be the highest-performing desktop AI supercomputer, featuring the GB300 Grace Blackwell Ultra Desktop Superchip and ConnectX-8, with OEM partners, including ASUS, Box, Dell Technologies, HP Inc., Lambda and Supermicro, building systems. Together, these announcements highlight NVIDIA’s commitment to democratizing AI and supporting developers.
Software is the most important feature of NVIDIA GPUs
In TBR’s 1Q24 Semiconductor Market Landscape, NVIDIA led all vendors in terms of trailing-12 month (TTM) corporate revenue growth, with hardware revenue accounting for an estimated 88.9% of the company’s TTM top line. However, while NVIDIA’s industry-leading top-line growth continues to be driven primarily by increasing GPU and AI infrastructure systems sales, the reason customers choose NVIDIA hardware ultimately boils down to two interrelated factors: the company’s developer ecosystem, and its AI platform strategy.
The CUDA advantage
In 2006 NVIDIA introduced CUDA (Compute Unified Device Architecture), a coding language and framework purpose-built to enable the acceleration of workloads beyond graphics. With CUDA, developers gained the ability to code applications optimized to run on NVIDIA GPUs. Since CUDA’s inception, NVIDIA has relentlessly invested in strengthening CUDA, supporting backward compatibility, publishing new CUDA libraries, and giving developers new resources to optimize the performance and simplify the building of applications.
As such, many legacy AI applications and libraries are rooted in CUDA, whose documentation is light years ahead of competing platforms, such as AMD ROCm. With respect to driving AI efficiency, several NVIDIA executives and spokespeople at GTC 2025 circled back to the notion that, when it comes to enabling the most complex AI workloads of today and tomorrow, software optimization is as important as, if not more important than, infrastructure innovation and optimization, underscoring the unique value behind NVIDIA’s CUDA-optimized GPUs. In short, at the heart of NVIDIA’s comprehensive AI stack and competitive advantage is CUDA, and as Huang emphasized to the attending industry analysts, “Software is the most important feature of NVIDIA GPUs.”
A new framework for AI inference
As the AI inference boom materializes, NVIDIA has leveraged the programmability of its GPUs to optimize the performance of reasoning models at scale, with Huang introducing NVIDIA Dynamo at GTC 2025. Dynamo is an open-source modular inference framework that was designed to serve GenAI models in multinode distributed environments and specifically developed for accelerating and scaling AI reasoning models to maximize token revenue generation.
The framework leverages a technique called “disaggregated serving,” which separates the processing of input tokens in the prefill phase of inference from the processing of output tokens in the decode phase. Traditional large language model (LLM) deployments leverage a single GPU or GPU node for both the prefill and decode phases, but each phase has different resource requirements, with prefill being compute-bound and decode being memory-bound. As NVIDIA’s VP of Accelerated Computing Ian Buck put it, “Dynamo is the Kubernetes of GPU orchestration.”
To optimize the utilization of GPU resources for distributed inference, Dynamo’s Planner feature continuously monitors GPU capacity metrics in distributed inference environments to make real-time decisions on whether to serve incoming user requests using disaggregated or aggregated serving while also selecting and dynamically shifting GPU resources to serve prefill or decode inference phases.
To further drive inference efficiencies by reducing request latency and time to first token, Dynamo has a Smart Router feature to minimize key value (KV) cache re-computation. KV cache can be thought of as the model’s contextual understanding of a user’s input. As the size of the input increases, KV cache computation increases quadratically, and if the same request is frequently executed, this can lead to excessive KV cache re-computation, reducing inference efficiency. Dynamo Smart Router works by assigning an overlap score to each new inference request as it arrives and then using that overlap score to intelligently route the request to the best-suited resource — i.e., whichever available resource has the highest overlap score between its KV cache and the user’s request — minimizing KV cache recomputation and freeing up GPU resources.
Additionally, Dynamo leans on its Distributed KV Cache Manager feature to support both distributed and disaggregated inference serving and to offer hierarchical caching capabilities. Calculating KV cache is resource intensive, but as AI demand increases, so does the volume of KV cache that must be stored to minimize KV cache recomputation. Dynamo Distributed KV Cache Manager leverages advanced caching policies to prioritize the placement of frequently accessed data closer to the GPU, with less accessed data being offloaded farther from the GPU.
As such, the hottest KV cache data is stored on GPU memory with progressively colder data being offloaded to shared CPU host memory, solid-state drives (SSDs) or networked object storage. Leveraging these key features, NVIDIA claims Dynamo maximizes resource utilization, yielding up to 30 times higher performance for AI factories running reasoning models like DeepSeek-R1 on NVIDIA Blackwell. Additionally, NVIDIA leaders state that while designed specifically for the inference of AI reasoning models, Dynamo can double token generation when applied to traditional knowledge-based LLMs on NVIDIA Hopper.
The Super Bowl but everybody wins
NVIDIA’s astronomical revenue growth and relentless innovation road map aside, perhaps nothing emphasizes the degree of importance the company holds over the future of the entire AI market more than the number of partners that are clamoring to gain a foothold using NVIDIA as a launching point. The San Jose McEnery Convention Center was filled with nearly 400 exhibitors showcasing how NVIDIA’s AI and accelerated computing platforms are driving innovation across all industries. NVIDIA GTC is no longer a conference highlighting the innovations of a single company; it is the epicenter of showcasing AI opportunity, and every company that wishes to play a role in the market was in attendance.
The broad swath of NVIDIA’s partner ecosystem was represented. Infrastructure OEMs and ODMs displayed systems built on NVIDIA reference architectures, while NVIDIA inception startups highlighted their own diverse codeveloped AI solutions. However, perhaps the most compelling and largest-scale example of NVIDIA relying on its partners to deliver AI solutions to end customers came from the company’s global systems integrator (GSI) partners.
NVIDIA provides the platform; partners provide the solution
The world’s leading GSIs, including Accenture, Deloitte, EY, Infosys and Tata Consultancy Services (TCS), all showcased how they are leveraging NVIDIA’s AI Enterprise software platform — comprising NIMs, NeMo and Blueprints — to help customers build and deploy their own customized AI solutions with a heavy emphasis on agentic AI. While some of the largest enterprises in the world have the talent required to build bespoke AI solutions, many other organizations rely on NVIDIA-certified GSI partners with training and expertise in NVIDIA’s AI technologies to develop and deploy AI solutions.
Agentic AI has emerged as the next frontier of AI, using reasoning and iterative planning to solve complex, multistep problems autonomously, leading to enhanced productivity and user experiences. NVIDIA AI Enterprise’s tools help make this possible, and at GTC 2025, NVIDIA business leaders shed light on three overarching reasons why NVIDIA AI Enterprise has resonated with end customers and NVIDIA partners alike.
First, NVIDIA AI Enterprise builds on CUDA to deliver software-optimized full-stack acceleration, much like other NVIDIA AI platforms. Business leaders essentially explained NIMs — the building blocks of AI Enterprise — as an opinionated way of running a GenAI model on a GPU in the most efficient way possible.
Second, NVIDIA AI Enterprise is enterprise grade, meaning that the thousands of first- and third-party libraries constituting the platform are constantly maintained with AI copilots scanning for security threats and AI agents patching software autonomously. Additionally, enterprises demand commitments to maintenance and standard APIs that are not going to change, and NVIDIA AI Enterprise ticks these boxes while also offering tiered levels of support services on top of the platform.
Finally, because NIMs are containerized, based on Kubernetes, AI Enterprise is extremely portable, allowing the platform to deliver a consistent experience across a variety of environments.
Autonomous vehicles are the tip of the physical AI iceberg
Several of NVIDIA’s automotive partners also attended GTC 2025, displaying their vehicles inside and outside the convention center. These partners all leverage at least one of NVIDIA’s three computing platforms comprising the company’s end-to-end solutions for autonomous vehicles, with several partners leveraging NVIDIA’s entire platform — including General Motors (GM), whose adoption of NVIDIA AI, simulation and accelerated compute was announced by Huang during the GTC 2025 keynote address.
While autonomous vehicles are perhaps the most tangible example, NVIDIA’s three computer systems can be used to build robots of all kinds, ranging from industrial robots used on manufacturing lines to surgical robots supporting the healthcare industry. The three computers required to build physical AI include NVIDIA DGX, which is leveraged for model pre-training and post-training; NVIDIA OVX, which is leveraged for simulation to further train, test and validate physical AI models; and NVIDIA AGX, which acts as the robot runtime and is used to safely deploy distilled physical AI models in the real world.
Following the emergence of agentic AI, NVIDIA sees physical AI as the next wave of artificial intelligence, and the company has already codeveloped foundation models and simulation frameworks to support advancements in the field with industry-leading partners, such as Disney Research and Google DeepMind.
Conclusion
The sheer scale of NVIDIA GTC 2025 reaffirmed NVIDIA’s position at the epicenter of the AI revolution, with Huang’s keynote address filling all the available seating in the SAP Center. Born from Huang’s long-standing vision of accelerating workloads by applying parallel processing, NVIDIA’s relentless investments in the R&D of the entire AI stack — from GPUs to interconnect and software platforms to developer resources — remains the driving force behind the AI giant’s success and seemingly insurmountable lead over competitors.
NVIDIA’s first-mover advantage in accelerated computing was predicated on the company’s CUDA platform and its ability to allow developers to optimize applications running on NVIDIA GPUs. Nearly 20 years later, NVIDIA continues to leverage CUDA and its robust ecosystem of developers to create innovative AI platforms, such as Omniverse and AI Enterprise, that attract partners from every corner of the technology ecosystem. By swimming in its own lane and relying on its growing NVIDIA Partner Network to deliver AI systems and solutions to end customers, NVIDIA has built an unrivaled ecosystem of partners whose actions on the front lines with end customers facilitate the near-infinite gravity behind the company’s AI platforms.