Technology Business Research, Inc., is pleased to announce the launch of the AI & GenAI Technology Market Landscape, a new research report focusing on some of the more influential AI startups that are making generative AI (GenAI) a reality for many enterprises as well as for their cloud delivery partners, which play a critical role in this new market.

In our initial research we analyze alliance relationships, specifically how AI vendors — including both large language model (LLM) providers and GenAI facilitators — are working with major hyperscalers and SaaS vendors. Additionally, we look at where AI startups are investing; key trends, such as the emergence of multimodal models; and what we can expect from vendors in the coming quarters. Vendors covered include AI21 Labs, Anthropic, Cohere, Databricks, Google, Hugging Face, Meta, OpenAI and Stability AI.

The first publication of this semiannual report is now available. If you are a current TBR client and believe you have access to the full research via your employer’s enterprise license, or if would like to learn how to access the full research, click here.

Highlights from the AI & GenAI Technology Market Landscape

Key Trend: Multimodal models

Most foundation model vendors have made multimodal models a strategic priority as they look to expand the number of GenAI use cases
Different from traditional LLMs, multimodal AI can process and interpret several types of data inputs at the same time, such as text, images and sounds. This versatility makes multimodal models critical for expanding the viable use cases for generative AI, specifically to support the creation of marketing content. According to TBR’s 2H23 Cloud Applications Customer Research, this is currently a top five use case for GenAI.

Cloud service providers (CSPs) and foundation model vendors alike have made efforts to internally develop multimodal models or form collaborations to harness GenAI’s data interpretation capabilities. Meta launched its open-source multimodal AI model, ImageBind, which can process text, audio, visual, movement, thermal and depth data simultaneously. When connected to sensors, ImageBind can perceive the surrounding environment’s 3D shape, temperature and sound. OpenAI’s newest evolution of its ChatGPT series, GPT-4V, allows users to input text, voice and images into user prompts to create content. Google’s Gemini is also a multimodal foundation model that identifies and generates text, images, video, code and audio. Gemini is known for its ability to perform massive multitask language understanding.

TBR believes CSPs and foundation model vendors will drive innovation of multimodal models to improve data interpretation and insights across all business segments.