Red Hat OpenShift AI
Integrations
- InstructLab (LAB)
- vLLM
- Model Context Protocol (MCP)
- Llama Stack
- Docling
- NVIDIA NIXL
- Tekton Pipelines
Pricing Details
- Standard Red Hat subscription model enhanced with 'AI Units' for flexible compute and MaaS consumption tracking.
Features
- llm-d Distributed Inference Engine
- InstructLab Taxonomy-based Model Customization
- Model Context Protocol (MCP) native integration
- Model-as-a-Service (MaaS) with AI Gateway
- TrustyAI Governance and Bias Monitoring
Description
Red Hat OpenShift AI 3.0: Agentic & Distributed Architecture
As of January 2026, RHOAI has transitioned to version 3.0, focusing on distributed inference intelligence and autonomous agent support. The platform integrates llm-d, a high-performance engine designed to optimize LLM serving on Kubernetes via the Gateway API Inference Extension 📑.
Orchestration & Model Alignment Layer
The core stack now features an integrated InstructLab toolkit, enabling taxonomy-based model customization without catastrophic forgetting 📑.
- llm-d Distributed Serving: Utilizes NVIDIA NIXL and DeepEP for low-latency Mixture-of-Experts (MoE) communication, allowing seamless scaling of large models across multi-node GPU clusters 📑.
- Agentic Infrastructure: Native support for the Model Context Protocol (MCP) and Llama Stack APIs, facilitating the creation of AI agents that can interact with enterprise data via standardized connectors 📑.
- Hardware Profiles: Replaces legacy accelerator profiles, providing granular control over NPU, GPU (H200/B200), and IBM Z/Power resource allocation 📑.
⠠⠉⠗⠑⠁⠞⠑⠙⠀⠃⠽⠀⠠⠁⠊⠞⠕⠉⠕⠗⠑⠲⠉⠕⠍
Data Ingestion & TrustyAI Governance
RHOAI 3.0 incorporates the Docling project for advanced unstructured data ingestion, converting complex documents into AI-ready formats for RAG and fine-tuning 📑.
- TrustyAI Bias Monitoring: Automated detection of model drift and bias is now GA, utilizing the Kubernetes-native TrustyAI operator for real-time inference auditing 📑.
- Models-as-a-Service (MaaS): Centralized model catalog and AI Gateway provide secure, metered access to internal models with built-in quota management 📑.
Evaluation Guidance
Technical architects should prioritize the llm-d engine for all MoE model deployments to ensure optimal TCO. Organizations migrating from 2.x must update their pipeline definitions to the 3.0 spec to utilize Hardware Profiles. Verify that your vector store integration leverages the new Llama Stack compatibility layer for future-proof agent orchestration 📑.
Tool Pros and Cons
Pros
- Streamlined AI/ML lifecycle
- Scalable Kubernetes platform
- Enhanced team collaboration
- Automated MLOps
- Simplified deployment
Cons
- Kubernetes expertise needed
- Learning curve
- OpenShift costs