Schedule AI Workloads at Scale

KAI Scheduler is a Kubernetes-native scheduler for AI workloads that optimises GPU allocation across the full AI lifecycle, from data processing through training to inference, while keeping resource shares fair across teams.

Get started View on GitHub

kai · cluster=prod-01 scheduling

$ kubectl get queues
NAME              PRIORITY   PARENT          CHILDREN
department-a      medium                     team-research-a,team-inference
department-b      medium                     team-training,team-research-b
team-inference    high       department-a
team-research-a   medium     department-a
team-training     medium     department-b
team-research-b   low        department-b

$ kubectl describe queue team-research-a
Name:         team-research-a
Spec:
  parentQueue: department-a
  resources:
    cpu: { quota: 64, limit: 128, overQuotaWeight: 1 }
    gpu: { quota: 8,  limit: 16,  overQuotaWeight: 1 }
Status:
  allocated:
    cpu: 52
    gpu: 14

$ kubectl get schedulingshards
NAME      AGE
default   12d

Efficient

Bin-packing and gang-scheduling pack GPUs tighter, so fewer fragmented nodes carry more workloads in flight.

Fair

Hierarchical queues with deserved-share and time-based fairness keep every team moving without throttling.

Scalable

Designed and continuously tested to manage large-scale GPU clusters with thousands of nodes and high-throughput workloads, with topology awareness and GPU sharing.

How it works

One scheduler. Every workload.

From quick interactive notebooks to multi-node distributed training, KAI keeps your GPU fleet busy without starving anyone.

Purpose-built for managing AI workloads on Kubernetes

Hierarchical Queues

Multi-level queue tree with quotas, limits, and over-quota borrowing across teams.

Gang Scheduling & Elastic Workloads

All-or-nothing placement for distributed training; min and max replicas that grow and shrink with available capacity, governed by fairness rules.

GPU Sharing

Time slicing, MPS, and MIG so inference and dev workloads stop hoarding whole devices.

Topology-Aware

Optimised placement with topology-aware scheduling, plus hierarchical topology-aware scheduling for hierarchical PodGroups.

Queue & Workload Priority

Per-queue priority classes plus per-workload priority. Critical jobs preempt cleanly, non-critical jobs back off.

Time-based Fairshare

Tracks historical GPU usage over a configurable window so over-quota resources are distributed fairly across time, not at a single moment.

Explore all features →

From the community

Talks & recordings

KubeCon EU 2026 · April 2026

GPU Reservations: Maximising Utilisation and Fairness Across Teams

How KAI manages GPU reservations to balance utilisation with fairness across teams.

KubeCon EU 2026 · April 2026

Lessons Learned Orchestrating Multi-Tenant GPUs on OpenShift AI with KAI

Real-world deployment patterns from production multi-tenant GPU clusters.

KubeCon NA 2025 · November 2025

Mind the Topology: Smarter Scheduling for AI Workloads on Kubernetes

How Topology-Aware Scheduling optimises placement for disaggregated serving architectures.

January 2026

Ensuring Balanced GPU Allocation with Time-Based Fairshare

How KAI tracks historical GPU usage to allocate shared resources fairly across teams over time, not just at the current moment.

October 2025

GPU Scheduling with KAI and vCluster

Running KAI inside virtual clusters provisioned by vCluster: isolated tenants, safe upgrades, and a sandbox for testing scheduler changes without touching the host cluster.

October 2025

Enable Gang Scheduling and Workload Prioritisation in Ray with KAI

Configuring KubeRay to schedule with KAI: gang scheduling for RayCluster, hierarchical queues with priority classes, and preemption between Ray workloads.

May 2025

Building an Elastic GPU Cluster with KAI and Luna Autoscaler

How Luna pairs with KAI to scale GPU capacity to actual queue demand, not to every pending pod the API server happens to see.

May 2025

Optimising GPU Usage in ZenML Pipelines with KAI

Wiring fractional GPU sharing into ZenML pipelines so multiple ML steps share a physical GPU on Kubernetes for tighter utilisation and lower cost.

April 2025

NVIDIA Open-Sources Run:ai Scheduler to Foster Community Collaboration

Run:ai scheduler is open-sourced under Apache 2.0. Walk through what it brings to Kubernetes clusters and the architecture that makes it work.

Join the Community

CNCF Slack GitHub Mailing List Community Calls

KAI Scheduler is a Cloud Native Computing Foundation sandbox project.

The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our Trademark Usage page.