Skip to main content

AIBrix on EKS

warning

Deployment of ML models on EKS requires access to GPUs or Neuron instances. If your deployment isn't working, it’s often due to missing access to these resources. Also, some deployment patterns rely on Karpenter autoscaling and static node groups; if nodes aren't initializing, check the logs for Karpenter or Node groups to resolve the issue.

info

These instructions only deploy the AIBrix cluster as a base. If you are looking to deploy specific models for inference or training, please refer to this AI page for end-to-end instructions.

What is AIBrix?

AIBrix is an open source initiative designed to provide essential building blocks to construct scalable GenAI inference infrastructure. AIBrix delivers a cloud-native solution optimized for deploying, managing, and scaling large language model (LLM) inference, tailored specifically to enterprise needs. Alt text

Key Features and Benefits

  • LLM Gateway and Routing: Efficiently manage and direct traffic across multiple models and replicas.
  • High-Density LoRA Management: Streamlined support for lightweight, low-rank adaptations of models.
  • Distributed Inference: Scalable architecture to handle large workloads across multiple nodes.
  • LLM App-Tailored Autoscaler: Dynamically scale inference resources based on real-time demand.
  • Unified AI Runtime: A versatile sidecar enabling metric standardization, model downloading, and management.
  • Heterogeneous-GPU Inference: Cost-effective SLO-driven LLM inference using heterogeneous GPUs.
  • GPU Hardware Failure Detection: Proactive detection of GPU hardware issues.

Deploying the Solution

👈

Verify Deployment

👈

Clean Up

👈