Skip to main content

AIBrix

AIBrix is an open source initiative designed to provide essential building blocks to construct scalable GenAI inference infrastructure. AIBrix delivers a cloud-native solution optimized for deploying, managing, and scaling large language model (LLM) inference, tailored specifically to enterprise needs. Alt text

Features

  • LLM Gateway and Routing: Efficiently manage and direct traffic across multiple models and replicas.
  • High-Density LoRA Management: Streamlined support for lightweight, low-rank adaptations of models.
  • Distributed Inference: Scalable architecture to handle large workloads across multiple nodes.
  • LLM App-Tailored Autoscaler: Dynamically scale inference resources based on real-time demand.
  • Unified AI Runtime: A versatile sidecar enabling metric standardization, model downloading, and management.
  • Heterogeneous-GPU Inference: Cost-effective SLO-driven LLM inference using heterogeneous GPUs.
  • GPU Hardware Failure Detection: Proactive detection of GPU hardware issues.

Deploying the Solution

👈

Checking AIBrix Installation

Please run the below commands to check the AIBrix installation

kubectl get pods -n aibrix-system

Wait till all the pods are in Running status.

Running a model on AiBrix system

We will now run Deepseek-Distill-llama-8b model using AIBrix on EKS.

Please run the below command.

kubectl apply -f blueprints/inference/aibrix/deepseek-distill.yaml

This will deploy the model on deepseek-aibrix namespace. Wait for few minutes and run

kubectl get pods -n deepseek-aibrix

Wait for the pod to be in running state.

Accessing the model using gateway

Gateway is designed to serve LLM requests and provides features such as dynamic model & LoRA adapter discovery, user configuration for request count & token usage budgeting, streaming and advanced routing strategies such as prefix-cache aware, heterogeneous GPU hardware. To access the model using Gateway, Please run the below command

kubectl -n envoy-gateway-system port-forward service/envoy-aibrix-system-aibrix-eg-903790dc 8888:80 &

Once the port-forward is running, you can test the model by sending a request to the Gateway.

ENDPOINT="localhost:8888"
curl -v http://${ENDPOINT}/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1-distill-llama-8b",
"prompt": "San Francisco is a",
"max_tokens": 128,
"temperature": 0
}'

Cleanup

👈
caution

To avoid unwanted charges to your AWS account, delete all the AWS resources created during this deployment