AIBrix

AIBrix is an open source initiative designed to provide essential building blocks to construct scalable GenAI inference infrastructure. AIBrix delivers a cloud-native solution optimized for deploying, managing, and scaling large language model (LLM) inference, tailored specifically to enterprise needs. Alt text

Features

LLM Gateway and Routing: Efficiently manage and direct traffic across multiple models and replicas.
High-Density LoRA Management: Streamlined support for lightweight, low-rank adaptations of models.
Distributed Inference: Scalable architecture to handle large workloads across multiple nodes.
LLM App-Tailored Autoscaler: Dynamically scale inference resources based on real-time demand.
Unified AI Runtime: A versatile sidecar enabling metric standardization, model downloading, and management.
Heterogeneous-GPU Inference: Cost-effective SLO-driven LLM inference using heterogeneous GPUs.
GPU Hardware Failure Detection: Proactive detection of GPU hardware issues.

Deploying the Solution

👈

Checking AIBrix Installation

Please run the below commands to check the AIBrix installation

kubectl get pods -n aibrix-system

Wait till all the pods are in Running status.

Running a model on AiBrix system

We will now run Deepseek-Distill-llama-8b model using AIBrix on EKS.

Please run the below command.

kubectl apply -f blueprints/inference/aibrix/deepseek-distill.yaml

This will deploy the model on deepseek-aibrix namespace. Wait for few minutes and run

kubectl get pods -n deepseek-aibrix

Wait for the pod to be in running state.

Accessing the model using gateway

Gateway is designed to serve LLM requests and provides features such as dynamic model & LoRA adapter discovery, user configuration for request count & token usage budgeting, streaming and advanced routing strategies such as prefix-cache aware, heterogeneous GPU hardware. To access the model using Gateway, Please run the below command

kubectl -n envoy-gateway-system port-forward service/envoy-aibrix-system-aibrix-eg-903790dc 8888:80 &

Once the port-forward is running, you can test the model by sending a request to the Gateway.

ENDPOINT="localhost:8888"
curl -v http://${ENDPOINT}/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "deepseek-r1-distill-llama-8b",
        "prompt": "San Francisco is a",
        "max_tokens": 128,
        "temperature": 0
    }'

Cleanup

👈

caution

To avoid unwanted charges to your AWS account, delete all the AWS resources created during this deployment

Features​

Deploying the Solution

Checking AIBrix Installation​

Running a model on AiBrix system​

Accessing the model using gateway​

Cleanup

Features

Checking AIBrix Installation

Running a model on AiBrix system

Accessing the model using gateway