Neuron Inference on EKS

📄️ Llama-3-8B with vLLM on Inferentia2

Serving Meta-Llama-3-8B-Instruct model on AWS Inferentia2 using Ray and vLLM for optimized inference performance.

📄️ Mistral-7B on Inferentia2

Deployment of ML models on EKS requires access to GPUs or Neuron instances. If your deployment isn't working, it’s often due to missing access to these resources. Also, some deployment patterns rely on Karpenter autoscaling and static node groups; if nodes aren't initializing, check the logs for Karpenter or Node groups to resolve the issue.

📄️ Llama-3-8B on Inferentia2

Serve Llama-3 models on AWS Inferentia accelerators for efficient inference.

📄️ Llama-2 on Inferentia2

Serve Llama-2 models on AWS Inferentia accelerators for efficient inference.

📄️ Stable Diffusion on Inferentia2

Deployment of ML models on EKS requires access to GPUs or Neuron instances. If your deployment isn't working, it’s often due to missing access to these resources. Also, some deployment patterns rely on Karpenter autoscaling and static node groups; if nodes aren't initializing, check the logs for Karpenter or Node groups to resolve the issue.

📄️ Ray Serve High Availability

Deployment of ML models on EKS requires access to GPUs or Neuron instances. If your deployment isn't working, it’s often due to missing access to these resources. Also, some deployment patterns rely on Karpenter autoscaling and static node groups; if nodes aren't initializing, check the logs for Karpenter or Node groups to resolve the issue.