HTTP Serving

HTTP Servers are automatically deployed for predictive models. Predictive models deployed to Amazon SageMaker AI managed inference must support HTTP requests on \ping and \invocations on port 8080. The models themselves lack this capability, and most LLM serving frameworks suport this by default. For predictive models, this capability has to be built manually. To date, the following HTTP servers are supported:

Web Server	Description
Flask	Lightweight Python web framework
FastAPI	Modern, fast API framework with automatic docs

Flask¶

Under Construction

This section of the documentation is under construction.

FastAPI¶

Under Construction

This section of the documentation is under construction.