Skip to content

HTTP Serving

HTTP Servers are automatically deployed for predictive models. Predictive models deployed to Amazon SageMaker AI managed inference must support HTTP requests on \ping and \invocations on port 8080. The models themselves lack this capability, and most LLM serving frameworks suport this by default. For predictive models, this capability has to be built manually. To date, the following HTTP servers are supported:

Web Server Description
Flask Lightweight Python web framework
FastAPI Modern, fast API framework with automatic docs

Flask

Under Construction

This section of the documentation is under construction.

FastAPI

Under Construction

This section of the documentation is under construction.