HTTP Serving
HTTP Servers are automatically deployed for predictive models. Predictive models deployed to Amazon SageMaker AI managed inference must support HTTP requests on \ping and \invocations on port 8080. The models themselves lack this capability, and most LLM serving frameworks suport this by default. For predictive models, this capability has to be built manually. To date, the following HTTP servers are supported:
| Web Server | Description |
|---|---|
| Flask | Lightweight Python web framework |
| FastAPI | Modern, fast API framework with automatic docs |
Flask¶
Under Construction
This section of the documentation is under construction.
FastAPI¶
Under Construction
This section of the documentation is under construction.