Skip to the content.

Inference API

Inference API is listening on port 8080 and only accessible from localhost by default. To change the default setting, see MMS Configuration.

There are three type of APIs:

  1. API description - Describe MMS inference APIs with OpenAPI 3.0 specification
  2. Health check API - Check MMS health status
  3. Predictions API - Make predictions API call to MMS

API Description

To view a full list of inference API, you can use following command:

curl -X OPTIONS http://localhost:8443

The out is OpenAPI 3.0.1 json format. You can use it to generate client code, see swagger codegen for detail.

Health check API

MMS support a ping API that user can check MMS health status:

curl http://localhost:8080/ping

Your response, if the server is running should be:

{
  "health": "healthy!"
}

Predictions API

MMS 1.0 support 0.4 style API calls, those APIs are deprecated, they will be removed in future release. See Deprecated APIs for detail.

For each loaded model, user can make REST call to URI: /predictions/{model_name}

curl Example

curl -O https://s3.amazonaws.com/model-server/inputs/kitten.jpg

curl -X POST http://localhost:8080/predictions/resnet-18 -T kitten.jpg

or:

curl -X POST http://localhost:8080/predictions/resnet-18 -F "data=@kitten.jpg"

The result was some JSON that told us our image likely held a tabby cat. The highest prediction was:

{
    "class": "n02123045 tabby, tabby cat",
    "probability": 0.42514491081237793,
    ...
}

Deprecated API

MMS 0.4 style predict API is kept for backward compatible purpose, and will be removed in future release.

curl Example

curl -O https://s3.amazonaws.com/model-server/inputs/kitten.jpg

curl -X POST http://localhost:8080/resnet-18/predict -F "data=@kitten.jpg"