Skip to the content.

Multi Model Server Benchmarking

The benchmarks measure the performance of MMS on various models and benchmarks. It supports either a number of built-in models or a custom model passed in as a path or URL to the .model file. It also runs various benchmarks using these models (see benchmarks section below). The benchmarks are run through a python3 script on the user machine through jmeter. MMS is run on the same machine in a docker instance to avoid network latencies. The benchmark must be run from within the context of the full MMS repo because it executes the local code as the version of MMS (and it is recompiled between runs) for ease of development.

Installation

Ubuntu

The script is mainly intended to run on a Ubuntu EC2 instance. For this reason, we have provided an install_dependencies.sh script to install everything needed to execute the benchmark on this environment. All you need to do is run this file and clone the MMS repo.

MacOS

For mac, you should have python3 and java installed. If you wish to run the default benchmarks featuring a docker-based instance of MMS, you will need to install docker as well. Finally, you will need to install jmeter with plugins which can be accomplished by running mac_install_dependencies.sh.

Other

For other environments, manual installation is necessary. The list of dependencies to be installed can be found below or by reading the ubuntu installation script.

The benchmarking script requires the following to run:

Models

The pre-loaded models for the benchmark can be mostly found in the MMS model zoo. We currently support the following:

Benchmarks

We support several basic benchmarks:

We also support compound benchmarks:

Examples

Run basic latency test on default resnet-18 model
./benchmark.py latency

Run basic throughput test on default resnet-18 model.
./benchmark.py throughput

Run all benchmarks
./benchmark.py --all

Run using the noop-v1.0 model
./benchmark.py latency -m noop_v1.0

Run on GPU (4 gpus)
./benchmark.py latency -g 4

Run with a custom image
./benchmark.py latency -i {imageFilePath}

Run with a custom model (works only for CNN based models, which accept image as an input for now. We will add support for more input types in future to this command. )
./benchmark.py latency -c {modelUrl} -i {imageFilePath}

Run with custom options
./benchmark.py repeated_scale_calls --options scale_up_workers 100 scale_down_workers 10

Run against an already running instance of MMS
./benchmark.py latency --mms 127.0.0.1 (defaults to http, port 80, management port = port + 1)
./benchmark.py latency --mms 127.0.0.1:8080 --management-port 8081
./benchmark.py latency --mms https://127.0.0.1:8443

Run verbose with only a single loop
./benchmark.py latency -v -l 1

Benchmark options

The full list of options can be found by running with the -h or –help flags.

Profiling

Frontend

The benchmarks can be used in conjunction with standard profiling tools such as JProfiler to analyze the system performance. JProfiler can be downloaded from their website. Once downloaded, open up JProfiler and follow these steps:

  1. Run MMS directly through gradle (do not use docker). This can be done either on your machine or on a remote machine accessible through SSH.
  2. In JProfiler, select “Attach” from the ribbon and attach to the ModelServer. The process name in the attach window should be “com.amazonaws.ml.mms.ModelServer”. If it is on a remote machine, select “On another computer” in the attach window and enter the SSH details. For the session startup settings, you can leave it with the defaults. At this point, you should see live CPU and Memory Usage data on JProfiler’s Telemetries section.
  3. Select Start Recordings in JProfiler’s ribbon
  4. Run the Benchmark script targeting your running MMS instance. It might run something like ./benchmark.py throughput --mms https://127.0.0.1:8443. It can be run on either your local machine or a remote machine (if you are running remote), but we recommend running the benchmark on the same machine as the model server to avoid confounding network latencies.
  5. Once the benchmark script has finished running, select Stop Recordings in JProfiler’s ribbon

Once you have stopped recording, you should be able to analyze the data. One useful section to examine is CPU views > Call Tree and CPU views > Hot Spots to see where the processor time is going.

Backend

The benchmarks can also be used to analyze the backend performance using cProfile. It does not require any additional packages to run the benchmark, but viewing the logs does require an additional package. Run pip install snakeviz to install this. To run the python profiling, follow these steps:

  1. In the file mms/model_service_worker.py, set the constant BENCHMARK to true at the top to enable benchmarking.
  2. Run the benchmark and MMS. They can either be done automatically inside the docker container or separately with the “–mms” flag.
  3. Run MMS directly through gradle (do not use docker). This can be done either on your machine or on a remote machine accessible through SSH.
  4. Run the Benchmark script targeting your running MMS instance. It might run something like ./benchmark.py throughput --mms https://127.0.0.1:8443. It can be run on either your local machine or a remote machine (if you are running remote), but we recommend running the benchmark on the same machine as the model server to avoid confounding network latencies.
  5. Run snakeviz /tmp/mmsPythonProfile.prof to view the profiling data. It should start up a web server on your machine and automatically open the page.
  6. Don’t forget to set BENCHMARK = False in the model_service_worker.py file after you are finished.