Job Manager

The worker module.

class lorien.tune.job_manager.AWSBatchJobManager(target: str, jobs: List[lorien.tune.job.Job], configs: argparse.Namespace)

AWS batch job manager class.

desc() str

Return a description shown at the beginning of progress bar while tuning.

relaunch_hanging_jobs(batch_client, jobs_desc: List[Dict[str, Any]])

Timed monitoring check for halted/failed jobs and relaunch them

Parameters
  • batch_client (botocore.client.Batch) -- boto3 client for AWS batch. Note that we do not annotate boto3 types because it requires an additional package.

  • jobs_desc (List[Dict[str, Any]]) -- Job descriptions that extracted from each job.

resume_job_states()

Check if the tuning jobs have correct metadata.

tune_impl(progress)

Tune workloads with AWS batch.

Parameters

progress (tqdm) -- The formulated progress bar to be updated progressively.

class lorien.tune.job_manager.JobManagerBase(target: str, jobs: List[lorien.tune.job.Job], configs: argparse.Namespace)

The base class of job manager.

abstract desc() str

Return a description shown at the beginning of progress bar while tuning.

num_jobs() int

Return the total number of jobs to be tuned by this manager.

replay_trace()

Update the current state from the trace file.

abstract resume_job_states()

Resume the jobs that were being tuned when the state was dumped.

tune() List[lorien.tune.result.TuneResult]

Tune workloads on the servers via RPC.

Returns

tune_results -- The result can be either the absolute performance, the speedup over the last performance, or the error message.

Return type

List[TuneResult]

abstract tune_impl(progress: tqdm.std.tqdm)

Workload tuning implementation. The tuning results are directly stored in self.job_n_results.

Parameters

progress (tqdm) -- The formulated progress bar to be updated progressively.

class lorien.tune.job_manager.LocalJobManager(target: str, jobs: List[lorien.tune.job.Job], configs: argparse.Namespace)

Local job manager class.

desc() str

Return a description shown at the beginning of progress bar while tuning.

resume_job_states()

Resume the jobs that were being tuned when the state was dumped.

tune_impl(progress)

Tune workloads with locally.

Note

Local tuner will not update the progress bar in order to keep the console concise.

Parameters

progress (tqdm) -- The formulated progress bar to be updated progressively.

class lorien.tune.job_manager.RPCJobManager(target: str, jobs: List[lorien.tune.job.Job], configs: argparse.Namespace)

RPC job manager class.

desc() str

Return a description shown at the beginning of progress bar while tuning.

resume_job_states()

Resume the jobs that were being tuned when the state was dumped.

tune_impl(progress)

Tune workloads with RPC hosts.

Parameters

progress (tqdm) -- The formulated progress bar to be updated progressively.

lorien.tune.job_manager.register_job_manager(name: str, format_help: str) Callable

Register job manager.

Parameters
  • name (str) -- The job manager name.

  • format_help (str) -- The job manager specific config format description.

Returns

reg -- A callable function for registration.

Return type

Callable