Job Manager¶
The worker module.
- class lorien.tune.job_manager.AWSBatchJobManager(target: str, jobs: List[lorien.tune.job.Job], configs: argparse.Namespace)¶
AWS batch job manager class.
- desc() str ¶
Return a description shown at the beginning of progress bar while tuning.
- relaunch_hanging_jobs(batch_client, jobs_desc: List[Dict[str, Any]])¶
Timed monitoring check for halted/failed jobs and relaunch them
- Parameters
batch_client (botocore.client.Batch) -- boto3 client for AWS batch. Note that we do not annotate boto3 types because it requires an additional package.
jobs_desc (List[Dict[str, Any]]) -- Job descriptions that extracted from each job.
- resume_job_states()¶
Check if the tuning jobs have correct metadata.
- tune_impl(progress)¶
Tune workloads with AWS batch.
- Parameters
progress (tqdm) -- The formulated progress bar to be updated progressively.
- class lorien.tune.job_manager.JobManagerBase(target: str, jobs: List[lorien.tune.job.Job], configs: argparse.Namespace)¶
The base class of job manager.
- abstract desc() str ¶
Return a description shown at the beginning of progress bar while tuning.
- num_jobs() int ¶
Return the total number of jobs to be tuned by this manager.
- replay_trace()¶
Update the current state from the trace file.
- abstract resume_job_states()¶
Resume the jobs that were being tuned when the state was dumped.
- tune() List[lorien.tune.result.TuneResult] ¶
Tune workloads on the servers via RPC.
- Returns
tune_results -- The result can be either the absolute performance, the speedup over the last performance, or the error message.
- Return type
List[TuneResult]
- abstract tune_impl(progress: tqdm.std.tqdm)¶
Workload tuning implementation. The tuning results are directly stored in self.job_n_results.
- Parameters
progress (tqdm) -- The formulated progress bar to be updated progressively.
- class lorien.tune.job_manager.LocalJobManager(target: str, jobs: List[lorien.tune.job.Job], configs: argparse.Namespace)¶
Local job manager class.
- desc() str ¶
Return a description shown at the beginning of progress bar while tuning.
- resume_job_states()¶
Resume the jobs that were being tuned when the state was dumped.
- tune_impl(progress)¶
Tune workloads with locally.
Note
Local tuner will not update the progress bar in order to keep the console concise.
- Parameters
progress (tqdm) -- The formulated progress bar to be updated progressively.
- class lorien.tune.job_manager.RPCJobManager(target: str, jobs: List[lorien.tune.job.Job], configs: argparse.Namespace)¶
RPC job manager class.
- desc() str ¶
Return a description shown at the beginning of progress bar while tuning.
- resume_job_states()¶
Resume the jobs that were being tuned when the state was dumped.
- tune_impl(progress)¶
Tune workloads with RPC hosts.
- Parameters
progress (tqdm) -- The formulated progress bar to be updated progressively.
- lorien.tune.job_manager.register_job_manager(name: str, format_help: str) Callable ¶
Register job manager.
- Parameters
name (str) -- The job manager name.
format_help (str) -- The job manager specific config format description.
- Returns
reg -- A callable function for registration.
- Return type
Callable