-
Notifications
You must be signed in to change notification settings - Fork 461
BUDA job submission
This document describes how to run Docker jobs an a BOINC project that supports BUDA. It assumes that you have job-submission access on the project.
We call BUDA applications 'science apps'.
Each science app has a name, like 'worker' or 'autodock'.
A science app can have 'variants' that
use different types of computer hardware.
The name of a variant is cpu
if it uses a single CPU.
Otherwise it's the name of a plan class.
There might be variants for 1 CPU, for N CPUs, and for various GPU types.
The BUDA tools use the user file sandbox
for uploading files to the BOINC server.
To access it, go to Computing / File sandbox
in the project menu bar.
In the menu bar of the BOINC project's web site,
select Computing / Job Submission
.
Then click on BUDA
This shows a list of existing science apps and their variants.
You can
- add or delete a science app;
- add or delete a variant;
- submit jobs to a variant.
The form for adding a variant includes
- A plan class name (leave it blank for 1-CPU variants)
- Select (from your file sandbox) a set of 'app files'. This includes:
- a Dockerfile
- a main program to run in the container
- other app files if needed
- a list of input files names
- a list of output files names
The Dockerfile should specify a directory app/
.
For example:
FROM debian
WORKDIR /app
CMD ./main_2.sh
This specifies an image based on the latest Debian
(from Docker Hub)
and a main program (in this case a shell script) main_2.sh
.
This file (and the executables it presumably runs)
are included in the set of app files.
If you need to pass command-line arguments into the container, the Dockerfile should contain e.g.
ENV ARGS ""
CMD ./main2.sh ${ARGS}
The arguments are available in the container
in the environment variable ARGS
.
You can access them from a C program using getenv()
,
or in a shell script as $ARGS
.
You submit BUDA jobs in 'batches': a single job, or hundreds, or thousands. The jobs in a batch have different input files, different command-line arguments, or both.
A batch of jobs is described by a 'batch file':
a zip file containing one directory per job.
Each directory contains the input files for that job,
and an optional file cmdline
containing command-line argments
to be passed to the main program.
If there are input files shared among all jobs,
these can be put in a directory shared_input_files
.
This can save disk space in some cases.
For example, suppose that the app takes input files
file1
, file2
, and file3
,
and that all jobs use the same file1
but
different file2
and file3
.
In this case the batch file could contain
shared_input_files/
file1
jobname1/
[cmdline]
file2
file3
...
jobname2/
[cmdline]
file2
file3
...
...
The file names in the shared directory and in each job directory must match the variant's list of input file names. Each job must have all input files.
Assuming you've created a BUDA app and variant, you can submit jobs to it as follows:
-
Prepare a batch file and upload it to your file sandbox.
-
In the
Computing
menu on the project web site, selectJob submission
. -
In the Job submission page, click BUDA.
-
Click
Submit
next to the variant you want. -
Select the batch file from the file list.
-
You can specify command-line arguments to be passed to all jobs in the batch (before those specified in the batch file).
-
You can enable debugging output. This will include all Docker commands and their output in the stderr output of each job. This is handy for debugging problems with your Dockerfile.
After submitting a batch of jobs, you're taken to a web page for the batch. This shows, among other things, how many of the jobs have completed. Reload it to update this information.
You can click on a job to see its status (and if it failed, the stderr output). You can view or download its input files.
On the batch page, you can click to download a zip file of the output files of all completed jobs. These filenames have the form
batch_<batchid>__job_<jobname>__file_<filename>
where
-
batchid
is the (integer) ID of the batch; -
jobname
is the job name (the directory name from the batch file); -
filename
is the name of the output file as written by the app.
When you're done with the batch, you can 'retire' it. This removes its intput and output files from the server.