This repository demonstrates how LLMs can be run on JASMIN's Orchid GPU cluster using Ollama served using Singularity.
- JASMIN uses Singularity v3.7 so it is recommended that you install the same version locally to build and deploy the runner locally (see documentation).
- To run on JASMIN you will need to sign up and request access to the
jasmin-login
service to be able to access the login servers and the scientific computing VMs. This will also give you access to the LOTUS batch computing cluster, but to access the Orchid (GPU) cluster you will also need to request access to theorchid
service.
To get started, first build the runner locally:
./build-runner.sh
This will create a .sif
file, which is an image file that Singularity can run as a container.
On some Linux systems, building with the --fakeroot option may not be possible. To get around this problem you can either try configuring the user namespace or remove the
--fakeroot
option frombuild-runner.sh
and run withsudo
.
Once the runner is created you should test locally to ensure everything works correctly. run.sbatch
contains slurm directives so that it can be run on Orchid but it can also be run locally as a script. The runner will query the LLM model with the prompts defined in input.txt
. The file is a list of inputs for the LLM that can be changed with whatever prompts you would like to give.
Note: These prompts are individual, they do not constitute a chat history i.e. the LLM will not be aware of previous prompts when it responds to the subsequent.
To run:
./run.sbatch
Assuming everything worked correctly, this should produce output.txt
, which contains the output of all the prompts submitted to the LLM in the format:
Query: What is the capital of Sri Lanka?
Response: The capital of Sri Lanka is Colombo, and it's also the largest city in the country by population.
Query:
...
You can configure the parameters for your run by editing params.sh
. The input and output files are defined using the INPUTFILE
and OUTPUT_FILE
respectively. You can also change the LLM to use by modifying MODEL
. The model you select must be available on Ollama, e.g. to use the llama3.2 model, modify the line in params.sh
to:
MODEL=llama3.2
Note: Larger models can take some time to download and will be very slow to run if you GPU does not have enough VRAM. It is best to test locally with a smaller model (e.g. tinyllama, llama3.2) and then use larger models when running on Orchid.
If you've not used JASMIN before, it is best to familiarise yourself via the getting started steps in the documentation. The following instructions assume you have setup an SSH key for secure access.
To run on JASMIN's Orchid cluster, first bundle up the local files and transfer them to your JASMIN workspace via the login server. A convenience script to tarball only the necessary files is provided tarball.sh
(where $JASMIN_USER
is your JASMIN username).
./tarball.sh
scp orchid-ollama-testbed.tar.gz [email protected]:/home/users/$JASMIN_USER/
Now you can login to JASMIN and access one of the sci VMs, e.g. to access sci-vm-03
via login-02
:
ssh -A [email protected]
ssh sci-vm-03.jasmin.ac.uk
Next extract the bundled files:
tar -xf orchid-ollama-testbed.tar.gz
Finally, submit the slurm job:
cd orchid-ollama-testbed
sbatch run.sbatch
By default, the standard error and output should appear in files $JOB_NUMER.err
and $JOB_NUMBER.out
.
The status of your jobs, submitted on JASMIN, can be checked via:
squeue -u $JASMIN_USER
Typical status codes for the job include:
PD
=pendingR
=runningCD
=completedF
=failed
Once the job is complete, check to ensure the query responses have been successfully sent to output.txt
.