Our lab not too long ago received awarded two GPUs on a neighborhood college x’s cluster and it fell on me to submit jobs to it, I’ve figured the logistics however am at a loss on how to consider the assets to request the scheduler. ( we use PBSpro)
To be exact, one of many header strains that the job-scheduler instance script incorporates is thus:
#PBS -l walltime=72:00:00,choose=1:ncpus=12:ompthreads=12:ngpus=2:mem=186gb
I do know that
walltime is only a tough estimate of how lengthy the job ought to take + redundancy. The remaining is cryptic to me. I’ve a really superficial/naive thought of how even a single processing unit works and that random entry reminiscence is one thing like a velocity bottleneck for a few of these actually hungry processes like sparse-matmultiply. That is concerning the extent of what i find out about what reminiscence(and that i assume they imply RAM right here too) is within the context of a single processor.
The query is then learn how to reconcile the variety of CPU’s with the two GPUs we have been granted and learn how to estimate how a lot reminiscence to request? Does the reminiscence belong to CPUs and GPUs simply do the work (is it pooled from particular person chips?)
My understanding is that anybody of the PI’s with the entry to the cluster can request MANY nodes (
choose=n) and MANY CPUs (
ncpus), however we additionally received the privelige to throw some GPUs on prime. I might admire a ton if anyone may both elucidate the interaction there… Or level to a not-too-technical a useful resource to get began.
Thanks rather a lot!