simplify slurm and qsub

| categories: orgmode, qsub, slurm | View Comments

slurm and qsub (link anyone?) are beautiful cluster schedulers. If you work on a cluster, you probably use one. I use both, as well as some old computers which don't have schedulers. I manage my runs from an orgmode "notebook", with a table that tells my scripts which resource uses which scheduler.

The usual way to use slurm and qsub is by submitting a little shell script which tells all the nodes how to divide their tasks, what are the important environment variables, which command are we running, etc. If you work on clusters you probably have a zillion copies of these little scripts.

/FIRST,/ most of the information is identical, so why not create a template at the home directory ? Instead of the absolute path of the current run, insert %s, instead of the number of mpi threads insert %d … you get the idea. I call my template .slurm_cmds .

Now, we need to automatically create templates by replacing all those %x by our real information, and submit to the queue:

 1: #!/usr/bin/perl -w
 2: # purpose : insert a job to the slurm queue
 3: # syntax : slurm_run.pl number_of_processes cmd
 4: # number_of_processes= the number of cores that are expected to be used by
 5: # the job. this is not verified - so consistency with the compilation under
 6: # MPI is just assumed and is the responsibility of the user. 
 7: # cmd = the executable (usually binary) you wish to include in the queue 
 8: # the file .slurm_cmds is expected to be found on the $HOME directory.
 9: # this file is a template batch file with all the needed exports and a srun
10: # call. slurm_run.pl just reads the template, replaces the necessary info to
11: # the right places, and sends the new formed batch file to the queue.
12: #
13: # depends on : (1) the perl Env and Cwd libraries ,
14: # (2) the $HOME/.slurm_cmds template
15: #
16: # Copyright 2012 Avi Gozolchiani (http://tiny.cc/avigoz)
17: # This program is free software: you can redistribute it and/or modify
18: # it under the terms of the GNU General Public License as published by
19: # the Free Software Foundation, either version 3 of the License, or
20: # (at your option) any later version.
21: #
22: # This program is distributed in the hope that it will be useful,
23: # but WITHOUT ANY WARRANTY; without even the implied warranty of
24: # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
25: # GNU General Public License for more details.
26: #
27: # You should have received a copy of the GNU General Public License
28: # along with this program.  If not, see <http://www.gnu.org/licenses/>.
29: 
30: # $Log$
31: use Env;
32: use Cwd;
33: $currWorkDir = &Cwd::cwd();
34: # parse cmd line
35: $n_proc=shift //die "syntax error : slurm_run number_of_processes cmd\n";
36: $cmd=shift //die "syntax error : slurm_run number_of_processes cmd\n";
37: # define file names (both source and target)
38: $slurm_template="$HOME/.slurm_cmds";
39: $batch_name="run-mit.batch_$1";
40: # open the files
41: open SLURMTEMP, $slurm_template or die "couldn't find the template file\n";
42: open BATCH,">$batch_name" or die "couldn't write a temporary batch file\n";
43: # copy each line from the source template to the target, with
44: # the necessary changes
45: while(<SLURMTEMP>){
46:     last if length($_)==0;
47:     if(/cd/){
48:         printf BATCH $_,$currWorkDir;
49:     }elsif(/srun/){ # if(/cd/){
50:         printf BATCH $_ , $n_proc, $cmd;
51:     }elsif(/SBATCH/){ # if(/cd/){ ... }elsif(/srun/){
52:         printf BATCH $_, $n_proc;
53:     }else{   # if(/cd/){... }elsif(/srun/){...}elsif(/SBATCH/){
54:         print BATCH $_;
55:     }        # if(/cd/){... }elsif(/srun/){...}elsif(/SBATCH/){..}else{
56: }                               # while(<SLURMTEMP>){
57: close BATCH;
58: # send to queue
59: print `sbatch -x n03 ./$batch_name`;

The last line submits my fresh batch file to the slurm queue. I can monitor it's processing via :

1: squeue  -o '%.7i %.9P %.50j %.8u %.2t %.10M %.5D %.6C %R'

the "%.50j" is important, since we want to know the full job names.

The "-x n03" part in slurm_run.pl was added since our system admin asked me to not use node 03. Is there a better way to consistently do it?

Copyright (C) 2015 by Avi Gozolchiani. See the License for information about copying.

org-mode source

Read and Post Comments