simplify slurm and qsub
December 28, 2014 at 06:48 PM | categories: orgmode, qsub, slurm | View Comments
slurm and qsub (link anyone?) are beautiful cluster schedulers. If you work on a cluster, you probably use one. I use both, as well as some old computers which don't have schedulers. I manage my runs from an orgmode "notebook", with a table that tells my scripts which resource uses which scheduler.
The usual way to use slurm and qsub is by submitting a little shell script which tells all the nodes how to divide their tasks, what are the important environment variables, which command are we running, etc. If you work on clusters you probably have a zillion copies of these little scripts.
/FIRST,/ most of the information is identical, so why not create a template at the home directory ? Instead of the absolute path of the current run, insert %s, instead of the number of mpi threads insert %d … you get the idea. I call my template .slurm_cmds .
Now, we need to automatically create templates by replacing all those %x by our real information, and submit to the queue:
1: #!/usr/bin/perl -w 2: # purpose : insert a job to the slurm queue 3: # syntax : slurm_run.pl number_of_processes cmd 4: # number_of_processes= the number of cores that are expected to be used by 5: # the job. this is not verified - so consistency with the compilation under 6: # MPI is just assumed and is the responsibility of the user. 7: # cmd = the executable (usually binary) you wish to include in the queue 8: # the file .slurm_cmds is expected to be found on the $HOME directory. 9: # this file is a template batch file with all the needed exports and a srun 10: # call. slurm_run.pl just reads the template, replaces the necessary info to 11: # the right places, and sends the new formed batch file to the queue. 12: # 13: # depends on : (1) the perl Env and Cwd libraries , 14: # (2) the $HOME/.slurm_cmds template 15: # 16: # Copyright 2012 Avi Gozolchiani (http://tiny.cc/avigoz) 17: # This program is free software: you can redistribute it and/or modify 18: # it under the terms of the GNU General Public License as published by 19: # the Free Software Foundation, either version 3 of the License, or 20: # (at your option) any later version. 21: # 22: # This program is distributed in the hope that it will be useful, 23: # but WITHOUT ANY WARRANTY; without even the implied warranty of 24: # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 25: # GNU General Public License for more details. 26: # 27: # You should have received a copy of the GNU General Public License 28: # along with this program. If not, see <http://www.gnu.org/licenses/>. 29: 30: # $Log$ 31: use Env; 32: use Cwd; 33: $currWorkDir = &Cwd::cwd(); 34: # parse cmd line 35: $n_proc=shift //die "syntax error : slurm_run number_of_processes cmd\n"; 36: $cmd=shift //die "syntax error : slurm_run number_of_processes cmd\n"; 37: # define file names (both source and target) 38: $slurm_template="$HOME/.slurm_cmds"; 39: $batch_name="run-mit.batch_$1"; 40: # open the files 41: open SLURMTEMP, $slurm_template or die "couldn't find the template file\n"; 42: open BATCH,">$batch_name" or die "couldn't write a temporary batch file\n"; 43: # copy each line from the source template to the target, with 44: # the necessary changes 45: while(<SLURMTEMP>){ 46: last if length($_)==0; 47: if(/cd/){ 48: printf BATCH $_,$currWorkDir; 49: }elsif(/srun/){ # if(/cd/){ 50: printf BATCH $_ , $n_proc, $cmd; 51: }elsif(/SBATCH/){ # if(/cd/){ ... }elsif(/srun/){ 52: printf BATCH $_, $n_proc; 53: }else{ # if(/cd/){... }elsif(/srun/){...}elsif(/SBATCH/){ 54: print BATCH $_; 55: } # if(/cd/){... }elsif(/srun/){...}elsif(/SBATCH/){..}else{ 56: } # while(<SLURMTEMP>){ 57: close BATCH; 58: # send to queue 59: print `sbatch -x n03 ./$batch_name`;
The last line submits my fresh batch file to the slurm queue. I can monitor it's processing via :
1: squeue -o '%.7i %.9P %.50j %.8u %.2t %.10M %.5D %.6C %R'
the "%.50j" is important, since we want to know the full job names.
The "-x n03" part in slurm_run.pl was added since our system admin asked me to not use node 03. Is there a better way to consistently do it?
Copyright (C) 2015 by Avi Gozolchiani. See the License for information about copying.