Using a parallel pool
Matlab can be parallelized using
parpool. For many users, this can be as simple as replacing
for loops with a
parfor loop, as long as the problem is pleasingly parallel and the individual iterations of the loop do not depend on the outcome of the others.
Best practices for
parfor on Blue Crab
We recommend that users who require multiple, simultaneous parallel pools follow these steps when using
parfor in order to avoid name collisions in the location where Matlab writes the temporary files required to organize its “workers” during the parallel calculation. This method can also be used if you would like these files to be written to another location for debugging or performance reasons.
- Put the following command in your SLURM script. You can select a full path to the temporary location here. You are welcome to put these files in your home directory (
~/) or the data directory (
~/data) on our ZFS filesystem. If your calculation requires many parallel executions, we recommend against using the Lustre filesystem (
~/scratch) for this kind of I/O because it will write many small files.
export TMPDIR=$(pwd)/matlab_tmp # or select another path mkdir -p $TMPDIR
- Instead of using the command
parpool(4), use the following commands to create one worker per processor. Note that your SLURM request should use
--cpus-per-taskto select the number of processors. Matlab does not use “tasks” which are exclusively used for MPI jobs. These commands must appear before your parallel
pc = parcluster('local'); nproc = str2num(getenv('SLURM_CPUS_PER_TASK')); pc.NumWorkers = nproc; job_folder = fullfile(getenv('TMPDIR'),getenv('SLURM_JOB_ID')); mkdir(job_folder); pc.JobStorageLocation = job_folder; parpool(pc,nproc)
parforcommand anywhere in your script.
At the end of your script, remove the pool with these commands.
poolobj = gcp('nocreate'); delete(poolobj);
This procedure gives you the maximum control over the files required to support parallel execution.