... | @@ -26,12 +26,15 @@ Note that this machine will be reserved for the user (even if shutdown) until "D |
... | @@ -26,12 +26,15 @@ Note that this machine will be reserved for the user (even if shutdown) until "D |
|
|
|
|
|
To preserve an environment with the desired installations and files on the hard disk (even after a machine is deleted), you can select "Create Snapshot" from the "Action" menu of a currently running machine. Once the snapshot is saved, any new machine can be quickly booted up with the desired installation and hard disk environment by selecting "Instance Snapshot" as the source from the "Select Boot Source" menu under "Source" when launching a new instance, and then selecting the desired snapshot. Note that saving a snapshot may take some time, but booting one (once saved) will be fast--the OpenStack is optimized for this.
|
|
To preserve an environment with the desired installations and files on the hard disk (even after a machine is deleted), you can select "Create Snapshot" from the "Action" menu of a currently running machine. Once the snapshot is saved, any new machine can be quickly booted up with the desired installation and hard disk environment by selecting "Instance Snapshot" as the source from the "Select Boot Source" menu under "Source" when launching a new instance, and then selecting the desired snapshot. Note that saving a snapshot may take some time, but booting one (once saved) will be fast--the OpenStack is optimized for this.
|
|
|
|
|
|
## Storage
|
|
## SLURM Cluster
|
|
|
|
The rest of the wiki consists mainly of information for operating the SLURM scheduled portion of Julia.
|
|
|
|
|
|
|
|
### Storage
|
|
To transfer files, use the domain julia-storage.uni-wuerzburg.de
|
|
To transfer files, use the domain julia-storage.uni-wuerzburg.de
|
|
|
|
|
|
(e.g. `ssh username@julia-storage.uni-wuerzburg.de`).
|
|
(e.g. `ssh username@julia-storage.uni-wuerzburg.de`).
|
|
|
|
|
|
## Compiling
|
|
### Compiling
|
|
By default, the Intel compiler suite is installed, but not added to the PATH variable on the new julia login nodes. This is readily done by executing
|
|
By default, the Intel compiler suite is installed, but not added to the PATH variable on the new julia login nodes. This is readily done by executing
|
|
`source /usr/local/etc/intel_mpi.sh`. For ALF you can chose the Intel enviroment (instead of SuperMUC or Jurece ...) and it should work just fine.
|
|
`source /usr/local/etc/intel_mpi.sh`. For ALF you can chose the Intel enviroment (instead of SuperMUC or Jurece ...) and it should work just fine.
|
|
On virtual machines within openstack there is one issue:
|
|
On virtual machines within openstack there is one issue:
|
... | @@ -39,16 +42,16 @@ Apparently the Intel compiler option -xHost (automated architecture dependent ve |
... | @@ -39,16 +42,16 @@ Apparently the Intel compiler option -xHost (automated architecture dependent ve |
|
|
|
|
|
Jobs may be compiled from the Julia login node, `ssh username@julia.uni-wuerzburg.de`, or in the OpenStack.
|
|
Jobs may be compiled from the Julia login node, `ssh username@julia.uni-wuerzburg.de`, or in the OpenStack.
|
|
|
|
|
|
## Interactive Sessions
|
|
### Interactive Sessions
|
|
|
|
|
|
Program performance may be tested in the OpenStack machines, but interactive sessions are also available from the login node. To get one use `srun --pty bash`.
|
|
Program performance may be tested in the OpenStack machines, but interactive sessions are also available from the login node. To get one use `srun --pty bash`.
|
|
|
|
|
|
It is a good idea to check first if there are any resources available for interactive sessions by running `sinfo` and seeing if there are any nodes in the "idle" state. If your desired partition is not available (see below for partitions) there may be nodes in other partitions that are currently idle and you can use for testing. Specify a partition in the interactive session by using `srun --pty -p <partition name> bash`.
|
|
It is a good idea to check first if there are any resources available for interactive sessions by running `sinfo` and seeing if there are any nodes in the "idle" state. If your desired partition is not available (see below for partitions) there may be nodes in other partitions that are currently idle and you can use for testing. Specify a partition in the interactive session by using `srun --pty -p <partition name> bash`.
|
|
|
|
|
|
## Job submission
|
|
### Job submission
|
|
To submit jobs via SLURM, use `ssh username@julia.uni-wuerzburg.de` and then in the directory of your job script use `sbatch <job_script>`.
|
|
To submit jobs via SLURM, use `ssh username@julia.uni-wuerzburg.de` and then in the directory of your job script use `sbatch <job_script>`.
|
|
|
|
|
|
## Sample Job Script
|
|
### Sample Job Script
|
|
For a generic job (serial, embarrassingly parallel, or parallel, depending on options chosen):
|
|
For a generic job (serial, embarrassingly parallel, or parallel, depending on options chosen):
|
|
|
|
|
|
```
|
|
```
|
... | @@ -74,16 +77,16 @@ srun ./prog.out <arguments> |
... | @@ -74,16 +77,16 @@ srun ./prog.out <arguments> |
|
|
|
|
|
This will set up one node (N) with 32 cores (ntasks-per-node * cpus-per-task), and allow the job to run for 24 hours at the most. Depending on the cores requested and availability of nodes, the job could end up on a "standard" node or an "ib" node, because the partition is not specified (more on partitions below). The source (export) will be the default Debian installation ("Ubuntu" is an alternative option), and the working directory (workdir) is set so that the executable and input files can be found (`/home/<user>` is currently the default working directory.) Output files will be placed in the working directory as well. The executable will be able to make use of all the cores requested in the node.
|
|
This will set up one node (N) with 32 cores (ntasks-per-node * cpus-per-task), and allow the job to run for 24 hours at the most. Depending on the cores requested and availability of nodes, the job could end up on a "standard" node or an "ib" node, because the partition is not specified (more on partitions below). The source (export) will be the default Debian installation ("Ubuntu" is an alternative option), and the working directory (workdir) is set so that the executable and input files can be found (`/home/<user>` is currently the default working directory.) Output files will be placed in the working directory as well. The executable will be able to make use of all the cores requested in the node.
|
|
|
|
|
|
## Partitions
|
|
### Partitions
|
|
There are four partitions in the SLURM portion of the cluster: "standard*", "ib", "fat", and "gpu". A partition can be specified in a job script by adding an `#SBATCH -p <partition name>` line. The standard*, ib, and gpu partitions each consist of nodes with 32 threads and 384GB of memory available per node. If no partition is specified, jobs will end up either in the standard* or the ib partition (if no standard* nodes are available).
|
|
There are four partitions in the SLURM portion of the cluster: "standard*", "ib", "fat", and "gpu". A partition can be specified in a job script by adding an `#SBATCH -p <partition name>` line. The standard*, ib, and gpu partitions each consist of nodes with 32 threads and 384GB of memory available per node. If no partition is specified, jobs will end up either in the standard* or the ib partition (if no standard* nodes are available).
|
|
|
|
|
|
The ib partition nodes include additionally FDR 56 Gbit/s Mellanox infiniband interconnect on each node, and the gpu partition nodes include additionally two NVIDIA Tesla P100 graphics cards on each node. The fat partition nodes each have 2TB of memory and can handle 144 threads.
|
|
The ib partition nodes include additionally FDR 56 Gbit/s Mellanox infiniband interconnect on each node, and the gpu partition nodes include additionally two NVIDIA Tesla P100 graphics cards on each node. The fat partition nodes each have 2TB of memory and can handle 144 threads.
|
|
|
|
|
|
## `srun` vs. `mpirun` vs. `mpiexec`
|
|
### `srun` vs. `mpirun` vs. `mpiexec`
|
|
For not quite embarrassingly parallel jobs, see here
|
|
For not quite embarrassingly parallel jobs, see here
|
|
https://docs.computecanada.ca/wiki/Advanced_MPI_scheduling#Why_srun_instead_of_mpiexec_or_mpirun.3F
|
|
https://docs.computecanada.ca/wiki/Advanced_MPI_scheduling#Why_srun_instead_of_mpiexec_or_mpirun.3F
|
|
|
|
|
|
## Cluster Info from the Login Node
|
|
### Cluster Info from the Login Node
|
|
To get information on the nodes available use `sinfo`.
|
|
To get information on the nodes available use `sinfo`.
|
|
|
|
|
|
To get the specifications for a specific node on the sinfo list use `scontrol show node <nodename>`.
|
|
To get the specifications for a specific node on the sinfo list use `scontrol show node <nodename>`.
|
... | | ... | |