Cryo-EM Cluster @ LSI
Setup
Gaining Access
1. Submit ticket to lsiit@umich.edu requesting a user account be added to the Cianfrocco Lab.
- Important! Make sure to request to have 'bash' as your shell.
2. Once approved, log into the server:
- On campus:
ssh -Y [uniqname]@cryoem.lsi.umich.edu
- Off campus:
- First connect to Michigan's VPN (learn more here)
ssh -Y [uniqname]@cryoem.lsi.umich.edu
3. If this is your first time logging into the system, you need to create a keypair so that you can launch jobs onto multiple nodes of the cluster. Read more here about how to set this up.
- After testing that your ssh key pair works, log out of the node by typing 'exit'
Environment setup
If this is your first time logging in, you need to add this line:
source /Users/mcianfro/software/cianfrocco.sh
to the last line of the file:
~/.bashrc
You can do this using the vi text editor.
Now, log out of the cluster by typing:
$ exit
And then log back in using the command you used above.
This will give you access to:
- Modules (see more below)
- RELION cluster submission script template
- Two useful cluster commands:
- q - list all of your current jobs
- qq - list jobs of all users on cluster
Additional notes about software environment
If you just want to use the RELION environmental variables set, add this line:
source /Users/mcianfro/software/relion.sh
If you just want to use the module environment , add this line:
source /Users/mcianfro/software/modules.sh
Software setup
Modules
You are free to use the 'standard' what to use the cluster, which is to use SBGrid software (see below how to use SBGrid). We prefer 'modules' because they are WAY more intuitive than SBGrid, and allow you to switch software extremely fast.
'Modules' allow you to load and unload specific software packages with a few simple commands.
module list
Lists all modules that are currently loadedmodule avail
Lists all modules that are available to loadmodule load [module name]
Loads software into your environmentmodule unload [module name]
Unloads software from your environmentmodule clear
Clears all modules from your environment
NOTE: You can only use modules if you are sourcing the file /Users/mcianfro/software/init.sh.
SBGrid
If you'd rather use SBGrid to load your software (instead of modules), just type:
$ sbgrid
Into your terminal.
Interacting with the cluster
Terminology
Nodes - an individual unit of the cluster that houses CPUs or GPUs.
Job Queue - the cluster has different types of nodes with CPUs, GPUs, or high memory. By selecting a job queue type, you are putting your job into a certain type of computing task
- batch - default queue (all nodes with <= 256GB memory)
- himem - consists of machines with >256GB memory (Currently just 1 machine)
- gpu - consists of GPU nodes
Walltime - Estimated time for your job to complete. If job takes longer than wall time, it will be terminated.
Cluster specifications
Compute nodes (50)
- 20 cores total per node
- 256 GB memory
GPU nodes (4)
- 32 cores total per node
- 128 GB memory
- 2 x NVIDIA GeForce GTX 1070 GPUs (8 GB memory ea.)
Monitoring jobs
To check only YOUR jobs on the cluster:
$ q
control.hpc.lsi.umich.edu:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
----------------------- ----------- -------- ---------------- ------ ----- ------ --------- --------- - ---------
94466 mcianfro batch Class2D/job003/r 150377 1 1 -- 01:00:00 C --
94467 mcianfro batch Class2D/job003/r 150483 1 1 -- 01:00:00 R 00:00:02
This will give you the run down of jobs that are running or completed/canceled on the cluster.
Items displayed:
- Job ID - This is the ID number you will use to terminate the job (or monitor its status)
- Username - uniqname of person who submitted the job
- Queue - type of computing nodes used by this job (see Queue below)
- SessID - ID tag for submission
- NDS - Nodes requested / used for job
- TSK - Number of processes per node
- Req'd Memory - Requested memory (RAM) for job
- Req'd Time - Requested wall time for job
- S - State of job: C - Canceled/completed; R - Running
- Elap Time - Length of time job has been running
If you want to see ALL jobs on the cluster, type:
$ qq
Terminating job
To terminate a job:
$ qdel [Job ID]