Cryo-EM Cluster @ LSI

Setup

Gaining Access

1. Submit ticket to lsiit@umich.edu requesting a user account be added to the Cianfrocco Lab.

  • Important! Make sure to request to have 'bash' as your shell.

2. Once approved, log into the server:

  • On campus:
    • ssh -Y [uniqname]@cryoem.lsi.umich.edu
  • Off campus:
    • First connect to Michigan's VPN (learn more here)
    • ssh -Y [uniqname]@cryoem.lsi.umich.edu

3. If this is your first time logging into the system, you need to create a keypair so that you can launch jobs onto multiple nodes of the cluster. Read more here about how to set this up.

  • After testing that your ssh key pair works, log out of the node by typing 'exit'

Environment setup

If this is your first time logging in, you need to add this line:

source /Users/mcianfro/software/cianfrocco.sh

to the last line of the file:

~/.bashrc

You can do this using the vi text editor.

Now, log out of the cluster by typing:

$ exit

And then log back in using the command you used above.

This will give you access to:

  • Modules (see more below)
  • RELION cluster submission script template
  • Two useful cluster commands:
      • q - list all of your current jobs
      • qq - list jobs of all users on cluster

Additional notes about software environment

If you just want to use the RELION environmental variables set, add this line:

source /Users/mcianfro/software/relion.sh

If you just want to use the module environment , add this line:

source /Users/mcianfro/software/modules.sh

Software setup

Modules

You are free to use the 'standard' what to use the cluster, which is to use SBGrid software (see below how to use SBGrid). We prefer 'modules' because they are WAY more intuitive than SBGrid, and allow you to switch software extremely fast.

'Modules' allow you to load and unload specific software packages with a few simple commands.

  • module list Lists all modules that are currently loaded
  • module avail Lists all modules that are available to load
  • module load [module name] Loads software into your environment
  • module unload [module name] Unloads software from your environment
  • module clear Clears all modules from your environment

NOTE: You can only use modules if you are sourcing the file /Users/mcianfro/software/init.sh.

SBGrid

If you'd rather use SBGrid to load your software (instead of modules), just type:

$ sbgrid

Into your terminal.

Interacting with the cluster

Terminology

Nodes - an individual unit of the cluster that houses CPUs or GPUs.

Job Queue - the cluster has different types of nodes with CPUs, GPUs, or high memory. By selecting a job queue type, you are putting your job into a certain type of computing task

  • batch - default queue (all nodes with <= 256GB memory)
  • himem - consists of machines with >256GB memory (Currently just 1 machine)
  • gpu - consists of GPU nodes

Walltime - Estimated time for your job to complete. If job takes longer than wall time, it will be terminated.

Cluster specifications

Compute nodes (50)

  • 20 cores total per node
  • 256 GB memory

GPU nodes (4)

  • 32 cores total per node
  • 128 GB memory
  • 2 x NVIDIA GeForce GTX 1070 GPUs (8 GB memory ea.)

Monitoring jobs

To check only YOUR jobs on the cluster:

$ q

control.hpc.lsi.umich.edu: 
                                                                                  Req'd       Req'd       Elap
Job ID                  Username    Queue    Jobname          SessID  NDS   TSK   Memory      Time    S   Time
----------------------- ----------- -------- ---------------- ------ ----- ------ --------- --------- - ---------
94466                   mcianfro    batch    Class2D/job003/r 150377     1      1       --   01:00:00 C       -- 
94467                   mcianfro    batch    Class2D/job003/r 150483     1      1       --   01:00:00 R  00:00:02

This will give you the run down of jobs that are running or completed/canceled on the cluster.

Items displayed:

  • Job ID - This is the ID number you will use to terminate the job (or monitor its status)
  • Username - uniqname of person who submitted the job
  • Queue - type of computing nodes used by this job (see Queue below)
  • SessID - ID tag for submission
  • NDS - Nodes requested / used for job
  • TSK - Number of processes per node
  • Req'd Memory - Requested memory (RAM) for job
  • Req'd Time - Requested wall time for job
  • S - State of job: C - Canceled/completed; R - Running
  • Elap Time - Length of time job has been running

If you want to see ALL jobs on the cluster, type:

$ qq

Terminating job

To terminate a job:

$ qdel [Job ID]