User Tools

Site Tools


keller_and_evans_lab:cu_research_computing

This document will mostly cover specific instructions for using RC in the Vrieze and Keller labs. We will try to update this, but RC is a bit of a moving target, so some of what is written below may now be outdated.

Getting started

General documentation for using RC is on their website]. We recommend that ALL new users first read these overviews on that webpage. In particular:
Logging In <which you've already done> Duo 2-factor Authentication <which you've already done> Allocations Node Types Filesystems The modules system The PetaLibrary Running applications with Jobs Batch Jobs and Job Scripting Interactive Jobs Useful Slurm commands Job Resource Information squeue status and reason codes Containerization on Summit

Overview of best practices

Richard Border's slides from October 2019 (might be out of date)

Logging in

Put these settings in your ~/.ssh/config file so you only have to enter your OTP once per session, instead of for every ssh connection you make. See the slides describing this, and instructions to do the same on PuTTY and Bitvise.

# These rules only apply for connections to login.rc.colorado.edu
Host login.rc.colorado.edu
# Setup a master ssh session the first time, and subsequent times
# use the existing master connection
ControlMaster auto
ControlPath ~/.ssh/%r@%h:%p
# Keep the ssh connection open, even when the last session closes
ControlPersist yes
# X forwarding. Remove this on a Mac if you
# don't want it to start X11 each time you
# connect to RC
ForwardX11 yes
# Compress the data stream.
Compression yes
# Send keep alive packets to prevent disconnects from
# CU wireless and behind NAT devices
ServerAliveInterval 60

These settings should work from Mac and Linux. For Windows, see the slides. On a Mac, those settings will cause X11 to start. If you don't want that to happen, then remove the ForwardX11 yes line.

For those with access to summit (ONLY!), here are the steps to using it:

#From a login node:
ssh -YC <uname>@shas07{01-15}

#In your shell script:
No need to include -A UCB00000442
 --partition=shas

#To run R:
ml load R
ml load gcc
R

Don't save temporary files in /pl/active/KellerLab

Everything you save to /pl/active/KellerLab (and /pl/active/IBG) is backed up automatically by the system. This is generally good, unless you're saving files that you only need for the moment, or at most for a few days. We don't need these files long term and we don't need them backed up, but they will be backed up, and they'll count against our total storage allocation for the next year. The last thing we want is to pay for storage for a bunch of large temporary files that are also backed up.

If the temporary files are deleted by you after they're automatically backed up, they live on for a year before they're finally removed entirely from the system. That's one year of paying for storage for unneeded files.

Fortunately, RC has a solution for this, called scratch' space. Scratch space is meant for temporary files that do not need to be backed up. Scratch space has several advantages, including:

  • Scratch is faster faster
  • Scratch is not backed up (this can be a good thing!)
  • We don't have to pay for scratch

So unless you know the file you're creating is important to save long-term, use the scratch space.

When using janus nodes scratch data should be written to

/lustre/janus_scratch/<username> # or /rc_scratch/<username>

When using himem aka blanca-ibg, scratch data can be written to

/lustre/janus_scratch/<username> # or /rc_scratch/<username> # or /local/scratch/<username>

When using blanca-ibgc1 scratch can be written to

/rc_scratch/<username> # or /local/scratch/<username>

You will have to manually create a directory to put their stuff in. You can also just make a big mess with files all over and annoy other users. lustre and rc_scratch are network filesystems, and will appear identical on all nodes that connect to them. /local/scratch is local to the particular node used, so something saved on bnode0108 will not be visible from himem04. The size of /local/scratch depends on which node is used, but it is about not large. The “df” command does not work on rc_scratch, so it's unclear how much space is available. In the future lustre will be going away, and will probably be replaced by rc_scratch.

Slurm

Queues

#if you want to run on ibg himem, you need to load the right module module load slurm/blanca

#then in your shell script #SBATCH –qos=blanca-ibg

#If you want to run on normal queues, then: module load slurm/slurm

#then in your shell script, one of the below, depending on what queue you want #SBATCH –qos=himem #SBATCH –qos=crestone #SBATCH –qos=janus

Getting information on jobs

#To check our balance on our allocations and get the account id# sbank balance statement sacctmgr -p show user <username> #alternatively to find the acct#

#To see how busy the nodes are. For seeing how many janus nodes are available, look for the #number under NODES where STATE is “idle” for PARTITION “janus” and TIMELIMIT 1-00:00:00. sinfo -l

#checking on submissions for a user squeue -u <username> #To see your job statuses (R is for running, PD pending, CG completing, CD completed, F failed, TO timeout) squeue -u <username> -t RUNNING squeue -u <username> -t PENDING squeue -u <username> –start #Get an estimate of when jobs will start

#detailed information on a queue (who is running on it, how many cpus requested, memory requested, time information, etc.) squeue -q blanca-ibg -o %u,%c,%e,%m,%j,%l,%L,%o,%R,%t | column -ts ','

#current status of queues qstat -i #To see jobs that are currently pending (this is helpful for seeing if queue is overbooked) qstat -r #To see jobs that are currently running qstat -a #To see jobs that are running OR are queued qstat -a -n #To see all jobs, including which nodes they are running on qstat -r -n #To see running jobs, and which nodes they are running on

#other commands showq-slurm -o -U -q <partition> #List job priority order for current user (you) in given partition scontrol show jobid -dd <jobid> #List detailed information for a job (useful for troubleshooting). More info [https://www.rc.colorado.edu/book/export/html/613 here]. pbsnodes -a #To look at the status of each node

### Once job has completed, you can get additional information ### that was not available during the run. This includes run ### time, memory used, etc. sacct -j <jobid> –format=JobID,JobName,MaxRSS,Elapsed #Stats on completed jobs by jobID sacct -u <username> –format=JobID,JobName,MaxRSS,Elapsed #View same info for all jobs of user

#To check graphically how much storage is being taken up in /work/KellerLab folder xdiskusage /work/KellerLab/sizes

Running and Controlling jobs

sbatch <shell.script.name.sh> #run shell script sinteractive –nodelist=bnode0102 #run interactive job on node “bnode0102” scancel <jobid> #Cancel one job scancel -u <username> #Cancel all jobs for user scancel -t PENDING -u <username> #Cancel all pending jobs for user scancel –name myJobName #Cancel one or more jobs by name scontrol hold <jobid> #Pause job scontrol resume <jobid> #Resume job

Advanced commands

### Suspend all running jobs for a user (takes into account job arrays) squeue -ho %A -t R | xargs -n 1 scontrol suspend ### Resume all suspended jobs for a user squeue -o “%.18A %.18t” -u <username> | awk '{if ($2 ==“S”){print $1}}' | xargs -n 1 scontrol resume ### After resuming, check if any are still suspended squeue -ho %A -u $USER -t S | wc -l

Interactive Jobs

Interactive session on IBG himem. Note that this command really starts up a job, then launches screen, then connects you to that screen session. If you want to use your emacs key combinations to move the cursor around you'll have to remap in your ~/.screenrc file the Ctrl-A binding, which in screen is used to initiate commands. This command will bind Ctrl-A to Ctrl-O

# replace Ctrl-A with Ctrl-O
escape ^Oo

This only needs to be done once.

Then launch your interactive job on the IBG himem node.

module load slurm/blanca && sinteractive –qos=blanca-ibg

Or onto any free himem node

module load slurm/blanca && sinteractive –qos=blanca

Job Arrays to submit many independent jobs in parallel

The easiest way to do this is to create a 'job array'. This allows you to loop through chromosomes very easily. An example sbatch shell script to tabix index 22 chromosomes is:

#!/bin/sh #SBATCH –mem=1gb #SBATCH –ntasks-per-node=1 #SBATCH –nodes=1 #SBATCH –qos=blanca-ibgc1 #SBATCH –time=2:00:00 #SBATCH –array 1-22

tabix -fp vcf chr${SLURM_ARRAY_TASK_ID}impv1.vcf.gz

GNU Parallel to submit more complicated sets of jobs in parallel

For more complicated sets of commands, one can also create a text file (e.g., test.txt) with the list of parameters you need to feed into your script, for example in bash format.

1 0 99 1 100 199 1 200 299

You can then feed this text file to the parallel command. Here's an example for ibgc1 minicluster:

cat test.txt | parallel –colsep ' ' sbatch –qos blanca-ibgc1 test.sh

Here's a command for janus

cat test.txt | parallel –colsep ' ' sbatch –qos janus –reservation=janus-serial test.sh

Where the file test.sh is a bash script that simply runs tabix on the commands from the test.txt file.

#!/bin/sh #SBATCH -A <account number> #SBATCH –mem=20gb #SBATCH –ntasks-per-node=1 #SBATCH –nodes=1 #SBATCH –time=0:30:00

assertNotEmpty() {

  : "${!1:? "$1 is empty, aborting."}"

} : ${chr:=$1} export chr assertNotEmpty chr

: ${startpos:=$2} export startpos assertNotEmpty startpos

: ${endpos:=$3} export endpos assertNotEmpty endpos

tabix -h chr${chr}/chr${chr}impv1.vcf.gz ${chr}:${startpos}-${endpos} | bgzip -c > chr$chr/chr${chr}impv1.${chr}_$startpos-$endpos.vcf.gz

Compiling software

RC intentionally keeps some header files off the login nodes to dissuage people from trying to compile on those nodes. Instead, use the janus-compile nodes to compile your software. Log in to a login node and then run

ssh janus-compile[1-4]

KellerLab software

These instructions are for accessing the software and utilities that have been installed by the KellerLab. There are two choices.

Source the KellerLab preferences

This will import all of the KellerLab preferences, including changing the prompt and adding some other useful aliases. To do this add the following to the bottom of your .my.bashrc

. /work/KellerLab/opt/bash_profile

an easy way to do that is by pasting the following into a terminal on RC

echo “. /work/KellerLab/opt/bash_profile” » ~/.my.bashrc

Only add the paths

If you prefer your own command prompt and other settings, then you can just add the paths to your .my.bashrc To do that, add the following lines to the bottom of your .my.bashrc

PATH=${PATH}:/work/KellerLab/opt/bin MANPATH=${MANPATH}:/work/KellerLab/opt/share/man export PERL5LIB=/work/KellerLab/opt/lib/perl5/

or run this

head -3 /work/KellerLab/opt/bash_profile » ~/.my.bashrc

Emacs and ESS

ESS on RC

To add the ESS package to Emacs, add the following lines to the bottom of your .emacs file

;;getting ESS to work (load “/work/KellerLab/opt/ESS/ess-13.09-1/lisp/ess-site.el”)

(require 'ess-site)

ESS on your local Emacs

Running Emacs (or Aquamacs) on your local computer, but editing files and using R on RC is not difficult, but requires some setup.

Setup

First, edit your local .emacs file and add the following statement

(setq tramp-ssh-controlmaster-options

               (concat
                 "-o ControlPath=~/.ssh/%%r@%%h:%%p "
                 "-o ControlMaster=auto -o ControlPersist=no"))

This will cause tramp to share the same SSH socket as your login.rc.colorado.edu SSH sessions. That way you can open files without having to re-enter your OTP.

Opening a remote file with tramp

Opening a file on a remote host with tramp is similar to opening a local file. Press ctrl-x ctrl-f and then for the remote file enter /ssh:login.rc.colorado.edu: and then enter the path to the file, for example /ssh:login.rc.colorado.edu:/work/KellerLab/myfile.R or /ssh:login.rc.colorado.edu:scripts/mystuff.R. Once a single path element (/work or ~/, for example> is entered then tab completion will work.

Activating a remote R session in ESS

Activating a remote R session is accomplished by using a shell within Emacs. It is most convenient to open a new Emacs frame with ctrl-x 5 2. Then within the new frame enter M-x shell. Then from the shell within Emacs run ssh login.rc.colorado.edu and then from that shell run ssh himem04 or ssh node0219 or whatever to get to the location where you actually want to run R.

Once you have a shell on your job node, then run R by typing R. Then run M-x ess-remote and then press enter to accept the Dialect line that is put up.

Once that is done you will be able to edit an R script in your local Emacs and submit lines to the R process on the RC job node.

keller_and_evans_lab/cu_research_computing.txt · Last modified: 2020/02/12 09:03 by lessem