This document will mostly cover specific instructions for using RC in the Vrieze and Keller labs. We will try to update this, but RC is a bit of a moving target, so some of what is written below may now be outdated. ======= Getting started ======= General documentation for using RC is [[[https://curc.readthedocs.io/en/latest/|on their website]]]. We recommend that ALL new users first read these overviews on that webpage. In particular: \\ Logging In Duo 2-factor Authentication Allocations Node Types Filesystems The modules system The PetaLibrary Running applications with Jobs Batch Jobs and Job Scripting Interactive Jobs Useful Slurm commands Job Resource Information squeue status and reason codes Containerization on Summit ======= Overview of best practices ======= [[https://docs.google.com/presentation/d/1yQToDgohYZIzwu9NL0Z5ORpmN-9qeY3cpT7RDusEg-w/edit?usp=sharing|Richard Border's slides]] from October 2019 (might be out of date) ======= Logging in ======= Put these settings in your ''~/.ssh/config'' file so you only have to enter your OTP once per session, instead of for every ssh connection you make. See the [[https://docs.google.com/presentation/d/1FMir2LDbBJffXZ5aMjIJhRA_7HO99xrxT__AFxmgU_c/edit?usp=sharing|slides]] describing this, and instructions to do the same on [[https://www.chiark.greenend.org.uk/~sgtatham/putty/|PuTTY]] and [[https://www.bitvise.com/ssh-client|Bitvise]]. # These rules only apply for connections to login.rc.colorado.edu Host login.rc.colorado.edu # Setup a master ssh session the first time, and subsequent times # use the existing master connection ControlMaster auto ControlPath ~/.ssh/%r@%h:%p # Keep the ssh connection open, even when the last session closes ControlPersist yes # X forwarding. Remove this on a Mac if you # don't want it to start X11 each time you # connect to RC ForwardX11 yes # Compress the data stream. Compression yes # Send keep alive packets to prevent disconnects from # CU wireless and behind NAT devices ServerAliveInterval 60 These settings should work from Mac and Linux. For Windows, see the [[https://docs.google.com/presentation/d/1FMir2LDbBJffXZ5aMjIJhRA_7HO99xrxT__AFxmgU_c/edit?usp=sharing|slides]]. On a Mac, those settings will cause X11 to start. If you don't want that to happen, then remove the ''ForwardX11 yes'' line. For those with access to summit (ONLY!), here are the steps to using it: #From a login node: ssh -YC @shas07{01-15} #In your shell script: No need to include -A UCB00000442 --partition=shas #To run R: ml load R ml load gcc R ======= Don't save temporary files in /pl/active/KellerLab ======= Everything you save to /pl/active/KellerLab (and /pl/active/IBG) is backed up automatically by the system. This is generally good, unless you're saving files that you only need for the moment, or at most for a few days. We don't need these files long term and we don't need them backed up, but they will be backed up, and **they'll count against our total storage allocation for the next year**. The last thing we want is to pay for storage for a bunch of large temporary files that are also backed up. If the temporary files are deleted by you after they're automatically backed up, they live on for a year before they're finally removed entirely from the system. That's one year of paying for storage for unneeded files. Fortunately, RC has a solution for this, called //scratch//' space. Scratch space is meant for temporary files that do not need to be backed up. Scratch space has several advantages, including: * Scratch is faster faster * Scratch is not backed up (this can be a good thing!) * We don't have to pay for scratch **So unless you know the file you're creating is important to save long-term, use the scratch space.** When using **janus** nodes scratch data should be written to /lustre/janus_scratch/ # or /rc_scratch/ When using **himem** aka **blanca-ibg**, scratch data can be written to /lustre/janus_scratch/ # or /rc_scratch/ # or /local/scratch/ When using **blanca-ibgc1** scratch can be written to /rc_scratch/ # or /local/scratch/ You will have to manually create a directory to put their stuff in. You can also just make a big mess with files all over and annoy other users. lustre and rc_scratch are network filesystems, and will appear identical on all nodes that connect to them. /local/scratch is local to the particular node used, so something saved on bnode0108 will not be visible from himem04. The size of /local/scratch depends on which node is used, but it is about not large. The "df" command does not work on rc_scratch, so it's unclear how much space is available. In the future lustre will be going away, and will probably be replaced by rc_scratch. ======= Slurm ======= ====== Queues ====== #if you want to run on ibg himem, you need to load the right module module load slurm/blanca #then in your shell script #SBATCH --qos=blanca-ibg #If you want to run on normal queues, then: module load slurm/slurm #then in your shell script, one of the below, depending on what queue you want #SBATCH --qos=himem #SBATCH --qos=crestone #SBATCH --qos=janus ====== Getting information on jobs ====== #To check our balance on our allocations and get the account id# sbank balance statement sacctmgr -p show user #alternatively to find the acct# #To see how busy the nodes are. For seeing how many janus nodes are available, look for the #number under NODES where STATE is "idle" for PARTITION "janus" and TIMELIMIT 1-00:00:00. sinfo -l #checking on submissions for a user squeue -u #To see your job statuses (R is for running, PD pending, CG completing, CD completed, F failed, TO timeout) squeue -u -t RUNNING squeue -u -t PENDING squeue -u --start #Get an estimate of when jobs will start #detailed information on a queue (who is running on it, how many cpus requested, memory requested, time information, etc.) squeue -q blanca-ibg -o %u,%c,%e,%m,%j,%l,%L,%o,%R,%t | column -ts ',' #current status of queues qstat -i #To see jobs that are currently pending (this is helpful for seeing if queue is overbooked) qstat -r #To see jobs that are currently running qstat -a #To see jobs that are running OR are queued qstat -a -n #To see all jobs, including which nodes they are running on qstat -r -n #To see running jobs, and which nodes they are running on #other commands showq-slurm -o -U -q #List job priority order for current user (you) in given partition scontrol show jobid -dd #List detailed information for a job (useful for troubleshooting). More info [https://www.rc.colorado.edu/book/export/html/613 here]. pbsnodes -a #To look at the status of each node ### Once job has completed, you can get additional information ### that was not available during the run. This includes run ### time, memory used, etc. sacct -j --format=JobID,JobName,MaxRSS,Elapsed #Stats on completed jobs by jobID sacct -u --format=JobID,JobName,MaxRSS,Elapsed #View same info for all jobs of user #To check graphically how much storage is being taken up in /work/KellerLab folder xdiskusage /work/KellerLab/sizes ====== Running and Controlling jobs ====== sbatch #run shell script sinteractive --nodelist=bnode0102 #run interactive job on node "bnode0102" scancel #Cancel one job scancel -u #Cancel all jobs for user scancel -t PENDING -u #Cancel all pending jobs for user scancel --name myJobName #Cancel one or more jobs by name scontrol hold #Pause job scontrol resume #Resume job ====== Advanced commands ====== ### Suspend all running jobs for a user (takes into account job arrays) squeue -ho %A -t R | xargs -n 1 scontrol suspend ### Resume all suspended jobs for a user squeue -o "%.18A %.18t" -u | awk '{if ($2 =="S"){print $1}}' | xargs -n 1 scontrol resume ### After resuming, check if any are still suspended squeue -ho %A -u $USER -t S | wc -l ====== Interactive Jobs ====== **Interactive session on IBG himem**. Note that this command really starts up a job, then launches screen, then connects you to that screen session. If you want to use your emacs key combinations to move the cursor around you'll have to remap in your ''~/.screenrc'' file the Ctrl-A binding, which in screen is used to initiate commands. This command will bind Ctrl-A to Ctrl-O # replace Ctrl-A with Ctrl-O escape ^Oo This only needs to be done once. Then launch your interactive job on the IBG himem node. module load slurm/blanca && sinteractive --qos=blanca-ibg Or onto any free himem node module load slurm/blanca && sinteractive --qos=blanca ====== Job Arrays to submit many independent jobs in parallel ====== The easiest way to do this is to create a 'job array'. This allows you to loop through chromosomes very easily. An example sbatch shell script to tabix index 22 chromosomes is: #!/bin/sh #SBATCH --mem=1gb #SBATCH --ntasks-per-node=1 #SBATCH --nodes=1 #SBATCH --qos=blanca-ibgc1 #SBATCH --time=2:00:00 #SBATCH --array 1-22 tabix -fp vcf chr${SLURM_ARRAY_TASK_ID}impv1.vcf.gz ====== GNU Parallel to submit more complicated sets of jobs in parallel ====== For more complicated sets of commands, one can also create a text file (e.g., test.txt) with the list of parameters you need to feed into your script, for example in bash format. 1 0 99 1 100 199 1 200 299 You can then feed this text file to the parallel command. Here's an example for ibgc1 minicluster: cat test.txt | parallel --colsep ' ' sbatch --qos blanca-ibgc1 test.sh Here's a command for janus cat test.txt | parallel --colsep ' ' sbatch --qos janus --reservation=janus-serial test.sh Where the file test.sh is a bash script that simply runs tabix on the commands from the test.txt file. #!/bin/sh #SBATCH -A #SBATCH --mem=20gb #SBATCH --ntasks-per-node=1 #SBATCH --nodes=1 #SBATCH --time=0:30:00 assertNotEmpty() { : "${!1:? "$1 is empty, aborting."}" } : ${chr:=$1} export chr assertNotEmpty chr : ${startpos:=$2} export startpos assertNotEmpty startpos : ${endpos:=$3} export endpos assertNotEmpty endpos tabix -h chr${chr}/chr${chr}impv1.vcf.gz ${chr}:${startpos}-${endpos} | bgzip -c > chr$chr/chr${chr}impv1.${chr}_$startpos-$endpos.vcf.gz ======= Compiling software ======= RC intentionally keeps some header files off the login nodes to dissuage people from trying to compile on those nodes. Instead, use the janus-compile nodes to compile your software. Log in to a login node and then run ssh janus-compile[1-4] ======= KellerLab software ======= These instructions are for accessing the software and utilities that have been installed by the KellerLab. There are two choices. ====== Source the KellerLab preferences ====== This will import all of the KellerLab preferences, including changing the prompt and adding some other useful aliases. To do this add the following to the bottom of your ''.my.bashrc'' . /work/KellerLab/opt/bash_profile an easy way to do that is by pasting the following into a terminal on RC echo ". /work/KellerLab/opt/bash_profile" >> ~/.my.bashrc ====== Only add the paths ====== If you prefer your own command prompt and other settings, then you can just add the paths to your ''.my.bashrc'' To do that, add the following lines to the bottom of your ''.my.bashrc'' PATH=${PATH}:/work/KellerLab/opt/bin MANPATH=${MANPATH}:/work/KellerLab/opt/share/man export PERL5LIB=/work/KellerLab/opt/lib/perl5/ or run this head -3 /work/KellerLab/opt/bash_profile >> ~/.my.bashrc ======= Emacs and ESS ======= ====== ESS on RC ====== To add the ESS package to Emacs, add the following lines to the bottom of your ''.emacs'' file ;;getting ESS to work (load "/work/KellerLab/opt/ESS/ess-13.09-1/lisp/ess-site.el") (require 'ess-site) ====== ESS on your local Emacs ====== Running Emacs (or Aquamacs) on your local computer, but editing files and using R on RC is not difficult, but requires some setup. ===== Setup ===== First, edit your local .emacs file and add the following statement (setq tramp-ssh-controlmaster-options (concat "-o ControlPath=~/.ssh/%%r@%%h:%%p " "-o ControlMaster=auto -o ControlPersist=no")) This will cause tramp to share the same SSH socket as your login.rc.colorado.edu SSH sessions. That way you can open files without having to re-enter your OTP. ===== Opening a remote file with tramp ===== Opening a file on a remote host with tramp is similar to opening a local file. Press ''ctrl-x ctrl-f'' and then for the remote file enter ''/ssh:login.rc.colorado.edu:'' and then enter the path to the file, for example ''/ssh:login.rc.colorado.edu:/work/KellerLab/myfile.R'' or ''/ssh:login.rc.colorado.edu:scripts/mystuff.R''. Once a single path element (''/work'' or ''~/'', for example> is entered then tab completion will work. ===== Activating a remote R session in ESS ===== Activating a remote R session is accomplished by using a shell within Emacs. It is most convenient to open a new Emacs frame with ''ctrl-x 5 2''. Then within the new frame enter ''M-x shell''. Then from the shell within Emacs run ''ssh login.rc.colorado.edu'' and then from that shell run ''ssh himem04'' or ''ssh node0219'' or whatever to get to the location where you actually want to run R. Once you have a shell on your job node, then run R by typing ''R''. Then run ''M-x ess-remote'' and then press enter to accept the Dialect line that is put up. Once that is done you will be able to edit an R script in your local Emacs and submit lines to the R process on the RC job node.