Differences

This shows you the differences between two versions of the page.

--- cu_research_computing [2016/04/07 15:24] – /* Only add the paths */ lessem
+++ cu_research_computing [2019/10/08 10:19] – matthew_keller
@@ Line 1: / Line 1: @@
-General documentation for using RC is [[[https://www.rc.colorado.edu/support|on their website]]]. This document will mostly cover specific instructions for using it in the Vrieze and Keller labs.
+This document will mostly cover specific instructions for using RC in the Vrieze and Keller labs. We will try to update this, but RC is a bit of a moving target, so some of what is written below may now be outdated.
-======= Logging in =======
+======= Getting started =======
+General documentation for using RC is [[[https://curc.readthedocs.io/en/latest/|on their website]]]. We recommend that ALL new users first read these overviews. In particular:
+Logging In <which you've already done>
+Duo 2-factor Authentication <which you've already done>
+Allocations
+Node Types
+Filesystems
+The modules system
+The PetaLibrary
+Running applications with Jobs
+Batch Jobs and Job Scripting
+Interactive Jobs
+Useful Slurm commands
+Job Resource Information
+squeue status and reason codes
+Containerization on Summit
+======= Overview of best practices =======
+This was written by Richard Border on Oct 8, 2019:
+{{file_example.jpg}}
+======= Logging in =======
 Put these settings in your ''~/.ssh/config'' file so you only have to enter your OTP once per session, instead of for every ssh connection you make
@@ Line 25: / Line 50: @@
 These settings should work from Mac and Linux. I'm not sure how to do the equivalent from Windows with Putty. On a Mac, those settings will cause X11 to start. If you don't want that to happen, then remove the ''ForwardX11 yes'' line.
+For those with access to summit (ONLY!), here are the steps to using it:
+  #From a login node:
+  ssh -YC <uname>@shas07{01-15}
+  #In your shell script:
+  No need to include -A UCB00000442
+   --partition=shas
+  #To run R:
+  ml load R
+  ml load gcc
+  R
@@ Line 59: / Line 99: @@
-You will have to manually create a directory to put their stuff in. You can also just make a big mess with files all over and annoy other users. lustre and rc_scratch are network filesystems, and will appear identical on all nodes that connect to them. /local/scratch is local to the particular node used, so something saved on bnode0108 will not be visible from himem04. The size of /local/scratch depends on which node is used, but it is about not large. rc_scratch is currently only about 20TB, which lustre is 100s of TB. In the future lustre will be going away, and will probably be replaced by having rc_scratch grow to the size of lustre.
+You will have to manually create a directory to put their stuff in. You can also just make a big mess with files all over and annoy other users. lustre and rc_scratch are network filesystems, and will appear identical on all nodes that connect to them. /local/scratch is local to the particular node used, so something saved on bnode0108 will not be visible from himem04. The size of /local/scratch depends on which node is used, but it is about not large. The "df" command does not work on rc_scratch, so it's unclear how much space is available. In the future lustre will be going away, and will probably be replaced by rc_scratch.
 ======= Slurm =======
+====== Queues ======
+#if you want to run on ibg himem, you need to load the right module
+module load slurm/blanca
+#then in your shell script
+#SBATCH --qos=blanca-ibg
+#If you want to run on normal queues, then:
+module load slurm/slurm
+#then in your shell script, one of the below, depending on what queue you want
+#SBATCH --qos=himem
+#SBATCH --qos=crestone
+#SBATCH --qos=janus
@@ Line 68: / Line 129: @@
-squeue -u <username>
+#To check our balance on our allocations and get the account id#
+sbank balance statement
+sacctmgr -p show user <username> #alternatively to find the acct#
+#To see how busy the nodes are. For seeing how many janus nodes are available, look for the
+#number under NODES where STATE is "idle" for PARTITION "janus" and TIMELIMIT 1-00:00:00.
+sinfo -l
+#checking on submissions for a user
+squeue -u <username>  #To see your job statuses (R is for running, PD pending, CG completing, CD completed, F failed, TO timeout)
 squeue -u <username> -t RUNNING
 squeue -u <username> -t PENDING
+squeue -u <username> --start #Get an estimate of when jobs will start
+#detailed information on a queue (who is running on it, how many cpus requested, memory requested, time information, etc.)
+squeue -q blanca-ibg -o %u,%c,%e,%m,%j,%l,%L,%o,%R,%t | column -ts ','
+#current status of queues
+qstat -i #To see jobs that are currently pending (this is helpful for seeing if queue is overbooked)
+qstat -r #To see jobs that are currently running
+qstat -a #To see jobs that are running OR are queued
+qstat -a -n #To see all jobs, including which nodes they are running on
+qstat -r -n #To see running jobs, and which nodes they are running on
+#other commands
 showq-slurm -o -U -q <partition>  #List job priority order for current user (you) in given partition
-scontrol show jobid -dd <jobid>   #List detailed information for a job (useful for troubleshooting)
+scontrol show jobid -dd <jobid>   #List detailed information for a job (useful for troubleshooting). More info [https://www.rc.colorado.edu/book/export/html/613 here].
+pbsnodes -a #To look at the status of each node
 ### Once job has completed, you can get additional information
@@ Line 79: / Line 164: @@
 sacct -j <jobid> --format=JobID,JobName,MaxRSS,Elapsed     #Stats on completed jobs by jobID
 sacct -u <username> --format=JobID,JobName,MaxRSS,Elapsed  #View same info for all jobs of user
+#To check graphically how much storage is being taken up in /work/KellerLab folder
+xdiskusage /work/KellerLab/sizes
-====== Controlling jobs ======
+====== Running and Controlling jobs ======
+sbatch <shell.script.name.sh> #run shell script
+sinteractive --nodelist=bnode0102 #run interactive job on node "bnode0102"
 scancel <jobid>                  #Cancel one job
 scancel -u <username>            #Cancel all jobs for user
@@ Line 113: / Line 204: @@
 This only needs to be done once.
-Then launch your interactive job.
+Then launch your interactive job on the IBG himem node.
 module load slurm/blanca && sinteractive --qos=blanca-ibg
+Or onto any free himem node
+module load slurm/blanca && sinteractive --qos=blanca
@@ Line 180: / Line 276: @@
 tabix -h chr${chr}/chr${chr}impv1.vcf.gz ${chr}:${startpos}-${endpos} | bgzip -c > chr$chr/chr${chr}impv1.${chr}_$startpos-$endpos.vcf.gz
+======= Compiling software =======
+RC intentionally keeps some header files off the login nodes to dissuage people from trying to compile on those nodes. Instead, use the janus-compile nodes to compile your software. Log in to a login node and then run
+ssh janus-compile[1-4]