User Tools

Site Tools


keller_and_evans_lab:cu_research_computing

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
keller_and_evans_lab:cu_research_computing [2016/07/18 19:30]
matthew_keller /* Getting information on jobs */
keller_and_evans_lab:cu_research_computing [2020/02/12 09:03] (current)
lessem [Don't save temporary files in /work/KellerLab]
Line 1: Line 1:
-General documentation for using RC is [[[https://www.rc.colorado.edu/support|on their website]]]. This document will mostly cover specific instructions for using it in the Vrieze and Keller labs. 
  
 +This document will mostly cover specific instructions for using RC in the Vrieze and Keller labs. We will try to update this, but RC is a bit of a moving target, so some of what is written below may now be outdated.
  
-======= Logging in ======= 
  
 +======= Getting started =======
 +
 +General documentation for using RC is [[[https://curc.readthedocs.io/en/latest/|on their website]]]. We recommend that ALL new users first read these overviews on that webpage. In particular: \\
 +Logging In <which you've already done>
 +Duo 2-factor Authentication <which you've already done>
 +Allocations
 +Node Types
 +Filesystems
 +The modules system
 +The PetaLibrary
 +Running applications with Jobs
 +Batch Jobs and Job Scripting
 +Interactive Jobs
 +Useful Slurm commands
 +Job Resource Information
 +squeue status and reason codes
 +Containerization on Summit
 +
 +
 +======= Overview of best practices =======
 +
 +[[https://docs.google.com/presentation/d/1yQToDgohYZIzwu9NL0Z5ORpmN-9qeY3cpT7RDusEg-w/edit?usp=sharing|Richard Border's slides]] from October 2019 (might be out of date)
 +
 +
 +
 +======= Logging in =======
  
-Put these settings in your ''~/.ssh/config'' file so you only have to enter your OTP once per session, instead of for every ssh connection you make+Put these settings in your ''~/.ssh/config'' file so you only have to enter your OTP once per session, instead of for every ssh connection you make. See the [[https://docs.google.com/presentation/d/1FMir2LDbBJffXZ5aMjIJhRA_7HO99xrxT__AFxmgU_c/edit?usp=sharing|slides]] describing this, and instructions to do the same on [[https://www.chiark.greenend.org.uk/~sgtatham/putty/|PuTTY]] and [[https://www.bitvise.com/ssh-client|Bitvise]].
      
   # These rules only apply for connections to login.rc.colorado.edu   # These rules only apply for connections to login.rc.colorado.edu
Line 13: Line 38:
   ControlMaster auto   ControlMaster auto
   ControlPath ~/.ssh/%r@%h:%p   ControlPath ~/.ssh/%r@%h:%p
 +  # Keep the ssh connection open, even when the last session closes
 +  ControlPersist yes
   # X forwarding. Remove this on a Mac if you   # X forwarding. Remove this on a Mac if you
   # don't want it to start X11 each time you   # don't want it to start X11 each time you
Line 24: Line 51:
  
  
-These settings should work from Mac and Linux. I'm not sure how to do the equivalent from Windows with Putty. On a Mac, those settings will cause X11 to start. If you don't want that to happen, then remove the ''ForwardX11 yes'' line.+These settings should work from Mac and Linux. For Windows, see the [[https://docs.google.com/presentation/d/1FMir2LDbBJffXZ5aMjIJhRA_7HO99xrxT__AFxmgU_c/edit?usp=sharing|slides]]. On a Mac, those settings will cause X11 to start. If you don't want that to happen, then remove the ''ForwardX11 yes'' line. 
 + 
 +For those with access to summit (ONLY!), here are the steps to using it: 
 +   
 +  #From a login node: 
 +  ssh -YC <uname>@shas07{01-15} 
 +   
 +  #In your shell script: 
 +  No need to include -A UCB00000442 
 +   --partition=shas 
 +   
 +  #To run R: 
 +  ml load R 
 +  ml load gcc 
 +  R 
  
  
-======= Don't save temporary files in /work/KellerLab =======+======= Don't save temporary files in /pl/active/KellerLab =======
  
-Everything you save to /work/KellerLab is backed up automatically by the system. This is generally good, unless you're saving files that you only need for the moment, or at most for a few days. We don't need these files long term and we don't need them backed up, but they will be backed up, and **they'll count against our total storage allocation for the next year**. The last thing we want is to pay for storage for a bunch of large temporary files that are also backed up.+Everything you save to /pl/active/KellerLab (and /pl/active/IBG) is backed up automatically by the system. This is generally good, unless you're saving files that you only need for the moment, or at most for a few days. We don't need these files long term and we don't need them backed up, but they will be backed up, and **they'll count against our total storage allocation for the next year**. The last thing we want is to pay for storage for a bunch of large temporary files that are also backed up.
  
 If the temporary files are deleted by you after they're automatically backed up, they live on for a year before they're finally removed entirely from the system. That's one year of paying for storage for unneeded files. If the temporary files are deleted by you after they're automatically backed up, they live on for a year before they're finally removed entirely from the system. That's one year of paying for storage for unneeded files.
Line 63: Line 105:
  
 ======= Slurm ======= ======= Slurm =======
 +
 +
 +
 +====== Queues ======
 +
 +
 +#if you want to run on ibg himem, you need to load the right module
 +module load slurm/blanca
 +
 +#then in your shell script
 +#SBATCH --qos=blanca-ibg
 +
 +#If you want to run on normal queues, then:
 +module load slurm/slurm
 +
 +#then in your shell script, one of the below, depending on what queue you want
 +#SBATCH --qos=himem
 +#SBATCH --qos=crestone
 +#SBATCH --qos=janus
 +
 +
  
  
Line 71: Line 134:
 #To check our balance on our allocations and get the account id# #To check our balance on our allocations and get the account id#
 sbank balance statement sbank balance statement
 +sacctmgr -p show user <username> #alternatively to find the acct#
  
 #To see how busy the nodes are. For seeing how many janus nodes are available, look for the  #To see how busy the nodes are. For seeing how many janus nodes are available, look for the 
Line 76: Line 140:
 sinfo -l   sinfo -l  
  
-squeue -u <username>+#checking on submissions for a user 
 +squeue -u <username>  #To see your job statuses (R is for running, PD pending, CG completing, CD completed, F failed, TO timeout)
 squeue -u <username> -t RUNNING squeue -u <username> -t RUNNING
 squeue -u <username> -t PENDING squeue -u <username> -t PENDING
-showq-slurm -o -U -q <partition #List job priority order for current user (you) in given partition +squeue -<username--start #Get an estimate of when jobs will start 
-scontrol show jobid -dd <jobid>   #List detailed information for job (useful for troubleshooting) + 
-squeue -u mmkeller #To see your job statuses (R is for running, PD pendingCG completingCD completed, F failedTO timeout+#detailed information on queue (who is running on ithow many cpus requestedmemory requestedtime informationetc.
-squeue -u mmkeller --start #Get an estimate of when jobs will start +squeue -q blanca-ibg -o %u,%c,%e,%m,%j,%l,%L,%o,%R,%t | column -ts ',' 
-pbsnodes -#To look at the status of each node+ 
 +#current status of queues
 qstat -i #To see jobs that are currently pending (this is helpful for seeing if queue is overbooked) qstat -i #To see jobs that are currently pending (this is helpful for seeing if queue is overbooked)
 qstat -r #To see jobs that are currently running qstat -r #To see jobs that are currently running
Line 90: Line 156:
 qstat -r -n #To see running jobs, and which nodes they are running on qstat -r -n #To see running jobs, and which nodes they are running on
  
 +#other commands
 +showq-slurm -o -U -q <partition>  #List job priority order for current user (you) in given partition
 +scontrol show jobid -dd <jobid>   #List detailed information for a job (useful for troubleshooting). More info [https://www.rc.colorado.edu/book/export/html/613 here].
 +pbsnodes -a #To look at the status of each node
  
 ### Once job has completed, you can get additional information  ### Once job has completed, you can get additional information 
Line 103: Line 173:
  
  
-====== Controlling jobs ======+====== Running and Controlling jobs ======
  
  
 +sbatch <shell.script.name.sh> #run shell script
 +sinteractive --nodelist=bnode0102 #run interactive job on node "bnode0102"
 scancel <jobid>                  #Cancel one job scancel <jobid>                  #Cancel one job
 scancel -u <username>            #Cancel all jobs for user scancel -u <username>            #Cancel all jobs for user
keller_and_evans_lab/cu_research_computing.1468891847.txt.gz · Last modified: 2016/07/18 19:30 by matthew_keller