Basics
Look at huge files:
Basic file manipulation:
Look at folder sizes:
Remove a lot of files in different folders
List all files modified in the current directory (./) in last 9 days:
Rename multiple files in a folder
Copy a file over the network (see also rsync)
Copy entire directory over the network (see also rsync)
Change default permissions for the group
Compressing files
Running multiple jobs at once in a shell script
Pausing for a given amount of time between starting processes in a shell script
Look at all files that have changed in last xx days
Look at multiple files at same time
Refresh .bashrc or .bash_profile files
Download files using command line
Copy (or move) files that have changed within the last 5 days
Get basic information about the computer or node that you are on

Basics

moves or renames files

mv   file destination

copies files

cp    file destination

lists files in directory

ls
  ls -ltrah

look at what processes are running on server (for CPU and RAM usage)

top

look at what processes are using input/output onto hard disks

iotop

see what processes mmkeller is running

ps -u mmkeller

change/modify permissions; in this case, add read & write permissions to the group

chmod g+rw  directory

change owner of directory to newguy

chown newguy directory

change group of directory to km

chgrp km directory

Look at huge files:

After you do the below, type -S (chop off long lines) and -N (put line number on lines)

less file

Many times you want tab delimited columns to easily read the file, here's how:

column -t file | less -SN

finding particular files:

ls -l *has*these*words*lastword | grep -v notthiswordthough

“*” means anything in between these words

or get just the last 500 lines and 'pipe' them to less:

tail -n500 file | less

Count number of rows in a file:

wc -l file

Count number of columns in a file:

first line only:

awk -F ' ' '{print NF; exit}' file

longest line in file:

awk '{ if (NF > max) max = NF } END { print max }' file

Webpage on looking at large files in unix: Look|at Large Files]

Basic file manipulation:

Remove the first 6 columns of a file:

cut -f 7- -d ' ' infile > outfile

Keep columns 400 through 897 of a file seaparated by commas:

cut -f 400-897 -d ',' infile > outfile

Keep only columns 1, 2, 3, 7, 8, & 10 of a file:

cut -f 1,2,3,7,8,10 file.ped > checkfile.ped

Keep all rows starting with the characters “UCL”:

grep '^UCL' input > output

Keep all rows that have matching characters in the file called 'IID-list':

grep -f IID-list input > output

Remove the top line of file1 and add that line to file 2:

head -n1 file1 > header
  cat header file2 > file3

change all 1'2 to 2's and 0's to 1's (important if wanting to use a 0/1 allelic codes in PLINK):

tr 1 2 < file1 > file2
  tr 0 1 file2 > file3

An alternative way to substitute a value using sed (like find and replace to all items in a file):

sed 's/string/cheese/g' < infile > outfile

(Note, use a different file name for the outfile or else sed will return an empty file)

Using awk and perl to split a column into multiple columns:

file | awk '{print $6}' | perl -pe 's{/}{\t}g' > newfile

awk grabs the 6th column, and perl switches the / character into tab delimiters

file | awk 'NR>1 {print $6}' | perl -pe 's{/}{\t}g' > newfile

the same command but without including the column name

Using perl to change .xls columns into a flat file column:

perl -pi -e 's/\x0D/\n/g' copied_xls_column.txt > newfile.txt

merge two files using a common ID column (first column):

join file1 file2 > file3

__Remove first row of a flat file:

sed '1d' filename > filename2

Alternatively, for a file with 500 rows:

tail -n499 filename > filename2

Links for file manipulation in UNIX: For working with large files, see this page: Large|Files in Unix]

And for text processing commands, see this page: Text|Processing in Unix]

Look at folder sizes:

du -h --max-depth 1

if you want to sort the above

du --max-depth 1 /home/ | sort -nr

Remove a lot of files in different folders

this will remove file ss3.out that exists in 100s of folders in the current directory

find ./ -name 'ss3.out' -exec rm {} \;

The words following the -exec option is the command that you want to execute i.e. rm in this case.

{\}\ is basically an indicator that the filenames returned by the search should be substituted here.

\; is the terminating string, and is required at the end of the command.

Remove files starting with “mm” that exists in 100s of folders in the current directory

find ./ -name mm* -exec rm {} \;

Before doing the above, you might look to see if it is going to remove the files you want!

find ./ -name mm* > look

List all files modified in the current directory (./) in last 9 days:

find ./ -mtime -9d -exec ls -lt {} \;

Same thing, but only R scripts modified in last 9 days:

find ./ -mtime -9d -name ".R" -exec ls -lt {} \;

Rename multiple files in a folder

Use the rename function. A perl expression must come first. E.g., to change all .txt files to .bak:

rename 's/\.txt/.bak/' *.txt

To rename all files beginning with “NEW.” and change them to “OLD.”:

rename 's/^NEW./OLD./' NEW.*

Copy a file over the network (see also rsync)

scp files.to.copy user@server.colorado.edu:~/myfolder/subfolder

Copy entire directory over the network (see also rsync)

scp -r directory.to.copy user@server.colorado.edu:~/myfolder/subfolder

Change default permissions for the group

Add this line to your ~/.bashrc file:

umask 002 #let group have read/write/execute permissions

Compressing files

A single file, converting the file to a compressed file

gzip filename.ext

A single file, leaving the original file unchanged

gzip -c filename.ext > new.zipped.file.gz

Multiple files

gzip -c filename.ext filename2.ext > new.zipped.file.gz

Extract a gzipped file

gunzip new.zipped.file.gz

A directory

tar -cvzf tar.file.name.tar directory

Extract a tar'd directory

tar -xvf tar.file.name.tar

Extract a tar.gz directory

tar -zxvf tar.file.name.tar.gz

Extract MULTIPLE tar.gz directories

for i in *.tar.gz; do tar -zxvf "$i"; done

Running multiple jobs at once in a shell script

 ./S800-loop1.sh &
 ./S800-loop2.sh &
 ./S800-loop3.sh &

Pausing for a given amount of time between starting processes in a shell script

 ./S800-loop1.sh &
 sleep 45m
 ./S800-loop2.sh &
 sleep 45m
 ./S800-loop3.sh &
 sleep 45m

Look at all files that have changed in last xx days

find /directory -type f -ctime -xx | more

Look at multiple files at same time

less file1 file2 file3 
  :n forward to next file 
  :p backward to previous file

Refresh .bashrc or .bash_profile files

After you have modified one of the files above, you need to refresh your OS so it uses the correct .bashrc or .bash_profile settings. You can either restart the computer, logout and log back in, or do this:

source ~/.bash_profile #assuming that you've modified .bash_profile

Download files using command line

lwp-download http://pngu.mgh.harvard.edu/~purcell/plink/dist/plink-1.07-x86_64.zip

OR

wget http://pngu.mgh.harvard.edu/~purcell/plink/dist/plink-1.07-x86_64.zip

Copy (or move) files that have changed within the last 5 days

find ./ -mtime -5 -exec cp {} ~new/path/folder \;

Make sure that the target folder isn't in the folder being found; i.e., that ~/new/path/folder isn't in ./. Otherwise, you'll start trying to copy the contents of the folder itself back into the folder.

Get basic information about the computer or node that you are on

cat /proc/cpuinfo 
  cat /proc/cpuinfo | grep processor
 cat /proc/cpuinfo | grep processor | wc

IBG Wiki

Table of Contents