IBG Wiki

This is an old revision of the document!

### PSYCH 3102 Behavioral Genetics ### Lab 1 – downloading and exploring a genome ### Author: Scott Vrieze

###################################### ### STEP 1, get a terminal working ### ###################################### ### ### If you have a PC running Windows, install CYGWIN ### https://www.cygwin.com/ ### CYGWIN will automatically install in C:/cygwin/ ### ### On a mac or linux computer, open up a terminal ###

################################################# ### STEP 2, make a directory in which to work ### ################################################# ### On a Windows PC, open up Cygwin. ### ### On a Mac or linux computer, open a terminal. ### ### A window with text will open up. This box is BY FAR the most ### powerful thing on your computer. The trick is learning how to use ### it. ### ### Let's practice!

### Let's see where you are by listing the contents of the directory ls

### Let's create a new directory called “bg”, where we can work. mkdir bg

### Check that you created that directory by running ls again ls

## Now, move into the bg directory cd bg

## You should be in the bg directory. Run ls to see what's in here ls

## The result should come up blank, because there's nothing in this ## directory yet! Let's find something interesting to put in here.

############################################ ### STEP 3, download the practice genome ### ############################################ ### ### The dataset is in our google drive folder, with the following ### direct link: ### https://drive.google.com/file/d/0B608ps4vtHUaWFNOWXJqZ0tDMXc/ ### ### Download the file and then move the file to, on a Windows computer: ### C:/cygwin/home/<username>/bg/ ### On a mac: ### /home/<username>/bg/

### Let's check and see if you got it right. Open a terminal and run cd bg

### Then list the contents of the directory ls ### You should see something like the following output: ### $ ls ### hu916767_20170324191934.txt ### ### If that's what you saw, congratulations, you put the file in the ### right place!

################################################ ### STEP 4, look at the contents of the file ### ################################################ ### ### In your terminal, go to the bg folder, then type less hu916767_20170324191934.txt

### That should open the file in your terminal. You can scroll up or ### down using the arrow keys. To scroll faster you can press the ### space bar. To close the “less” session, press “q”.

### What if we just want to look at the first few lines? head hu916767_20170324191934.txt

### The last few lines? tail hu916767_20170324191934.txt

### How many variants are there? Try “wc -l”. This will give you the ### number of lines in the file, which is approx the number of ### variants. wc -l hu916767_20170324191934.txt

### That's a lot of variants. How can I extract a certain variant, ### without scrolling through the whole file? grep 'rs9430244' hu916767_20170324191934.txt

### Try another one grep 'rs8176719' hu916767_20170324191934.txt

### Huh, what does DD mean? I thought nucleotides could be A, C, T, or ### G. Also – Google that variant. What phenotype does it affect? What ### phenotype does this person have?

### We can also grab both variants, if we wanted to grep -E 'rs8176719|rs9430244' hu916767_20170324191934.txt

###################################### ### STEP 5, join commands together ### ###################################### ### ### Now we'll do something called “piping”. Piping allows you to run a ### command on a file, then send the output of that command to a new ### command, and possibly on to a new command. Let's give it a try.

### We saw above that “grep” allows you to extract all lines that ### match a certain character string. We also saw that “wc -l” counts ### the number of lines. Can we combine these two commands? ### ### Let's extract all the variants that are “GG” grep 'GG' hu916767_20170324191934.txt

### OK, that didn't work so well, the output just kept on feeding our ### screen. Instead, we'll use a pipe to send that output to wc -l grep 'GG' hu916767_20170324191934.txt | wc -l

### How about all SNPs that are homozygous? grep -E 'GG|CC|TT|AA' hu916767_20170324191934.txt | wc -l

### How about all the variants that are homozygous? grep -E 'GG|CC|TT|AA|II|DD' hu916767_20170324191934.txt | wc -l

### How many indels are there? grep -E 'II|DD|ID|DI' hu916767_20170324191934.txt | wc -l

### Now, a little trickier. How many variants are on chromosome 1? Is ### this command going to work? Why or why not? grep -E '1' hu916767_20170324191934.txt