Expression quantitative trait loci (eQTL) analysis identifies genetic variants associated with gene expression levels. Proper quality control (QC) is essential to ensure reliable results. Below, we outline standard QC steps for genotype data, RNA-seq count data, and covariates, along with checks for proper data integration.
Genotype data typically comes in formats such as VCF, PLINK (.bed/.bim/.fam), or similar.
The RNA-seq count file used for eQTL analysis is a gene expression count matrix (TSV/CSV format). The matrix represents raw or normalized gene expression counts for multiple samples.
Structure of the File:
1. Rows: Genes (or transcripts, depending on the quantification
method).
2. Columns: Individual samples (biological replicates, conditions,
etc.).
3. First column: Unique gene/transcript identifiers (e.g., Ensembl IDs
or gene symbols).
4. Remaining columns: Integer values representing raw or normalized
counts per sample.
You are going to be performing an eQTL analysis today. You will be using data from the GEUVADIS project. The GEUVADIS (Genetic European Variation in Health and Disease) Project is an international initiative aimed at characterizing the impact of genetic variation on gene expression. By integrating whole-genome sequencing with RNA sequencing data from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project, GEUVADIS has provided a rich resource for studying expression quantitative trait loci (eQTLs). This dataset has been instrumental in understanding how genetic variants influence gene regulation across populations and has contributed to advances in functional genomics, disease mapping, and precision medicine. The project’s findings serve as a foundation for linking genetic variation to molecular phenotypes, enhancing our ability to interpret genome-wide association studies (GWAS) and their implications for human health.
You have been provided with 3 files for this practical:
geuvadis_chr22_genotypes.txt: QC’d genotype data from chr22 (MAF > 5%, SNVs only, HWE p-value < 1e-6 removed).
geuvadis_chr22_rnaseq.txt: PEER-residualized RPKM on chr22 genes (some non-coding genes removed).
geuvadis_covariates.txt: Sample covariates (including sex and genotype PCs).
Load required R packages for the practical and set your working directory.
# Load necessary libraries
library(data.table) # data I/O
library(dplyr) # data frame manipulation
library(stringr) # string manipulation
library(MatrixEQTL) # eQTL calculation
library(coloc) # Colocalization analysis
library(locuszoomr) # Locuszoom plots
library(EnsDb.Hsapiens.v75) # Gene mapping for locuszoom plots
# Set working directory
#setwd("~/practicals/4.3.GenomeAnnotation_WeiZhou/final")
You will now load the genotype, RNA-Seq, and covariate files and answer some basic questions about their contents.
Genotyping was performed through low-coverage whole-genome sequencing.
The genotype data was filtered using PLINK2.
Original genotypes were downloaded from the 1000 Genomes Project.
The dataset has been restricted to subjects from the YRI (Yoruba in Ibadan, Nigeria) population.
geno = fread("/usr/local/data/geuvadis_chr22_genotypes.txt", data.table = F)
head(geno[,1:10])
Q1: How many samples are there? A: 108
nrow(geno)
## [1] 108
Q2: How many variants are included? A: 95,190
ncol(geno) - 6
## [1] 95190
The RNA-Seq data was obtained from the GEUVADIS project.
It contains normalized gene expression counts.
rna = fread("/usr/local/data/geuvadis_chr22_rnaseq.txt", data.table = F)
head(rna)[,1:10]
Q3: How many subjects are there? A: 462
ncol(rna) - 4
## [1] 462
Q4: How many genes are there? A: 420
nrow(rna)
## [1] 420
Q5: Confirm that all genes are on chromosome 22.
table(rna$Chr)
##
## 22
## 420
Covariate information was downloaded from the 1000 Genomes Project.
Genetic principal components were generated previously for the 2024 IGES tutorial on PRS calculation.
covar = fread("/usr/local/data/geuvadis_covariates.txt", data.table = F)
head(covar)
Q6: How many subjects are included? A: 4,978
nrow(covar)
## [1] 4978
Q7: How many columns are included? A: 14
ncol(covar)
## [1] 14
Q8: Are there any missing data in the covariate file? A: Yes
Q9: What are some of the options you could do if you had missing values in the matrix?
For our purposes, we will subset to samples present across all 3 files. This will remove missing data from the covariates file.
Q10: Do all files have the expected format?
Q11: Do all files have the same set of subjects? A: No
MatrixEQTL analysis requires 3 files: genotypes, normalized expression, and covariates. Now you will focus on preparing the genotypes, RNA-Seq, and covariate data for use by MatrixEQTL.
To prepare genotype data for MatrixEQTL, we need to prepare 2 files:
# Extract genotype values
geno_vals = geno %>% dplyr::select(IID, starts_with("22"))
rownames(geno_vals) = geno_vals$IID
geno_vals$IID = NULL
geno_vals = geno_vals[rownames(geno_vals) %in% colnames(rna), ]
geno_vals = as.data.frame(t(geno_vals))
# Filter rows where each homozygous genotype appears at least 4 times
geno_vals = geno_vals[rowSums(geno_vals == 0) >= 3 & rowSums(geno_vals == 2) >= 3, ]
# Write out formatted genotypes
fwrite(geno_vals, "geno_vals.txt", row.names = T)
# genotype locus information
geno_locs = str_split(rownames(geno_vals), ":", simplify = T)
geno_locs = as.data.frame(cbind(rownames(geno_vals), geno_locs))
colnames(geno_locs) = c("snp", "chr", "pos")
geno_locs$pos = str_split(geno_locs$pos, "\\_", simplify = T)[,1]
# Write out formatted genotype positions
fwrite(geno_locs, "geno_locs.txt")
To prepare RNA-Seq data for MatrixEQTL, we need to prepare 2 files:
rna_vals = rna[, c(2, 5:ncol(rna))]
rownames(rna_vals) = rna_vals$Gene_Symbol
rna_vals$Gene_Symbol = NULL
rna_vals = rna_vals[, colnames(rna_vals) %in% colnames(geno_vals)] #445
# Write out formatted covariates
fwrite(rna_vals, "rna_vals.txt", row.names = T)
# RNA locus information
rna_locs = rna[, 2:4]
rna_locs$Coord = rna_locs$Coord - 1e6
rna_locs$Coord2 = rna_locs$Coord + 2e6
names(rna_locs) = c("geneid", "chr", "s1", "s2")
# Write out formatted covariates
fwrite(rna_locs, "rna_locs.txt")
We will use sex and 5 genetic principal components as covariates in modeling.
rownames(covar) = covar$`Sample name`
covar = covar %>% dplyr::select(Sex, PC1, PC2, PC3, PC4, PC5)
covar = covar[rownames(covar) %in% colnames(rna_vals), , drop = F]
# Convert sex to a numeric (binary) variable
covar$Sex[covar$Sex == "male"] = 0
covar$Sex[covar$Sex == "female"] = 1
# Order covariates by sample name
covar = covar[order(rownames(covar)), ]
# Confirm sample order matches between files
all(rownames(covar) == colnames(rna_vals))
## [1] TRUE
# Write out formatted covariates
fwrite(as.data.frame(t(covar)), "covariates.txt")
You are going to perform both cis-eQTL and trans-eQTL analyses simultaneously. A cis-eQTL analysis focuses on testing SNPs physically near (or in) the gene being tested. The cis definition must be specified. A common cis-definition is: +/-Mb from the TSS, such that for a given gene, only SNPs inside the specified window will be tested for association with gene expression. Because each gene has its’ own location in the genome, the set of SNPs being tested varies by gene. A cis-eQTL analysis focuses on regions surrounding the gene that are most likely to harbor regulatory elements that affect the gene. Where the cis-eQTL analysis focuses on SNPs physically close to a given gene, a trans-eQTL analysis will test for associations with all SNPs that are not close to the gene (ie on the same chromosome, but far away, or on different chromosomes. A trans-eQTL analysis involves conducting many statistical tests (MANY! SNPs per gene x many genes), thus multiple testing correction is imperative to control for false positive associations.
You will use the software package MatrixEQTL, designed for fast eQTL analysis on large datasets. MatrixEQTL can test for association between genotype and gene expression using linear regression with either additive or ANOVA genotype effects. The models can include covariates to account for factors as population stratification, sex, and clinical/experimental variables variables. For more details see Shabalin (2012) [LINK].
To perform both eQTL analyses, you will specify the following model:
PEER-normalized gene expression ~ Sex + PC1 + PC2 + PC3 + PC4 + PC5
snps = SlicedData$new()
snps$fileDelimiter = ","
snps$fileOmitCharacters = "NA"
snps$fileSkipRows = 1
snps$fileSkipColumns = 1
snps$fileSliceSize = 2000
snps$LoadFile("geno_vals.txt")
## Rows read: 2,000
## Rows read: 4,000
## Rows read: 6,000
## Rows read: 8,000
## Rows read: 10,000
## Rows read: 12,000
## Rows read: 14,000
## Rows read: 16,000
## Rows read: 18,000
## Rows read: 20,000
## Rows read: 22,000
## Rows read: 24,000
## Rows read: 26,000
## Rows read: 28,000
## Rows read: 30,000
## Rows read: 32,000
## Rows read: 34,000
## Rows read: 36,000
## Rows read: 38,000
## Rows read: 40,000
## Rows read: 42,000
## Rows read: 44,000
## Rows read: 46,000
## Rows read: 48,000
## Rows read: 50,000
## Rows read: 52,000
## Rows read: 54,000
## Rows read: 56,000
## Rows read: 56684 done.
# Read in genotype positions
snpspos = read.table("geno_locs.txt", sep = ",", header = T)
gene = SlicedData$new()
gene$fileDelimiter = ","
gene$fileOmitCharacters = "NA"
gene$fileSkipRows = 1
gene$fileSkipColumns = 1
gene$fileSliceSize = 2000
gene$LoadFile("rna_vals.txt")
## Rows read: 420 done.
# Read in gene positions
genepos = read.table("rna_locs.txt", sep = ",", header = T)
# Read in covariates
cvrt = SlicedData$new()
cvrt$fileDelimiter = ","
cvrt$fileOmitCharacters = "NA"
cvrt$fileSkipRows = 1
cvrt$fileSkipColumns = 0
cvrt$LoadFile("covariates.txt")
## Rows read: 6 done.
# Run MatrixEQTL
me = Matrix_eQTL_main(
snps = snps,
gene = gene,
cvrt = cvrt,
output_file_name = "trans_eqtls.txt",
pvOutputThreshold = 1e-3, # p-value output threshold for trans-eqtls
useModel = modelLINEAR,
errorCovariance = numeric(),
verbose = TRUE,
output_file_name.cis = "cis_eqtls.txt",
pvOutputThreshold.cis = 1e-3, # p-value output threshold for cis-eqtls
snpspos = snpspos,
genepos = genepos,
cisDist = 1e6, # distance threshold for cis associations
pvalue.hist = "qqplot",
min.pv.by.genesnp = FALSE,
noFDRsaveMemory = FALSE)
## Matching data files and location files
## 420 of 420 genes matched
## 56684 of 56684 SNPs matched
## Task finished in 0.011 seconds
## Reordering genes
## Task finished in 1.368 seconds
## Processing covariates
## Task finished in 0.002 seconds
## Processing gene expression data (imputation, residualization)
## Task finished in 0.002 seconds
## Creating output file(s)
## Task finished in 0.042 seconds
## Performing eQTL analysis
## 3.44% done, 40 cis-eQTLs, 1,054 trans-eQTLs
## 6.89% done, 142 cis-eQTLs, 1,819 trans-eQTLs
## 10.34% done, 243 cis-eQTLs, 2,635 trans-eQTLs
## 13.79% done, 523 cis-eQTLs, 3,793 trans-eQTLs
## 17.24% done, 583 cis-eQTLs, 4,309 trans-eQTLs
## 20.68% done, 786 cis-eQTLs, 5,395 trans-eQTLs
## 24.13% done, 1,104 cis-eQTLs, 6,173 trans-eQTLs
## 27.58% done, 1,213 cis-eQTLs, 6,877 trans-eQTLs
## 31.03% done, 1,392 cis-eQTLs, 7,785 trans-eQTLs
## 34.48% done, 1,467 cis-eQTLs, 8,576 trans-eQTLs
## 37.93% done, 1,600 cis-eQTLs, 9,278 trans-eQTLs
## 41.37% done, 1,690 cis-eQTLs, 9,953 trans-eQTLs
## 44.82% done, 1,764 cis-eQTLs, 10,663 trans-eQTLs
## 48.27% done, 1,785 cis-eQTLs, 11,436 trans-eQTLs
## 51.72% done, 1,908 cis-eQTLs, 12,824 trans-eQTLs
## 55.17% done, 2,008 cis-eQTLs, 13,475 trans-eQTLs
## 58.62% done, 2,175 cis-eQTLs, 14,164 trans-eQTLs
## 62.06% done, 2,308 cis-eQTLs, 15,107 trans-eQTLs
## 65.51% done, 2,536 cis-eQTLs, 15,670 trans-eQTLs
## 68.96% done, 2,949 cis-eQTLs, 16,335 trans-eQTLs
## 72.41% done, 3,064 cis-eQTLs, 17,031 trans-eQTLs
## 75.86% done, 3,159 cis-eQTLs, 17,839 trans-eQTLs
## 79.31% done, 3,235 cis-eQTLs, 18,704 trans-eQTLs
## 82.75% done, 3,466 cis-eQTLs, 19,876 trans-eQTLs
## 86.20% done, 3,517 cis-eQTLs, 20,541 trans-eQTLs
## 89.65% done, 3,571 cis-eQTLs, 21,413 trans-eQTLs
## 93.10% done, 3,683 cis-eQTLs, 22,201 trans-eQTLs
## 96.55% done, 3,816 cis-eQTLs, 22,917 trans-eQTLs
## 100.00% done, 3,944 cis-eQTLs, 23,162 trans-eQTLs
## Task finished in 1.495 seconds
##
You will now do some preliminary evaluation of the eQTLs you have generated.
# Check the structure of results
str(me)
## List of 5
## $ time.in.sec: Named num 2.71
## ..- attr(*, "names")= chr "elapsed"
## $ param :List of 11
## ..$ output_file_name : chr "trans_eqtls.txt"
## ..$ pvOutputThreshold : num 0.001
## ..$ useModel : int 117348
## ..$ errorCovariance : num[0 , 1]
## ..$ verbose : logi TRUE
## ..$ output_file_name.cis : chr "cis_eqtls.txt"
## ..$ pvOutputThreshold.cis: num 0.001
## ..$ cisDist : num 1e+06
## ..$ pvalue.hist : chr "qqplot"
## ..$ min.pv.by.genesnp : logi FALSE
## ..$ dfFull : num 79
## $ all :List of 4
## ..$ ntests : num 23807280
## ..$ neqtls : num 27106
## ..$ hist.bins : num [1:6175] 0.00 2.24e-309 2.51e-309 2.82e-309 3.16e-309 ...
## ..$ hist.counts: num [1:6174] 0 0 0 0 0 0 0 0 0 0 ...
## $ trans :List of 5
## ..$ ntests : num 21306927
## ..$ neqtls : num 23162
## ..$ eqtls :'data.frame': 23162 obs. of 6 variables:
## .. ..$ snps : chr [1:23162] "22:33006654_C" "22:33007374_G" "22:33007514_G" "22:33008121_T" ...
## .. ..$ gene : chr [1:23162] "IGLL3P" "IGLL3P" "IGLL3P" "IGLL3P" ...
## .. ..$ statistic: num [1:23162] -6.53 -6.53 -6.53 -6.53 -6.4 ...
## .. ..$ pvalue : num [1:23162] 5.85e-09 5.85e-09 5.85e-09 5.85e-09 1.03e-08 ...
## .. ..$ FDR : num [1:23162] 0.0311 0.0311 0.0311 0.0311 0.0437 ...
## .. ..$ beta : num [1:23162] -0.33 -0.33 -0.33 -0.33 -0.319 ...
## ..$ hist.bins : num [1:6175] 0.00 2.24e-309 2.51e-309 2.82e-309 3.16e-309 ...
## ..$ hist.counts: num [1:6174] 0 0 0 0 0 0 0 0 0 0 ...
## $ cis :List of 5
## ..$ ntests : num 2500353
## ..$ neqtls : num 3944
## ..$ eqtls :'data.frame': 3944 obs. of 6 variables:
## .. ..$ snps : chr [1:3944] "22:46687220_G" "22:46688208_C" "22:46688321_T" "22:46689229_C" ...
## .. ..$ gene : chr [1:3944] "TTC38" "TTC38" "TTC38" "TTC38" ...
## .. ..$ statistic: num [1:3944] -16.8 -16.8 -16.8 -16.8 -16.3 ...
## .. ..$ pvalue : num [1:3944] 7.43e-28 7.43e-28 7.43e-28 7.43e-28 5.38e-27 ...
## .. ..$ FDR : num [1:3944] 4.64e-22 4.64e-22 4.64e-22 4.64e-22 2.69e-21 ...
## .. ..$ beta : num [1:3944] -5.07 -5.07 -5.07 -5.07 -5.08 ...
## ..$ hist.bins : num [1:6175] 0.00 2.24e-309 2.51e-309 2.82e-309 3.16e-309 ...
## ..$ hist.counts: num [1:6174] 0 0 0 0 0 0 0 0 0 0 ...
## - attr(*, "class")= chr [1:2] "list" "MatrixEQTL"
You will now plot a quantile-quantile (QQ) plot, which visually compares the distribution of observed p-values from your analysis to the expected uniform distribution under the null hypothesis. This plot helps assess whether there is an enrichment of small p-values, indicating potential true associations, or if the distribution follows the null expectation. A well-calibrated analysis should show most points aligning along the diagonal reference line, while deviations (particularly at the lower end of p-values) suggest significant signals. Systematic inflation or deflation of p-values can indicate confounding effects, population structure, or technical biases, which may require further investigation and correction. MatrixEQTL provides functionality that automatically returns a QQ plot stratified by cis and trans eQTLs.
# Generate QQ plots
plot(me)
Q12: Do cis and trans eQTLs seem well-calibrated? A: Yes
Q13: Do cis or trans eQTLs demonstrate stronger hits? A: cis-eQTLs
The title of the QQ plot displays how many tests were performed for cis- and trans-eqtls, respectively.
Q14: How many cis- and trans- associations were tested? A: 2,500,353 cis, 21,306,927 trans
# Create objects for cis and trans eqtls
cis = me$cis$eqtls
trans = me$trans$eqtls
Q15: What is the smallest nominal p-value you’ve found? A: 7.43e-28
Q16: What is the smallest FDR-adjusted p-value you’ve found? A: 4.64e-22
cis[1, ]
Cis-eQTL analysis tests numerous SNP-gene pairs, increasing the risk of false positives. MatrixEQTL controls the false discovery rate (FDR) using the q-value method, which estimates the proportion of false positives among significant findings. Q-values are less conservative than the Bonferroni correction and balance statistical power and false positive control, making them ideal for large-scale eQTL studies. A threshold of q < 0.05 is typically used to identify significant eQTLs while minimizing false associations.
Q17: How many significant (FDR < 0.05) cis-eQTLs (SNP-gene pairs) have you found? A: 510
nrow(cis[cis$FDR <= 0.05, ])
## [1] 510
Q18: How many cis eGenes have you found? A: 40
length(table(cis[cis$FDR <= 0.05, ]$gene))
## [1] 40
Q19: How many trans-eQTLs (SNP-gene pairs) have you found? A: 5
nrow(trans[trans$FDR <= 0.05, ])
## [1] 5
Q20: How many trans eGenes have you found? A: 1
length(table(trans[trans$FDR <= 0.05, ]$gene))
## [1] 1
head(cis)
head(trans)
You will now calculate distance between SNPs and eGenes. Once you’ve done so, plot histograms of distances for both cis and trans eQTLs.
# define positions
cis$chrom = str_split(cis$snps, ":", simplify = T)[,1]
cis$pos = str_split(cis$snps, ":", simplify = T)[,2]
cis$pos = str_split(cis$pos, "_", simplify = T)[,1]
trans$chrom = str_split(trans$snps, ":", simplify = T)[,1]
trans$pos = str_split(trans$snps, ":", simplify = T)[,2]
trans$pos = str_split(trans$pos, "_", simplify = T)[,1]
cis$distance = as.numeric(cis$pos) - as.numeric(genepos$s1[match(cis$gene, genepos$geneid)] + 1e6)
trans$distance = as.numeric(trans$pos) - as.numeric(genepos$s1[match(trans$gene, genepos$geneid)] + 1e6)
# Plot histograms of distances for cis and trans eQTLs
hist(cis$distance, main = "Cis eQTL Distances", xlab = "Distance from TSS (bp)", col = "red", breaks = 50)
Q21: Do cis-eQTLs appear enriched for a certain distance? A: Gene start sites
Now plot trans-eQTL distances.
hist(trans$distance, main = "Trans eQTL Distances", xlab = "Distance from TSS (bp)", col = "blue", breaks = 50)
Next you are going to visualize some of the cis-eQTLs. The idea is to select a specific SNP-gene pair and plot gene expression by genotype.
# Extract top eGenes for cis and trans
top_cis_gene = cis$gene[1]
top_trans_gene = trans$gene[1]
geno_sub_cis = geno_vals[rownames(geno_vals) %in% cis$snps[1], , drop = FALSE]
rna_sub_cis = rna_vals[rownames(rna_vals) %in% top_cis_gene, , drop = FALSE]
df_cis = data.frame(SNP = as.factor(t(geno_sub_cis)), Gene = as.numeric(t(rna_sub_cis)))
boxplot(Gene ~ SNP, data = df_cis, main = paste("Expression of", top_cis_gene, "by Genotype"))
geno_sub_trans = geno_vals[rownames(geno_vals) %in% trans$snps[1], , drop = FALSE]
rna_sub_trans = rna_vals[rownames(rna_vals) %in% top_trans_gene, , drop = FALSE]
df_trans = data.frame(SNP = as.factor(t(geno_sub_trans)), Gene = as.numeric(t(rna_sub_trans)))
boxplot(Gene ~ SNP, data = df_trans, main = paste("Expression of", top_trans_gene, "by Genotype"))
Q22: Which allele is associated with higher gene expression of TTC38? A: the minor allele
Q23: Is the eQTL effect size larger for the TTC38 cis-eQTL or for the IGLL3P trans-eQTL? Hint: the eQTL effect size can be quantified as the slope of the regression, or the log allelic fold change (aFC), which represents the fold difference in gene expression between haplotypes carrying the reference and alternative alleles at a given genetic variant. A: TTC38
Another way to visualize eQTL results is to plot summary statistics along the genome. In GWAS, we typically use a Manhattan plot to display associations across the entire genome. However, for cis-eQTL analysis, we are often interested in specific regions surrounding a gene, as each gene has its own cis-window (~20K genes in total). To explore associations at the gene level, LocusZoom is a widely used tool for visualizing genetic associations in their genomic context.
You will generate LocusZoom plots for both the top cis-eQTL and top trans-eQTL associations.
First, extract summary statistics for all SNP-gene pairs tested for the gene of interest from the MatrixEQTL output file.
Next, you will generate a LocusZoom plot for each gene.
To further refine association signals, you can use LDlink to retrieve LD information. LDlink requires an API token, which you can obtain by registering an account at the following link: https://ldlink.nih.gov/?tab=apiaccess. After receiving your token via email, replace the ‘token =’ argument in the code below with your access token.
cis$snps = stringr::str_split(cis$snps, "_", simplify = T)[,1]
cis$snps = paste0("chr", cis$snps)
# Plot top cis eGene
loc = locus(data = cis[cis$gene == cis$gene[1], ], gene = cis$gene[1], flank = 1e5, ens_db = "EnsDb.Hsapiens.v75")
## TTC38, chromosome 22, position 46563858 to 46789905
## 181 SNPs/datapoints
summary(loc)
## Gene TTC38
## Chromosome 22, position 46,563,858 to 46,789,905
## 181 SNPs/datapoints
## 7 gene transcripts
## 7 protein_coding
## Ensembl version: 75
## Organism: Homo sapiens
## Genome build: GRCh37
loc = link_LD(loc, pop = "YRI", token = "47d02c2de92e")
##
## LDlink server is working...
## Matched 178 SNPs (26.7 secs)
loc = link_recomb(loc, table = "hapMapRelease24YRIRecombMap")
## Retrieving recombination data from UCSC
locus_plot(loc)
Q24: Where, relative to TTC38, are the strongest associations? A: gene body
# Trans
trans$snps = stringr::str_split(trans$snps, "_", simplify = T)[,1]
trans$snps = paste0("chr", trans$snps)
# Plot top trans eGene
loc = locus(data = trans[trans$gene == trans$gene[1], ], gene = trans$gene[1], flank = c(1e4, 8e6), ens_db = "EnsDb.Hsapiens.v75")
## IGLL3P, chromosome 22, position 25704223 to 33716047
## 6 SNPs/datapoints
summary(loc)
## Gene IGLL3P
## Chromosome 22, position 25,704,223 to 33,716,047
## 6 SNPs/datapoints
## 240 gene transcripts
## 89 protein_coding, 39 pseudogene, 34 lincRNA, 33 antisense, 9 misc_RNA, 9 snRNA, 8 sense_intronic, 7 miRNA, 4 rRNA, 3 snoRNA, 2 IG_C_pseudogene, 2 IG_V_pseudogene, 1 processed_transcript
## Ensembl version: 75
## Organism: Homo sapiens
## Genome build: GRCh37
loc = link_LD(loc, pop = "YRI", token = "47d02c2de92e")
##
## LDlink server is working...
## Matched 6 SNPs (23.7 secs)
loc = link_recomb(loc, table = "hapMapRelease24YRIRecombMap")
## Retrieving recombination data from UCSC
locus_plot(loc, filter_gene_name = "IGLL3P") # only plot eGene for gene track due to large chromosomal window
Q25: How far is the SNP from IGLL3P? A: ~ 7 Mb
Now that you have identified eQTLs, the next step is to explore the SNPs and eGenes using publicly available databases. These resources provide insights into the functional context of genetic variants and their role in gene regulation and disease.
Use the UCSC genome browser to visualize the genomic location of your top cis-eQTL.
Q26: What rsid is associated with our variant of interest? A: rs73886792
This rsID can now be used for further exploration in Open Targets Genetics.
Use Open Targets Genetics to investigate associations between your variant and gene expression, disease traits, and drug targets.
Q27: Which population has the highest effect allele frequency? A: African/African-American
Q28: What genes does this variant act as an eQTL for? A: TTC38, TRMU, PKDREJ
Use the GTEx Portal to analyze tissue-specific gene expression and regulatory variants.
Q29: What tissue is TTC38 most highly expressed in? A: Liver
Q30: What cell type is TTC38 most frequently expressed in? A: Adipocytes
Q31: What is the top single tissue eqtl? A: chr22_46292311_C_A_b38; rs6008552
Q32: What tissue is this eQTL in? A: Artery - Tibial
Q33: What is the top single tissue sQTL? A: chr22_46292311_C_A_b38; rs6008552
Q34: What tissue is this sQTL in? A: Nerve - Tibial