import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn
from math import log, isnan
%matplotlib inline
Import hail and create the HailContext if you haven't already done so.
import hail
hc = hail.HailContext()
The Hail expression language is used everywhere in Hail: filtering conditions, describing covariates and phenotypes, generating synthetic data, plotting, exporting, etc. You can evaluate a Hail expression with the HailContext method eval_expr_typed. eval_expr_typed
returns a tuple with the result of evaluating the expression and the type of the expression. Use eval_expr if you just want the value. We'll use eval_expr_typed
throughout so you can become more comfortable with types in Hail.
Let's start with simple primitve types: Boolean, Int, Double, String. Hail expressions are passed as Python strings to Hail methods.
hc.eval_expr_typed('true') # the Boolean literals are true and false
The return value is True
, not true
. Why? When values are returned by Hail methods, they are automatically converted to the corresponding Python value.
hc.eval_expr_typed('123')
hc.eval_expr_typed('123.45')
String literals are denoted with double-quotes.
hc.eval_expr_typed('"Hello, world"')
Like R, all values in Hail can be missing. Most operations, like addition, return missing if any of their inputs is missing. There are a few special operations for manipulating missing values. There is also a missing literal, but you have to specify it's type. Missing Hail values are converted to None
in Python.
hc.eval_expr_typed('NA: Int') # missing Int
hc.eval_expr_typed('NA: Dict[Genotype, Double]')
hc.eval_expr_typed('1 + NA: Int')
You can test missingness with isDefined
and isMissing
.
hc.eval_expr_typed('isDefined(1)')
hc.eval_expr_typed('isDefined(NA: Int)')
hc.eval_expr_typed('isMissing(NA: Double)')
orElse
lets you convert missing to a default value and orMissing
lets you turn a value into missing based on a condtion.
hc.eval_expr_typed('orElse(5, 2)')
hc.eval_expr_typed('orElse(NA: Int, 2)')
hc.eval_expr_typed('orMissing(true, 5)')
hc.eval_expr_typed('orMissing(false, 5)')
You can assign a value to a variable with a let
expression. Here is an example.
hc.eval_expr_typed('let a = 5 in a + 1')
The variable, here a
is only visible in the body of the let, the expression following in
. You can assign multiple variables. Variable assignments are separated by and
. Each variable is visible in the right hand side of the following variables as well as the body of the let. For example:
hc.eval_expr_typed('''
let a = 5
and b = a + 1
in a * b
''')
Unlike other languages, conditionals in Hail return a value. The arms of the conditional must have the same type. The predicate must be of type Boolean. If the predicate is missing, the value of the entire conditional is missing. Here are some simple examples.
hc.eval_expr_typed('if (true) 1 else 2')
hc.eval_expr_typed('if (false) 1 else 2')
hc.eval_expr_typed('if (NA: Boolean) 1 else 2')
hc.eval_expr_typed('if (true) 1 else "two"') # type error, Int and String incompatible
Hail has several compound types: Array[T]
, Set[T]
, Dict[K, V]
, Struct
s and Aggregable[T]
. T
, K
and V
can be any type, including other compound types. Array[T]
are similar to Python's lists, except they must be homogenous: that is, each element must be of the same type. Arrays are 0-indexed. Here are some examples of simple array expressions.
Array literals are constructed with square brackets.
hc.eval_expr_typed('[1, 2, 3, 4, 5]')
Arrays are indexed with square brackets and support Python's slice syntax.
hc.eval_expr_typed('let a = [1, 2, 3, 4, 5] in a[0]')
hc.eval_expr_typed('let a = [1, 2, 3, 4, 5] in a[1:3]')
hc.eval_expr_typed('let a = [1, 2, 3, 4, 5] in a[1:]') # slice to the end, a[:4] to slice from the beginning
hc.eval_expr_typed('let a = [1, 2, 3, 4, 5] in a.length')
Arrays can be transformed with functional operators filter
and map
. These operations return a new array, never modify the original.
# keep the elements that are less than 10
hc.eval_expr_typed('[1, 2, 22, 7, 10, 11].filter(x => x < 10)')
# square the elements of an array
hc.eval_expr_typed('[1, 2, 22, 7, 10, 11].map(x => x * x)')
The full list of methods on arrays can be found here.
Numeric arrays, like Array[Int]
and Array[Double]
have additional operations like max
, mean
, median
, sort
. For a full list, see, for example, Array[Int]. Here are a few examples.
hc.eval_expr_typed('[1, 2, 22, 7, 10, 11].sum()')
hc.eval_expr_typed('[1, 2, 22, 7, 10, 11].max()')
Struct
s¶Struct
s are a collection of named values known as fields. Hail does not have tuples like Python. Unlike arrays, the values can be heterogenous. Unlike Dict
s, the set of names are part of the type and must be known statically. Struct
s are constructed with a syntax similar to Python's dict
syntax. Struct
fields are accessed using the .
syntax.
x, t = hc.eval_expr_typed('{gene: "ACBD", function: "LOF", nHet: 12}')
print(x)
print(t)
hc.eval_expr_typed('let s = {gene: "ACBD", function: "LOF", nHet: 12} in s.gene')
hc.eval_expr_typed('let s = NA: Struct { gene: String, function: String, nHet: Int} in s.gene')
Write an expression that mean-centers and variance normalizes an array. Return a struct with 3 fields: the mean, the variance and the normalized array. We've provided a template to get you started. Fill in the <?>
.
hc.eval_expr_typed('''
let a = [1, -2, 11, 3, -2]
and mu = <?>
and var = a.map(x => <?>).sum()
in {mean: <?>, variance: <?>, normalized: <?>}
''')
What if a
contains an missing value NA: Int
? Will your code still work?