import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn
from math import log, isnan

%matplotlib inline

Import hail and create the HailContext if you haven't already done so.

import hail
hc = hail.HailContext()

Hail Expression Language¶

The Hail expression language is used everywhere in Hail: filtering conditions, describing covariates and phenotypes, generating synthetic data, plotting, exporting, etc. You can evaluate a Hail expression with the HailContext method eval_expr_typed. eval_expr_typed returns a tuple with the result of evaluating the expression and the type of the expression. Use eval_expr if you just want the value. We'll use eval_expr_typed throughout so you can become more comfortable with types in Hail.

Primitive Types¶

Let's start with simple primitve types: Boolean, Int, Double, String. Hail expressions are passed as Python strings to Hail methods.

hc.eval_expr_typed('true') # the Boolean literals are true and false

The return value is True, not true. Why? When values are returned by Hail methods, they are automatically converted to the corresponding Python value.

hc.eval_expr_typed('123')

hc.eval_expr_typed('123.45')

String literals are denoted with double-quotes.

hc.eval_expr_typed('"Hello, world"')

Primitive types support all the usual operations you'd expect. For details, refer to the documentation on operators and types.

Missingness¶

Like R, all values in Hail can be missing. Most operations, like addition, return missing if any of their inputs is missing. There are a few special operations for manipulating missing values. There is also a missing literal, but you have to specify it's type. Missing Hail values are converted to None in Python.

hc.eval_expr_typed('NA: Int') # missing Int

hc.eval_expr_typed('NA: Dict[Genotype, Double]')

hc.eval_expr_typed('1 + NA: Int')

You can test missingness with isDefined and isMissing.

hc.eval_expr_typed('isDefined(1)')

hc.eval_expr_typed('isDefined(NA: Int)')

hc.eval_expr_typed('isMissing(NA: Double)')

orElse lets you convert missing to a default value and orMissing lets you turn a value into missing based on a condtion.

hc.eval_expr_typed('orElse(5, 2)')

hc.eval_expr_typed('orElse(NA: Int, 2)')

hc.eval_expr_typed('orMissing(true, 5)')

hc.eval_expr_typed('orMissing(false, 5)')

Let¶

You can assign a value to a variable with a let expression. Here is an example.

hc.eval_expr_typed('let a = 5 in a + 1')

The variable, here a is only visible in the body of the let, the expression following in. You can assign multiple variables. Variable assignments are separated by and. Each variable is visible in the right hand side of the following variables as well as the body of the let. For example:

hc.eval_expr_typed('''
let a = 5
and b = a + 1
 in a * b
''')

Conditionals¶

Unlike other languages, conditionals in Hail return a value. The arms of the conditional must have the same type. The predicate must be of type Boolean. If the predicate is missing, the value of the entire conditional is missing. Here are some simple examples.

hc.eval_expr_typed('if (true) 1 else 2')

hc.eval_expr_typed('if (false) 1 else 2')

hc.eval_expr_typed('if (NA: Boolean) 1 else 2')

hc.eval_expr_typed('if (true) 1 else "two"') # type error, Int and String incompatible

Arrays¶

Hail has several compound types: Array[T], Set[T], Dict[K, V], Structs and Aggregable[T]. T, K and V can be any type, including other compound types. Array[T] are similar to Python's lists, except they must be homogenous: that is, each element must be of the same type. Arrays are 0-indexed. Here are some examples of simple array expressions.

Array literals are constructed with square brackets.

hc.eval_expr_typed('[1, 2, 3, 4, 5]')

Arrays are indexed with square brackets and support Python's slice syntax.

hc.eval_expr_typed('let a = [1, 2, 3, 4, 5] in a[0]')

hc.eval_expr_typed('let a = [1, 2, 3, 4, 5] in a[1:3]')

hc.eval_expr_typed('let a = [1, 2, 3, 4, 5] in a[1:]') # slice to the end, a[:4] to slice from the beginning

hc.eval_expr_typed('let a = [1, 2, 3, 4, 5] in a.length')

Arrays can be transformed with functional operators filter and map. These operations return a new array, never modify the original.

# keep the elements that are less than 10
hc.eval_expr_typed('[1, 2, 22, 7, 10, 11].filter(x => x < 10)')

# square the elements of an array
hc.eval_expr_typed('[1, 2, 22, 7, 10, 11].map(x => x * x)')

The full list of methods on arrays can be found here.

Numeric Arrays¶

Numeric arrays, like Array[Int] and Array[Double] have additional operations like max, mean, median, sort. For a full list, see, for example, Array[Int]. Here are a few examples.

hc.eval_expr_typed('[1, 2, 22, 7, 10, 11].sum()')

hc.eval_expr_typed('[1, 2, 22, 7, 10, 11].max()')

`Struct`s¶

Structs are a collection of named values known as fields. Hail does not have tuples like Python. Unlike arrays, the values can be heterogenous. Unlike Dicts, the set of names are part of the type and must be known statically. Structs are constructed with a syntax similar to Python's dict syntax. Struct fields are accessed using the . syntax.

x, t = hc.eval_expr_typed('{gene: "ACBD", function: "LOF", nHet: 12}')
print(x)
print(t)

hc.eval_expr_typed('let s = {gene: "ACBD", function: "LOF", nHet: 12} in s.gene')

hc.eval_expr_typed('let s = NA: Struct { gene: String, function: String, nHet: Int} in s.gene')

Exercise¶

Write an expression that mean-centers and variance normalizes an array. Return a struct with 3 fields: the mean, the variance and the normalized array. We've provided a template to get you started. Fill in the <?>.

hc.eval_expr_typed('''
let a = [1, -2, 11, 3, -2]
and mu = <?>
and var = a.map(x => <?>).sum() 
 in {mean: <?>, variance: <?>, normalized: <?>}
''')

What if a contains an missing value NA: Int? Will your code still work?