{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Practical 2: The Hail expression language\n",
    "\n",
    "## The cells of this practical can be entered (by cut and paste) into the IPython console.\n",
    "\n",
    "## Before entering the first cell, make sure you have changed to the directory hail-practical. Skip the first cell haven't closed IPython console since running the last practical."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn\n",
    "from math import log, isnan\n",
    "from pprint import pprint\n",
    "import matplotlib.patches as mpatches\n",
    "\n",
    "from hail import *\n",
    "\n",
    "%matplotlib inline\n",
    "\n",
    "def qqplot(pvals):\n",
    "    spvals = sorted([x for x in pvals if x and not(isnan(x))])\n",
    "    exp = [-log(float(i) / len(spvals), 10) for i in np.arange(1, len(spvals) + 1, 1)]\n",
    "    obs = [-log(p, 10) for p in spvals]\n",
    "    plt.scatter(exp, obs)\n",
    "    plt.plot(np.arange(0, max(max(exp), max(obs))), c=\"red\")\n",
    "    plt.xlabel(\"Expected p-value (-log10 scale)\")\n",
    "    plt.ylabel(\"Observed p-value (-log10 scale)\")\n",
    "    plt.xlim(xmin=0)\n",
    "    plt.ylim(ymin=0)\n",
    "\n",
    "hc = HailContext()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hail Expression Language\n",
    "\n",
    "The Hail expression language is used everywhere in Hail: filtering conditions, describing covariates and phenotypes, generating synthetic data, plotting, exporting, etc.  You can evaluate a Hail expression with the HailContext method [eval_expr_typed](https://hail.is/hail/hail.HailContext.html#hail.HailContext.eval_expr_typed).  `eval_expr_typed` returns a tuple with the result of evaluating the expression and the type of the expression.  Use [eval_expr](https://hail.is/hail/hail.HailContext.html#hail.HailContext.eval_expr) if you just want the value.  We'll use `eval_expr_typed` throughout so you can become more comfortable with types in Hail.\n",
    "\n",
    "## Primitive Types\n",
    "\n",
    "Let's start with simple primitve types: Boolean, Int, Double, String.  Hail expressions are passed as Python strings to Hail methods."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "hc.eval_expr_typed('true') # the Boolean literals are true and false"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The return value is `True`, not `true`.  Why?  When values are returned by Hail methods, they are automatically converted to the corresponding Python value.\n",
    "\n",
    "String literals are denoted with double-quotes.\n",
    "\n",
    "Note, we use variables a, b, ... so you don't have to cut and paste quite so many cells."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "a = hc.eval_expr_typed('123')\n",
    "b = hc.eval_expr_typed('123.45')\n",
    "c = hc.eval_expr_typed('\"Hello, world\"')\n",
    "\n",
    "print a\n",
    "print b\n",
    "print c"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise\n",
    "\n",
    "Primitive types support all the usual operations you'd expect.  For details, refer to the documentation on [functions](https://hail.is/hail/functions.html), [operators](https://hail.is/hail/operators.html) and [types](https://hail.is/hail/types.html).  What's the difference between operators and functions?  Operators are symbols like `+` and `*` that are written infix and functions have names and are called with parens like `f(5)`.\n",
    "\n",
    "Try a few simple expressions with operators on primitives.  Divide two integers.  Can you compare strings?  You can concatenate strings with +.  What's the log base 10 of 1024?  (Hint: it's a function.)\n",
    "\n",
    "Experiment with some expressions by filling in <?>."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "hc.eval_expr_typed('<?>')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Missingness\n",
    "\n",
    "Like R, all values in Hail can be missing.  Most operations, like addition, return missing if any of their inputs is missing.  There are a few special operations for manipulating missing values.  There is also a missing literal, `NA`, but you have to specify it's type.  Remember, `e: Int` just means that `e` has type `Int`.  Missing Hail values are converted to `None` in Python.  You can test missingness with `isDefined` and `isMissing`.\n",
    "\n",
    "Before you evaluate these, guess what the result will be.\n",
    "\n",
    "Here are some examples:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "a = hc.eval_expr_typed('NA: Int') # missing Int\n",
    "\n",
    "b = hc.eval_expr_typed('1 + NA: Int')\n",
    "\n",
    "c = hc.eval_expr_typed('isDefined(1)')\n",
    "\n",
    "d = hc.eval_expr_typed('isDefined(NA: Int)')\n",
    "\n",
    "e = hc.eval_expr_typed('isMissing(NA: Double)')\n",
    "\n",
    "print a\n",
    "print b\n",
    "print c\n",
    "print d\n",
    "print e"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Let\n",
    "\n",
    "You can assign a value to a variable with a `let` expression.  Here is an example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "hc.eval_expr_typed('let a = 5 in a + 1')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The variable, here `a` is only visible in the body of the let, the expression following `in`.  You can assign multiple variables.  Variable assignments are separated by `and`.  Each variable is visible in the right hand side of the following variables as well as the body of the let.\n",
    "\n",
    "Python triple quote strings can span multiple lines.  This can be useful for writing long Hail expressions.\n",
    "\n",
    "For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "hc.eval_expr_typed('''\n",
    "let a = 5\n",
    "and b = a + 1\n",
    " in a * b\n",
    "''')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conditionals\n",
    "\n",
    "Unlike other languages, conditionals in Hail return a value.  The arms of the conditional must have the same type.  The predicate must be of type Boolean.  If the predicate is missing, the value of the entire conditional is missing.  This differs from R, where it is an error to have a missing conditional.  Here are some simple examples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "a = hc.eval_expr_typed('if (true) 1 else 2')\n",
    "\n",
    "b = hc.eval_expr_typed('if (false) 1 else 2')\n",
    "\n",
    "c = hc.eval_expr_typed('if (NA: Boolean) 1 else 2')\n",
    "\n",
    "print a\n",
    "print b\n",
    "print c"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "hc.eval_expr_typed('if (true) 1 else \"two\"') # type error, Int and String incompatible"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Arrays\n",
    "\n",
    "Hail has several compound types: `Array[T]`, `Set[T]`, `Dict[K, V]`, `Struct`s and `Aggregable[T]`.  `T`, `K` and `V` can be any type, including other compound types.  `Array[T]` are similar to Python's lists, except they must be homogenous: that is, each element must be of the same type.  Arrays are 0-indexed.  Here are some examples of simple array expressions.\n",
    "\n",
    "Array literals are constructed with square brackets.\n",
    "\n",
    "Arrays are indexed with square brackets and support Python's slice syntax."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "a = hc.eval_expr_typed('[1, 2, 3, 4, 5]')\n",
    "\n",
    "b = hc.eval_expr_typed('let a = [1, 2, 3, 4, 5] in a[0]')\n",
    "\n",
    "c = hc.eval_expr_typed('let a = [1, 2, 3, 4, 5] in a[1:3]')\n",
    "\n",
    "d = hc.eval_expr_typed('let a = [1, 2, 3, 4, 5] in a[1:]') # slice to the end, a[:4] to slice from the beginning\n",
    "\n",
    "e = hc.eval_expr_typed('let a = [1, 2, 3, 4, 5] in a.length')\n",
    "\n",
    "print a\n",
    "print b\n",
    "print c\n",
    "print d\n",
    "print e"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Arrays can be transformed with functional operators `filter` and `map`.  These operations return a new array, never modify the original."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# keep the elements that are less than 10\n",
    "a = hc.eval_expr_typed('let a = [1, 2, 22, 7, 10, 11] in a.filter(x => x < 10)')\n",
    "\n",
    "# square the elements of an array\n",
    "b = hc.eval_expr_typed('let a = [1, 2, 22, 7, 10, 11] in a.map(x => x * x)')\n",
    "\n",
    "print a\n",
    "print b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The full list of methods on arrays can be found [here](https://hail.is/hail/types.html#array-t)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Numeric Arrays\n",
    "\n",
    "Numeric arrays, like `Array[Int]` and `Array[Double]` have additional operations like `max`, `mean`, `median`, `sort`.  For a full list, see, for example, [Array[Int]](https://hail.is/hail/types.html#array-int).  Here are a few examples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "a = hc.eval_expr_typed('[1, 2, 22, 7, 10, 11].sum()')\n",
    "\n",
    "b = hc.eval_expr_typed('[1, 2, 22, 7, 10, 11].max()')\n",
    "\n",
    "print a\n",
    "print b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## `Struct`s\n",
    "\n",
    "`Struct`s are a collection of named values known as fields.  Hail does not have tuples like Python.  Unlike arrays, the values can be heterogenous.  Unlike `Dict`s, the set of names are part of the type and must be known statically.  `Struct`s are constructed with a syntax similar to Python's `dict` syntax.  `Struct` fields are accessed using the `.` syntax."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "x, t = hc.eval_expr_typed('{gene: \"ACBD\", function: \"LOF\", nHet: 12}')\n",
    "print x\n",
    "print t"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "hc.eval_expr_typed('let s = {gene: \"ACBD\", function: \"LOF\", nHet: 12} in s.gene')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercises\n",
    "\n",
    "Let's do a series of exercises to transform an array.  First, fill in the `<?>` below to compute the mean of `a`.  The mean is often denoted by the Greek letter mu."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "hc.eval_expr_typed('''\n",
    "let a = [1, -2, 11, 3, -2] \n",
    "and mu = <?>\n",
    " in mu''')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Second, let's compute the variance of `a`.  Remember, the variance is the sum of the squares of the residual differences from the mean.  Note, Hail has no square function.  You'll need the mean you computed above.  Note, there is currently no square operation in Hail, so you can multiplication or the `pow` function (for example, `pow(2, 3) == 8`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "hc.eval_expr_typed('''\n",
    "let a = [1, -2, 11, 3, -2]\n",
    "and mu = <?>\n",
    "and var = a.map(x => <?>).sum()\n",
    " in var\n",
    "''')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, put it all together to return a structure that contains the mean, variance and the array `a` mean-centered and variance-normalized (Z-score)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "hc.eval_expr_typed('''\n",
    "let a = [1, -2, 11, 3, -2]\n",
    "and mu = <?>\n",
    "and var = a.map(x => <?>).sum()\n",
    "and norm = <?>\n",
    " in {mean: <?>, variance: <?>, normalized: <?>}\n",
    "''')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What if `a` contains an missing value `NA: Int`?  Will your code still work?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python [conda root]",
   "language": "python",
   "name": "conda-root-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}