Real-World Tutorial: hapLOHseq¶
hapLOHseq is a program developed by Anthony San Lucas of our lab (based on hapLOH, Scheet and Vattathil), which analyzes allelic imbalance in genomic sequence data.
Running syqada tutorial
and selecting the HAPLOHSEQ tutorial will populate your tutorial directory with 4 directories: control,
templates, tasks, and example_files, as well as a script, example_run.sh,
that invokes syqada.
Note that the bin directory contains three subdirectories, which contain hapLOHseq
executables for MacOSX and two versions of Red Hat Linux 5.6 that run on our compute servers
and our cluster. If you are using another Unix OS, you will need to fetch and compile hapLOHseq from source at http://haplohseq.scheetsoftware.org.
The files used to configure the workflow can be found in the control directory:
HAPLOHSEQ.protocol: the file that specifies the steps in the workflow.
HAPLOHSEQ.config: specified dependencies and parameters of the workflow.
HAPLOHSEQ.samples: the names of samples to process through the workflow.
To use this protocol on your own NGS data, it should be sufficient to modify the config file to point to your input data and modify the samples file to match your inputs. Here, you’ll see that the config file points to a subdirectory of the example_files directory, so if you run:
% syqada auto
the program will plow through two steps and produce an estimate of allelic imbalance on a tumor sample. There is a plotting step that should work, but is commented out. If you wish to try it, you will need modify the config file to point to your R installation and then study the task and template before uncommenting the line.
What hapLOHseq Does¶
For comparison, this script:
example_files/example/example_run.sh
executes the three steps of the same NGS allelic imbalance analysis workflow, if you wish to compare the results.
This is the behavior of the hapLOHseq workflow:
Phase the het sites in tumor_exome.vcf using a utility phasing script
(simple_phaser.py) provided with the haplohseq bundle. You can
instead use MACH, fastPHASE, BEAGLE or your phasing software of
choice. However, there are 2 output files that need to be formatted
as they are for MACH. Examples of such files are generated using
our simple_phaser.py script and can be found in the example_output
directory after step 1 is executed. See tumor_example.hap and
tumor_exome.pos.
Run haplohseq to assign AI probabilities across the
genome for the test sample. The detailed report that includes this
information (tumor_exome_haplohseq.posterior.dat) can be found in the
example_output directory.
Call R to generate a plot for
visualization of the haplohseq AI probabilities.