.. SyQADA documentation master file, created by
   sphinx-quickstart on Tue Oct  1 09:49:40 2013.

SyQADA - A system for automating bioinformatic workflows
========================================================

License::

  SyQADA is made available under the GNU GPL version 3 license
  (http://www.gnu.org/copyleft/gpl.html) a text copy of which is available in
  this bundle.

SyQADA provides a system for creating, running, and monitoring the
progress of each step of a workflow, based on a project configuration
that includes a sample file listing sample names, a tool-suite configuration file,
and a protocol file that identifies the tasks to be performed
and designates a script template for each step.

The reasons for SyQADA's existence are discussed in :ref:`motivation`.

Here's where convention asks for a :ref:`Quick Start<Cicero>` Guide.
Abandon All Hope Ye Who Enter Here: I regret to say that the
:ref:`SyQADA Quick Start Guide<Cicero>` is pathetic.  (for those who
have already abandoned hope: :ref:`cheat_sheet`) I don't see how to
explain how to manage a bioinformatics workflow in three easy steps.
Before going further, I suggest you at least read the :ref:`Caveats`.

SyQADA's only dependencies are python3.5+ish, bash, and the Unix operating system 
(in addition to the kernel, SyQADA invokes a small number of standard Unix commands).
SyQADA relies on the Unix file system to record its progress and allow users
to understand that progress easily. SyQADA is designed to simplify, to the
extent possible, construction of the scripts necessary to run an analysis
project on a set of data representing some set of samples. It cannot eliminate
the kinds of problems one faces in largescale computation, but I hope you find that it
simplifies dealing with them.

SyQADA strives to simplify organization of analysis projects and create
an environment in which it is easy to reproduce a workflow. SyQADA
creates a standard file structure for each step and names error and
output files appropriately for each sample. It creates scripts that
can be run manually or using SyQADA either on a local Unix machine (including
MacOS X) or
on the MDACC clusters (the Nautilus cluster runs PBS, and the Shark cluster
runs LSF --- the references to clusters in this document usually identify
them by their cluster management software, i.e., PBS or LSF).
The cluster interfaces expect to find local settings for queue size in
the resources directory (so that an external user can, we hope, modify only
a specific set of named constants and adapt SyQADA to a different queueing policy.
The ones provided are specific to MDACC,

Several existing workflows are included that perform divers sequence
and variant analysis tasks, including GATK-based sequence alignment
and recalibration; GATK variant calling; calling of somatic variants
(mutect, indelocator); vtools annotation of variants using a variety
of genomic resources; birdseed and haploh; GATK and haplohseq; download
of TCGA data; etc.

Creation of new workflows is fairly straightforward. Script templates
are relatively simple to construct (using a simple text editor such as
emacs or vi) from a working example invocation of a computation. Some
help is provided in this manual.


Contents
========

.. toctree::
   :maxdepth: 2

   License
   Caveats
   Motivation
   ReleaseNotes
   installation
   Before_You_Begin
   tutorial_basic
   tutorial_hapLOHseq
   SyQADA_Structure
   taskfiles
   templates
   Commands
   Running_A_Workflow
   Building_A_New_Workflow
   Validation
   replication
   iteration
   qa
   tutorial_features
   workflows
   glossary
   Troubleshooting
   Architecture
   Cicero
   scheet_cheat_sheet
   For_Development
   System_Internals
   shell
   Funding
   hacks

Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`