.. SyQADA documentation master file, created by sphinx-quickstart on Tue Oct 1 09:49:40 2013. SyQADA - A system for automating bioinformatic workflows ======================================================== License:: SyQADA is made available under the GNU GPL version 3 license (http://www.gnu.org/copyleft/gpl.html) a text copy of which is available in this bundle. SyQADA provides a system for creating, running, and monitoring the progress of each step of a workflow, based on a project configuration that includes a sample file listing sample names, a tool-suite configuration file, and a protocol file that identifies the tasks to be performed and designates a script template for each step. The reasons for SyQADA's existence are discussed in :ref:`motivation`. Here's where convention asks for a :ref:`Quick Start` Guide. Abandon All Hope Ye Who Enter Here: I regret to say that the :ref:`SyQADA Quick Start Guide` is pathetic. (for those who have already abandoned hope: :ref:`cheat_sheet`) I don't see how to explain how to manage a bioinformatics workflow in three easy steps. Before going further, I suggest you at least read the :ref:`Caveats`. SyQADA's only dependencies are python3.5+ish, bash, and the Unix operating system (in addition to the kernel, SyQADA invokes a small number of standard Unix commands). SyQADA relies on the Unix file system to record its progress and allow users to understand that progress easily. SyQADA is designed to simplify, to the extent possible, construction of the scripts necessary to run an analysis project on a set of data representing some set of samples. It cannot eliminate the kinds of problems one faces in largescale computation, but I hope you find that it simplifies dealing with them. SyQADA strives to simplify organization of analysis projects and create an environment in which it is easy to reproduce a workflow. SyQADA creates a standard file structure for each step and names error and output files appropriately for each sample. It creates scripts that can be run manually or using SyQADA either on a local Unix machine (including MacOS X) or on the MDACC clusters (the Nautilus cluster runs PBS, and the Shark cluster runs LSF --- the references to clusters in this document usually identify them by their cluster management software, i.e., PBS or LSF). The cluster interfaces expect to find local settings for queue size in the resources directory (so that an external user can, we hope, modify only a specific set of named constants and adapt SyQADA to a different queueing policy. The ones provided are specific to MDACC, Several existing workflows are included that perform divers sequence and variant analysis tasks, including GATK-based sequence alignment and recalibration; GATK variant calling; calling of somatic variants (mutect, indelocator); vtools annotation of variants using a variety of genomic resources; birdseed and haploh; GATK and haplohseq; download of TCGA data; etc. Creation of new workflows is fairly straightforward. Script templates are relatively simple to construct (using a simple text editor such as emacs or vi) from a working example invocation of a computation. Some help is provided in this manual. Contents ======== .. toctree:: :maxdepth: 2 License Caveats Motivation ReleaseNotes installation Before_You_Begin tutorial_basic tutorial_hapLOHseq SyQADA_Structure taskfiles templates Commands Running_A_Workflow Building_A_New_Workflow Validation replication iteration qa tutorial_features workflows glossary Troubleshooting Architecture Cicero scheet_cheat_sheet For_Development System_Internals shell Funding hacks Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search`