syqada autoΒΆ

SyQADA expects to work in the directory in which it is invoked, not the bin directory in which the executable sits. By convention, control and configuration files are stored in a subdirectory named control. This convention is codified, in that if you place the configuration, sample, and protocol files in the control directory and name them:

MY_PROJECT.config
MY_PROJECT.samples
MY_PROJECT.protocol

you may then invoke SyQADA simply as:

>>> syqada auto

MY_PROJECT.protocol Should either be a file containing a series of tasks and their attribute information (see Protocol Construction, or see workflows/example/control/Example.protocol for an example) or a list of task-names and their references (see workflows/example/control/Example.reference for an example).

MY_PROJECT.config This should contain absolute path references to executables and reference files (such as hg19.fasta). They are expressed as colon separated entries, one per line, such as:

sourcedata        : /a/directory/containing/sample/data

The config file reader tolerates shell environment expressions, but they should be used with caution. I make use of $TEAM_ROOT because that is relatively well fixed and meant to be as consistent across systems as feasible without a dedicated engineering team. Even that can lead you astray in the wrong environment of course, so be warned.

MY_PROJECT.samples Should contain a list of sample names that will be used in filenames, etc., one name to a line. If the first line is the header of a tab-delimited file, then the sample file can also contain phenotype information that can be used as input parameters to task templates (see The Sample File).

—notifications <path> The name of a file where progress notations will be written. syqada auto defaults this to be a path to a file on a webserver running on one of the HAPS. The eventual goal would be to create an RSS feed. In the meantime, this writes the most recent event at the top of the file so that a web retrieval will not need to scroll to see current state.

syqada auto reads the protocol file and then figures out what to do. If no task directories exist, it creates them and their METADATA files If –init is specified, it stops after creating them.

Without –init, syqada auto then proceeds to evaluate each directory in turn from 01 on, running a BatchRunner if the task has not completed. Upon completion of the BatchRunner, SyQADA tests the task for completion and proceeds to the next task if there are no errors.

At this time, syqada auto does nothing to repair a task that shows a failure. However, it does do some simple parsing of stderr to identify common problems and give hints as to their possible causes. Failed tasks must be handled manually with syqada manage and syqada batch. See those sub-commands for details.

syqada auto has three special and mutually exclusive flags.

  • init
  • ignore
  • status

Typical invocation does not require any flags. The meaning of the options follows.

>>> syqada auto --init

simply constructs any unconstructed METADATA for the first uninitialized task in the protocol file. It is typically used to determine manually whether the job scripts for a batch have been correctly created before running syqada auto.

>>> syqada auto --ignore batchdir-prefix1 [batchdir-prefix2 ...]

will ignore errors in steps beginning with any prefix in the list. Note that –ignore 0 1 will ignore errors in any of the first 19 batches.

>>> syqada auto --prototype
(New in 2.1) When running syqada auto or batch, –prototype creates all the jobscripts for the desired step and then runs only one of them, quitting without submitting the remainder if that job fails. If the job succeeds, syqada proceeds as usual and a normal bolus of the remaining jobs is submitted for execution.
>>> syqada auto --resume batch-step-prefix (e.g., --resume 09)
(New in 2.1) –resume assumes that all steps numerically prior to the resume step have finished correctly, and resumes automatic processing from that step, thus saving a great deal of logfile inspection in the case of large multi-step projects. My personal practice is to start with the step immediately before the one you would like to execute, just to make sure that it has actually succeeded.

When you find yourself restarting a huge task with thousands of jobs, such as may occur with many samples split by chromosome (common in haplohseq, for example), you are likely to find that the startup is delayed because standard behavior is to evaluate current status to verify that it makes sense to proceed. If you have enough faith that there are no issues, you can short-circuit this behavior and go straight to job processing by adding the –faith option to the syqada auto command.