.. _syqada_auto:

###################
syqada auto
###################

SyQADA expects to work in the directory in which it is invoked, `not` the `bin` directory
in which the executable sits.
By convention, control and configuration files are stored in a subdirectory named
`control`. This convention is codified, in that if you place the configuration, sample,
and protocol files in the control directory and name them::

  MY_PROJECT.config
  MY_PROJECT.samples
  MY_PROJECT.protocol

you may then invoke SyQADA simply as:

>>> syqada auto

    *MY_PROJECT.protocol*
    Should either be a file containing a series of tasks and their attribute
    information (see :ref:`protocol`, or see workflows/example/control/Example.protocol for an example)
    or a list of task-names and their references
    (see workflows/example/control/Example.reference for an example).

    *MY_PROJECT.config*
    This should contain absolute path references to executables
    and reference files (such as hg19.fasta). They are expressed as
    colon separated entries, one per line, such as::

      sourcedata        : /a/directory/containing/sample/data

    The config file reader tolerates
    shell environment expressions, but they should be used with caution.
    I make use of $TEAM_ROOT because that is relatively well fixed and meant to be
    as consistent across systems as feasible without a dedicated engineering team.
    Even that can lead you astray in the wrong environment of course, so
    be warned.

    *MY_PROJECT.samples*
    Should contain a list of sample names that will be used in filenames, etc.,
    one name to a line. If the first line is the header of a tab-delimited file,
    then the sample file can also contain phenotype information that can be used
    as input parameters to task templates (see :ref:`sample_file`). 

  --notifications <path>
    The name of a file where progress notations will be written. *syqada auto* defaults
    this to be a path to a file on a webserver running on one of the HAPS. The eventual
    goal would be to create an RSS feed. In the meantime, this writes the most recent
    event at the top of the file so that a web retrieval will not need to scroll to see
    current state.

*syqada auto* reads the protocol file and then figures out what to do. If
no task directories exist, it creates them and their METADATA files
If *--init* is specified, it stops after creating them.

Without *--init*, *syqada auto* then proceeds to evaluate each
directory in turn from 01 on, running a BatchRunner if the task has not completed.
Upon completion of the BatchRunner, SyQADA tests the task for completion
and proceeds to the next task if there are no errors.

At this time, *syqada auto* does nothing to repair a task that shows a failure.
However, it does do some simple parsing of stderr to identify common problems
and give hints as to their possible causes.
Failed tasks must be handled manually with *syqada manage* and *syqada batch*.
See those sub-commands for details.

syqada auto has three special and mutually exclusive flags.

.. hlist::
   :columns: 3

   * init
   * ignore
   * status

Typical invocation does not require any flags. The meaning of the options follows.

>>> syqada auto --init

simply constructs any unconstructed METADATA for the first
uninitialized task in the protocol file.  It is typically used to determine
manually whether the job scripts for a batch have been correctly
created before running syqada auto.

>>> syqada auto --ignore batchdir-prefix1 [batchdir-prefix2 ...]

will ignore errors in steps beginning with any prefix in the list. Note that `--ignore 0 1`
will ignore errors in any of the first 19 batches.

>>> syqada auto --prototype

    (New in 2.1)
    When running syqada auto or batch, --prototype creates all the
    jobscripts for the desired step and then runs only one of them,
    quitting without submitting the remainder if that job fails. If
    the job succeeds, syqada proceeds as usual and a normal bolus of
    the remaining jobs is submitted for execution.

>>> syqada auto --resume batch-step-prefix (e.g., --resume 09)

    (New in 2.1)
    --resume assumes that all steps numerically prior to the resume
    step have finished correctly, and resumes automatic processing
    from that step, thus saving a great deal of logfile inspection in
    the case of large multi-step projects. My personal practice is to
    start with the step immediately before the one you would like to
    execute, just to make sure that it has actually succeeded.

When you find yourself restarting a huge task with thousands of jobs,
such as may occur with many samples split by chromosome (common in
haplohseq, for example), you are likely to find that the startup is
delayed because standard behavior is to evaluate current status to
verify that it makes sense to proceed. If you have enough faith that
there are no issues, you can short-circuit this behavior and go
straight to job processing by adding the `--faith` option to the
`syqada auto` command.