Replication

Replication workflows are designed to do experimental comparison of different parameters to a program or programs over the same set of samples. Each given replication parameter is permuted over each given value, so that for parameters P1 and P2 having 3 and 4 given values respectively, 12 replicate task directories per task in the workflow will be created, with the METADATA file for each one containing a distinct pair of values for P1 and P2. There is no maximum placed on the number of permutations, which is sort of crazy, since what you are doing is running a complete, presumably computation-intensive task once for each replicate.

Replication is invoked by specifying the replication in the preamble of the protocol file (i.e., the lines before the first TASKDEF):

replicate = P1=value1,value2,value3
replicate = P2=value1,value2,value3,value4 P3=value1,value2

The parameters and their values can be specified space-separated on the same line or on separate lines, according to taste. syqada will then write a file named control/PROJECTNAME.replication showing the numbered replicate values one to a line. This could potentially be used in subsequent analyses, but is unnecessary and ignored by syqada.

In the absence of any replicate line in the protocol file, there will be no replication performed.

Scatter

A TASKDEF that contains a scatter attribute causes the creation of one directory for each combination of parameters listed in the scatter attribute. Assuming our Replication.config looks like the line above, any TASKDEF that contains one or the other of scatter or gather will be treated specially, as follows:

TASKDEF = stepX path-to-taskfile
scatter = P1,P3

means create replicate execution directories for each value of P1 and P3 (6 directories, in our example) These will be named:

0N-stepX-P1_value1-P3_value1
...
0N-stepX-P1_value3-P3_value2

The 0N-P1_value3-P3_value2-stepX/METADATA would contain:

replicate:
  P1 = value3
  P3 = value2

These terms are available to be interpolated into a task’s script template wherever {P1} or {P3} appears.

A TASKDEF that contains:

scatter = all

will expand all parameters named in the replication config and create P1xP2xP3 directories. There is no protection from excessive replication, unlike version 1.0, which used a simpler and uglier approach, so beware, we are talking permutations here, and you could consume a lot of cycles. However, a possible advantage of the scatter-gather approach is that instead of effectively running all P1xP2xP3 copies of the workflow at every step, you only run as many replicates of each task as are necessary to scatter or gather the appropriate terms (incredible as it may seem, it has actually been useful to scatter one term, gather, then scatter a different term for batch management). The example Features.protocol has four steps. The first scatters by two parameters (4 replicates), the second gathers one of the two parameters (scattering by one, leaving two replicates), the third adds a third parameter for scatter (2x2 or four replicates), and the fourth gathers all parameters (one replicate). SyQADA-1.0 would have scattered all three used parameters through each of the first three steps, producing 2x2x2 replicates of the first three tasks, and making programming of the second (partial gather) step much more difficult.

Replication Directory Structure gives an example directory layout. SyQADA simply runs these tasks in order, completing all replicates of the first step before beginning the second step.

Gather

A protocol task that includes the specification:

TASKDEF stepY path-to-taskfile
gather = P1

will, instead of replicating, create a task or tasks that use a regular expression as inputdir to permit the program run by the job to collate the outputs of the previous step by the named parameter(s), presumably for plotting or analysis to evaluate the relative performance of the various replicates.:

gather = all

creates a single task. Thus, two steps that specify:

TASKDEF stepA path scatter = all TASKDEF stepB path gather = all

will perform the way syqada-1.0 performed, with the exception that the directory names now include the parameter specs instead of being numbered.

Inherited Scatter

If a workflow step that succeeds a scatter does not contain a gather = all (or its equivalent, listing all the parameters scattered by the previous scatter), it will inherit the replication of the previous scatter, and create the corresponding number of batch directories. This is illustrated in step 02 of the Tutorial on Special Features.