Generating Workflows

The second axis of the WorkflowHub project targets the generation of realistic synthetic workflow traces with a variety of characteristics. The WorkflowGenerator class uses recipes of workflows (as described in Analyzing Traces) for creating many different synthetic workflows based on distributions of workflow task runtime, and input and output file sizes. The resulting workflows are represented in the WorkflowHub JSON format, which is already supported by simulation frameworks such as WRENCH.

Workflow Recipes

The WorkflowHub package provides a number of workflow recipes for generating realistic synthetic workflow traces. Each recipe may provide their own methods for instantiating a WorkflowRecipe object depending on the properties that define the structure of the actual workflow. For instance, the code snippet below shows how to instantiate a recipe of the Epigenomics and 1000Genome workflows:

from workflowhub.generator import EpigenomicsRecipe, GenomeRecipe

# creating an Epigenomics workflow recipe
epigenomics_recipe = EpigenomicsRecipe.from_sequences(num_sequence_files=2, num_lines=100, bin_size=10)

# creating a 1000Genome workflow recipe
genome_recipe = GenomeRecipe.from_num_chromosomes(num_chromosomes=3, num_sequences=10000, num_populations=1)

All workflow recipes also provide a common method (from_num_tasks) for instantiating a WorkflowRecipe object as follows:

from workflowhub.generator import EpigenomicsRecipe, GenomeRecipe

# creating an Epigenomics workflow recipe
epigenomics_recipe = EpigenomicsRecipe.from_num_tasks(num_tasks=9)

# creating a 1000Genome workflow recipe
genome_recipe = GenomeRecipe.from_num_tasks(num_tasks=5)

Note that num_tasks defines the upper bound for the total number of tasks in the workflow, and that each workflow recipe may define different lower bound values so that the workflow structure is guaranteed. Please, refer to the documentation of each workflow recipe for the lower bound values.

The current list of available workflow recipes include:

The Workflow Generator

Synthetic workflow traces are generated using the WorkflowGenerator class. This class takes as input a WorkflowRecipe object (see above), and provides two methods for generating synthetic workflow traces:

  • build_workflow(): generates a single synthetic workflow trace based on the workflow recipe used to instantiate the generator.
  • build_workflows(): generates a number of synthetic workflow traces based on the workflow recipe used to instantiate the generator.

The build methods use the workflow recipe for generating realistic synthetic workflow traces, in which the workflow structure follows workflow composition rules defined in the workflow recipe, and tasks runtime, and input and output data sizes are generated according to distributions obtained from actual workflow execution traces (see Analyzing Traces).

Each generated trace is a represented as a Workflow object (which in itself is an extension of the NetworkX DiGraph class). The Workflow class provides two methods for writing the generated workflow trace into files:

Examples

The following example generates a Seismology synthetic workflow trace based on the number of pair of signals to estimate earthquake STFs (num_pairs), builds a synthetic workflow trace, and writes the synthetic trace to a JSON file.

from workflowhub import WorkflowGenerator
from workflowhub.generator import SeismologyRecipe

# creating a Seismology workflow recipe based on the number
# of pair of signals to estimate earthquake STFs
recipe = SeismologyRecipe.from_num_pairs(num_pairs=10)

# creating an instance of the workflow generator with the
# Seismology workflow recipe
generator = WorkflowGenerator(recipe)

# generating a synthetic workflow trace of the Seismology workflow
workflow = generator.build_workflow()

# writing the synthetic workflow trace into a JSON file
workflow.write_json('seismology-workflow.json')

The example below generates a number of Cycles (agroecosystem) synthetic workflow traces based on the upper bound number of tasks allowed per workflow.

from workflowhub import WorkflowGenerator
from workflowhub.generator import CyclesRecipe

# creating a Cycles workflow recipe based on the number of tasks per workflow
recipe = CyclesRecipe.from_num_tasks(num_tasks=1000)

# creating an instance of the workflow generator with the
# Cycles workflow recipe
generator = WorkflowGenerator(recipe)

# generating 10 synthetic workflow traces of the Cycles workflow
workflows_list = generator.build_workflows(num_workflows=10)

# writing each synthetic workflow trace into a JSON file
count = 1
for workflow in workflows_list:
    workflow.write_json('cycles-workflow-{:02}.json'.format(count))
    count += 1