Error strategies | Nextflow

Machine Learning pipeline

This example shows how to put together a basic Machine Learning pipeline. It fetches a dataset from OpenML, trains a variety of machine learning models on a prediction target, and selects the best model based on some evaluation criteria.

1
#!/usr/bin/env nextflow
2

3
params.dataset_name = 'wdbc'
4
params.train_models = ['dummy', 'gb', 'lr', 'mlp', 'rf']
5
params.outdir = 'results'
6

7
workflow {
8
    // fetch dataset from OpenML
9
    ch_datasets = fetch_dataset(params.dataset_name)
10

11
    // split dataset into train/test sets
12
    (ch_train_datasets, ch_predict_datasets) = split_train_test(ch_datasets)
13

14
    // perform training
15
    (ch_models, ch_train_logs) = train(ch_train_datasets, params.train_models)
16

17
    // perform inference
18
    ch_predict_inputs = ch_models.combine(ch_predict_datasets, by: 0)
19
    (ch_scores, ch_predict_logs) = predict(ch_predict_inputs)
20

21
    // select the best model based on inference score
22
    ch_scores
23
        | max {
24
            new JsonSlurper().parse(it[2])['value']
25
        }
26
        | subscribe { dataset_name, model_type, score_file ->
27
            def score = new JsonSlurper().parse(score_file)
28
            println "The best model for ${dataset_name} was ${model_type}, with ${score['name']} = ${score['value']}"
29
        }
30
}
31

32
// view the entire code on GitHub ...

Try it in your computer

To run this pipeline on your computer, you will need:

Unix-like operating system
Java 17 (or higher)
Docker

Install Nextflow by entering the following command in the terminal:

$ curl -fsSL get.nextflow.io | bash

Then launch the pipeline with this command:

$ nextflow run ml-hyperopt -profile wave

It will automatically download the pipeline GitHub repository and build a Docker image on-the-fly using Wave, thus the first execution may take a few minutes to complete depending on your network connection.

NOTE: Nextflow 22.10.0 or newer is required to run this pipeline with Wave.