W4MRUtils

Introduction: package history and purpose

Workflow4Metabolomics (W4M) provides tools to help people to process metabolomic datasets. Its preferred Chanel is the web-based interface Galaxy, particularly suited for analysts with no skill regarding software programing nor programing language in general. Galaxy also has the advantage to be compatible with a large variety of languages, enabling to combine tools coded in various programing languages, which is a huge advantage regarding to the diversity of steps (and thus tools) needed for comprehensive metabolomic data processing.

Nonetheless, a significant part of the tools provided by W4M are coded using R. Due to the diversity of tools, W4M chose to adopt common standards across its tools, in particular regarding data table formats and basics data checks. It also intents to gather utility functions to help the construction of R-based Galaxy tools using W4M standards, along with functions to easily handle basic actions given the W4M table data format.

The W4MRUtils package is meant to gather such functions, to enable W4M R-based Galaxy tools to have easy access to these standardised approaches.

How to use the package

What you should know about the W4M standards

Some functions are closely linked to Galaxy integration as the parse_args function for example, but a large variety of functions can be helpful independently of Galaxy. To use these functions, it is useful to understand the choices made by W4M, in particular regarding the formats, as these will be often used as input parameters for functions.

The main format aspect to consider is the W4M 3-tables format. Metabolomic data, similarly to other types of ’omics data, can be divided in 3 main types of information:

These 3 types of information can be stocked in 3 distinct tables. In the package, the chosen format is the following:

These 3 tables are standard inputs and/or outputs for several of the functions found in this package. Consequently, when you need to specify sample-wise or variable-wise information, you will generally be invited to stock it in one of these tables (same goes for outputs of functions, where to find the generated information).

About the types of functions in the package

There are several types of functions in the package. Globally, you will find functions enabling the following:

Usage example for galaxy tools

library(W4MRUtils)

TOOL_NAME <- "A Test Tool" #nolint
logger <- get_logger(TOOL_NAME)
logger$info("Parsing parameters...")
#> [   info-A Test Tool-16:31:35] - Parsing parameters...
args <- optparse_parameters(
  input = optparse_character(),
  output = optparse_character(),
  threshold = optparse_numeric(default = 10),
  args = commandArgs()
)
logger$info("Parsing parameters OK.")
#> [   info-A Test Tool-16:31:35] - Parsing parameters OK.
show_galaxy_header(
  TOOL_NAME,
  "1.2.0",
  args = args,
  logger = logger,
  show_sys = FALSE
)
#> [   info-A Test Tool-16:31:35] - Job starting time:
#> [   info-A Test Tool-16:31:35] - ven. 08 sept. 2023 16:31:35
#> [   info-A Test Tool-16:31:35] - ------------------------------------------------------------------------------
#> [   info-A Test Tool-16:31:35] - Parameters used in A Test Tool - version 1.2.0:
#> [   info-A Test Tool-16:31:35] - 
#> [   info-A Test Tool-16:31:35] - List of 4
#> [   info-A Test Tool-16:31:35] -  $ help     : logi FALSE
#> [   info-A Test Tool-16:31:35] -  $ input    : chr "/tmp/RtmpsJ5uhs/./input.csv"
#> [   info-A Test Tool-16:31:35] -  $ output   : chr "/tmp/RtmpsJ5uhs/./output.csv"
#> [   info-A Test Tool-16:31:35] -  $ threshold: num 2
#> [   info-A Test Tool-16:31:35] - ------------------------------------------------------------------------------
check_parameters <- function(args, logger) {
  logger$info("Checking parameters...")
  check_one_character(args$input)
  check_one_character(args$output)
  check_one_numeric(args$threshold)
  if (args$threshold < 1) {
    logger$errorf(
      "The threshold is too low (%s). Cannot continue", args$threshold
    )
    stopf("Threshold = %s is tool low.", args$threshold)
  }
  if (args$threshold < 3) {
    logger$warningf(
      paste(
        "The threshold is very low (%s).",
        "This may lead to erroneous results."
      ),
      args$threshold
    )
  }
  logger$info("Parameters OK.")
}

read_input <- function(args, logger) {
  logger$infof("Reading file in %s ...", args$input)
  return(read.csv(args$input))
}

process_input <- function(args, logger, table) {
  filter <- table[1, ] > args$threshold
  logger$infof(
    "%s values filtered (%s%%)",
    length(filter),
    round(length(filter) / nrow(table) * 100, 2)
  )
  if (length(filter) == nrow(table)) {
    logger$warning("All values have been filtered.")
  }
  return(table[which(table[, 1] > args$threshold), ])
}

write_output <- function(args, logger, output) {
  logger$info("Writing output file...")
  write.table(output, args$output)
}

# check_parameters(args, logger$sublogger("Param Checking"))
# input <- read_input(args, logger$sublogger("Inputs Reader"))
# print(input)
# output <- process_input(args, logger$sublogger("Inputs Processing"), input)
# print(output)
# write_output(args, logger$sublogger("Output Writter"), output)
# show_galaxy_footer(TOOL_NAME, "1.2.0", logger = logger, show_packages = FALSE)