Introduction

This document is designed for the developer who want to create a cloud provider or container extension for the DockerParallel package. We will discuss the structure of the package and calling order of the package function.

Package structure

The package assumes a server-worker-client structure, their relationship can be summarized as follow

{width=90%}

When the user creates a DockerCluster object, the object serves as the client. The server will receive the computation jobs from the client and send them to the workers. The server and workers are defined by the DockerContainer objects. They can be dynamically created on the cloud using the functions provided by the corresponding CloudProvider object. However, it is possible to have the server and some workers running outside of the cloud as the user might have them in some other platforms.

The design of the package is twofold. It contains two sets of APIs, user APIs and developer APIs. The user APIs are called by the user to start the cluster on the cloud. The developer APIs expose the internal functions that are used by the DockerCluster object to collect the container information and deploy the container on the cloud. Below shows the difference between the user and developer APIs

{width=90%}

The user issues the high-level command to the DockerCluster object(e.g. startCluster). The DockerCluster object will call the downstream functions to supply what the user needs. CloudConfig stores the cluster static settings such as jobQueueName and workerNumber. CloudRuntime keeps the runtime information of the server and workers. CloudProvider provides the functions to run the container on the cloud and DockerContainer defines which container should be used. As the cluster might need to deploy both the server and workers, the DockerCluster object defines both serverContainer and workerContainer as its slots. Since a DockerCluster object corresponds to a cluster on the cloud, all the before-mentioned components behave like the environment object in R. Therefore, it is possible to change the value in the DockerCluster object inside a function and broadcast the effect to the same object outside of the function.

The class CloudProvider and DockerContainer are generalizable, the developer can define a new cloud provider by defining a new class which inherits CloudProvider. The same rule applies to DockerContainer as well. In the rest of the document, we will discuss the implementation details of the DockerCluster object.

The big picture

In this section, we use the function DockerCluster$startCluster as an example to show how the components of the DockerCluster object work together to deploy a cluster on the cloud.

{width=90%}

The gray color means the function is from DockerCluster, red means the CloudProvider and blue means the DockerContainer. Note that the flowchart only contains these three classes as CloudConfig and CloudRuntime are purely the class for storing the cluster information. In most cases, it is DockerCluster's job to manage the data in the cluster, there is no need to change the value of CloudConfig and CloudRuntime in the function defined for CloudProvider or DockerContainer. The only exception is the function reconnectDockerCluster where the cloud provider is responsible for setting all the values in the cluster.

Accessor functions

The package provides getters/setters for the developer to access the values in the DockerCluster object. The argument is always the DockerCluster object. The high-level accessors are

.getCloudProvider
.getCloudConfig
.getServerContainer
.getWorkerContainer
.getCloudRuntime

Note that there are no setters for them as they are of the reference class. The accessors for the CloudConfig values are

## Getter
.getJobQueueName
.getWorkerNumber
.getWorkerHardware
.getServerHardware
.getServerWorkerSameLAN
.getServerClientSameLAN
.getServerPassword
.getServerPort

## Setter
.setJobQueueName
.setWorkerNumber
.setWorkerHardware
.setServerHardware
.setServerWorkerSameLAN
.setServerClientSameLAN
.setServerPassword
.setServerPort

The accessors for the CloudRuntime values are

## Getter
.getWorkerHandles
.getServerHandle
.getServerPrivateIp
.getServerPublicIp

## Setter
.addWorkerHandles
.removeWorkerHandles
.setServerHandle
.setServerPrivateIp
.setServerPublicIp

In most cases, only the getter are required to be called in the CloudProvider or DockerContainer functions.

The docker container

DockerContainer defines the docker image and its environment variables. Its class definition is

setRefClass(
    "DockerContainer",
    fields = list(
        name = "CharOrNULL",
        maxWorkerNum = "integer",
        environment = "list",
        image = "character"
    )
)

where name is the an optional slot for the developer to name and distinguish the server and worker container, maxWorkerNum defines the maximum number of workers that can be run in the same container. environment is the environment variable in the container. image is the image used by the container. Note that a minimum container should define at least the image slot. The other slots are optional.

The generic function for the DockerContainer class are

configServerContainerEnv
configWorkerContainerEnv
registerParallelBackend
deregisterParallelBackend
getServerContainer
getExportedNames
getExportedObject

A minimum container should override configServerContainerEnv, configWorkerContainerEnv and registerParallelBackend. The rest is optional, but in most cases they should also be defined.

Configure the container environment

The motivation for configuring the container environment is to allow developer to pass the server/worker IP, port, password to the container before deploying them on the cloud(please see The big picture). Therefore, server can be password protected and worker can find the server via the settings in the environment variable. The function prototypes are

setGeneric("configServerContainerEnv", function(container, cluster, verbose){
    standardGeneric("configServerContainerEnv")
})

and

setGeneric("configWorkerContainerEnv", function(container, cluster, workerNumber, verbose){
    standardGeneric("configWorkerContainerEnv")
})

where container is the worker container and cluster is the DockerCluster object. workerNumber defines how many workers should be run in the same container. Please keep in mind that the user might also define the environment variables in the container, so you should insert your environment variables to DockerContainer$environment, not overwrite the entire environment object.

Since the container object is a reference object, it is recommended to call $copy() before set the environment variable to avoid the side affect.

Parallel backend

It is the container's duty to register the parallel backend as neither the cloud provider nor the cluster knows which backend your container supports. The function prototypes are

setGeneric("registerParallelBackend", function(container, cluster, verbose, ...){
    standardGeneric("registerParallelBackend")
})

and

setGeneric("deregisterParallelBackend", function(container, cluster, verbose, ...){
    standardGeneric("deregisterParallelBackend")
})

where container is the worker container and cluster is the DockerCluster object. The backend can be anything defined globally that the user could use to do the parallel computing. The popular choices are foreach or BiocParallel. Other backends are also possible. The default deregisterParallelBackend can deregister the foreach backend. If your backend is not from foreach, you should also define deregisterParallelBackend.

Server container

The function getServerContainer is purely for obtaining the server container from the worker container. By doing so the user only need to provide the worker container to the cluster. The prototype is

setGeneric("getServerContainer", function(workerContainer){
    standardGeneric("getServerContainer")
})

This function is optional, but we recommend to define it as otherwise the user must explicitly provide both the server and worker container to the cluster if both need to be deployed by the cloud.

Extension to the container

You can export the functions and variables in the container via getExportedNames and getExportedObject. The prototypes are

setGeneric("getExportedNames", function(x){
    standardGeneric("getExportedNames")
})

and

setGeneric("getExportedObject", function(x, name){
    standardGeneric("getExportedObject")
})

getExportedNames defines the exported names and getExportedObject gets the exported object. They will be called by the cluster and the user can see the exported objects from DockerCluster$serverContainer$... and DockerCluster$workerContainer$....

If the exported object is a function, the exported function will be defined in an environment such that the DockerCluster object is assigned to the variable cluster. In other words, the exported function can use the variable cluster without define it. For example, if we export the function

foo <- function(){
  cluster$startCluster()
}

the user can call foo via DockerCluster$workerContainer$foo() with no argument and the cluster will be started. This can be useful if the developer needs to change anything in the cluster without asking the user to provide the DockerCluster object. If the function has the argument cluster, the argument will be removed from the function when the function is exported to the user. The user would not be bothered with the redundant cluster argument.

The cloud provider

CloudProvider provides functions to deploy the container on the cloud. Its generic functions are

initializeProvider
runDockerServer
runDockerWorkers
getDockerInstanceIps
getDockerInstanceStatus
IsDockerInstanceInitializing
IsDockerInstanceRunning
IsDockerInstanceStopped
killDockerInstances
dockerClusterExists
reconnectDockerCluster

The function names should be self-explained. A minimum cloud provider only needs to define runDockerServer, runDockerWorkers and getDockerInstanceIps. However, many important features will be missing if you do not define the optional functions.

Initialize the cloud provider

initializeProvider allows the developer to initialize the cloud provider. It will be automatically called by the DockerCluster before running the server and workers on the cloud. This function can be omitted if the cloud provider does not require any initialization. Its generic is

setGeneric("initializeProvider", function(provider, cluster, verbose){
    standardGeneric("initializeProvider")
})

where the provider is the cloud provider object, cluster is the DockerCluster object which contains the provider. verbose is an integer showing the verbose level.

run the server/worker container

runDockerServer and runDockerWorkers implement the core functions of the cloud provider. The generics are

setGeneric("runDockerServer", function(provider, cluster, container, hardware, verbose){
    standardGeneric("runDockerServer")
})

setGeneric("runDockerWorkers",
           function(provider, cluster, container, hardware, workerNumber, verbose){
    standardGeneric("runDockerWorkers")
})

where container should be an object that has DockerContainer as its parent class. hardware is from the class DockerHardware. workerNumber indicates how many workers should be run on the cloud. The return value of the runDockerServer should be a server handle and runDockerWorkers should be a list of worker handles where each handle corresponds to a worker. The worker handles can be duplicated if multiple workers share the same container. The handle can be any object that is used by the cloud provider to identify the running container.

Not all slots in the container object need to be used by the cloud provider. Only the slots name, environment and image should be handled. The others will be processed by the container object itself. The environment slot in the container has been configured before passing to the cloud provider as The big picture) shows. However, the worker number is set to 1 for each container. If the cloud provider have enough resources to support multiple workers in one container, the provider should call configWorkerContainerEnv with the new worker number again to overwrite the previous settings. This can be useful when deploying multiple containers with one worker in each container are slower than deploying one container with multiple workers. It is providers responsibility to make sure the worker number does not exceed the number limitation specified in container$maxWorkerNum

The argument hardware specifies the hardware for each container. As the hardware is different for each cloud provider, the base class DockerHardware only contains the most import hardware parameters, that is, it only has the slots cpu, memory and id. Although the cluster does not have any hard restriction on how these three parameters will be explained by the cloud provider, we recommend to explain them as follow

  1. cpu: The CPU unit used by each worker, 1024 units corresponds to a physical CPU core.
  2. memory: The memory used by each worker, the unit is MB.
  3. id: The id of the hardware. It is a character that is meaningful for the provider. This can be missing if the cloud provider do not need this slot.

Get the IPs of the running container

getDockerInstanceIps needs to return the public/private IP of the container. Its generic is

setGeneric("getDockerInstanceIps", function(provider, instanceHandles, verbose){
    standardGeneric("getDockerInstanceIps")
})

where instanceHandles is a list of handles that are returned by runDockerServer or runDockerWorkers. The return value should be a data.frame with two columns publicIp and privateIp with each row corresponds to a handle in instanceHandles. If an instance does not have the public IP, it should return an empty character “”.

Get the status of the instance

The function getDockerInstanceStatus, IsDockerInstanceInitializing, IsDockerInstanceRunning and IsDockerInstanceStopped are used to query the status of the running docker instance. The developer only needs to define getDockerInstanceStatus and the rest can be done by the default methods. The generic for getDockerInstanceStatus is

setGeneric("getDockerInstanceStatus", function(provider, instanceHandles, verbose){
    standardGeneric("getDockerInstanceStatus")
})

where instanceHandles is a list of handles that are returned by runDockerServer or runDockerWorkers. The return value is a character vector with each element corresponding to a handle in instanceHandles. The vector element must be one of the three values "initializing", "running" or "stopped"

Kill a running docker

killDockerInstances should be able to kill the container instance given the instance handle. Its generic is

setGeneric("killDockerInstances", function(provider, instanceHandles, verbose){
    standardGeneric("killDockerInstances")
})

where instanceHandles is a list of handles that are returned by runDockerServer or runDockerWorkers.

Check if the same cluster has existed on the cluster

Sometimes users might have a running cluster on the cloud and want to reuse the cluster. The functions dockerClusterExists and reconnectDockerCluster are designed to achieve it. The generics are

setGeneric("dockerClusterExists", function(provider, cluster, verbose){
    standardGeneric("dockerClusterExists")
})
setGeneric("reconnectDockerCluster", function(provider, cluster, verbose){
    standardGeneric("reconnectDockerCluster")
})

A cluster is defined by the jobQueueName, if a running cluster on the cloud has the same job queue name, it should be treated as the cluster the user wants. Unlike the previous functions, reconnectDockerCluster is responsible to set all missing values in CloudConfig and CloudRuntime in the cluster as it is hard for the user to provide the correct information to the cluster. Please use the setters defined in Accessor functions) to set them.

Extension to the provider

The provider also supports exporting APIs to the user, it follows the same rule as the container and user can find them in cluster$cloudProvider$.... Please see Extension to the container) for the details.

Unit test for the extension

We provide a general purpose unit test function for the developers to test their extensions. The test function uses testthat framework. Since the package needs a provider and a container package to work, the developer needs to define both components in the test file.

provider <- ECSFargateProvider::ECSFargateProvider()
container <- BiocFEDRContainer::BiocFEDRWorkerContainer()
generalDockerClusterTest(
  cloudProvider = provider, 
  workerContainer = container,
  workerNumber = 3L,
  testReconnect = TRUE)

Please see ?generalDockerClusterTest for more information,