utiml: Utilities for multi-label learning

Adriano Rivolli

2021-05-28

Version: 0.1.5

The utiml package is a framework to support multi-label processing, like Mulan on Weka. It is simple to use and extend. This tutorial explain the main topics related with the utiml package. More details and examples are available on utiml repository.

1. Introduction

The general prupose of utiml is be an alternative to processing multi-label in R. The main methods available on this package are organized in the groups:

The utiml package needs of the mldr package to handle multi-label datasets. It will be installed together with the utiml1.

The installation process is similar to other packages available on CRAN:

install.packages("utiml")

After installed, you can now load the utiml package (The mldr package will be also loaded):

library("utiml")
## Loading required package: mldr
## Loading required package: parallel
## Loading required package: ROCR

The utiml brings two multi-label datasets. A synthetic toy dataset called toyml and a real world dataset called foodtruck. To understand how to load your own dataset, we suggest the read of mldr documentation. The toyml contains 100 instances, 10 features and 5 labels, its prupose is to be used for small tests and examples.

head(toyml)
##        att1      att2      att3      att4     att5      att6      att7
## 1 -0.150258  0.000461  0.237302  0.004333 0.086273  0.611953 -0.040632
## 2  0.219093  0.023877  0.038309 -0.041287 0.013978  0.277978  0.147673
## 3  0.137491  0.042125  0.011613  0.066545 0.388947 -0.312591 -0.163133
## 4 -0.318716 -0.054081  0.005198  0.085436 0.660657  0.011783  0.096005
## 5  0.004815  0.659007  0.023343 -0.135839 0.063470 -0.207688  0.091519
## 6  0.336280 -0.140629 -0.032099 -0.365930 0.004982  0.124665 -0.133950
##       iatt8     iatt9    ratt10 y1 y2 y3 y4 y5
## 1 -0.215861  0.447483  0.611953  1  1  0  1  0
## 2 -0.592199 -0.164926  0.277978  1  1  0  1  0
## 3 -0.426994 -0.564884 -0.312591  1  1  0  1  0
## 4 -0.526278  0.505936  0.011783  1  1  0  0  1
## 5  0.170262  0.389038 -0.207688  1  1  0  0  0
## 6  0.652938  0.961077  0.124665  1  1  0  0  0

The foodtruck contains different types of cousines to be predicted from user preferences and habits. The dataset has 12 labels:

foodtruck$labels
##                 index count       freq     IRLbl   SCUMBLE SCUMBLE.CV
## street_food        22   295 0.72481572  1.000000 0.1249889  1.0276904
## gourmet            23   120 0.29484029  2.458333 0.1396873  0.7994104
## italian_food       24    43 0.10565111  6.860465 0.2059097  0.4101859
## brazilian_food     25    72 0.17690418  4.097222 0.1463292  0.7305315
## mexican_food       26    41 0.10073710  7.195122 0.2491880  0.2759161
## chinese_food       27    16 0.03931204 18.437500 0.2831969  0.3981316
## japanese_food      28    36 0.08845209  8.194444 0.2113363  0.5936371
## arabic_food        29    25 0.06142506 11.800000 0.2840999  0.3441985
## snacks             30    67 0.16461916  4.402985 0.1526898  0.5780729
## healthy_food       31    33 0.08108108  8.939394 0.2138170  0.5414302
## fitness_food       32    30 0.07371007  9.833333 0.2268195  0.5120827
## sweets_desserts    33   154 0.37837838  1.915584 0.1730439  0.5959228

In the following section, an overview of how to conduct a multi-label experiment are explained. Next, we explores each group of methods and its particularity.

2. Overview

After load the multi-label dataset some data processing may be necessary. The pre-processing methods are utilities that manipulate the mldr datasets. Suppose that we want to normalize the attributes values (between 0 and 1), we can do:

mytoy <- normalize_mldata(toyml)

Next, we want to stratification the dataset in two partitions (train and test), containing 65% and 35% of instances respectively, then we can do:

ds <- create_holdout_partition(mytoy, c(train=0.65, test=0.35), "iterative")
names(ds)
## [1] "train" "test"

Now, the ds object has two elements ds$train and ds$test, where the first will be used to create a model and the second to test the model. For example, using the Binary Relevance multi-label method with the base algorithm Random Forest2, we can do:

brmodel <- br(ds$train, "RF", seed=123)
prediction <- predict(brmodel, ds$test)

The prediction is an object of class mlresult that contains the probability (also called confidence or score) and the bipartitions values:

head(as.bipartition(prediction))
##    y1 y2 y3 y4 y5
## 8   0  1  0  1  0
## 10  1  1  0  1  0
## 12  0  1  0  1  0
## 14  0  1  0  1  0
## 18  0  1  0  1  0
## 19  0  1  0  0  0
head(as.probability(prediction))
##       y1    y2    y3    y4    y5
## 8  0.196 0.948 0.046 0.904 0.216
## 10 0.588 0.956 0.006 0.610 0.104
## 12 0.188 0.942 0.090 0.698 0.248
## 14 0.384 0.850 0.040 0.798 0.280
## 18 0.292 0.838 0.416 0.640 0.296
## 19 0.088 0.844 0.316 0.472 0.086
head(as.ranking(prediction))
##    y1 y2 y3 y4 y5
## 8   4  1  5  2  3
## 10  3  1  5  2  4
## 12  4  1  5  2  3
## 14  3  1  5  2  4
## 18  5  1  3  2  4
## 19  4  1  3  2  5

A threshold strategy can be applied:

newpred <- rcut_threshold(prediction, 2)
head(newpred)
##    y1 y2 y3 y4 y5
## 8   0  1  0  1  0
## 10  0  1  0  1  0
## 12  0  1  0  1  0
## 14  0  1  0  1  0
## 18  0  1  0  1  0
## 19  0  1  0  1  0

Now we can evaluate the models and compare if the use of the MCUT threshold improved the results:

result <- multilabel_evaluate(ds$tes, prediction, "bipartition")
thresres <- multilabel_evaluate(ds$tes, newpred, "bipartition")

round(cbind(Default=result, RCUT=thresres), 3)
##                 Default  RCUT
## F1                0.681 0.718
## accuracy          0.557 0.581
## hamming-loss      0.223 0.211
## macro-AUC         0.565 0.565
## macro-F1          0.480 0.434
## macro-precision   0.587 0.490
## macro-recall      0.463 0.448
## micro-AUC         0.814 0.814
## micro-F1          0.711 0.730
## micro-precision   0.706 0.714
## micro-recall      0.716 0.746
## precision         0.710 0.714
## recall            0.752 0.810
## subset-accuracy   0.143 0.114

Details of the labels evaluation can be obtained using:

result <- multilabel_evaluate(ds$tes, prediction, "bipartition", labels=TRUE)
result$labels
##          AUC        F1  accuracy    balacc precision    recall TP TN FP FN
## y1 0.7133333 0.2857143 0.8571429 0.5833333 0.5000000 0.2000000  1 29  1  4
## y2 0.6620370 0.8852459 0.8000000 0.5625000 0.7941176 1.0000000 27  1  7  0
## y3 0.6494253 0.2500000 0.8285714 0.5660920 0.5000000 0.1666667  1 28  1  5
## y4 0.2859848 0.6923077 0.5428571 0.4204545 0.6428571 0.7500000 18  1 10  6
## y5 0.5166667 0.2857143 0.8571429 0.5833333 0.5000000 0.2000000  1 29  1  4

3. Pre-processing

The pre-processing methods were developed to facilitate some operations with the multi-label data. Each pre-processing method receives a mldr dataset and returns other mldr dataset. You can use them as needed.

Here, an overview of the pre-processing methods:

# Fill sparse data
mdata <- fill_sparse_mldata(toyml)

# Remove unique attributes
mdata <- remove_unique_attributes(toyml)

# Remove the attributes "iatt8", "iatt9" and "ratt10"
mdata <- remove_attributes(toyml, c("iatt8", "iatt9", "ratt10"))

# Remove labels with less than 10 positive or negative examples
mdata <- remove_skewness_labels(toyml, 10)

# Remove the labels "y2" and "y3"
mdata <- remove_labels(toyml, c("y2", "y3"))

# Remove the examples without any labels
mdata <- remove_unlabeled_instances(toyml)

# Replace nominal attributes
mdata <- replace_nominal_attributes(toyml)

# Normalize the predictive attributes between 0 and 1
mdata <- normalize_mldata(mdata)

4. Sampling

4.1 Subsets

If you want to create a specific or a random subset of a dataset, you can use the methods create_subset and create_random_subset, respectively. In the first case, you should specify which rows and optionally attributes, you want. In the second case, you just define the number of instances and optionally the number of attributes.

4.2 Holdout

To create two or more partitions of the dataset, we use the method create_holdout_partition. The first argument is a mldr dataset, the second is the size of partitions and the third is the partition method. The options are: random, iterative and stratified. The iterative is a stratification by label and the stratified is a stratification by labelset. The return of the method is a list with the names defined by the second parameter. See some examples:

4.3 k-Folds

The simplest way to run a k-fold cross validation is by using the method cv:

##              F1        accuracy    hamming-loss       precision          recall 
##          0.5191          0.4408          0.1519          0.6982          0.4810 
## subset-accuracy 
##          0.2580

To obtain detailed results of the folds, use the parameter cv.results, such that:

##           F1 accuracy hamming-loss precision recall subset-accuracy
##  [1,] 0.6500   0.5500         0.22    0.7500 0.6667             0.3
##  [2,] 0.6633   0.5583         0.26    0.7000 0.6833             0.2
##  [3,] 0.6233   0.5000         0.28    0.6833 0.6833             0.1
##  [4,] 0.7800   0.6833         0.16    0.8000 0.8167             0.4
##  [5,] 0.6467   0.5083         0.30    0.6500 0.7833             0.1
##  [6,] 0.6433   0.5167         0.24    0.6500 0.7333             0.1
##  [7,] 0.6400   0.5167         0.26    0.7000 0.6500             0.1
##  [8,] 0.7200   0.6083         0.24    0.7833 0.7500             0.3
##  [9,] 0.5233   0.4000         0.32    0.6667 0.5000             0.0
## [10,] 0.8067   0.7000         0.14    0.8500 0.8500             0.3
##            y1   y2   y3     y4   y5
## accuracy 0.82 0.76 0.78 0.6000 0.83
## balacc    NaN  NaN  NaN 0.4355 0.50
## TP       0.10 7.30 0.10 6.0000 0.00
## TN       8.10 0.30 7.70 0.0000 8.30
## FP       0.20 1.90 0.40 3.1000 0.00
## FN       1.60 0.50 1.80 0.9000 1.70

Finally, to manually run a k-fold cross validation, you can use the create_kfold_partition. The return of this method is an object of type kFoldPartition that will be used with the method partition_fold to create the datasets:

5. Classification Methods

The multi-label classification is a supervised learning task that seeks to learn and predict one or more labels together. This task can be grouped in: problem transformation and algorithm adaptation. Next, we provide more details about the methods and their specifities.

5.1 Transformation methods and Base Algorihtms

The transformation methods require a base algorithm (binary or multi-class) and use their predictions to compose the multi-label result. In the utiml package there are some default base algorithms that are accepted.

Each base algorithm requires a specific package, you need to install manually it, because they are not installed together with utiml. The follow algorithm learners are supported:

Use Name Package Call
CART Classification and regression trees rpart rpart::rpart(…)
C5.0 C5.0 Decision Trees and Rule-Based Models C50 C50::C5.0(…)
KNN K Nearest Neighbor kknn kknn::kknn(…)
MAJORITY Majority class prediction - -
NB Naive Bayes e1071 e1071::naiveBayes(…)
RANDOM Random prediction - -
RF Random Forest randomForest randomForest::randomForest(…)
SVM Support Vector Machine e1071 e1071::svm(…)
XGB eXtreme Gradient Boosting xgboost xgboost::xgboost(…)

To realize a classification first it is necessary to create a multi-label model, the available methods are:

Method Name Approach
br Binary Relevance (BR) one-against-all
brplus BR+ one-against-all; stacking
cc Classifier Chains one-against-all; chaining
clr Calibrated Label Ranking (CLR) one-versus-one
dbr Dependent Binary Relevance (DBR) one-against-all; stacking
ebr Ensemble of Binary Relevance (EBR) one-against-all; ensemble
ecc Ensemble of Classifier Chains (ECC) one-against-all; ensemble
eps Ensemble of Pruned Set (EPS) powerset
homer Hierarchy Of Multi-label classifiER (HOMER) hierarchy
lift Learning with Label specIfic FeaTures (LIFT) one-against-all
lp Label Powerset (LP) powerset
mbr Meta-Binary Relevance (MBR or 2BR) one-against-all; stacking
ns Nested Stacking (NS) one-against-all; chaining
ppt Pruned Problem Transformation (PPT) powerset
prudent Pruned and Confident Stacking Approach (Prudent) one-against-all; stacking
ps Pruned Set (PS) powerset
rakel Random k-labelsets (RAkEL) powerset
rdbr Recursive Dependent Binary Relevance (RDBR) one-against-all; stacking
rpc Ranking by Pairwise Comparison (RPC) one-versus-one

The first and second parameters of each multi-label method is always the same: The multi-label dataset and the base algorithm, respectively. However, they may have specific parameters, examples:

Beyond the parameters of each multi-label methods, you can define the parameters for the base algorithm, like this:

After build the model, To predict new data use the predict method. Here, some predict methods require specific arguments and you can assign arguments for the base method too. For default, all base learner will predict the probability of prediciton, then do not use these parameters. Instead of, use the probability parameter defined by the multi-label prediction method.

An object of type mlresult is the return of predict method. It always contains the bipartitions and the probabilities values. So you can use: as.bipartition, as.probability and as.ranking for specific values.

5.2 Algorithm adapatation

Until now, only a single adaptation method is available the mlknn.

5.3 Seed and Multicores

Almost all multi-label methods can run in parallel. The train and prediction methods receive a parameter called cores that specify the number of cores used to run the method. For some multi-label methods are not possible running in multi-core, then read the documentation of each method, for more details.

If you need of reproducibility, you can set a specific seed:

The cv method also supports multicores:

6. Thresholds

The threshold methods receive a mlresult object and return a new mlresult, except for scut that returns the threshold values. These methods, change mainly the bipartitions values using the probabilities values.

# Use a fixed threshold for all labels 
newpred <- fixed_threshold(prediction, 0.4)

# Use a specific threshold for each label 
newpred <- fixed_threshold(prediction, c(0.4, 0.5, 0.6, 0.7, 0.8))

# Use the MCut approch to define the threshold
newpred <- mcut_threshold(prediction)

# Use the PCut threshold
newpred <- pcut_threshold(prediction, ratio=0.65)

# Use the RCut threshold
newpred <- rcut_threshold(prediction, k=3)

# Choose the best threshold values based on a Mean Squared Error 
thresholds <- scut_threshold(prediction, toyml, cores = 2)
newpred <- fixed_threshold(prediction, thresholds)

#Predict only the labelsets present in the train data
newpred <- subset_correction(prediction, toyml)

7. Evaluation

To evaluate multi-label models you can use the method multilabel_evaluate. There are two ways of call this method:

toy <- create_holdout_partition(toyml)
brmodel <- br(toy$train, "SVM")
prediction <- predict(brmodel, toy$test)

# Using the test dataset and the prediction
result <- multilabel_evaluate(toy$test, prediction)
print(round(result, 3))
##                F1          accuracy average-precision               clp 
##             0.737             0.631             0.816             0.400 
##          coverage      hamming-loss         macro-AUC          macro-F1 
##             2.133             0.207             0.510             0.347 
##   macro-precision      macro-recall       margin-loss         micro-AUC 
##             0.307             0.400             1.200             0.756 
##          micro-F1   micro-precision      micro-recall               mlp 
##             0.748             0.767             0.730             0.600 
##         one-error         precision      ranking-loss            recall 
##             0.267             0.767             0.219             0.761 
##   subset-accuracy               wlp 
##             0.267             0.600
# Build a confusion matrix
confmat <- multilabel_confusion_matrix(toy$test, prediction)
result <- multilabel_evaluate(confmat)
print(confmat)
## Multi-label Confusion Matrix
## 
## Absolute Matrix:
## -------------------------------------
##              Expected_1 Expected_0 TOTAL
## Prediction_1         46         14    60
## Predicion_0          17         73    90
## TOTAL                63         87   150
## 
## Proportinal Matrix:
## -------------------------------------
##              Expected_1 Expected_0 TOTAL
## Prediction_1      0.307      0.093   0.4
## Predicion_0       0.113      0.487   0.6
## TOTAL             0.420      0.580   1.0
## 
## Label Matrix
## -------------------------------------
##    TP FP FN TN Correct Wrong  %TP  %FP  %FN  %TN %Correct %Wrong MeanRanking
## y1  0  0  3 27      27     3 0.00 0.00 0.10 0.90     0.90   0.10        3.70
## y2 22  8  0  0      22     8 0.73 0.27 0.00 0.00     0.73   0.27        1.00
## y3  0  0  6 24      24     6 0.00 0.00 0.20 0.80     0.80   0.20        3.43
## y4 24  6  0  0      24     6 0.80 0.20 0.00 0.00     0.80   0.20        2.00
## y5  0  0  8 22      22     8 0.00 0.00 0.27 0.73     0.73   0.27        4.87
##    MeanScore
## y1      0.19
## y2      0.80
## y3      0.20
## y4      0.64
## y5      0.13

The confusion matrix summarizes a lot of data, and can be merged. For example, using a k-fold experiment:

kfcv <- create_kfold_partition(toyml, k=3)
confmats <- lapply(1:3, function (k) {
  toy <- partition_fold(kfcv, k)
  model <- br(toy$train, "RF")
  multilabel_confusion_matrix(toy$test, predict(model, toy$test))
})
result <- multilabel_evaluate(merge_mlconfmat(confmats))

Its possible choose which measures will be computed:

# Example-based measures
result <- multilabel_evaluate(confmat, "example-based")
print(names(result))
## [1] "F1"              "accuracy"        "hamming-loss"    "precision"      
## [5] "recall"          "subset-accuracy"
# Subset accuracy, F1 measure and hamming-loss
result <- multilabel_evaluate(confmat, c("subset-accuracy", "F1", "hamming-loss"))
print(names(result))
## [1] "F1"              "hamming-loss"    "subset-accuracy"
# Ranking and label-basedd measures
result <- multilabel_evaluate(confmat, c("label-based", "ranking"))
print(names(result))
##  [1] "average-precision" "coverage"          "macro-AUC"        
##  [4] "macro-F1"          "macro-precision"   "macro-recall"     
##  [7] "margin-loss"       "micro-AUC"         "micro-F1"         
## [10] "micro-precision"   "micro-recall"      "one-error"        
## [13] "ranking-loss"
# To see all the supported measures you can try
multilabel_measures()
##  [1] "F1"                "accuracy"          "all"              
##  [4] "average-precision" "bipartition"       "clp"              
##  [7] "coverage"          "example-based"     "hamming-loss"     
## [10] "label-based"       "label-problem"     "macro-AUC"        
## [13] "macro-F1"          "macro-based"       "macro-precision"  
## [16] "macro-recall"      "margin-loss"       "micro-AUC"        
## [19] "micro-F1"          "micro-based"       "micro-precision"  
## [22] "micro-recall"      "mlp"               "one-error"        
## [25] "precision"         "ranking"           "ranking-loss"     
## [28] "recall"            "subset-accuracy"   "wlp"

8. How to Contribute

The utiml repository is available on (https://github.com/rivolli/utiml). If you want to contribute with the development of this package, contact us and you will be very welcome.

Please, report any bugs or suggestions on CRAN mail or git hub page.


  1. You may also be interested in mldr.datasets

  2. Requires the randomForest package.