Coding overview

data in human readable and tidyverse format

a typical spreadsheet might show some ORSA (own risk and solvency assessment) in the shape represented by the following data.frame:

id	time	ratio	SCR	BSCR	operational	life	market	l_expenses	l_CAT	m_equity	and so on
1	2017	230	100	80	25	33	50	..	..	..	..
2	2018	225	103	85	25	33	57	..	..	..	..
3	2019	227	107	90	23	37	60	..	..	..	..
..

One can discern several parts. The first columns are id of each SCR composition and its ‘meta’ attributes (time, ratio). The further columns describe the components of each SCR item. The value of each item is in the crossing of its corresponding column and row.

data in ‘ggplot2’ prescribe format (tidyverse format)

ggplot2, the foundation on which the plotting part of this package is build expects data in a tidyverse format. Each row in the data describes only one data point i.e. value of SCR item for one specific ‘id’.

the following code is used from transferring data (for example 2, a single SCR plot) in a spreadsheet the same form as the “human format” as above to tidyverse format (the numbers differ though !)

data <- readxl::read_xlsx(path = "path/filename.xlsx",sheet = "ex2_data")
  data <- tidyr::gather(data,
                        key = description,
                        value = value,
                        -id, -time, -ratio)
  sii_z_ex2_data <- data.frame(   time = as.numeric(data$time),
                                      ratio = as.numeric(data$ratio),
                                      description = data$description,   # it has to be a factor !!
                                      value = as.numeric(data$value),
                                      id = data$id

 head(sii_z_ex2_data,7)
#>   time ratio      description value id
#> 1 2017   230              SCR    30  1
#> 2 2017   230             BSCR    35  1
#> 3 2017   230      operational     5  1
#> 4 2017   230 Adjustment-LACDT   -10  1
#> 5 2017   230         BSCR_div    -5  1
#> 6 2017   230           market    20  1
#> 7 2017   230             life    15  1

ggsolvencyii: data transformations

when the above data is passed to the package with (a very) basic line as

ggplot() + geom_sii_risksurface(data = sii_z_ex2_data , mapping = aes(x=time, y = ratio, id=id, value = value, description = description))

a lot happens under the hood. Broadly speaking the next steps are taken for geom_sii_surface and .._outline:

 1. when `geom_sii_riskoutline` is used for comparison of id's, risk-values are moved between data rows
 2. the structure of the SCR composition a expanded with grouping information
 3. the expanded structure is integrated with the data
 4. actual grouping is performed by adding rows 
 5. for all elements to be plotted the corner-coordinates of the circle segments are calculated
 6. when applicable rotation and/or "squarification" is applied by changing the corner-coordinates
 7. corner coordinates are transformed in a series of points for polygons

shuffling with riskvalues in the data

geom_sii_riskoutline plots (some of) the outlines of circle segment and as such can be used for a non-obtrusive plot, or for an overlay of the composition of one SCR over the other (see use in vignette showcase. To prevent the need of working with two separate datasets the optional aesthetic comparewithid is present in geom_sii_outline. It is best explained with an example. Compare the data of sii_z_ex1_data with the expanded structures without and with use of the comparewithid-aesthetic. It shows that the structure of id = 1 is not plotted anymore at its own location (2016,230) but three times in 201: Value 23 for SCR is now present three times in the data. This transformation is used for all (sub)risks.

## the original data
      sii_z_ex1_data[sii_z_ex1_data$description == "SCR", ]
#>    time ratio description    value id comparewithid
#> 1  2016   230         SCR 23.00000  1            NA
#> 2  2017   233         SCR 23.14993  2             1
#> 3  2018   238         SCR 19.99461  3             2
#> 4  2019   243         SCR 15.61773  4             3
#> 5  2017   231         SCR 19.60600  5             1
#> 6  2018   232         SCR 25.74336  6             5
#> 7  2019   232         SCR 21.91342  7             6
#> 8  2017   227         SCR 25.08169  8             1
#> 9  2018   225         SCR 22.43068  9             8
#> 10 2019   226         SCR 21.91607 10             9

#> without passing the aesthetic 'comparewithid`: 10 lines of data
#>    description id    x   y    value
#> 35         SCR  1 2016 230 23.00000
#> 34         SCR  2 2017 233 23.14993
#> 33         SCR  3 2018 238 19.99461
#> 31         SCR  4 2019 243 15.61773
#> 39         SCR  5 2017 231 19.60600
#> 38         SCR  6 2018 232 25.74336
#> 32         SCR  7 2019 232 21.91342
#> 36         SCR  8 2017 227 25.08169
#> 37         SCR  9 2018 225 22.43068
#> 40         SCR 10 2019 226 21.91607
#> and with passing passing the aesthetic 'comparewithid': 9 lines of data
#>    description id    x   y    value
#> 28         SCR  2 2017 233 23.00000
#> 31         SCR  3 2018 238 23.14993
#> 32         SCR  4 2019 243 19.99461
#> 29         SCR  5 2017 231 23.00000
#> 33         SCR  6 2018 232 19.60600
#> 34         SCR  7 2019 232 25.74336
#> 35         SCR  8 2017 227 23.00000
#> 30         SCR  9 2018 225 25.08169
#> 36         SCR 10 2019 226 22.43068

structure: levels, levels, levels…

The foundation of the package is the structure. A representation of the buildup of the SCR from its risks and subrisks. This structure is applied as a data.frame passed as a parameter to the geom’s geom_sii_surface and geom_sii_outline. The default data.frame is sii_structure_sf16_eng where ‘sf16’ stands for the standard formula as of 2016, and ‘eng’ for English descriptions.

 head(sii_structure_sf16_eng, 15)
#> # A tibble: 15 x 3
#>    description      level childlevel
#>    <chr>            <chr> <chr>     
#>  1 SCR              1     2         
#>  2 BSCR             2     3         
#>  3 operational      2     <NA>      
#>  4 Adjustment-LACDT 2d    <NA>      
#>  5 BSCR_div         3d    <NA>      
#>  6 market           3     4.01      
#>  7 life             3     4.02      
#>  8 non-life         3     4.03      
#>  9 health           3     4.04      
#> 10 cp-default       3     <NA>      
#> 11 intangibles      3     <NA>      
#> 12 market_div       4.01d <NA>      
#> 13 m_interestrate   4.01  <NA>      
#> 14 m_equity         4.01  <NA>      
#> 15 m_property       4.01  <NA>

A Dutch version, sii_structure_sf16_nld, is present in the package.

The hierarchy of the elements in description is determined by level and their components (childlevel). SCR has a mandatory level (character value) “1”. rows with a suffix ‘d’ indicate a diversification item.

For other localizations or for use with internal models another structure can be passed to the geom. see my interpretation of the Internal Model of the dutch insurer “nationale nederlanden” in sii_z_ex6_structure. Changing level-numbering or descriptions of items leads possible to the need of changing other (parameter) files as well (i.e. levelmax, plotdetails, coloring-sets).

expanding the structure: possible grouping

When reporting the SCR composition of a large insurance company many risks will be present. This can lead to a very cluttered plot where all information is present but which is difficult to interpret. The package provides the means to restrict the amount of items to ‘k’ (in general or for each level separately) by means of the parameter levelmax. this can be an integer, to applied to all items or in the form of a data.frame. The default value is 99, only grouping for risks with more than 100 sub-risks….

Parameter levelmax = sii_levelmax_sf16_995 shows all higher levels (lower level numbers) but restricts the lower levels (higher numbers) to 4 individual risks and 1 grouping of the smallest risks in that level.

sii_levelmax_sf16_995
#> # A tibble: 8 x 2
#>   level levelmax
#>   <chr>    <dbl>
#> 1 1           99
#> 2 2           99
#> 3 3           99
#> 4 4.01         5
#> 5 4.02         5
#> 6 4.03         5
#> 7 4.04         5
#> 8 5            5

Combining the structure and the levelmax-information leads to an expanded structure of which the lines for levels 3 and 4.01 are shown here:

#> # A tibble: 15 x 4
#>    description     level childlevel levelmax
#>    <chr>           <chr> <chr>         <dbl>
#>  1 market          3     4.01             99
#>  2 life            3     4.02             99
#>  3 non-life        3     4.03             99
#>  4 health          3     4.04             99
#>  5 cp-default      3     <NA>             99
#>  6 intangibles     3     <NA>             99
#>  7 market_div      4.01d <NA>             99
#>  8 m_interestrate  4.01  <NA>              5
#>  9 m_equity        4.01  <NA>              5
#> 10 m_property      4.01  <NA>              5
#> 11 m_spread        4.01  <NA>              5
#> 12 m_currency      4.01  <NA>              5
#> 13 m_concentration 4.01  <NA>              5
#> 14 m_illiquidity   4.01  <NA>              5
#> 15 market_other    4.01o <NA>             99

The row with level 4.01o is the added row. The description is derived from the row where childlevel = 4.01 and the value of the parameter aggregatesuffix (default value is “other”).

integration with data and actual grouping

The data (in tidyverse format!) is combined with the expanded structure by means of a left-join on the side of the data. Because the data is not expected to have o-lines for integration they will not be present in the merged table. When a possible grouping line is present in the expanded structure a check is conducted whether the data contains so much risks for that level that actual grouping is needed. (The dataset can contain less risks than the structure which is used; i.e. a pure life-insurance company can use the standard sii_structure_sf16_eng without any problems)

Now it’s known which lines in the expanded structure/data-data.frame should be plotted it is time to convert the date into circle segments. For the data-row with the largest SCR value it is defined as a full circle with radius = 1whatever the values of x and y. When combining several calls to geom_sii_risksurface and/or _riskoutline the parameter maxscrvalue overwrites this extracted value. All plot-elements are scaled to the surface value of the item. additional manual horizontal and vertical scaling is possible, depending on the range of x and y values of the axes to retain the round shape.

For other levels the circle segments are defined by an inner and outer radius and a number of (compass-)degrees of the first and last radial line (clockwise). the inner radius is defined by the outer radius of the next higher level. the number of compass-degrees is defined by the fraction of the value of each item and its (equal leveled) ‘peers’. The value / surface dictates the outer radius.

When applicable a rotation is performed, a rotation in such a way that the first radial line of a specific (sub)risk point to 12 ’o clock, and/or an added fixed rotation.

A final transformation to a squared form is possible. to keep surfaces correct the ‘radial’-lines are adjusted. This might lead to unpredictable results in combination with a rotation which is not a multiple of 45 degrees or description-based rotation.

The (transformed/rotated) corner points are translated in polygon points (for geom_sii_risksurface) or line segments (for geom_sii_riskoutline)

The final step is to define which of all these polygons or line segments actually will be plotted. By default everything will be plotted but passing a dataframe to parameter plotdetails can determine this on a level-level or a description-level.

In the showcase two data-frames are used, only differing in column surface, but equal for outline1 to outline13. one of them is shown here.

sii_z_ex1_plotdetails
#>    levelordescription surface outline1 outline2 outline3 outline4
#> 1                   1    TRUE       NA     TRUE       NA       NA
#> 2                   2    TRUE     TRUE       NA     TRUE       NA
#> 3                  2d    TRUE       NA       NA       NA       NA
#> 4                   3    TRUE     TRUE     TRUE     TRUE       NA
#> 5                  3d    TRUE       NA       NA       NA       NA
#> 6                4.01   FALSE       NA     TRUE       NA       NA
#> 7               4.01d   FALSE       NA       NA       NA       NA
#> 8               4.01o   FALSE       NA     TRUE       NA       NA
#> 9                4.02   FALSE       NA     TRUE       NA       NA
#> 10              4.02d   FALSE       NA       NA       NA       NA
#> 11              4.02o   FALSE       NA     TRUE       NA       NA
#> 12        operational      NA     TRUE     TRUE     TRUE       NA
#> 13         cp-default      NA     TRUE     TRUE     TRUE       NA
#>    outline11 outline13
#> 1       TRUE      TRUE
#> 2         NA        NA
#> 3         NA        NA
#> 4         NA        NA
#> 5         NA        NA
#> 6       TRUE      TRUE
#> 7         NA        NA
#> 8       TRUE      TRUE
#> 9       TRUE      TRUE
#> 10        NA        NA
#> 11      TRUE      TRUE
#> 12        NA        NA
#> 13        NA        NA

surface is used by geom_sii_risksurface, the other columns by geom_sii_riskoutline. It can best be read as follows. for each risk the line of the corresponding level is used, possibly overrule by the line with the correct description and a explicit TRUE or FALSE present.

Coding overview

Marco van Zanden

2019-01-03

interested ? ….. anyone ????

Vignettes

Have you seen the examples in vignettes ggsolvency and showcase yet?