1 Introduction and Motivation for 'portsort'

Portfolio sorts have been a key part of empirical financial research since the 1970s. Sorting procedures are used extensively in identifying and exploring relationships between expected returns and asset class characteristics. The portfolio sorting approach has become widely used and is currently the dominant approach in empirical finance to test for and establish cross-sectional relationships between expected asset returns and asset characteristics. Past empirical work featuring portfolio sorts include the use of price-to-earnings ratios @basu1977investment, book-to-market values @fama1992cross, firm size @banz1981relationship, volatility @ang2006cross, cross-sectional momentum @jegadeesh1993returns and host of over featuring factors such as liquidity, default risk and downside risk (Value-at-Risk, Lower Partial Moments etc.).

The portsort R package includes functionality for both a conditional sort and an unconditional sort up to the third dimension, i.e assets can be grouped by up to three factors. After the sorting procedure has been conducted, there is further functionality to analyse the turnover, relative frequency of the assets in each sub-portfolio and the mean sub-portfolio size.

The generic approach in empirical finance is to first sort assets based on some factor into multiple sub-portfolios at some formation point and then examine the out-of-sample performance of these sub-portfolios. Often, long-short zero-cost portfolios are constructed by initiating positions in the top and bottom sub-portfolios (in the case of a univariate sort). The portsort package simplifies this process by allowing the user to automatically backtest univariate, double and triple-sorted portfolios. The output of the core sorting functions includes the out-of-sample sub-portfolio returns and a list of the assets in each sub-portfolio. From this, you can create trading strategies (long-short portfolios), pass the sub-portfolios to a regression analysis or conduct further analysis on the sub-portfolios themselves. The primary goal of the package is to offer academic researchers, students and practitioners an easy way to conduct the portfolio sort procedure.

2 Conducting Conditional and Unconditional Sorts

The portsort package offers functionality for both a conditional and unconditional sort - a concept that is still, for unknown reasons, glossed over by most lecturers in undergraduate and MBA financial economics coursework. The differences can be profound and have a major impact on the out-of-sample portfolio results. In the case of a univariate portfolio sort, the results are identical, however, higher dimension sorts will differ markedly.

2.1 Conditional Sorting Procedure

The conditional sorting procedure is best explained by example. Assume an asset manager has identified two stock factors which he/she thinks are drivers of expected stock returns due to prior statistical analysis. Let factor one (Fa) be market capitalization and factor two (Fb) be the book-to-market ratio. A conditional double sort strongly depends on which factor the asset manager decides to sort the stocks by first. Let us assume that the manager thinks that market capitalization has a stronger effect on expected returns. At time t, stocks are first sorted into n sub-portfolios based on market capitalization. Instead of using quantile break-points, the manager decides to form 3 sub-portfolios (tercile portfolios). The result is 3 sub-portfolios sorted from low to high based on market capitalization (a basic univariate sort). Now, within these 3 sub-portfolios, the manager decides to sort each sub-portfolio into a further m sub-portfolios based on the book-to-market ratio. Let us assume the manager decides on terciles for ease of interpretation. This means that the stocks within each market capitalization sub-portfolio are further sorted from low to high resulting in 3 book-to-market sorted sub-portfolios within the market capitalization sorted sub-portfolio tercile. Overall, 9 sub-portfolios are formed (there are 3 book-to-market sorted portfolios within each market capitalization sorted tercile). Any arbitrary number of sub-portfolios can be constructed based on the dimensions of each sort. The maximum amount of sub-portfolios in the case of a double sort is n x m and in the case of a triple sort n x m x z.

2.2 Unconditional Sorting Procedure

In an unconditional sort, the order of the sort does not matter, i.e the manager will get the same results regardless of the sort order. Using the example above, Fa and Fb are sorted independently (at the same time) into terciles. This results in 'Sort 1' which is the stock universe sorted by market capitalization from low to high. These same stocks (regardless of the first sort) are sorted into book-to-market terciles called 'Sort 2'. We now have two univariate sorts stored in tercile portfolios which are sorted independently of each other. This is the crucial difference between a conditional and unconditional sort - the double or triple sorting procedure is based on the intersection between each sub-portfolio (in this case terciles). For example, if stock 1, 23 and 40 is in the 1st market capitalization tercile, while stock 1, 20 and 43 is in the 1st book-to-market tercile, the 'L1-L1' sub-portfolio would only contain stock 1 and stock 43. The result is that each sub-portfolio will contain stocks that are the intersection of each independently sorted sub-portfolio.

2.3 Impact on Sub-Portfolios

The key difference between the two sorting procedures is that in a conditional sort, the factor which is sorted on first impacts the second sort. This means that the sub-portfolios formed within the portfolios formed by the first sort are 'conditional' on the first factor. This may seem obvious, but will result in large differences between sub-portfolios formed in the unconditional sorting process. In the case of the conditional sort, the factor that is sorted on first will have a much greater influence on forward returns than the the other factors that are subsequently sorted conditional on the first factor sort.

3. Functions included in portsort

3.1 conditional.sort and unconditional.sort

Both functions compute either a univariate, double or triple dimension sort. Function output includes an xts object of sub-portfolio returns and a list of the sub-portfolios. Both the conditional and unconditional sorts are pre-built to sort from low to high so that the interpretation of the out-of-sample return xts object is easier. For example, a double sort with dimension 3 x 3 (i.e there are 3 break-points for each factor) is conducted with Fa and Fb. A graphical depiction of the sort is shown below:

Portfolio composition matrix

The out-of-sample returns for each sub-portfolio are stored in columns 1 to 9 of the output matrix. Column 1 includes the returns for the 'Low-Low' sub-portfolio, column 4 includes returns for the 'Mid-Low' sub-portfolio whilst column 9 includes the 'High-High' sub-portfolio returns. The dimension of each sort can be defined with user-defined break-points.

# Load the portsort package and the pre-loaded data
library(portsort)
library(PerformanceAnalytics)
library(xts)
data(Factors)
# Lagged returns, lagged volumes are stored in the Factors list
R.Forward = Factors[[1]]; R.Lag = Factors[[2]]; V.Lag = Factors[[3]]
Fa = R.Lag; Fb = V.Lag
#Specify the dimension of the sort - let's use terciles
dimA = 0:3/3;dimB = 0:3/3;dimC = c(0,1)
# Run the conditional sort with quantiles computed using method 7 from the quantile function (stats package)
sort.output.con = conditional.sort(Fa,Fb,Fc=NULL,R.Forward,dimA,dimB,dimC,type = 7)
# Run the unconditional sort with quantiles computed using method 7 from the quantile function (stats package)
sort.output.uncon = unconditional.sort(Fa,Fb,Fc=NULL,R.Forward,dimA,dimB,dimC, type = 7)

# Compare the risk and return of each sub-portfolio using PerformanceAnalytics
# Set the scale to 365 (Cryptocurreny markets have no close) and geometric to FALSE (we are using log returns)
table.AnnualizedReturns(sort.output.con$returns, scale = 365, geometric = FALSE, digits = 3)

##                               1     2      3      4      5      6      7
## Annualized Return         3.292 0.407 -1.677 -0.172 -0.524 -0.481 -0.920
## Annualized Std Dev        1.254 1.071  0.943  1.077  1.066  0.892  1.364
## Annualized Sharpe (Rf=0%) 2.625 0.380 -1.779 -0.160 -0.492 -0.540 -0.674
##                               8      9
## Annualized Return         0.598 -0.826
## Annualized Std Dev        1.238  1.130
## Annualized Sharpe (Rf=0%) 0.483 -0.731

table.AnnualizedReturns(sort.output.uncon$returns, scale = 365, geometric = FALSE, digits = 3)

##                               1     2      3      4      5      6      7
## Annualized Return         3.671 0.384 -2.243 -0.172 -1.294 -0.383 -1.079
## Annualized Std Dev        1.237 1.091  0.999  1.120  1.047  0.948  1.605
## Annualized Sharpe (Rf=0%) 2.967 0.352 -2.246 -0.154 -1.236 -0.404 -0.672
##                               8      9
## Annualized Return         0.681 -0.530
## Annualized Std Dev        1.286  1.168
## Annualized Sharpe (Rf=0%) 0.529 -0.454

3.2 portfolio.turnover

The portfolio.turnover function takes the output of either the conditional or unconditional sort and returns a list which includes an xts object with the turnovers for each rebalancing period and the mean turnover for each asset over time.

# Load the portsort package and the pre-loaded data
library(portsort)
library(PerformanceAnalytics)
library(xts)
data(Factors)
# Lagged returns, lagged volumes are stored in the Factors list
R.Forward = Factors[[1]]; R.Lag = Factors[[2]]; V.Lag = Factors[[3]]
Fa = R.Lag; Fb = V.Lag
#Specify the dimension of the sort - let's use terciles
dimA = 0:3/3;dimB = 0:3/3;dimC = c(0,1)
# Run either the conditional or unconditional sort function 
sort.output = conditional.sort(Fa,Fb,Fc=NULL,R.Forward,dimA,dimB,dimC)
# Run the turnover function
turnover.output = portfolio.turnover(sort.output)
turnover.output$`Mean Turnover`

##                       1       2         3         4         5         6
## Mean Turnover 0.7639123 0.82181 0.7487352 0.7684092 0.8473862 0.6953345
##                       7         8         9
## Mean Turnover 0.7852726 0.8293985 0.7290613

3.3 portfolio.frequency

The portfolio.frequency functions takes as input a rank and the output of one of the sorting functions. The function computes how many times a given asset appeared in every sub-portfolio based on the rank input.

# Load the portsort package and the pre-loaded data
library(portsort)
library(PerformanceAnalytics)
library(xts)
data(Factors)
# Lagged returns, lagged volumes are stored in the Factors list
R.Forward = Factors[[1]]; R.Lag = Factors[[2]]; V.Lag = Factors[[3]]
Fa = R.Lag; Fb = V.Lag
#Specify the dimension of the sort - let's use terciles
dimA = 0:3/3;dimB = 0:3/3;dimC = c(0,1)
# Run either the conditional or unconditional sort function 
sort.output = conditional.sort(Fa,Fb,Fc=NULL,R.Forward,dimA,dimB,dimC)
# Run the portfolio.frequency function with rank = 1
# to see which cryptocurrency appeared the most in each sub-portfolio
portfolio.frequency(sort.output, rank = 1)

##                1     2     3     4     5     6     7     8     9
## Ticker      MONA STEEM   XRP   POT  DOGE   ETH  MONA  DOGE   ETH
## Count        212   140   180   159    88   252   185   125   209
## Percentage 0.358 0.236 0.304 0.268 0.148 0.425 0.312 0.211 0.352

# To see which crypto pair appeared the second most, set rank = 2 
portfolio.frequency(sort.output, rank = 2)

##                1     2     3     4     5     6     7     8     9
## Ticker       POT   BTS   LTC   DGD   NXT   LTC   POT STRAT   LTC
## Count        177   120   159   143    80   232   168   116   187
## Percentage 0.298 0.202 0.268 0.241 0.135 0.391 0.283 0.196 0.315

3.4 mean.portfolio.size

# Load the portsort package and the pre-loaded data
library(portsort)
library(PerformanceAnalytics)
library(xts)
data(Factors)
# Lagged returns, lagged volumes are stored in the Factors list
R.Forward = Factors[[1]]; R.Lag = Factors[[2]]; V.Lag = Factors[[3]]
Fa = R.Lag; Fb = V.Lag
#Specify the dimension of the sort - let's use terciles
dimA = 0:3/3;dimB = 0:3/3;dimC = c(0,1)
# Run the conditional sort function 
sort.output.con = conditional.sort(Fa,Fb,Fc=NULL,R.Forward,dimA,dimB,dimC)
# Run the unconditional sort function 
sort.output.uncon = unconditional.sort(Fa,Fb,Fc=NULL,R.Forward,dimA,dimB,dimC)

# Investigate mean portfolio size - conditional sort
portfolio.mean.size(sort.output = sort.output.con)

##           1 2 3 4 5 6 7 8 9
## Mean Size 3 3 3 3 2 3 3 3 3

# Investigate mean portfolio size - unconditional sort
portfolio.mean.size(sort.output = sort.output.uncon)

##              1    2    3    4    5    6    7    8    9
## Mean Size 3.33 2.89 2.77 2.78 2.23 2.99 2.88 2.88 3.24

4 Empirical Example using Cryptocurrencies

4.1 Univariate Sort with Cross-Sectional Momentum

Cross-sectional momentum was popularized by @jegadeesh1993returns. First, an asset's return momentum is computed by summing the asset's log return over the prior s-periods. A skip period k is often used to remove the prior k-periods to account for possible short-term reversals. A cross-sectional momentum strategy using the methodology of @jegadeesh1993returns can easily be constructed using the functionality of portsort.

# Load the portsort package and the pre-loaded data
library(portsort)
library(PerformanceAnalytics)
library(xts)
data(Factors)

# Compute momentum for the 26 cryptocurrency pairs - this will become Factor A (Fa)
# The look-back-period
s = 21
# The skip-period to account for short-term reversals
k = 1
# Define an empty xts to store the momentum calculations
# Lagged returns, lagged volumes are stored in the Factors list
R.Forward = Factors[[1]]; R.Lag = Factors[[2]]; V.Lag = Factors[[3]]
XSMOM = R.Lag
XSMOM[1:nrow(XSMOM),1:ncol(XSMOM)] <- NA
# Compute Momentum
for (i in 1:ncol(R.Lag)){

    for (t in (s + 1):nrow(R.Lag)){
        XSMOM[t,i] =  sum(R.Lag[(t-s):(t-1-k),i])

    }
  }

# Remove the formation period (s) by using na.omit
XSMOM = na.omit(XSMOM)
# Re-subset R.Forward
R.Forward = R.Forward[(s + 1):nrow(R.Forward), ]

# Specify the factors we need - specify Fb and Fc as NULL
Fa = XSMOM; Fb = NULL; Fc = NULL

#Specify the dimension of the sort - let's use quintiles
dimA = 0:5/5
# Run either the conditional or unconditional sort function (for univariate sorts there is no difference)
XSMOM.output = conditional.sort(Fa=Fa,R.Forward=R.Forward,dimA=dimA)

# Let's now investigate the risk and return profiles of the sub-portfolios
table.AnnualizedReturns(XSMOM.output$returns,scale = 365, geometric = FALSE)

##                                 1       2      3      4      5
## Annualized Return         -0.3627 -0.0646 0.2419 0.1328 0.1116
## Annualized Std Dev         0.9870  0.9260 0.9581 0.9833 1.1359
## Annualized Sharpe (Rf=0%) -0.3674 -0.0698 0.2525 0.1350 0.0982

# Investigate the mean sub-portfolio turnover over the sample period
portfolio.turnover(XSMOM.output)$`Mean Turnover`

##                       1         2         3         4         5
## Mean Turnover 0.2036713 0.4486014 0.4667832 0.4121212 0.1786713

# Lets see which crypto occurred the most in each sub-portfolio
portfolio.frequency(XSMOM.output, rank = 1)

##                1     2     3    4     5
## Ticker      GAME   FCT  DOGE  XMR  MONA
## Count        205   180   152  166   212
## Percentage 0.358 0.315 0.266 0.29 0.371

# Evaluate the mean sub-portfolio size
portfolio.mean.size(XSMOM.output)

##           1 2 3 4 5
## Mean Size 6 5 5 5 5

# Following the methodology of Jegadeesh and Titman, 1993, we will now form a long-short, zero-cost portfolio which initiates a long position in the high momentum sub-portfolio (portfolio 5) and a short position in the low momentum sub-portfolio (portfolio 1)

LS.Portfolio = XSMOM.output$returns[,5] + (-1*XSMOM.output$returns[,1])
# Investigate risk and return
table.AnnualizedReturns(LS.Portfolio,scale = 365, geometric = FALSE)

##                                5
## Annualized Return         0.4742
## Annualized Std Dev        1.0840
## Annualized Sharpe (Rf=0%) 0.4375

# We can now plot the back-tested results
chart.CumReturns(LS.Portfolio, geometric = FALSE, main = "XSMOM Long-Short Portfolio")

plot of chunk unnamed-chunk-5

4.2 Double Sort with Lagged Returns and Volume

Following the methodology of @gargano2017value and @dickerson2018value we will use both the conditional and unconditional sorting functions to construct double sorted cryptocurrency portfolios based on past returns and trading volumes.

# Load the portsort package and the pre-loaded data
library(portsort)
library(PerformanceAnalytics)
library(xts)
data(Factors)

# Specify the factors we need - lagged returns and lagged volume denominated in BTC
# Lagged returns, lagged volumes are stored in the Factors list
R.Forward = Factors[[1]]; R.Lag = Factors[[2]]; V.Lag = Factors[[3]]
Fa = R.Lag; Fb = V.Lag

#Specify the dimension of the sort - let's try a 3x3 sort (3 breakpoints for each factor)
dimA = 0:3/3
dimB = 0:3/3
# Run both the conditional and unconditional sort
sort.con = conditional.sort(Fa=Fa,Fb=Fb,R.Forward = R.Forward,dimA=dimA,dimB=dimB)
sort.uncon = unconditional.sort(Fa=Fa,Fb=Fb,R.Forward = R.Forward,dimA=dimA,dimB=dimB)

# Let's now investigate the risk and return profiles of the sub-portfolios
table.AnnualizedReturns(sort.con$returns,scale = 365, geometric = FALSE)

##                                1      2       3       4       5       6
## Annualized Return         3.2925 0.4065 -1.6769 -0.1718 -0.5243 -0.4815
## Annualized Std Dev        1.2545 1.0711  0.9427  1.0767  1.0662  0.8921
## Annualized Sharpe (Rf=0%) 2.6245 0.3795 -1.7789 -0.1596 -0.4917 -0.5397
##                                 7      8       9
## Annualized Return         -0.9201 0.5976 -0.8255
## Annualized Std Dev         1.3645 1.2380  1.1301
## Annualized Sharpe (Rf=0%) -0.6743 0.4827 -0.7305

table.AnnualizedReturns(sort.uncon$returns,scale = 365, geometric = FALSE)

##                                1      2       3       4       5       6
## Annualized Return         3.6706 0.3843 -2.2425 -0.1725 -1.2943 -0.3831
## Annualized Std Dev        1.2372 1.0908  0.9987  1.1201  1.0471  0.9477
## Annualized Sharpe (Rf=0%) 2.9668 0.3523 -2.2455 -0.1540 -1.2361 -0.4042
##                                 7      8       9
## Annualized Return         -1.0786 0.6806 -0.5303
## Annualized Std Dev         1.6054 1.2864  1.1683
## Annualized Sharpe (Rf=0%) -0.6719 0.5291 -0.4539

# Investigate the mean sub-portfolio turnover over the sample period
portfolio.turnover(sort.con)$`Mean Turnover`

##                       1       2         3         4         5         6
## Mean Turnover 0.7639123 0.82181 0.7487352 0.7684092 0.8473862 0.6953345
##                       7         8         9
## Mean Turnover 0.7852726 0.8293985 0.7290613

portfolio.turnover(sort.uncon)$`Mean Turnover`

##                       1         2         3         4         5         6
## Mean Turnover 0.7341966 0.7498595 0.6931482 0.7365374 0.7418494 0.6301815
##                       7         8         9
## Mean Turnover 0.7634867 0.7728258 0.6894483

# Lets see which crypto occurred the most in each sub-portfolio
portfolio.frequency(sort.con, rank = 1)

##                1     2     3     4     5     6     7     8     9
## Ticker      MONA STEEM   XRP   POT  DOGE   ETH  MONA  DOGE   ETH
## Count        212   140   180   159    88   252   185   125   209
## Percentage 0.358 0.236 0.304 0.268 0.148 0.425 0.312 0.211 0.352

portfolio.frequency(sort.uncon, rank = 1)

##                1    2     3     4    5     6     7    8     9
## Ticker      MONA   SC   XRP   POT DOGE   ETH  MONA  BTS   ETH
## Count        216  148   186   161  148   252   194  148   209
## Percentage 0.364 0.25 0.314 0.272 0.25 0.425 0.327 0.25 0.352

# Evaluate the mean sub-portfolio size
portfolio.mean.size(sort.con)

##           1 2 3 4 5 6 7 8 9
## Mean Size 3 3 3 3 2 3 3 3 3

portfolio.mean.size(sort.uncon)

##              1    2    3    4    5    6    7    8    9
## Mean Size 3.33 2.89 2.77 2.78 2.23 2.99 2.88 2.88 3.24

# Following the methodology of Gargano et al. 2017 and Bianchi and Dickerson (2018), we will now form a long-short, zero-cost portfolio which initiates a long position in the low prior return/low volume sub-portfolio (sub-portfolio 1) and a short position in the low return/high volume sub-portfolio (sub-portfolio 7).

Conditonal.LS.Portfolio = sort.con$returns[,1] + (-1*sort.con$returns[,9])
Unconditonal.LS.Portfolio = sort.uncon$returns[,1] + (-1*sort.uncon$returns[,9])

Portfolios = cbind(Conditonal.LS.Portfolio,Unconditonal.LS.Portfolio)
colnames(Portfolios) = c("Conditional","Unconditional")
# Chart the logarithmic cumulative returns
chart.CumReturns(Portfolios, geometric = FALSE, legend.loc = "topleft",
                  main = "Sorting Comparison")

plot of chunk unnamed-chunk-6

# Investigate risk and return
table.AnnualizedReturns(Portfolios,scale = 365, geometric = FALSE)

##                           Conditional Unconditional
## Annualized Return              4.1180        4.2009
## Annualized Std Dev             1.3927        1.3683
## Annualized Sharpe (Rf=0%)      2.9569        3.0700