Thursday, November 22, 2012

Escaping the simplex, part 1

Before tackling the main subject, two quick notes:

  • I did not post for quite a while in part because I followed the Coursera online course Introduction to Computational Finance and Financial Econometrics.  It was a nice refresher, extremely well presented, and including some R.  This did consume enough time to make posting cumbersome though.
  • I started using R studio, and I am quite happy with it under Windows.  This has also impacted my code, as R studio automatically stacks your plots.
The main topic of this post is to motivate an escape from the simplex, i.e. allow for weights either larger than 1 or smaller than 0.  Generally, this implies going short on one asset deemed less favorable and shift the short proceed to a deemed more favorable asset.

The main reason to do so is that the Best Constant Rebalanced Portfolio (BCRP) may lie outside.  This is a well known feature in the more traditional mean-variance analysis, where the constrained portfolio is sparser than the unconstrained one.  We have a similar effect for the BCRP, the constrained optimization results in a sparse portfolio, i.e. many weights are (close to) zero.

We can illustrate this by calculating the weights of the constrained BCRP.  The BCRP function in logopt is currently only supporting optimization on the simplex, and we can reuse the code originally presented in Universal portfolio, part 10  to accumulate the portfolio weights for BCRP across multiple combinations of the reference data.  This shows that the proportion of (close to) zero weights is high.

The code below as is runs for about 7 hours because of the large number of combinations, you can reduce the size of the problem if wanted by editing TupleSizes.

Note that in the past Syntax Highlighter had problems with R-bloggers, you might need to go to the original page to view the code.

# assume the use of RStudio (no explicit management of graphic devices)

library(logopt)
data(nyse.cover.1962.1984)
x <- coredata(nyse.cover.1962.1984)
nStocks <- dim(x)[2] 

EvaluateOnAllTuples <- function(ListName, TupleSizes, fFinalWealth, ...) {
  if (exists(ListName) == FALSE) {
    LocalList <- list()
    for (i in 1:length(TupleSizes)) {
      TupleSize <- TupleSizes[i]
      ws <- combn(x=(1:nStocks), m=(TupleSize), FUN=fFinalWealth, simplify=TRUE, ...)
      LocalList[[i]] <- ws
    }
    assign(ListName, LocalList, pos=parent.frame())
  }
}

TupleSizes <- c(2,3,4,5)

# evaluate the sorted coefficients for best CRP
SortedOptB <- function(cols, ...) {
  x <- list(...)[[1]] ;
  x <- x[,cols]
  b <- bcrp.optim(x)
  return(sort(b))
}

TupleSizes <- c(2,3,4,5)
EvaluateOnAllTuples("lSortedOptBSmall", TupleSizes, SortedOptB, x)

TupleSizes <- c(nStocks-3,nStocks-2,nStocks-1,nStocks)
EvaluateOnAllTuples("lSortedOptBLarge", TupleSizes, SortedOptB, x)

Colors <- c("red", "green", "blue", "brown", "cyan", "darkred","darkgreen")

for (iL in 1:length(lSortedOptBSmall)) {
  nCoeff <- nrow(lSortedOptBSmall[[iL]])
  E <- ecdf(lSortedOptBSmall[[iL]][1,])
  Title <- sprintf("Cumulative PDF for all sorted weights for BCRP of %d assets", nCoeff)
  plot(E, col=Colors[1], xlim=c(0,1), pch=".", main = Title, xlab = "relative weight")
  cat(sprintf("Combinations of %d assets\n", nCoeff))
  for (iB in 1:nCoeff) {
    E <- ecdf(lSortedOptBSmall[[iL]][iB,])
    lines(E, col=Colors[iB], pch=".")
    if (iB < nCoeff) {
       cat(sprintf("Percent of portfolio with %d weight(s) smaller than 0.001: %f\n", iB, E(0.001) ))
    }
  }
}

SmallOf2 <- lSortedOptBSmall[[1]][1,]
hist(SmallOf2,n=25,probability=TRUE, main="Histogram of smallest weight for two assets", 
     xlab="Smallest coefficient")
lines(density(SmallOf2, bw=0.02),col="blue")

# for the large ones, we show only the largest coefficients and show them in 
# opposite order to find how many coefficients are not always insiginifcant
for (iL in 1:length(lSortedOptBLarge)) {
  nCoeff <- nrow(lSortedOptBLarge[[iL]])
  E <- ecdf(lSortedOptBLarge[[iL]][nCoeff,])
  Title <- sprintf("Cumulative PDF for largest sorted weights for BCRP of %d assets", nCoeff)
  plot(E, col=Colors[1], xlim=c(0,1), pch=".", main = Title, xlab = "relative weight")
  cat(sprintf("Combinations of %d assets\n", nCoeff))
  for (iB in 1:nCoeff) {
    iCoeff <- nCoeff-iB+1
    E <- ecdf(lSortedOptBLarge[[iL]][iCoeff,])
    if (E(0.001) < 0.9999) {
      lines(E, col=Colors[iB], pch=".")
      cat(sprintf("Percent of portfolio with %d weight(s) smaller than 0.001: %f\n", iCoeff, E(0.001) ))
    }
  }
}

Only a subset of the results are shown below, first the cumulative probability function for the weights of the coefficients for all combinations of 5 assets.  The graph shows that the smallest coefficient is almost always 0 and the second coefficient is also very small all of the time.  The textual output shows that the second coefficient is insignificant 91% of the time or equivalently 91% of the best portfolio only uses 3 of the 5 possible assets.


Combinations of 5 assets
Percent of portfolio with 1 weight(s) smaller than 0.001: 0.996499
Percent of portfolio with 2 weight(s) smaller than 0.001: 0.914086
Percent of portfolio with 3 weight(s) smaller than 0.001: 0.499448
Percent of portfolio with 4 weight(s) smaller than 0.001: 0.067975



The density for the two asset case shows clearly that there is a high peak at zero.


Finally the textual output and the ECDF shows that for 33 possible assets, only up to 7 are present in the BCRP


Combinations of 33 assets
Percent of portfolio with 33 weight(s) smaller than 0.001: 0.000000
Percent of portfolio with 32 weight(s) smaller than 0.001: 0.000000
Percent of portfolio with 31 weight(s) smaller than 0.001: 0.000000
Percent of portfolio with 30 weight(s) smaller than 0.001: 0.000000
Percent of portfolio with 29 weight(s) smaller than 0.001: 0.065126
Percent of portfolio with 28 weight(s) smaller than 0.001: 0.707843
Percent of portfolio with 27 weight(s) smaller than 0.001: 0.947059


All this points to the fact that most of the weights of the BCRP end up on the boundary of the simplex, and that removing that specific constraint would get an even better solution, at least in term of terminal wealth.  We'll investigate further this in future posts.