logopt: a journey in R, Python, finance and open source: 2012

Thursday, November 22, 2012

Escaping the simplex, part 1

Before tackling the main subject, two quick notes:

I did not post for quite a while in part because I followed the Coursera online course Introduction to Computational Finance and Financial Econometrics. It was a nice refresher, extremely well presented, and including some R. This did consume enough time to make posting cumbersome though.
I started using R studio, and I am quite happy with it under Windows. This has also impacted my code, as R studio automatically stacks your plots.

The main topic of this post is to motivate an escape from the simplex, i.e. allow for weights either larger than 1 or smaller than 0. Generally, this implies going short on one asset deemed less favorable and shift the short proceed to a deemed more favorable asset.

The main reason to do so is that the Best Constant Rebalanced Portfolio (BCRP) may lie outside. This is a well known feature in the more traditional mean-variance analysis, where the constrained portfolio is sparser than the unconstrained one. We have a similar effect for the BCRP, the constrained optimization results in a sparse portfolio, i.e. many weights are (close to) zero.

We can illustrate this by calculating the weights of the constrained BCRP. The BCRP function in logopt is currently only supporting optimization on the simplex, and we can reuse the code originally presented in Universal portfolio, part 10 to accumulate the portfolio weights for BCRP across multiple combinations of the reference data. This shows that the proportion of (close to) zero weights is high.

The code below as is runs for about 7 hours because of the large number of combinations, you can reduce the size of the problem if wanted by editing TupleSizes.

Note that in the past Syntax Highlighter had problems with R-bloggers, you might need to go to the original page to view the code.

# assume the use of RStudio (no explicit management of graphic devices)

library(logopt)
data(nyse.cover.1962.1984)
x <- coredata(nyse.cover.1962.1984)
nStocks <- dim(x)[2] 

EvaluateOnAllTuples <- function(ListName, TupleSizes, fFinalWealth, ...) {
  if (exists(ListName) == FALSE) {
    LocalList <- list()
    for (i in 1:length(TupleSizes)) {
      TupleSize <- TupleSizes[i]
      ws <- combn(x=(1:nStocks), m=(TupleSize), FUN=fFinalWealth, simplify=TRUE, ...)
      LocalList[[i]] <- ws
    }
    assign(ListName, LocalList, pos=parent.frame())
  }
}

TupleSizes <- c(2,3,4,5)

# evaluate the sorted coefficients for best CRP
SortedOptB <- function(cols, ...) {
  x <- list(...)[[1]] ;
  x <- x[,cols]
  b <- bcrp.optim(x)
  return(sort(b))
}

TupleSizes <- c(2,3,4,5)
EvaluateOnAllTuples("lSortedOptBSmall", TupleSizes, SortedOptB, x)

TupleSizes <- c(nStocks-3,nStocks-2,nStocks-1,nStocks)
EvaluateOnAllTuples("lSortedOptBLarge", TupleSizes, SortedOptB, x)

Colors <- c("red", "green", "blue", "brown", "cyan", "darkred","darkgreen")

for (iL in 1:length(lSortedOptBSmall)) {
  nCoeff <- nrow(lSortedOptBSmall[[iL]])
  E <- ecdf(lSortedOptBSmall[[iL]][1,])
  Title <- sprintf("Cumulative PDF for all sorted weights for BCRP of %d assets", nCoeff)
  plot(E, col=Colors[1], xlim=c(0,1), pch=".", main = Title, xlab = "relative weight")
  cat(sprintf("Combinations of %d assets\n", nCoeff))
  for (iB in 1:nCoeff) {
    E <- ecdf(lSortedOptBSmall[[iL]][iB,])
    lines(E, col=Colors[iB], pch=".")
    if (iB < nCoeff) {
       cat(sprintf("Percent of portfolio with %d weight(s) smaller than 0.001: %f\n", iB, E(0.001) ))
    }
  }
}

SmallOf2 <- lSortedOptBSmall[[1]][1,]
hist(SmallOf2,n=25,probability=TRUE, main="Histogram of smallest weight for two assets", 
     xlab="Smallest coefficient")
lines(density(SmallOf2, bw=0.02),col="blue")

# for the large ones, we show only the largest coefficients and show them in 
# opposite order to find how many coefficients are not always insiginifcant
for (iL in 1:length(lSortedOptBLarge)) {
  nCoeff <- nrow(lSortedOptBLarge[[iL]])
  E <- ecdf(lSortedOptBLarge[[iL]][nCoeff,])
  Title <- sprintf("Cumulative PDF for largest sorted weights for BCRP of %d assets", nCoeff)
  plot(E, col=Colors[1], xlim=c(0,1), pch=".", main = Title, xlab = "relative weight")
  cat(sprintf("Combinations of %d assets\n", nCoeff))
  for (iB in 1:nCoeff) {
    iCoeff <- nCoeff-iB+1
    E <- ecdf(lSortedOptBLarge[[iL]][iCoeff,])
    if (E(0.001) < 0.9999) {
      lines(E, col=Colors[iB], pch=".")
      cat(sprintf("Percent of portfolio with %d weight(s) smaller than 0.001: %f\n", iCoeff, E(0.001) ))
    }
  }
}

Only a subset of the results are shown below, first the cumulative probability function for the weights of the coefficients for all combinations of 5 assets. The graph shows that the smallest coefficient is almost always 0 and the second coefficient is also very small all of the time. The textual output shows that the second coefficient is insignificant 91% of the time or equivalently 91% of the best portfolio only uses 3 of the 5 possible assets.

Combinations of 5 assets
Percent of portfolio with 1 weight(s) smaller than 0.001: 0.996499
Percent of portfolio with 2 weight(s) smaller than 0.001: 0.914086
Percent of portfolio with 3 weight(s) smaller than 0.001: 0.499448
Percent of portfolio with 4 weight(s) smaller than 0.001: 0.067975

The density for the two asset case shows clearly that there is a high peak at zero.

Finally the textual output and the ECDF shows that for 33 possible assets, only up to 7 are present in the BCRP

Combinations of 33 assets
Percent of portfolio with 33 weight(s) smaller than 0.001: 0.000000
Percent of portfolio with 32 weight(s) smaller than 0.001: 0.000000
Percent of portfolio with 31 weight(s) smaller than 0.001: 0.000000
Percent of portfolio with 30 weight(s) smaller than 0.001: 0.000000
Percent of portfolio with 29 weight(s) smaller than 0.001: 0.065126
Percent of portfolio with 28 weight(s) smaller than 0.001: 0.707843
Percent of portfolio with 27 weight(s) smaller than 0.001: 0.947059

All this points to the fact that most of the weights of the BCRP end up on the boundary of the simplex, and that removing that specific constraint would get an even better solution, at least in term of terminal wealth. We'll investigate further this in future posts.

Sunday, September 23, 2012

Universal portfolio, part 11

First an apology, the links to the Universal Portfolio paper have stopped working. This is because the personal webpage of Thomas Cover at Stanford has been taken down, but fortunately the content moved elsewhere. The new link is Universal Portfolio and hopefully this one will be stable.

Note that there are many available copies on the web but most (like this one) are for something that seems to be a slighly reworked version dated October 23 1996. The text appears mostly identical to the published version, but it does not include the figures.

In the rest of this post, I discuss the data used by Cover. That data is included in logopt as nyse.cover.1962.1984. It contains the relative prices for 36 NYSE stocks between 1962 and 1984.

> range(index(nyse.cover.1962.1984))
[1] "1962-07-03" "1984-12-31"

> colnames(nyse.cover.1962.1984)

[1] "ahp" "alcoa" "amerb" "arco" "coke" "comme" "dow" "dupont" "espey" "exxon" "fisch" "ford" "ge" "gm" "gte" "gulf" "hp"

[18] "ibm" "inger" "iroqu" "jnj" "kimbc" "kinar" "kodak" "luken" "meico" "merck" "mmm" "mobil" "morris" "pandg" "pills" "schlum" "sears"

[35] "sherw" "tex"

The names are not stickers, some guessing and with some help from an other person using the series gives the table below (and if anybody knows about the one without expansion yet. please post a comment).

Abbreviation	Company name	Current ticker
ahp	?	?
alcoa	Alcoa	AA
amerb	American Brands aka Fortune Brands	-
arco	?	?
coke	Coca-Cola	KO
comme	Commercial Metals	CMC
dow	Dow Chemicals	DOW
dupont	DuPont	DD
espey	Espey Manufacturing	ESP
exxon	Exxon Mobil	XOM
coke	Coca-Cola	KO
fisch	Fischbach Corp	-
ford	Ford	F
ge	General Electric	GE
gm	General Motors	GM*
gte	GTE Corporation	-
gulf	Gulf Oil (now Chevron)	CVX
hp	Hewlett-Packard	HPQ
ibm	IBM	IBM
inger	Ingersoll-Rand	IR
iroq	Iroquois Brands	-
jnj	Johnson & Johnson	JNJ
kimbc	Kimberly-Clark	KMB
kinar	Kinark?	-
kodak	Eastman Kodak	EKDKQ
luken	Lukens?	-
meico	?	?
merck	Merck	MRK
mmm	3M	MMM
mobil	Exxon Mobil	XOM
morris	Philip Morris	PM
pandg	Procter & Gamble	PG
pills	Pillsbury, now part of General Mills	-
schlum	Schlumberger	SLB
sears	Sears Holdings	SHLD
sherw	Sherwin-Williams	SHW
tex	Texaco, now Chevron	CVX

There is a lot of diversity across the different stocks, we saw that in two ways:

by showing the global time evolution of all stocks in time
by showing the growth rate at two times separated by N market days (shown as a price relative between the two dates).

# Some statistics on the NYSE series

library(logopt)
x <- coredata(nyse.cover.1962.1984)
w <- logopt:::x2w(x)
nDays <- dim(x)[1]
nStocks <- dim(x)[2] 
Days <- 1:nDays
iWin <- 1 ; plot(1:10)
Time <- index(nyse.cover.1962.1984)

# for each stock calculate:
# - min, max
# - average geometric return

MaxFinal <- max(w[nDays,])
MinFinal <- min(w[nDays,])
MaxAll <- max(w)
MinAll <- min(w)

if(length(dev.list()) < iWin) { x11() } ; iWin <- iWin + 1 ; dev.set(iWin) ;
plot(Time, w[,1], col="gray", ylim=range(w), log="y", type="l")
for (i in 1:nStocks) {
 lines(Time, w[,i], col="gray")
 if (w[nDays,i] == MaxFinal) { cat(sprintf("Stock with best final value: %s finishing at %.2f\n", colnames(w)[i], MaxFinal)) ; iMax <- i }
 if (w[nDays,i] == MinFinal) { cat(sprintf("Stock with worst final value: %s finishing at %.2f\n", colnames(w)[i], MinFinal)) ; iMin <- i }
 if (max(w[,i]) == MaxAll) { cat(sprintf("Stock with best peak value: %s at %.2f\n", colnames(w)[i], MaxAll)) }
 if (min(w[,i]) == MinAll) { cat(sprintf("Stock with worst valley value: %s at %.2f\n", colnames(w)[i], MinAll)) } 
}

lines(Time, w[,iMax], col="green")
lines(Time, w[,iMin], col="red")
lines(Time, apply(w,1,mean), col="blue")
grid()

# do a summary across n quotes
nDelta <- 1200
wD <- w[(nDelta+1):nDays,] / w[1:(nDays-nDelta),]
Time <- Time[1:(nDays-nDelta)]
MaxDAll <- max(wD)
MinDAll <- min(wD)
if(length(dev.list()) < iWin) { x11() } ; iWin <- iWin + 1 ; dev.set(iWin) ;
plot(Time, wD[,1], col="gray", ylim=range(wD), log ="y", type="l")
for (i in 1:nStocks) {
 lines(Time, wD[,i], col="gray")
 if (max(wD[,i]) == MaxDAll) { cat(sprintf("Stock with best gain on %s days: %s at %.2f\n", nDelta, colnames(w)[i], MaxDAll)) }
 if (min(wD[,i]) == MinDAll) { cat(sprintf("Stock with worst lost on %s days: %s at %.2f\n", nDelta, colnames(w)[i], MinDAll)) } 
} 
lines(Time, apply(wD,1,mean), col="blue")

grid()

This gives the following textual answer and graphs. Note that there are many alternate ways to present this information, in particular the package PerformanceAnalytics.

Stock with worst final value: dupont finishing at 3.07
Stock with worst valley value: meico at 0.26
Stock with best final value: morris finishing at 54.14
Stock with best peak value: schlum at 90.12
Stock with best gain on 1200 days: espey at 15.84
Stock with worst lost on 1200 days: meico at 0.07

This sequence forms a nice reference covering a long period of time, and has been used in many studies of portfolio selection algorithms. But the series has a number of serious problems:

Survivorship bias
The time range corresponds to a time where quotes were not yet decimal.

Friday, August 10, 2012

Universal portfolio, part 10

Part 9 compared the wealth of Universal against other portfolio selection algorithms by using the experimental cumulative distribution function of the relative wealth. This leads to a very compact representation, but it completely hides the absolute level evolution as the number of stocks in a portfolio increases.

The code below uses a slightly different approach, it uses a scatterplot where the final absolute wealth of two different algorithms are used for the x and y axes. The main diagonal corresponds to both algorithms having an equal final wealth. To provide more information, side graphs with the marginal probability of final wealth are included.

Finally, an other reference is added, the best CRP, using the optimization code in package logopt.

As in part 9, the code recalculates the value of Universal final wealth across all 4-tuples and thus takes one day to run if you did not have yet the results in your environment, be warned. Also, the code presentation uses Syntax Highlighter, I am still experimenting with the best way to present R code.

# Performance of Universal compared to some references

library(logopt)
x <- coredata(nyse.cover.1962.1984)
w <- logopt:::x2w(x)
nDays <- dim(x)[1]
nStocks <- dim(x)[2] 
Days <- 1:nDays
iWin <- 1 ; plot(1:10)

TupleSizes <- c(2,3,4)
EvaluateOnAllTuples <- function(ListName, TupleSizes, fFinalWealth, ...) {
 if (exists(ListName) == FALSE) {
  LocalList <- list()
  for (i in 1:length(TupleSizes)) {
   TupleSize <- TupleSizes[i]
   ws <- combn(x=(1:nStocks), m=(TupleSize), FUN=fFinalWealth, simplify=TRUE, ...)
   LocalList[[i]] <- ws
  }
  assign(ListName, LocalList, pos=parent.frame())
 }
}

UniversalFinalWealth <- function(cols, ...) {
 x <- list(...)[[1]] ; n <- list(...)[[2]]
 uc <- universal.cover(x[,cols], 20)
 return(uc[length(uc)])
}
EvaluateOnAllTuples("lUniversalFinalWealth", TupleSizes, UniversalFinalWealth, x, 20)

BestStockFinalWealth <- function(cols, ...) { 
 w <- list(...)[[1]]
 return(max(w[nDays,cols])) 
}
EvaluateOnAllTuples("lBestStockFinalWealth", TupleSizes, BestStockFinalWealth, w)

UcrpFinalWealth <- function(cols, ...) {
 x <- list(...)[[1]]
 ucrp <- crp(x[,cols])
 return(ucrp[length(ucrp)]) 
}
EvaluateOnAllTuples("lUcrpFinalWealth", TupleSizes, UcrpFinalWealth, x)

BhFinalWealth <- function(cols, ...) {
 x <- list(...)[[1]]
 ubh <- bh(x[,cols])
 return(ubh[length(ubh)]) 
}
EvaluateOnAllTuples("lBhFinalWealth", TupleSizes, BhFinalWealth, x)

BestCrpFinalWealth <- function(cols, ...) { 
 x <- list(...)[[1]]
 bopt <- bcrp.optim(x[,cols])
 bcrp <- crp(x[,cols],bopt)
 return(bcrp[length(bcrp)]) 
}
EvaluateOnAllTuples("lBestCrpFinalWealth", TupleSizes, BestCrpFinalWealth, x)

Colors <- c("blue","green","red")

CompareFinalAbsoluteWealth <- function( L0, L1, MainString, TupleSizes=TupleSizes, clip=0.01,
                           Colors=c("blue","green","red"), PlotChar = 19, PlotSize = 0.5,
                                   XLabel="Universal wealth", YLabel="Other wealth") {
 nLines <- min(length(L0),length(L1))
 if (clip > 0) { MaxUp <- quantile(c(L0[[nLines]], L1[[nLines]]), 1-clip) }
 else          { MaxUp <- max(L0, L1) }
 XLims = c(0, MaxUp)
 layout( matrix( c(0,2,2,1,3,3,1,3,3),ncol=3) )
 d.x <- density(L0[[1]])
 plot(d.x$x, d.x$y, xlim=XLims, type='l', col=Colors[1], main="Density on x axis", xlab="", ylab="")
 grid()
 for (i in 1:nLines) {
  d.x <- density(L0[[i]])
  lines(d.x$x, d.x$y, type='l', col=Colors[i])
  abline(v=mean(L0[[i]]), col=Colors[i])
 }
 d.y <- density(L1[[1]])
 plot(d.y$y, d.y$x, ylim=XLims, xlim=rev(range(d.y$y)), type='l', col=Colors[1], , main="Density on y axis", xlab="", ylab="")
 grid()
 for (i in 1:nLines) {
  d.y <- density(L1[[i]])
  lines(d.y$y, d.y$x, type='l', col=Colors[i])
  abline(h=mean(L1[[i]]), col=Colors[i])
 }
 plot(L0[[nLines]], L1[[nLines]], col=Colors[nLines], pch=PlotChar, cex=PlotSize, 
  xlab=XLabel, ylab=YLabel, xlim= XLims, ylim= XLims, type="p", main=MainString) 
 for (j in 1:nLines) {
  i <- nLines - j + 1
  points(L0[[i]], L1[[i]], col=Colors[i], pch=PlotChar, cex=PlotSize) 
  rug(L0[[i]], col=Colors[i],ticksize=0.01*i)
  rug(L1[[i]], col=Colors[i],ticksize=0.01*i, side=2)
 }
 abline(0,1,col="lightgray",lwd=2)
 legend("topright", legend=c("2 stocks","3 stocks","4 stocks"), pch=PlotChar, col=Colors, bg="white")
 grid()
}


if(length(dev.list()) < iWin) { x11() } ; iWin <- iWin + 1 ; dev.set(iWin) ;
CompareFinalAbsoluteWealth(lUniversalFinalWealth, lBestStockFinalWealth, "Universal relative to best stock final wealth", TupleSizes) 

if(length(dev.list()) < iWin) { x11() } ; iWin <- iWin + 1 ; dev.set(iWin) ;
CompareFinalAbsoluteWealth(lUniversalFinalWealth, lBhFinalWealth, "Universal relative to uniform buy and hold final wealth", TupleSizes) 

if(length(dev.list()) < iWin) { x11() } ; iWin <- iWin + 1 ; dev.set(iWin) ;
CompareFinalAbsoluteWealth(lUniversalFinalWealth, lUcrpFinalWealth, "Universal relative to uniform CRP final wealth", TupleSizes) 

if(length(dev.list()) < iWin) { x11() } ; iWin <- iWin + 1 ; dev.set(iWin) ;
CompareFinalAbsoluteWealth(lUniversalFinalWealth, lBestCrpFinalWealth, "Universal relative to best CRP final wealth", TupleSizes) 

# a function to compare the ECDF of two lists of final wealths
CompareFinalRelativeWealth <- function( L0, L1, MainString, TupleSizes=TupleSizes, 
                              Colors=c("blue","green","red"), PlotChar = ".",
                                   XLabel="Ratio of final wealths", YLabel="Cumulative probability") {
 nLines <- min(length(L0),length(L1))
 LR <- list() ; XLims = c()
 for(i in 1:nLines) { LR[[i]] <- L0[[i]]/L1[[i]] ; XLims <- range(XLims, LR[[i]]) }
 plot(ecdf(L0[[1]]/L1[[1]]), pch=PlotChar, col=Colors[1], main=MainString,
  xlab=XLabel, ylab=YLabel, xlim= XLims)
 abline(v=1,col="gray",lwd=2)
 for (i in 1:nLines) {
  lines(ecdf(L0[[i]]/L1[[i]]), pch=PlotChar, col=Colors[i])
 }
 legend("bottomright", legend=c("2 stocks","3 stocks","4 stocks"), fill=Colors)
 grid()
 # show best relative wealth and its composition
 for (i in 1:length(TupleSizes)) {
  TupleSize <- TupleSizes[i]
  BestTuple <- which.max(LR[[i]])
  BestStocks <- combn(1:nStocks, TupleSize)[,BestTuple]
  cat(sprintf("Max final relative wealth %.4f for stocks: ", LR[[i]][BestTuple]))
  cat(colnames(x)[BestStocks]) ; cat("\n")
 }
}

if(length(dev.list()) < iWin) { x11() } ; iWin <- iWin + 1 ; dev.set(iWin) ;
CompareFinalRelativeWealth(lUniversalFinalWealth, lBestCrpFinalWealth, "Universal relative to best CRP final wealth", TupleSizes)

The first graph compares Universal to the best stock in the portfolio. It also shows that the density function is not an ideal solution when the set of possible outcomes is discrete. The ecdf function would be a better choice in this case. The plot is also slightly misleading when compared to the equivalent plot of relative wealth in part 9. The problem is that points overlap and so the 2D density cannot be assessed except for the set of blue points.

The next graph shows a comparison of Universal and Uniform CRP. The graph works reasonably well in this case because both axes have smooth marginal distributions. As for the corresponding graph of relative wealth, we can clearly see that UCRP is simply better, but now we can also see that this is especially true for the best performing portfolios.

The comparison between Universal and Uniform Buy and Hold below also works well. And this time also we can see that the ratio gets better for better performing portfolios, but now with Universal the better algorithm

Finally the comparison between the Best CRP (in hindsight) and Universal shows that BCRP is always better. Because the probability density function of the best CRP is less smooth, the comparison is not perfect, but it seems that the ratio degrades as the number of stocks in the portfolio increases. This is expected given that the performance bound of Universal decreases as the number of stocks increase.

Plotting the ECDF of the relative wealth shows this effect much more vividly, clearly illustrating the advantage of looking at the same data in different fashions.

Wednesday, July 25, 2012

Universal portfolio, part 9

Part 8 was discussing the distribution of the absolute wealth of the Universal Portfolio across all possible tuples of length 2, 3 and 4.

However, comparing the absolute wealth against some reference, especially against simple portfolio selection algorithm provides a better view of the exact performance of the Universal algorithm. Because we want to compute and compute multiple references, the code uses functions to be more generic. The code remains terse (88 lines, including comments) and could be even terser.

Writing the code in this fashion shows again the strength of R, some specific aspects used here:

Passing functions as parameters of other functions
Using frames to access information outside the function scope, including assigning new values.
Use of ... to write variadic functions (I am still learning that, the current code works but is not elegant)
Use of ::: to access non exported functions from a package

The code recalculates the value of Universal final wealth across all 4-tuples and thus takes one day to run, be warned.

The first graph shows the final wealth of Universal versus the final wealth of the best stock in the tuple. Contrary to the absolute wealth, the relative wealth decreases as the number of stocks in the tuple increases and is also generally significantly less than 1. The red curve shows that for about 70% of the 4-tuples, Universal final wealth is below the wealth of the best stock in the tuple.

Obviously, it is impossible to know in advance which stock will have the best final wealth, so the comparison is slightly unfair. The next two comparisons however are against two of the simplest causal selection algorithms:

Equally weighted buy and hold, UBH for short, U for uniform
Equally weighted Constant Rebalanced Portfolio, usually UCRP for Uniform CRP. This one is interesting as Universal itself is based on a weighted combination of all CRP.

As BH is by construction worse than the best stock, we expect BH to fare better and indeed it does. First Universal is generally better than BH, and second the relative performance increases as the number of stocks in the tuple increase. In general, Universal is much better than UBH. The cumulative distribution does still shows a rather long and thin tail.

Unfortunately this does not carry to UCRP, UCRP happens to be a very good performer, and a very tough reference to beat for any portfolio selection algorithm.

Against UCRP:

Universal generally loses to UCRP (around 90% of the time) and sometimes pretty badly
Increasing the tuple size increases the relative performance when UCRP is better than Universal and decreases the performance when the opposite is true.

Interestingly, the textual output shows that the composition of the best absolute or relative Universal portfolios differ significantly for the comparison against UCRP.

Max final wealth 78.4742 for stocks: comme kinar

Max final wealth 111.6039 for stocks: comme kinar meico

Max final wealth 138.5075 for stocks: comme espey kinar meico

Max final relative wealth 4.3379 for stocks: iroqu kinar

Max final relative wealth 5.6186 for stocks: espey iroqu kinar

Max final relative wealth 5.2758 for stocks: coke espey iroqu kinar

Max final relative wealth 5.9302 for stocks: iroqu kinar

Max final relative wealth 8.6374 for stocks: espey iroqu kinar

Max final relative wealth 8.8390 for stocks: espey iroqu kinar meico

Max final relative wealth 1.3043 for stocks: dupont morris

Max final relative wealth 1.1980 for stocks: dupont morris sears

Max final relative wealth 1.1330 for stocks: dupont morris schlum sears

Wednesday, July 18, 2012

Universal portfolio, part 8

We extend the analysis of part 7 by calculating the final wealth for all tuples of 3 and 4 stocks, this is a simple extension but it also shows the most important problem of the Universal portfolio algorithm, its exponential complexity in the number of stocks.

The first part shows the cumulative distribution of the absolute final wealth for all possible tuples of 2, 3 and 4 stocks. The results are qualitatively the same as before, a thin tail of good results. It also a shows increase in absolute wealth as the number of stocks increase.

The code remains quite compact, R is simply wonderful for this type of analysis.

library(logopt)
x <- coredata(nyse.cover.1962.1984)
w <- logopt:::x2w(x)
nDays <- dim(x)[1]
nStocks <- dim(x)[2]
Days <- 1:nDays
iWin <- 1 ; plot(1:10)

FinalWealth <- function(cols) {
xij <- x[,cols]
uc <- universal.cover(xij, 20)
return(uc[length(uc)])
}

TupleSizes <- c(2,3,4)
if (exists("lws") == FALSE) {
lws <- list()
for (i in 1:length(TupleSizes)) {
TupleSize <- TupleSizes[i]
ws <- combn(1:nStocks, TupleSize, FinalWealth)
lines(ecdf(ws), pch=".", col=Colors[i])
lws[[i]] <- ws
}
}

# show the ecdf
if(length(dev.list()) < iWin) { x11() } ; iWin <- iWin + 1 ; dev.set(iWin) ;
plot(ecdf(lws[[3]]), pch=".", col="red", main="Cumulative probability of final wealth",
xlab="Final wealth", ylab="Cumulative probability")
lines(ecdf(lws[[2]]), pch=".", col="green")
lines(ecdf(lws[[1]]), pch=".", col="blue")
legend("bottomright", legend=c("2 stocks","3 stocks","4 stocks"), fill=c("blue","green","red"))
grid()

# show best absolute wealth and its composition
for (i in 1:length(TupleSizes)) {
TupleSize <- TupleSizes[i]
BestTuple <- which.max(lws[[i]])
BestStocks <- combn(1:nStocks, TupleSize)[,BestTuple]
cat(sprintf("Max final wealth %.4f for stocks: ", lws[[i]][BestTuple]))
cat(colnames(x)[BestStocks]) ; cat("\n")
}

This gives the following output and graph

Max final wealth 78.4742 for stocks: comme kinar
Max final wealth 111.6039 for stocks: comme kinar meico
Max final wealth 138.5075 for stocks: comme espey kinar meico

The highlighted line of code is there for a reason, calculating the final wealth for all 4-tuples takes about one full day on my machine. So the code is only run once, then the magic of automatically saving the environment takes care of avoiding recalculating the result.

But why is the running time so long? The main problem of the Universal Portfolio algorithm is that it needs to evaluate an integral on the simplex, and the curse of dimensionality rears its ugly head. The general approach for a numerical integration in n dimensions is exponential in the number of dimensions. Furthermore we calculate the wealth for all possible n-tuples and this increases also exponentially in n when n is much smaller than the total number of stocks concerned.

As usual R code can provide a more vivid representation, the following code generates a plot showing the complexity evolution as the number of stocks increase for 36 stocks and a grid spacing of 0.05. The code also produces a table for small the small values of n. The code uses the ::: operator, it was new to me and allows to call package functions that are not part of the exposed interface.

library(logopt)
x <- coredata(nyse.cover.1962.1984)
nStocks <- dim(x)[2]

Sizes <- 1:nStocks
nCs <- 0 * Sizes
nSs <- 0 * Sizes

for (i in 1:length(Sizes)) {
n <- Sizes[i]
nCs[i] <- choose(nStocks, n)
nSs[i] <- logopt:::count.grid.points(n, 20)
}

plot(nCs * nSs, type="l", col="blue", log="y",ylim=range(nCs,nSs,nCs*nSs))
lines(nCs, col="red")
lines(nSs, col="green")
grid()

cat(sprintf("n C(36,n) S(n,20) C(36,n)*S(n,20) Ratios to previous value\n"))
for(i in 2:6) {
n <- Sizes[i]
cat(sprintf("%d %10d %10d %15.0f",n,nCs[i],nSs[i], nCs[i] * nSs[i]))
if(i > 2) {
cat(sprintf("%7.1f %7.1f %7.1f", nCs[i]/nCs[i-1], nSs[i]/nSs[i-1],
(nCs[i] * nSs[i]) / (nCs[i-1] * nSs[i-1])))
}
cat("\n")
}

n C(36,n) S(n,20) C(36,n)*S(n,20) Ratios to previous value

2 630 21 13230

3 7140 231 1649340 11.3 11.0 124.7

4 58905 1771 104320755 8.2 7.7 63.2

5 376992 10626 4005916992 6.4 6.0 38.4

6 1947792 53130 103486188960 5.2 5.0 25.8

The exponential complexity makes the Universal Porfolio algorithm impractical for large portfolio, a serious limitation. Some authors have devised more efficient approaches, that either try to approximate Universal Portfolio at reduced complexity or are inspired by the ideas of Universal but with less complexity and general either no bounds or worse bounds.

Saturday, July 7, 2012

Universal portfolio, part 7

After reproducing all original figures and tables from Universal Portfolios, R coupled with modern processors allows to perform some more analysis.

First we calculate the final wealth of the universal portfolio for all possible pairs of stocks, and show the corresponding cumulative probability function, and which pair is the best. This shows how concise R code can be.

library(logopt)
x <- coredata(nyse.cover.1962.1984)
w <- logopt:::x2w(x)
nDays <- dim(x)[1]
nStocks <- dim(x)[2]
Days <- 1:nDays

FinalWealth <- function(cols) {
xij <- x[,cols]
uc <- universal.cover(xij, 20)
return(uc[length(uc)])
}

ws <- combn(1:nStocks, 2, FinalWealth)
plot(ecdf(ws),col="blue",pch=19,cex=0.5)
grid()

# show the best pair
BestIdx <- combn(1:nStocks,2)[,which.max(ws)]
cat(sprintf("Max final wealth: %.4f for pair (%s,%s)\n", max(ws), colnames(x)[BestIdx[1]], colnames(x)[BestIdx[2]]))

The result is the figure below, plus this line

Max final wealth: 78.4742 for pair (comme,kinar)

There are two interesting observations to make:

There is a long, thin tail. In other words, there are some pairs giving good results, but they are relatively rare.
The best pair happens to be one of the examples in Cover's article, probably not a coincidence.

An even better metric is the ratio between the final wealth and the wealth of the best stock in the pair. It is expected that the universal portfolio gets better performance when the underlying stocks perform better. The code below is very similar to the code above, but calculating the ratio of wealth as a final step. It needs the same prolog as above (not shown).

WealthRatio <- function(cols) {

xij <- x[,cols]

uc <- universal.cover(xij, 20)

return(uc[length(uc)]/max(w[length(uc),cols]))

}

wrs <- combn(1:nStocks, 2, WealthRatio)

plot(ecdf(wrs),col="blue",pch=19,cex=0.5)

grid()

# show the best and worst pairs

BestIdx <- combn(1:nStocks,2)[,which.max(wrs)]

cat(sprintf("Max increase from best stock: %.4f for pair (%s,%s)\n", max(wrs), colnames(x)[BestIdx[1]], colnames(x)[BestIdx[2]]))

WorstIdx <- combn(1:nStocks,2)[,which.min(wrs)]

cat(sprintf("Min ratio from best stock: %.4f for pair (%s,%s)\n", min(wrs), colnames(x)[WorstIdx[1]], colnames(x)[WorstIdx[2]]))

The result is the figure below, plus these lines

Max increase from best stock: 4.3379 for pair (iroqu,kinar)

Min ratio from best stock: 0.3773 for pair (dupont,morris)

And we can do similar observations:

There is a long, thin tail, even thinner than for the wealth itself. In other words, there are some pairs giving good results, but they are relatively rare.
In the majority of the case (about 60%), the wealth of universal portfolio is below the wealth of the best stock in the pair, sometimes significantly so.
The best pair happens to be another one of the examples in Cover's article, probably not a coincidence either.

The general feeling is that universal portfolio is a valid approach, but maybe not as performing as could be deduced from the original article that cherry picked the examples.

Saturday, June 30, 2012

R is fun

As mentioned in Universal portfolio, part 6, the wealth reported in Table 8.4 of Universal Portfolios could not be reproduced. An other observation is that the random weight vectors reported in Table 8.4 are shown in descending lexicographic order, except for the last one, suggesting possibly that there is an error in the original table.

R code allows to perform a simple experiment, take subsets of the full set of weights and check if we can reproduce the reported wealth. Explicitly we'll try all subsets that exclude from 1 to 7 values in the original set. This shows the expressive power of R, and how fast modern computers operate. The whole search only takes a few line of codes and execute very fast even if this entails computing the mean of 280,599 different subsets.

As it turns out, two different subsets with 18 elements match the reported wealth, a surprising result. No other subsets match. The code below must be appended after the code used in Universal portfolio, part 6.

# remove selected entries to try to get the published value of 98.4240
# R is fun and modern computers are fast
BestAppr <- 0
nPruned <- 0
nC <- length(crps)
cat("\n")
for (m in seq(nC-1, nC-7)) {
CrpPruned <- combn(crps, m, mean)
nPruned <- nPruned + length(CrpPruned)
Best <- min(abs(CrpPruned - 98.4240))
cat(sprintf("Min delta from Cover with %d samples is %.4f\n", m, Best))
}
cat(sprintf("\n%d different subsets tried\n", nPruned))
cat("\n Subsets matching the wealth reported in Table 8.4\n")

# so at least one subset of 18 samples match the published value, show them
Indices <- combn(1:22,18)
for (i in 1:(dim(Indices)[2])) {
PrunedMean <- mean(crps[Indices[,i]])
if (abs(PrunedMean - 98.4240) < 0.0001) {
cat(Indices[,i])
cat(sprintf(" %.5f\n",PrunedMean))
}
}

This gives the following output

Min delta from Cover with 21 samples is 2.3056

Min delta from Cover with 20 samples is 0.0398

Min delta from Cover with 19 samples is 0.0116

Min delta from Cover with 18 samples is 0.0000

Min delta from Cover with 17 samples is 0.0003

Min delta from Cover with 16 samples is 0.0008

Min delta from Cover with 15 samples is 0.0002

280599 different subsets tried

Subsets matching the wealth reported in Table 8.4

1 2 3 4 5 6 9 10 11 12 13 14 15 17 19 20 21 22 98.42403

1 2 3 4 5 6 9 11 12 13 14 15 17 18 19 20 21 22 98.42403

Sunday, June 10, 2012

Universal portfolio, part 6

The final table in Universal Portfolios introduces leverage. It indirectly also shows the dangers of rebalancing on margin, while Kin Ark increases 4.2 times, at 50% margin it goes to nothing.

The code below reproduces Table 8.4, again as plain text only. For unknown reason, the wealth of the universal portfolio cannot be reproduced, while all other values do match.

# table 8.4 of Cover "Universal Portfolios"

library(logopt)

x <- nyse.cover.1962.1984

xck <- x[,c("comme","kinar")]

nDays <- dim(xck)[1]

Days <- 1:nDays

# calculate the margined sequence

r <- 0.000233 # seems to be r <- ((1+0.06) ^ (1/250)) - 1 rounded

yck <- (2 * xck) - 1 - r

xyck <- cbind(xck[,1], yck[,1], xck[,2], yck[,2])

mck <- cumprod(xyck)[nDays]

cat(sprintf("Commercial metals %.4f\n", mck[1,1]))

cat(sprintf("Commercial metals on margin %.4f\n", mck[1,2]))

cat(sprintf("Kin Ark %.4f\n", mck[1,3]))

cat(sprintf("Kin Ark on margin %.4f\n", mck[1,4]))

cat(sprintf("r = %.6f/day = 6%%/year\n",r))

# same "random" b as in Cover

b <- c(0.8, 0.2, 0.0, 0.0)

b <- rbind(b, c(0.8, 0.1, 0.0, 0.1))

b <- rbind(b, c(0.6, 0.1, 0.1, 0.2))

b <- rbind(b, c(0.6, 0.0, 0.4, 0.0))

b <- rbind(b, c(0.5, 0.0, 0.2, 0.3))

b <- rbind(b, c(0.4, 0.0, 0.4, 0.2))

b <- rbind(b, c(0.3, 0.5, 0.1, 0.1))

b <- rbind(b, c(0.3, 0.4, 0.1, 0.2))

b <- rbind(b, c(0.3, 0.2, 0.2, 0.3))

b <- rbind(b, c(0.3, 0.1, 0.2, 0.4))

b <- rbind(b, c(0.3, 0.0, 0.1, 0.6))

b <- rbind(b, c(0.2, 0.7, 0.0, 0.1))

b <- rbind(b, c(0.2, 0.2, 0.3, 0.3))

b <- rbind(b, c(0.1, 0.8, 0.1, 0.0))

b <- rbind(b, c(0.1, 0.5, 0.2, 0.2))

b <- rbind(b, c(0.1, 0.4, 0.2, 0.3))

b <- rbind(b, c(0.1, 0.3, 0.1, 0.5))

b <- rbind(b, c(0.1, 0.2, 0.4, 0.3))

b <- rbind(b, c(0.1, 0.1, 0.2, 0.6))

b <- rbind(b, c(0.0, 0.5, 0.4, 0.1))

b <- rbind(b, c(0.0, 0.4, 0.2, 0.4))

b <- rbind(b, c(0.2, 0.5, 0.1, 0.2))

BestValue <- 0

crps <- b[,1] * 0

for (i in 1:length(crps)) {

crps[i] <- crp(xyck, b[i,])[nDays]

if(crps[i] > BestValue) {

BestValue <- crps[i]

BestB <- b[i,]

}

cat(sprintf("Sn* = %.4f bn* =(%.1f, %.1f, %.1f, %.1f)\n",

BestValue, BestB[1], BestB[2], BestB[3], BestB[4]))

cat(sprintf("Best constituent stock %.4f\n",max(mck)))

cat(sprintf("Wealth achieved by universal portfolio Sn^ = %.4f\n", mean(crps)))

cat(sprintf("\n b Sn(b)\n\n"))

for (i in 1:length(crps)) {

cat(sprintf("(%.1f, %.1f, %.1f %.1f) %8.4f\n",b[i,1], b[i,2], b[i,3], b[i,4], crps[i]))

}

Giving the following output

Commercial metals 52.0203

Commercial metals on margin 19.7335

Kin Ark 4.1276

Kin Ark on margin 0.0000

r = 0.000233/day = 6%/year

Sn* = 262.4021 bn* =(0.2, 0.5, 0.1, 0.2)

Best constituent stock 52.0203

Wealth achieved by universal portfolio Sn^ = 108.0784

b Sn(b)

(0.8, 0.2, 0.0 0.0) 57.0535

(0.8, 0.1, 0.0 0.1) 148.9951

(0.6, 0.1, 0.1 0.2) 207.1143

(0.6, 0.0, 0.4 0.0) 140.7803

(0.5, 0.0, 0.2 0.3) 60.8358

(0.4, 0.0, 0.4 0.2) 47.6074

(0.3, 0.5, 0.1 0.1) 212.8928

(0.3, 0.4, 0.1 0.2) 261.0452

(0.3, 0.2, 0.2 0.3) 89.0330

(0.3, 0.1, 0.2 0.4) 19.4840

(0.3, 0.0, 0.1 0.6) 0.7700

(0.2, 0.7, 0.0 0.1) 121.0142

(0.2, 0.2, 0.3 0.3) 45.2562

(0.1, 0.8, 0.1 0.0) 67.5882

(0.1, 0.5, 0.2 0.2) 233.6328

(0.1, 0.4, 0.2 0.3) 112.6695

(0.1, 0.3, 0.1 0.5) 12.7702

(0.1, 0.2, 0.4 0.3) 19.4840

(0.1, 0.1, 0.2 0.6) 0.2354

(0.0, 0.5, 0.4 0.1) 225.2524

(0.0, 0.4, 0.2 0.4) 31.8076

(0.2, 0.5, 0.1 0.2) 262.4021

Saturday, June 9, 2012

Universal portfolio, part 5

The first three tables in Universal Portfolios presents the same information in numerical form as some of the plots. The following code generates all three tables by defining a function then calling it with suitable parameters. The output is text only, so does not match the typography of the original.

# tables 8.1, 8.2 and 8.3 of Cover "Universal Portfolios"
require(logopt)
x <- nyse.cover.1962.1984

ShowTable <- function(a,b) {
cat(sprintf("%s versus %s\n", a, b))
cat(sprintf("------------------\n"))
cat(sprintf("b Sn(b)\n"))
cat(sprintf("------------------\n"))
xab <- x[,c(a,b)]
nDays <- dim(xab)[1]
Days <- 1:nDays
alphas <- seq(0,1,by=0.05)
crps <- alphas
for (i in 1:length(crps)) {
crps[i] <- crp(xab, c(1-alphas[i], alphas[i]))[nDays]
cat(sprintf("%.2f %9.4f\n",1-alphas[i], crps[i]))
}
cat(sprintf("------------------\n"))
cat(sprintf("Target wealth: Sn* = %.4f\n",max(crps)))
cat(sprintf("Best rebalanced portfolio: bn* = %4.2f\n",1-alphas[which.max(crps)]))
cat(sprintf("Best constituent stock: %.4f\n", max(c(crps[1], crps[length(crps)]))))
cat(sprintf("Universal wealth: Sn^ = %.4f\n\n\n\n", mean(crps)))
}

ShowTable("iroqu","kinar")
ShowTable("comme","kinar")
ShowTable("comme","meico")

iroqu versus kinar
------------------
b Sn(b)
------------------
1.00 8.9151
0.95 13.7712
0.90 20.2276
0.85 28.2560
0.80 37.5429
0.75 47.4513
0.70 57.0581
0.65 65.2793
0.60 71.0652
0.55 73.6190
0.50 72.5766
0.45 68.0915
0.40 60.7981
0.35 51.6645
0.30 41.7831
0.25 32.1593
0.20 23.5559
0.15 16.4196
0.10 10.8910
0.05 6.8737
0.00 4.1276
------------------
Target wealth: Sn* = 73.6190
Best rebalanced portfolio: bn* = 0.55
Best constituent stock: 8.9151
Universal wealth: Sn^ = 38.6727

comme versus kinar
------------------
b Sn(b)
------------------
1.00 52.0203
0.95 68.2890
0.90 85.9255
0.85 103.6415
0.80 119.8472
0.75 132.8752
0.70 141.2588
0.65 144.0035
0.60 140.7803
0.55 131.9910
0.50 118.6854
0.45 102.3564
0.40 84.6655
0.35 67.1703
0.30 51.1127
0.25 37.3042
0.20 26.1131
0.15 17.5315
0.10 11.2883
0.05 6.9704
0.00 4.1276
------------------
Target wealth: Sn* = 144.0035
Best rebalanced portfolio: bn* = 0.65
Best constituent stock: 52.0203
Universal wealth: Sn^ = 78.4742

comme versus meico
------------------
b Sn(b)
------------------
1.00 52.0203
0.95 61.0165
0.90 70.0625
0.85 78.7602
0.80 86.6815
0.75 93.4026
0.70 98.5414
0.65 101.7927
0.60 102.9589
0.55 101.9691
0.50 98.8869
0.45 93.9033
0.40 87.3172
0.35 79.5057
0.30 70.8890
0.25 61.8932
0.20 52.9162
0.15 44.3012
0.10 36.3178
0.05 29.1538
0.00 22.9160
------------------
Target wealth: Sn* = 102.9589
Best rebalanced portfolio: bn* = 0.60
Best constituent stock: 52.0203
Universal wealth: Sn^ = 72.6289

The final table is more complex to reproduce and also more interesting, we'll tackle it in a next post.

Pages