Sunday, 19 July 2015

Using optim() to get 'quick and dirty' solutions, a case study on network graphs

The optim() function in base R, is a generalized optimizer for continuous functions. It looks for the minimum of some objective function that you define across some continuous parameter space that you define.

In this post, I show how to use optim() to find a (inelegant, but workable) solution to do something very complex, plot a network of nodes based on the shortest path between them, with relatively little programming effort.

First, a simpler example:

If you didn't know the lm() function, you could use to find the least-squares line through set of points.
- The solution space is R2, the possible values of (intercept, slope).
- The objective function is the sum of squared error.

```get_ss_error = function(beta, x0, y0)
{
yhat = beta[1] + x0*beta[2]
error = sum( (y0 - yhat)^2 )
return(error)
}```
```set.seed(1234)
x0 = rnorm(1000)
y0 = -3 + 5*x0 + rnorm(1000)

## linear model solution
lm(y0 ~ x0)\$coef
(Intercept)          x0
-2.984010    5.055713

## optimizer solution
optim(par=c(0,0), get_ss_error, x0=x0, y0=y0)\$par
[1] -2.984321  5.055940

```
In this case optim() isn't using (X'X)-1 X'y to find the best solution, it just uses one of its general optimization methods to try different values for slope and intercept until it comes up with a pair that has close to the minimum value for get_ss_error. Note that the first parameter passed to get_ss_error is the potential solution. The optim() function will assume this about functions, I think.

Now let's try something more interesting than simple regression. In April, Jeremy Diachuk was playing with Thaumcraft, an expansion to Minecraft which uses a transmutation mechanic. That is, elements can be turned into other elements, but any given element can only be directly turned into a couple of other elements. For example, "Earth" can be changed into "Life" (and "Life" back into "Earth"), but "Earth" can't be changed into "Heal". However, "Life" can be changed into "Heal".

We can describe the transmutation relationships as an undirected network graph, and the shortest path length between each element is the number of transmutations needed to turn one element into another. "Earth" transmutes to "Life" transmutes to "Heal", which is a shortest path, so "Earth" is two transmutations away from "Heal". We call this distance in a network the geodesic distance.

Jeremy wanted a visual arrangement of some key elements that showed their relative geodesic distance from each other. There are programs out there to do this, but I thought it would be faster/better to build something in R using optim() than try to use an out of the box system. I was right... I think.

First, we needed some means to find the geodesic distance between each element. Jeremy gave me a formatted list of the possible transmutations (available here), and it was a simple call to the shortest.path() function in the igraph package in R to get a matrix of the geodesic distances.

```library(igraph)
library(permute)

sp = shortest.paths(g)
```

For the sake of a simpler picture and faster optimization, we weren't interested in all 49 of the elements, so we eliminated unimportant ones from the matrix. Note that we couldn't do this before finding the shortest paths because they may have use to make those paths.

`## Full list in downloadable code `
```ignore_list = c("CRAFT","SLIME","WRATH",...,"COLD","TAINT")

ignore_idx = which(row.names(sp) %in% ignore_list)
sp = sp[-ignore_idx, -ignore_idx]
```

Next, we needed...
- An objective function (the difference between geodesic distance and the usual meaning of distance, euclidean distance)
- A parameter space (the location of each of the key elements in a plot), and
- A function that takes in the parameter space, and gives out the objective function

`# Full comments in downloadable version `
```euclid_vs_geodesic = function( x, sp, importance)
{
size = length(x) / 2
y = x[ 1:Nnodes]
x = x[ 1:Nnodes + Nnodes ]

for(i in 1:Nnodes)
{
for(j in (1:Nnodes)[-i])
{
eu_dist =  sqrt((x[i] - x[j])^2 + (y[i] - y[j])^2 )
geo_dist = sp[i,j]

thisbad = abs( log( eu_dist / geo_dist))^3
}
}
}
```

The function euclid_vs_geodesic takes a vector of values 'x', breaks it into x and y coordinates for each element, and compares the euclidean distance between each coordinate pair with the number of transmutations that separate each element, and sums them up to get a 'badness' score. We can feed all this into optim to find the optimal placement of the elements in a 2-D plot.

`# Full comments in downloadable version `
```Nnodes = nrow(sp)
importance = rep(1,Nnodes)
init = rnorm(2*Nnodes)

res = optim(par=init,
fn=euclid_vs_geodesic, sp=sp, importance=importance,
control=list(trace=1, maxit=200),
method="L-BFGS-B", lower = rep(-4,2*Nnodes), upper = rep(4,2*Nnodes))
```

Using the results from res, we can plot:

```x = res\$par[1:Nnodes]
y = res\$par[Nnodes + 1:Nnodes]
plot(x,y, cex=0.1)
text( labels=row.names(sp), x=x, y=y)

```

Here, words that are close to each other represent elements that require few transmutations between each other. 'Light' (lower left) can be transmuted to 'Fire' and 'Energy' simply, but converting 'Light' to 'Motion' is hard, and turning 'Light' into 'Undead' is even harder.

Also, even the edges between the nodes aren't shown, shortest paths between nodes are visually implied. "Life" (just right of center), is close to the line between "Earth" and "Heal", implying the 'Earth to Life to Heal' shortest path.

-------------

The optim() function is also good for getting the inverse of a solution. In January I used quantile regression to find curves for the scores of the best and worst 10% of players in a video game after some amount of practice time. The original solution took time and score, and give back the estimated proportion of players that would reach that score by that time.

However, if you were instead interested in tuning a game to find the score that, say, 90% of players could reach within half an hour, optim() could be used to find that by setting the parameter space to 'score', setting 'time' as function parameter, and using 'distance from 90%' as the objective function.

One major drawback is that, in order to do this, each iteration inside the optim() function had to run the quantile regression code all over again. This made finding the optimal score or play time about ten times slower than finding the quantiles.

I call these 'quick and dirty' solutions because, although they work, they aren't guaranteed to be elegant, revealing, or computationally efficient. However, if you want to get a sense of a solution fast, it's hard to beat optim() for its simplicity of use.

Finally, optim() can also be used for combinatorial optimization, but I haven't covered that here.