Featured post

Textbook: Writing for Statistics and Data Science

If you are looking for my textbook Writing for Statistics and Data Science here it is for free in the Open Educational Resource Commons. Wri...

Friday 24 June 2016

I read this: Spatial and Spatio-Temporal Bayesian Models with R-INLA


Spatial and Spatio-temporal Bayesian Models with R-INLA, by Marta Blandiardo and Michela Cameletti is a short textbook that introduces INLA (Integrated nested Laplace approximations) and some preliminaries, and works through the math and R-code of a variety of problems that the R-INLA package can tackle. Several non-spatial regression models are also covered in enough detail to get you started on R-INLA in general.


 
Why I read it: 

I’m working on a research problem in which INLA was recommended as a possible solution. I was sold on the idea, but also warned that the original paper describing the INLA method. (Link) When Dr. Dave Campbell says something is hard, you can safely translate this as ‘herculean’. The paper
(Rue, et al. 2009) was 74 pages of material too intimidating for me to examine. So, I looked for other summary work of INLA, and came across this book which describes INLA and the use of an R package being developed by a community called R-INLA. R-INLA is not available on CRAN at the time of writing this, but it is available at www.r-inla.org

What is INLA: 

INLA, or integrated nested Laplace approximations is a method used to estimate the likelihood in high-dimension cases that integrates out unimportant variables, and then approximates the likelihood by using the product of marginals for all these assumed unimportant variables. Compared to MCMC, or Markov Chain Monte Carlo methods, INLA scales much better with the number of dimensions, but is reliant on the quality of that extra layer of approximations. Also, INLA is deterministic, instead of simulation based like most popular Bayesian methods, including ABC (Approximate Bayesian Computation). This has the advantage of replicability, but the disadvantage that you can’t improve the results simply by throwing more computers at it.


Who is this book for: 

This is book is for people with some statistical and mathematical fluency already. If you feel ready for Julian J. Faraway’s “Extending the Linear Model with R”, a popular book for teaching GLMs, then you are probably ready for this R-INLA book. That is, a background in calculus and statistical inference are required, and a background in R and in Bayesian statistics are helpful.

From what I’ve read (Chapters 4 and 5), R-INLA is a beast that can provide quick solutions to a vast set of popular analyses, and solutions of any sort for high-dimensional problems. It could be worth your reading time, especially if the INLA method becomes more popular.

There are 300 pages in this text, but 100 pages are dedicated to bringing the reader up to speed on using R and some basic analyses, on Bayesian theory, and on Bayesian computational methods like MCMC, Metropolis-Hastings, and BUGS.

The rest is a guide on applying INLA to Bayesian regression, hierarchical models (my focus for this project), spatial, and Spatio-Temporal models.

What I learned about using R-INLA:

At the basic level, the inla() function in R-INLA is used a lot like the glm() function.
1  Specify a formula, y ~ x. (You can also specify custom functions, see below)
2  Specify a family of distributions for the likelihood.
3  Specify a dataset.

However, under the hood, it’s performing a Bayesian regression (or other estimation, if you wish), with a default of very diffuse priors. For a regression, for example, you can use the control.fixed=list() setting in inla() and...
- change the priors on the intercept and all the regression parameters. 
- change the prior on precision (inverse of standard error sigma) of response estimates, which defaults to Gamma(1,10^-5), which you replace with another gamma or even a distribution from another family. 
The book provides a way to specify the standard deviation sigma directly, but it’s bit of an awkward workaround.

To get the fitted values that you would normally get from $fitted in a lm()or glm() model, you have specify with control.predictor = list(compute=TRUE)) in inla() to produce any fitted values in your model object. If you wish to predict for new data where the response is missing, you can use link= in the control.predictor list to specify which predictors to compute, and make sure to specify rows in which the response is missing.

1  The custom functions that you can specify in the formula that gets fed into inla() has many options that allow to specify that you are doing something more complex than a linear regression. The list of currently available ones is here: http://www.r-inla.org/models/latent-models

For example, you can set up an autocorrelational model by adding f(varname, model="rw1", hyper=list(…), prior=…, param=…) where "rw1" stands for Random Walk of Order 1, hyper stands for the hyper-prior like the one set for precision of estimates, and prior and param define the prior for the parameter(s) of the effect of varname. You can also specify multiple such functions by adding them as f1(…) + f2(…) + ….

If you want to treat levels of a particular variable as random effects, use f(varname, "iid"), and inla() will treat the effects of varname as independent and identically distributed values from some common distribution. This is particular useful in hierarchical models. These random effects can be used in different family specifications.

2.When specifying a family, the default is Gaussian. If you want to reduce the effect of outliers, using a family with fatter tails, like Gosset’s Student’s T. Remember, this is maximizing likelihood, not minimizing squared error, so using a fatter tailed family means that the likelihood doesn’t drop off as quickly as deviations get larger. As with other parts of inla(), you can override the defaults (like df for T, which it will get from n-p), with a list in control.family.

Other options, listed at http://www.r-inla.org/models/likelihoods , include Weibull, skew normal, binomial (for logistic regression), Poisson (for regression on counts), and Cox (as in Cox Proportional Hazard for survival data). 



What else I learned:

A random Markov field, or RMF, is a network of variables that shows explicitly which variables depend of which others. A connection between nodes represents a dependency, and a lack of a connection between nodes represents conditional independence. If the variables form a multivariate normal distribution, you have a GRMF, or Gaussian Markov random field. I can see how a concept like this would be useful in identifying which variables could be safely integrated out. The ‘Markov’ part specifies that only the immediate connections imply dependence.

The approxfun() function. It’s in base R, and it takes two numeric vectors of equal length, x and y, and makes a function that you can use to put any new x in to get an interpolated y value. By default, it has nothing more sophisticated than linear interpolation, but I’d be surprised if there wasn’t a polynomial or spline-based extension out there. Functions output by approxfun() can be used just like other continuous value functions of x, like sin() and cos(), which is very handy when integrating with integrate().



Original paper: 

Rue, H., Martino, S., & Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the royal statistical society: Series b (statistical methodology)71(2), 319-392.

No comments:

Post a Comment