Building ecological models bit-by-bit

David L Miller
CREEM, University of St Andrews

useR! 2015
Ålborg, Denmark
1 July 2015

Motivation
Ecological questions
How many animals are there?
How?
Statistical methods

(Usable by ecologists)
Software

(Usable by ecologists)

# Proposal

• Easier to understand (process-based)
• Simplification of workflow
• Less time waiting (and shorter waits)
• Diagnostics as we go
• Let’s do more of this!
Case study:
distance sampling

# Distance sampling (in 1 slide)

Code for animation at https://gist.github.com/dill/2b0c120d5484d338d8ef

# Modelling detection

• Model $$\mathbb{P} \left[ \text{animal detected } \vert \text{ object at distance } y\right] = g(y;\boldsymbol{\theta})$$
• Calculate the average probability of detection:

$\hat{p}_i = \frac{1}{w} \int_0^w g(y; \mathbf{z}_i, \boldsymbol{\hat{\theta}}) \text{d}y$

• Horvitz-Thompson-type estimators:

$\hat{N} = \sum_{i=1}^n \frac{s_i}{\hat{p}_i}$

• (or model-based estimators see Miller et al (2013))

# Mark-recapture distance sampling

• Buckland et al (2004), Borchers et al (1998)
• DS assumes $$g(0)=1$$ (i.e. see everything right infront of you)
• Use 2 observers, set up trials
• Add an extra likelihood component, account for this
• binomial, mark-recapture

$\mathcal{L} = \mathcal{L}_g \mathcal{L}_\Omega$

# Mark-recapture distance sampling animation

Code for animation at https://gist.github.com/dill/2b0c120d5484d338d8ef

Partition in likelihood == partition in software

# Syntax example

library(Distance2)
# MR model
mr.io <- mrds(data, truncation=4,
model=mr(mode="io", formula=~distance))
## do checking of mr part

# DS model
ds1 <- ds(data, truncation=4)
## do checking of ds part

mrds.io <- mr.io + ds1
+ is a really useful operator

# Let + do the work

• define +.class
• let + compute the resulting components
• likelihood
• AIC
• update classes/functions (summary, predict etc)
• clearer interface for users
• (likelihood components add on log scale)

“Inspired” (“stolen”) from ggplot2

# Can we do this for other model classes?

• Distance sampling good for this
• Componentised likelihoods (thanks to David Borchers)
• Where (conditional) independence is not required
• Just using + to “add” components

# Something like…?

# linear model
mod <- lm(response ~ x1, data=data)
mod <- mod + lm_var(x2)

less trivially

# include correlation structure in nlm
library(nlm)
mod <- nlm(response ~ x1, data=data)
mod_AR1 <- mod + corAR1(form=~sample|group)

Refit using nlm starting parameters? See also update.

# Conclusion

• Do folks think this kind of thing is useful?
• In which areas is it useful?
• Avoid optimality issues by refitting “full” model at end?
• Encourage users to perform model checking?
• Don’t just fit the most complicated model?

# Thanks!

Talk available:
http://converged.yt/talks/useR2015/talk.html

# References

• Buckland, S. T., Anderson, D. R., Burnham, K. P., Laake, J. L., Borchers, D. L., & Thomas, L. (2001). Introduction to Distance Sampling. OUP.
• Buckland, S. T., Anderson, D. R., Burnham, K. P., Laake, J. L., Borchers, D. L., & Thomas, L. (2004). Advanced Distance Sampling. OUP.
• Borchers, D. L., Buckland, S. T., Goedhart, P. W., Clarke, E. D., & Hedley, S. L. (1998). Horvitz-Thompson Estimators for Double-Platform Line Transect Surveys. Biometrics, 54(4), 1221. http://doi.org/10.2307/2533652
• Borchers, D. L., Buckland, S. T., & Zucchini, W. (2002). Estimating Animal Abundance: Closed populations. Springer.
• Miller, D. L., Burt, M. L., Rexstad, E. A., & Thomas, L. (2013). Spatial models for distance sampling data: recent developments and future directions. Methods in Ecology and Evolution, 4(11), 1001–1010. http://doi.org/10.1111/2041-210X.12105