Building ecological models bit-by-bit

David L Miller

CREEM, University of St Andrews

useR! 2015

Ålborg, Denmark

1 July 2015

Motivation

Ecological questions

How many animals are there?

How?

Statistical methods

(Usable by ecologists)

Software

(Usable by ecologists)

- Big (BIG) (Bayesian?) models
- Encapsulate many processes
- “Throw everything in and see what happens”
- (Perceived) Workflow:
- Formulate a giant likelihood
- Wait for a long time
- …
- Get results/diagnostics
- Optional: repeat (convergence/model checking fails)

- \(\geq 2\) stage modelling
- Optimality issues either way?
- Propagation of uncertainty “hard”?

- Thorough model checking
- Fitting non-complicated models
- (Many simple vs. one giant)

- Easier to understand (process-based)
- Simplification of workflow
- Less time waiting (and shorter waits)
- Diagnostics as we go
- Let’s do more of this!

Case study:

distance sampling

distance sampling

Code for animation at https://gist.github.com/dill/2b0c120d5484d338d8ef

- Model \(\mathbb{P} \left[ \text{animal detected } \vert \text{ object at distance } y\right] = g(y;\boldsymbol{\theta})\)
- Calculate the average probability of detection:

\[ \hat{p}_i = \frac{1}{w} \int_0^w g(y; \mathbf{z}_i, \boldsymbol{\hat{\theta}}) \text{d}y \]

- Horvitz-Thompson-type estimators:

\[ \hat{N} = \sum_{i=1}^n \frac{s_i}{\hat{p}_i} \]

- (or model-based estimators see Miller et al (2013))

- Buckland et al (2004), Borchers et al (1998)
- DS assumes \(g(0)=1\) (i.e. see everything right infront of you)
- Use 2 observers, set up trials
- Add an extra likelihood
*component*, account for this- binomial, mark-recapture

\[ \mathcal{L} = \mathcal{L}_g \mathcal{L}_\Omega \]

Code for animation at https://gist.github.com/dill/2b0c120d5484d338d8ef

Partition in likelihood == partition in software

`+`

is a really useful operator
`+`

do the work- define
`+.class`

- let
`+`

compute the resulting components- likelihood
- AIC
- update classes/functions (
`summary`

,`predict`

etc)

- clearer interface for users
- (likelihood components add on log scale)

“Inspired” (“stolen”) from `ggplot2`

- Distance sampling good for this
- Componentised likelihoods (thanks to David Borchers)

- What about other models?
- Where (conditional) independence is not required
- Just using
`+`

to “add” components

```
# linear model
mod <- lm(response ~ x1, data=data)
mod <- mod + lm_var(x2)
```

less trivially

```
# include correlation structure in nlm
library(nlm)
mod <- nlm(response ~ x1, data=data)
mod_AR1 <- mod + corAR1(form=~sample|group)
```

Refit using `nlm`

starting parameters? See also `update`

.

- Do folks think this kind of thing is useful?
- In which areas is it useful?

- Avoid optimality issues by refitting “full” model at end?
- Encourage users to perform model checking?
- Don’t just fit the most complicated model?

Talk available:

http://converged.yt/talks/useR2015/talk.html

http://converged.yt/talks/useR2015/talk.html

- Buckland, S. T., Anderson, D. R., Burnham, K. P., Laake, J. L., Borchers, D. L., & Thomas, L. (2001). Introduction to Distance Sampling. OUP.
- Buckland, S. T., Anderson, D. R., Burnham, K. P., Laake, J. L., Borchers, D. L., & Thomas, L. (2004). Advanced Distance Sampling. OUP.
- Borchers, D. L., Buckland, S. T., Goedhart, P. W., Clarke, E. D., & Hedley, S. L. (1998). Horvitz-Thompson Estimators for Double-Platform Line Transect Surveys. Biometrics, 54(4), 1221. http://doi.org/10.2307/2533652
- Borchers, D. L., Buckland, S. T., & Zucchini, W. (2002). Estimating Animal Abundance: Closed populations. Springer.
- Miller, D. L., Burt, M. L., Rexstad, E. A., & Thomas, L. (2013). Spatial models for distance sampling data: recent developments and future directions. Methods in Ecology and Evolution, 4(11), 1001–1010. http://doi.org/10.1111/2041-210X.12105