# Density surface models

Duke University, 13 February 2014

# Who is this guy?

• Statistician by training (St Andrews)
• PhD University of Bath, w. Simon Wood
• Postdoc, University of Rhode Island
• Research fellow at CREEM
• Developer of distance sampling software
Spatial modelling

# What do we want to do?

• Relate covariates to animal abundance
• Estimate abundance in a spatially explicit way
• Calculate uncertainty
• Interpretability to biologists/ecologists
• Often using mixed/historical data  # Line transects  # Data setup # Some “problems”

• How to model covariate effects?
• Model (term) selection
• Reponse distribution
• Uncertain detection
• Availability
• Autocorrelation
Density surface models
Detection

# How do we deal with detectability?

• Distance sampling! – Fit detection functions
• Estimate $$\mathbb{P}(\text{ detection } | \text{ object at distance } x) = g(x)$$
• Calculate average detection probability = $$\frac{1}{w}\int_0^w g(x) \text{ d}x$$ (where $$w$$ is truncation distance)  # Distance sampling

• Lots of other stuff going on here!
• Covariates that affect detectability
• Double observer ($$g(0)<1$$)
• Detection function formulations # Distance sampling software

• Distance for Windows (6.2 out soon!)
• Easy to use Windows software
• Len Thomas, Eric Rexstad, Laura Marshall
• Distance R package
• Simple way to fit detection functions
• Me!
• mrds R package
• More complex analyses - double observer surveys
• Jeff Laake, me # Two pages generalized additive models (I)

If we are modelling counts:

$\mathbb{E}(n_j) = \exp \left\{ \beta_0 + \sum_k f_k(z_{jk}) \right\}$

• $$n_j$$ has some count distribution (quasi-Poisson, Tweedie, negative binomial)
• $$f_k$$ are smooth functions (splines $$\Rightarrow f_k(x)=\sum_l \beta_l b_l(x)$$)
• $$f_k$$ can just be fixed effects $$\Rightarrow$$ GLM
• Add-in random effects, correlation structures $$\Rightarrow$$ GAMM
• Wood (2006) is a good intro book

# Two pages generalized additive models (II) Minimise distance between data and model while minimizing:

$\lambda_k \int_\Omega \frac{\partial^2 f_k(z_k)}{\partial z_k^2} \text{ d}z_k$

# Two options for response

## $$n_j$$ - raw counts per segment

$\mathbb{E}(n_j) = A_j \hat{p}_j \exp \left\{ \beta_0 + \sum_k f_k(z_{jk}) \right\}$

## $$\hat{n}_j$$ - H-T estimate per segment

$\mathbb{E}(\hat{n}_j) = A_j \exp \left\{ \beta_0 + \sum_k f_k(z_{jk}) \right\}$

$\hat{n}_j = \sum_{i \text{ in segment } j} \frac{s_i}{\hat{p}_i}$

# The dsm package

• Design “inspired by” (“stolen from”) mgcv
• Easy to build simple models, possible to build complex ones
• Syntax example:

model <- dsm(count ~ s(x,k=10) + s(depth,k=6),
detection.function,
segment.data,
observation.data,
family=negbin(theta=0.1))
• Utility functions: variance estimation, plotting, prediction etc

Case study I - Seabirds in RI waters

# Case study I - Seabirds in RI waters # RI seabirds - Aims

• Wind development in RI/MA waters
• Map of usage
• Estimate uncertainty
• Combine maps (Zonation) Photo by jackanapes on flickr (CC BY-NC-ND)

# RI seabirds - Detection function modelling # RI seabirds - Spatial covariates    # RI seabirds - The model

• Availability
• correction factor from previous experimental work
• $$p_j \times \mathbb{P}(\text{available for detection})$$
• Term selection by approximate $$p$$-values
• Covariates are collinear (curvilinear)
• select - extra penalty
• REML - better optimisation objective From Fig. 1 of Wood (2011)

# RI seabirds - Covariate effects # RI seabirds - Results # RI seabirds - Uncertainty Case study II - black bears in Alaska

# Case study II - black bears in AK

• Area of 26,482 km2 (~ size of VT/MA)
• Double observer surveys using Piper Super Cubs
• 1238, 35km transects, 2001-2003  # 1238 transects # Survey protocol

• Surveys in Spring, bears are there, but not too much foliage
• Generally search uphill
• Double observer (Borchers et al, 2006)
• Curtain between pilot and observer; light system
• Go off transect and circle to ID # Black bears

• Truncate at 22m and 450m, leaving 351 groups (out of ~44,000 segments)
• Group size 1-3 (lone bears, sow w. cubs)
• 1402m elevational cutoff       # Final model

• bivariate smooth of location
• smooth of elevation
• bivariate smooth of slope and aspect    # Abundance estimate for GMU13E

• MRDS estimate: ~1500 black bears
• DSM estimate: ~1200 black bears (968 - 1635, CV ~13%)
• Not a huge difference, so why bother?

# Abundance map # CV map # Conclusions

• Flexible spatial models
• GLMs + random effects + smooths + other extras
• autocorrelation can be modelled
• Large areas, makes sense
• Spatial component is v. helpful for managers
• Two-stage models can be useful!
• Estimating temporal trends

# Thanks

• Rhode Island: Kris Winiarski, Peter Paton, Scott McWilliams
• Alaska: Earl Becker, Becky Strauch, Mike Litzen, Dave Filkill
• Elsewhere: Mark Bravington, Natalie Kelly, Eric Rexstad, Louise Burt, Len Thomas, Steve Buckland  # Randomised quantile residuals

• Goodness of fit testing
• Dunn, PK, and GK Smyth. Randomized Quantile Residuals. Journal of Computational and Graphical Statistics 5, no. 3 (1996): 236–244.
• Back transform for exactly Normal residuals
• Less problems with artefacts
• (Thanks to Natalie Kelly at CSIRO for the tip)

# gam.check # rqgam.check 