Recent advances in spatial modelling of distance sampling surveys

David L Miller (@millerdl)
CREEM, University of St Andrews
converged.yt

Ecological Society of America Annual Conference
Baltimore, Maryland
10 August 2015  Density surface models

(Spatial models that account for detectability)

(…and more)
(This talk is a rough guide)

(Go to converged.yt, “Talks”

# $$\geq 2$$-stage models Hedley and Buckland (2004). Miller et al (2013).

Detectability

# Distance sampling - line transects Code for animation at https://gist.github.com/dill/2b0c120d5484d338d8ef

# Detection functions # Detection functions

• “Fit to the histogram”
• Model $$\mathbb{P} \left[ \text{animal detected } \vert \text{ animal at distance } y\right] = g(y;\boldsymbol{\theta})$$
• Calculate the average probability of detection:

$\hat{p}_i = \frac{1}{w} \int_0^w g(y; \boldsymbol{\hat{\theta}}) \text{d}y$

• Horvitz-Thompson-type estimators:

$\hat{N} = \sum_{i=1}^n \frac{s_i}{\hat{p}_i}$

(where $$s_i$$ are group/cluster sizes)

# Distance sampling (extensions)

• Covariates that affect detectability (Marques et al, 2007)
• Perception bias ($$g(0)<1$$) (Burt et al, 2014)
• Availability bias (Winiarski et al, 2013; Borchers et al, 2013)
• Detection function formulations (Miller and Thomas, 2015)
• Measurement error (Marques, 2004) Figure from Marques et al (2007)

Spatially explicit data

# Data setup Ursus from PhyloPic. # Case study - black bears in AK

• Area of 26,482 km2 (~area of VT or MA)
• Double observer surveys using Piper Super Cubs
• 1238, 35km transects, 2001-2003 # 1238 transects Spatially explicit models

# Spatial model

$\mathbb{E}(\hat{n}_j) = A_j\exp \left\{ \beta_0 + \sum_k f_k(z_{jk}) \right\}$

• $$\hat{n}_j$$ has some count distribution (Horvtiz-Thompson estimate)
• $$f_k$$ are smooth functions (splines $$\Rightarrow f_k(x)=\sum_l \beta_l b_l(x)$$)
• $$f_k$$ can just be fixed effects $$\Rightarrow$$ GLM
• Add-in random effects, correlation structures $$\Rightarrow$$ GAMM
• $$A_j$$ is area of segment
• R package dsm
• Wood (2006) is a good intro to GAMs

Back to those bears…   # “Bears don’t like to go too high” # “Bears like to sunbathe” # Abundance map What could go wrong?
“Of course our response distribution is correct…”

# Response distributions

• “Classically”: quasi-Poisson (I’ve not seen data like this)
• Lately: Tweedie, negative binomial
• Exponential family given power parameter
• (mgcv can now estimate power parameters via tw() and nb()) “We selected the right covariates!”

# Model selection

• All possible subsets - expensive; stepwise - path dependence
• Approximate $$p$$-values (Marra & Wood, 2012)
• Term selection by shrinkage to zero effect (Marra & Wood, 2011) “We removed correlated covariates!”

# Concurvity

$\text{Altitude} = f(x,y) + \epsilon \quad \text{or} \quad \text{Chlorophyll A} = f(\text{SST}) + \epsilon$

• Not just correlation!
• mgcv::concurvity() computes measures for fitted models   “Variance was estimated correctly”

# Uncertainty propagation

• Major criticism of $$\geq2$$-stage models
• Uncertainty from detection function AND spatial model (and…)
• Refit model with “extra” term – zero mean effect, variance contribution Williams et al (2011). Bravington, Hedley and Miller (in prep)

“What spatial autocorrelation?”

# Autocorrelation

• $$\text{AR}(p)$$ process (“obvious” structure)
• Can use GEE/GAMM structure for autocorrelation along transects
• In general this is unstable
• Random effects are sparse
• Splines are “dense”
• $$\Rightarrow$$ bad for optimisation Software

# The dsm package

• Design “inspired by” (“stolen from”) mgcv
• Easy to build simple models, possible to build complex ones
• Syntax example:

model <- dsm(count ~ s(x,k=10) + s(depth,k=6),
detection_function,
segment_data,
observation_data,
family=tw())
• Utility functions: variance estimation, plotting, prediction etc

# Distance sampling software

• Distance R package
• Simple way to fit detection functions
• Me!
• mrds R package
• More complex analyses - double observer surveys
• Jeff Laake, me
• Distance for Windows
• Easy to use Windows software
• Len Thomas, Eric Rexstad, Laura Marshall

Conclusions

# Conclusions

• Existing statistical framework (GAM)
• Flexible spatial models
• Detectability
• GLMs + random effects + smooths + other extras
• autocorrelation can be modelled
• accounting for uncertainty
• Large, heterogeneous areas
• Spatial component is v. helpful for managers
• Two-stage models can be useful!
• Modular model checking

# Acknowledgements

• St Andrews: Eric Rexstad, Len Thomas, Laura Marshall
• CSIRO: Mark Bravington, Natalie Kelly
• Alaska: Earl Becker, Becky Strauch, Mike Litzen, Dave Filkill

Funding from Alaska Department of Fish and Game  # Thanks!

Slides (with extra content) available at
converged.yt

Course at Duke in October:
nicholas.duke.edu/del/distance
Appendices
“Our spatial smoother fit well”

# Appendix - Smoothing in awkward regions Ramsay (2002). Wood, Bravington & Hedley (2008).

# Appendix - Miller and Wood (2014) • Calculate within-area distances
• Use multidimensional scaling to project (high usually)
• Use Duchon splines for smoothing
• Use GCV/REML for dimension selection

# Appendix - Smoothing in less awkward regions • “Remove” troublesome parts of the thin plate spline
• Do this carefully (Fourier transform)
• Nullspace (plane) terms replaced w. low freq

Miller and Kelly (in prep)

“Our detection functions look great!”

# Mixture model detection functions Data from Daniel Pike, Bjarni Mikkelsen and Gísli Vikingsson. Marine Research Institute, Iceland.

# Mixture model detection functions • Miller and Thomas (2015)
“Our parameter estimates are fine!”

# Smoothing parameter estimation by REML

• GCV tends to undersmooth (Reiss & Ogden, 2009)
• REML much better, esp. with correlated covariates Taken from Wood (2011).

“Our residuals are fine!”

# Residual checking (gam.check) # Residual checking # Randomised quantile residuals

• Count data is nasty for goodness of fit
• Dunn & Smyth (1996)
• Back transform for exactly Normal residuals
• Fewer problems with artefacts
• dsm::rqgam.check
• (Thanks to Natalie Kelly at CSIRO for the tip)

# rqgam.check “Nope, no problems with availability”

# Availability

• “Simple correction factor” for diving animals (Winiarski et al 2014)
• Borchers & co have many solutions using Hidden Markov Models

# More references

• Dunn, PK, and GK Smyth (1996). Randomized Quantile Residuals. Journal of Computational and Graphical Statistics 5(3) 236–244.
• Miller, DL, & L Thomas (2015). Mixture models for distance sampling detection functions. PLoS ONE.
• Miller, DL, & SN Wood (2014). Finite area smoothing with generalized distance splines. Environmental and Ecological Statistics, 21(4), 715–731.
• Ramsay, T (2002) Spline smoothing over difficult regions. Journal of the Royal Statistical Society, Series B 64, 307-319.
• Winiarski, KJ, DL Miller, PWC Paton, and SR McWilliams (2014). A Spatial Conservation Prioritization Approach for Protecting Marine Birds Given Proposed Offshore Wind Energy Development. Biological Conservation 169 79–88.
• Wood, SN, MV Bravington, & SL Hedley (2008). Soap film smoothing. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5), 931–955.