Spatial modelling of distance sampling surveys

David L Miller
CREEM, University of St Andrews

British Trust for Ornithology
Thetford, UK
26 June 2015

Ecological questions
How many animals are there?
Where are all the animals?
Why are they there?
Density surface models

(Spatial models that account for detectability)

(…and more)

# $$\geq 2$$-stage models

Hedley and Buckland (2004). Miller et al (2014).

Spatially explicit models

# Data setup

Ursus from PhyloPic.

# Two options for response

## $$n_j$$

• raw counts per segment
• model offset is effective area ($$A_j \hat{p}_j$$)

## $$\hat{n}_j$$

• Horvitz-Thompson estimate per segment

$\hat{n}_j = \sum_{i \text{ in segment } j} \frac{s_i}{\hat{p}_i}$

• model offset is then area ($$A_j$$)

# Generalized additive models (in two pages) (I)

If we are modelling counts:

$\mathbb{E}(\hat{n}_j) = A_j\exp \left\{ \beta_0 + \sum_k f_k(z_{jk}) \right\}$

• $$\hat{n}_j$$ has some count distribution (quasi-Poisson, Tweedie, negative binomial, ziP)
• $$A_j$$ is area of segment
• $$f_k$$ are smooth functions (splines $$\Rightarrow f_k(x)=\sum_l \beta_l b_l(x)$$)
• $$f_k$$ can just be fixed effects $$\Rightarrow$$ GLM
• Add-in random effects, correlation structures $$\Rightarrow$$ GAMM
• Wood (2006) is a good intro book

# Generalized additive models (in two pages) (II)

Minimise distance between data and model while minimizing:

$\lambda_k \int_\Omega \frac{\partial^2 f_k(z_k)}{\partial z_k^2} \text{ d}z_k$

## “just wiggly enough”

Fitting via REML, see Wood (2011).

# Response distributions

• “Classically”: quasi-Poisson (I’ve not seen data like this)
• Lately: Tweedie, negative binomial
• Exponential family given power parameter
• (mgcv can now estimate power parameters via tw() and nb())

# Model selection

• All possible subsets - expensive; stepwise - path dependence
• Term selection by shrinkage to zero effect (Marra & Wood, 2011)
• Approximate $$p$$-values (Marra and Wood, 2012)
Software:
dsm package

# The dsm package

• Design “inspired by” (“stolen from”) mgcv
• Easy to build simple models, possible to build complex ones
• Utility functions: variance estimation, plotting, prediction etc
install.packages(devtools)
devtools::install_github("DistanceDevelopment/dsm")

# dsm syntax

• Syntax example:

model <- dsm(count ~ s(x,k=10) + s(depth,k=6),
detection_function,
segment_data,
observation_data,
family=tw())
Case study:
Rhode Island seabirds

# RI seabirds

• Aerial survey
• 2 years of effort
• Transects off RI coast, nr Block Island, Long Island sound
• binned distances – 3 bins

# RI seabirds - Aims

• Wind development in RI/MA waters
• Map of usage
• Estimate uncertainty
• Combine maps (Zonation)

Photo by jackanapes on flickr (CC BY-NC-ND)

load(url("http://converged.yt/talks/BTO-dsm/loon-data.RData"))

# Segment data

head(seg)
##     Sample.Label SeasonYear       Date Surveyed. Wind.speed Wind.direction
## 1 101-10/12/2010 Winter1011 10/12/2010       Yes         <5              N
## 3 101-09/12/2011 Winter1112 09/12/2011       Yes      10-15             SW
## 4 101-28/01/2012 Winter1112 28/01/2012       Yes       5-10             SW
## 5 102-10/12/2010 Winter1011 10/12/2010       Yes         <5              N
## 7 102-28/01/2012 Winter1112 28/01/2012       Yes       5-10             SW
## 8 102-09/12/2011 Winter1112 09/12/2011       Yes      10-15             SW
##         Weather Visibility Waveheight Whitecaps Observer_Left
## 1 Partly Cloudy         >5         <1   Minimal            KW
## 3 Partly Cloudy         >5        2-3   Minimal            KW
## 4         Sunny         >5        1-2   Minimal            KW
## 5 Partly Cloudy         >5         <1   Minimal            KW
## 7         Sunny         >5        1-2   Minimal            KW
## 8 Partly Cloudy         >5        2-3   Minimal            KW
##   Observer_Right Onsurvey_Left Onsurvey_Right Glare_Left Glare_Right
## 1             JV           Yes            Yes   Moderate        None
## 3             EJ           Yes            Yes       None     Minimal
## 4             EJ           Yes            Yes       None        None
## 5             JV           Yes            Yes   Moderate        None
## 7             EJ           Yes            Yes       None        None
## 8             EJ           Yes            Yes       None     Minimal
##   Observerconditions_Left Observerconditions_Right Comments Season Year
## 1                    Good                Excellent          Winter 2010
## 3                    Good                     Good          Winter 2011
## 4               Excellent                Excellent          Winter 2012
## 5                    Good                Excellent          Winter 2010
## 7               Excellent                Excellent          Winter 2012
## 8                    Good                     Good          Winter 2011
##       gchl eff.sides        tx       ty        bx        by         x
## 1 3.568708         1 -44.23136 14.42102 -44.09166 12.153487 -44.16151
## 3 5.528956         1 -44.23136 14.42102 -44.09166 12.153487 -44.16151
## 4 5.528956         1 -44.23136 14.42102 -44.09166 12.153487 -44.16151
## 5 3.179644         1 -44.09166 12.15349 -43.95196  9.885946 -44.02181
## 7 4.356627         1 -44.09166 12.15349 -43.95196  9.885946 -44.02181
## 8 4.356627         1 -44.09166 12.15349 -43.95196  9.885946 -44.02181
##          y   Effort     depth    depthm    slope roughness phimedian
## 1 13.28725 2269.998  -66.9324 -20.83643 1.131463  1.352299  1.057448
## 3 13.28725 2269.998  -66.9324 -20.83643 1.131463  1.352299  1.057448
## 4 13.28725 2269.998  -66.9324 -20.83643 1.131463  1.352299  1.057448
## 5 11.01972 2269.998 -129.5995 -39.91604 0.651089  0.522652  1.872487
## 7 11.01972 2269.998 -129.5995 -39.91604 0.651089  0.522652  1.872487
## 8 11.01972 2269.998 -129.5995 -39.91604 0.651089  0.522652  1.872487
##   distancelandkm gchl_winter gchl_spring gchl_summer gchl_fall      fcpi
## 1       1.755095    4.441985    8.242430   15.327192  6.308177 0.5833333
## 3       1.755095    4.441985    8.242430   15.327192  6.308177 0.5833333
## 4       1.755095    4.441985    8.242430   15.327192  6.308177 0.5833333
## 5       3.875946    3.721898    6.793428    7.609529  5.299252 0.4666667
## 7       3.875946    3.721898    6.793428    7.609529  5.299252 0.4666667
## 8       3.875946    3.721898    6.793428    7.609529  5.299252 0.4666667
##   gchl_long Transect.Label
## 1  8.449691              1
## 3  8.449691              1
## 4  8.449691              1
## 5  6.066381              1
## 7  6.066381              1
## 8  6.066381              1

# Observation data

head(obs.loons)
##           Date     Time Species Group Location Bin Observer Length size
## 106 17/02/2011 18991230    COLO  Loon On Water   A       JV   7448    1
## 301 28/01/2012 18991230    COLO  Loon On Water   A    Ellen   7448    1
## 446 23/02/2011 18991230    COLO  Loon On Water   A       KW   7448    1
## 457 07/02/2011 18991230    COLO  Loon On Water   A       KW   7448    1
## 814 23/02/2011 18991230    COLO  Loon On Water   B       JV   7448    1
## 830 09/01/2012 18991230    COLO  Loon On Water   A       KW   7448    1
##     object          x         y distbegin distend distance Season Year
## 106    106 -17.525139 -33.33390        44     163      119 Winter 2011
## 301    301 -10.015215 -27.73880        44     163      119 Winter 2012
## 446    446  -3.793853 -25.40426        44     163      119 Winter 2011
## 457    457 -10.447177 -25.32420        44     163      119 Winter 2011
## 814    814 -14.415200 -21.22522       164     432      268 Winter 2011
## 830    830  27.030791 -21.05565        44     163      119 Winter 2012
##     SeasonYear Transect.Label    Sample.Label
## 106 Winter1011              8  819-17/02/2011
## 301 Winter1112             10 1022-28/01/2012
## 446 Winter1011             12 1222-23/02/2011
## 457 Winter1011             10 1021-07/02/2011
## 814 Winter1011              9  919-23/02/2011
## 830 Winter1112             21 2122-09/01/2012

# Fit a detection function

• Hazard-rate with group size as a covariate
• More on this in Winiarski et al (2013)
library(Distance)
## Loading required package: mrds
## This is mrds 2.1.13
## Built: R 3.2.0; ; 2015-06-24 16:41:23 UTC; unix
hr.df.size <- ds(obs.loons, formula=~size, key="hr",
truncation=list(right=1000, left=44))
## Columns "distbegin" and "distend" in data: performing a binned analysis...
## Cannot perform AIC adjustment term selection when covariates are used.
## Fitting hazard-rate key function
## AIC= 1580.124
## No survey area information supplied, only estimating detection function.

# Fitted detection function

plot(hr.df.size)

# Spatial model

library(dsm)
## Loading required package: mgcv
## This is mgcv 1.8-6. For overview type 'help("mgcv-package")'.
## This is dsm 2.2.8
## Built: R 3.2.0; ; 2015-06-25 13:24:11 UTC; unix
loon.model.hr <- dsm(Nhat~s(gchl_long, k=10)+
s(depthm, k=10)+
s(y, k=10),
hr.df.size, seg, obs.loons,
family=nb(), availability=0.7,
method="REML")

# dsm summary

summary(loon.model.hr)
##
## Family: Negative Binomial(0.105)
##
## Formula:
## Nhat ~ s(gchl_long, k = 10) + s(depthm, k = 10) + s(y, k = 10) +
##     offset(off.set)
##
## Parametric coefficients:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept) -14.62165    0.07504  -194.9   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
##                edf Ref.df Chi.sq  p-value
## s(gchl_long) 4.347  5.357  68.39 6.51e-13 ***
## s(depthm)    4.111  5.160  28.08 4.41e-05 ***
## s(y)         4.125  5.147  42.56 6.20e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) =  0.077   Deviance explained = 29.7%
## -REML = 3183.2  Scale est. = 1         n = 2067

# Plotting smooth terms

plot(loon.model.hr, pages=1)

Model checking

# Residual checking

gam.check(loon.model.hr)

# Randomised quantile residuals

• Count data is nasty for goodness of fit
• Dunn & Smyth (1996)
• Back transform for exactly Normal residuals
• Fewer problems with artefacts
• dsm::rqgam.check
• (Thanks to Natalie Kelly at CSIRO for the tip)

# rqgam.check

rqgam.check(loon.model.hr)

Inference

# Prediction grid

head(pred)
##   OBJECTID_1 Join_Cou_1 TARGET_F_1 LONGITUDE LATITUDE roughness phimedian
## 0          1          1          0 -71.68000 40.92203  0.082416  2.223548
## 1          2          1          1 -71.65626 40.92207  0.050347  2.340658
## 2          3          1          2 -71.63251 40.92209  0.056074  2.440206
## 3          4          1          3 -71.60877 40.92212  0.042995  2.498072
## 4          5          1          4 -71.58502 40.92214  0.036595  2.540494
## 5          6          1          5 -71.56128 40.92215  0.038354  2.552859
##   latitude_2 longitude1         x         y distancelandkm cellaream
## 0     40.925   -71.6750 -28.56647 -27.57289       71.51344     4e+06
## 1     40.925   -71.6625 -26.57151 -27.56911       76.54190     4e+06
## 2     40.925   -71.6375 -24.57647 -27.56578       81.77255     4e+06
## 3     40.925   -71.6125 -22.58151 -27.56300       81.74481     4e+06
## 4     40.925   -71.5875 -20.58648 -27.56077       81.65078     4e+06
## 5     40.925   -71.5625 -18.59145 -27.55911       82.07670     4e+06
##   chl_fall chl_winter chl_spring chl_summer gchl_winter gchl_spring
## 0 2.449310   2.251554   2.457058   1.263831    2.098199    2.337756
## 1 2.456197   2.339274   2.354573   1.281547    2.182396    2.233763
## 2 2.149591   2.934982   2.307914   1.075004    2.540287    2.192996
## 3 2.136389   2.824368   2.576211   0.887268    2.430343    2.412229
## 4 1.814823   2.608232   2.541746   0.872087    2.357794    2.397891
## 5 2.518508   2.439878   2.220565   0.851456    2.329046    2.083287
##   gchl_summer gchl_fall    depthm     depth    slope       fcpi gchl_long
## 0   1.1652222  2.275435 -50.41504 -165.4037 0.127724 0.03333333  1.647738
## 1   1.1777266  2.264136 -50.42341 -165.4311 0.033266 0.03333333  1.607348
## 2   1.0211105  2.099305 -51.29684 -168.2967 0.058634 0.01666667  1.542477
## 3   0.8770048  2.075592 -52.78529 -173.1801 0.020804 0.04166667  1.569906
## 4   0.8577625  1.777506 -53.22737 -174.6305 0.067070 0.02500000  1.548057
## 5   0.8364021  2.257372 -54.21326 -177.8650 0.061565 0.02500000  1.458858
##   width height
## 0     2      2
## 1     2      2
## 2     2      2
## 3     2      2
## 4     2      2
## 5     2      2

# Estimating abundance

• Need to specify
• area of prediction cells
• all covariates used in analysis
sum(predict(loon.model.hr, newdata=pred, off.set=pred$cellaream)) ## [1] 5844.445 # Mapping abundance # Estimating uncertainty summary(dsm.var.gam(loon.model.hr, pred, off.set=pred$cellaream))
## Summary of uncertainty in a density surface model calculated
##  analytically for GAM, with delta method
##
## Approximate asymptotic confidence interval:
##       5%     Mean      95%
## 4419.042 5844.445 7729.626
## (Using delta method)
##
## Point estimate                 : 5844.445
## Standard error                 : 809.3481
## CV of detection function       : 0.03710214
## CV from GAM                    : 0.1384816
## Total coefficient of variation : 0.1434

# Mapping uncertainty

plot(dsm.var.gam(loon.model.hr, split(pred, 1:nrow(pred)),
off.set=pred\$cellaream),
observations=FALSE,
gg.grad=scale_fill_gradient())

Conclusions

# Conclusion

• Detectability
• Flexible spatial models
• GLMs + random effects + smooths + other extras
• autocorrelation can be modelled
• accounting for uncertainty
• Large, heterogeneous areas
• Spatial component is v. helpful for managers
• Two-stage models can be useful!

# Acknowledgements

• St Andrews: Eric Rexstad, Len Thomas, Laura Marshall
• CSIRO: Mark Bravington, Natalie Kelly
• Rhode Island: Kris Winiarski, Peter Paton, Scott McWilliams

Funding from Rhode Island Department of Natural Resources.

# Thanks!

Slides available at
http://converged.yt/talks/BTO-dsm/talk.html

# References

• Dunn, PK, and GK Smyth. Randomized Quantile Residuals. Journal of Computational and Graphical Statistics 5, no. 3 (1996): 236–244.
• Marra, G, & Wood, SN (2011). Practical variable selection for generalized additive models. Computational Statistics and Data Analysis, 55(7), 2372–2387.
• Marra, G and SN Wood (2012). Coverage properties of confidence intervals for generalized additive model components. Scandinavian Journal of Statistics 39(1), 53–74.
• Miller, DL, ML Burt, EA Rexstad and L Thomas. Spatial Models for Distance Sampling Data: Recent Developments and Future Directions. Methods in Ecology and Evolution 4, no. 11 (2013): 1001–1010.
• Williams, R, SL Hedley, TA Branch, MV Bravington, AN Zerbini, & KP Findlay (2011). Chilean Blue Whales as a Case Study to Illustrate Methods to Estimate Abundance and Evaluate Conservation Status of Rare Species. Conservation Biology, 25(3), 526–535.
• Winiarski, KJ, ML Burt, Eric Rexstad, DL Miller, CL Trocki, PWC Paton, and SR McWilliams. Integrating Aerial and Ship Surveys of Marine Birds Into a Combined Density Surface Model: a Case Study of Wintering Common Loons. The Condor 116, no. 2 (2014): 149–161.
• Winiarski, KJ, DL Miller, PWC Paton, and SR McWilliams. A Spatial Conservation Prioritization Approach for Protecting Marine Birds Given Proposed Offshore Wind Energy Development. Biological Conservation 169 (2014): 79–88.
• Wood, SN, MV Bravington, & SL Hedley (2008). Soap film smoothing. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5), 931–955.
Appendices

# Appendix - Smoothing in less awkward regions

• “Remove” troublesome parts of the thin plate spline
• Do this carefully (Fourier transform)
• Nullspace (plane) terms replaced w. low freq

Miller and Kelly (in prep)

# Appendix - Concurvity

$\text{Altitude} = f(x,y) + \epsilon \quad \text{or} \quad \text{Chlorophyll A} = f(\text{SST}) + \epsilon$

• Not just correlation!
• mgcv::concurvity() computes measures for fitted models

# Autocorrelation

• Can use GEE/GAMM structure for autocorrelation along transects
• $$\text{AR}(p)$$ process (“obvious” structure)
• In general this is unstable
• Random effects are sparse
• Splines are “dense”
• $$\Rightarrow$$ bad for optimisation