In the last chapter we looked at how we could model the probability of an animal being detected using the distances to the animals from the transect. A little thought leads us to believe that in addition to distance, other factors will influence the detection probability. For example: a pod of 50 whales will be much easier to spot than a single whale; it’s harder to see birds early in the morning when it’s dark and misty; and it’s much more difficult to see (or do) anything when the sea is extremely choppy. We’d like to take these factors into account when we’re building our detection function, to ensure that we can model the detection process as accurately as possible.

Distance sampling methods have a property known as pooling robustness which means that if we assume that the heterogeneity in detectability is relatively small (i.e. there’s not much variation between animals in how detectable they are), then estimates of abundance will not be biased (though they might have high variance; Burnham et al. (2004), Section 11.12). This means that usually we do not need to worry about bias but if we can model the heterogeneity in detectability then we can improve the precision of our abundance estimates (Marques et al., 2007).

Let’s return to the pantropical spotted dolphin example from the previous chapter. During the survey data was collected on Beaufort sea state – a measure of the roughness of the sea (1 is “light air” with small waves, 5 “”fresh breeze" with moderate waves1). Observers also recorded the size of each group of dolphins that were detected. We can plot these against the observed distances…
Figure 1: Exploratory data analysis for the effects on recorded distance of Beaufort sea state (left) and observed group size (right). Dashed lines indicate the fit of a linear model through the data (giving a slightly decreasing distance with increasing sea state and increasing distance with increasing group size. Solid lines show a LOESS smooth through the data. Note that the data shown have had the observations with the two largest observed group sizes removed.

The plot indicates that there is some effect of both of these covariates on the distances that we can observe dolphins at. As one might expect, the rougher the sea (i.e. higher Beaufort), the shorter distance one observes animals at (though there is an odd spike at sea state 2; left panel). For the group size, we see an increase in observed distance as the size increases (solid line is a linear fit through the data, the dashed line is a LOESS smooth (Cleveland, Grosse and Shyu, 1991) which highlights the steep increase at smaller group sizes (right panel). It’s worth bearing in mind that sea state and group size both affect the observed distances in different ways and the above plots are only taking two dimensional slices through their effects (e.g. small groups in choppy seas are probably not very visible but larger groups may be) – so the above plots shouldn’t be over-interpreted.

Adding covariates into our models

The above plots plus some simple physical reasoning lead us to believe that the covariates have an effect on the detection probability – how do we include the covariates in our models though?

In the half-normal detection function we looked at in the last chapter, we saw that the scale parameter (σ) was the only parameter of the model. Decreasing σ made the detection function go to zero more quickly. We might expect that the covariates had a similar effect: animals are visible at larger distances when their groups are bigger: changing the rate of the drop-off in the detection function. If we are to embed the covariate effects in the scale parameter, we could re-write the scale as:
$$ \sigma_i = \exp\left( \beta_0 + \sum_{k=1}^K \beta_k z_{ik} \right) $$
now we’re not estimating the scale parameters, but rather the βk parameters. β0 represents a kind of “intercept” (in the sense that it’s the value that the scale parameter will take when all covariates are zero; βk for k = 1, …, K give the parameters to be estimates for the K covariates, which for a given observation, i are zik. Usually K is relatively small (we usually only include a couple of covariates in our model).

Now we’ve seen how covariates can be justified and included from a mathematical perspective, we can see how to actually fit such a model in Distance2. Let’s start by fitting a model with group size as a covariate2 :

hn.dolphins.size <- ds(mexdolphins, truncation=8000, model=df(scale=~size))
## Summary of fitted detection function
## Transect type          : line 
## Number of observations : 47 
## Distance range         : 0 - 8000 
## AIC                    : 836.0817 
## Detection function     : Half-normal 
## Detection function parameters
##               Estimate        SE
## (Intercept) 6.91408348 0.5091799
## size        0.03947174 0.0193140
## Kolmogorov-Smirnov p-value : 0.8957201 
## Cramer-von Mises p-value   : 0.8579195 
##            Estimate        SE        CV
## Average p 0.6601482 0.1228513 0.1860966

We can see that the summary now includes an additional row of parameters, giving the effect of the parameter for group size. One way to investigate the effect of adding the size covariate graphically by plotting the fitted model:

Figure 2: Fitted detection function for the pantropical spotted dolphin data where the size was included as a covariate. Left plot shows the detection function when size is fixed at its median value. Right plot shows the detection function evaluated with the detection function set to the 25%, 50% and 75% quantiles of group size.

The right panel of the plot shows that larger groups are easier to detect than smaller ones.

Methods for including covariates in distance sampling detection functions have a long history (Beavers and Ramsey (1998) is an early contribution to this area). The beginning of a rigorous approach to such models is Marques and Buckland (2003), though Marques et al. (2007) is somewhat more accessible.

For the half-normal model we obtained an average detectability of 0.7231, where as now the average probability of detection is 0.6601. Though this is the average (across observed covariates)3 detectability, for a given group size we have a different value of the detectability, mathematically we can now view the probability of detection and the detection function as functions of the covariates4. So now, for each observed value of a covariate we have a probability of detection. Mathematically:
$$ \hat{p_i} = \frac{1}{w}\int_0^w g(x; \boldsymbol{\beta}, \mathbf{z}_i) \text{d}x, $$
we can look at the values of $\hat{p_i}$ for our dolphin model by calling the predict method:

##  [1] 0.3609790 0.9999242 0.9994546 0.9724441 0.8290257 0.7652063 0.6872192
##  [8] 0.9999997 0.9960868 0.5988589 0.9999242 1.0000000 0.9999985 0.9999985
## [15] 1.0000000 0.9999242 0.9999242 0.3470822 0.3208042 0.2850027 0.3470822
## [22] 0.7652063 0.6872192 0.7652063 0.3470822 0.3084022 0.4216515 0.9999895
## [29] 0.5074922 0.9960868 0.9987996 0.4216515 0.6872192 0.9148433 0.8052838
## [36] 0.9987996 0.9411026 0.3470822 0.6872192 0.9999895 0.8290257 0.9991908
## [43] 0.9999985 0.9960868 0.9873138 0.9873138 0.9724441

where probabilities are in order of the observations. It’s easier to look at the quantiles:

##        0%       25%       50%       75%      100% 
## 0.2850027 0.6430390 0.9411026 0.9996894 1.0000000

in both preceding results we see there is a wide range of probabilities of detection. These differences will become important later in Abdundance estimation.

We can look again at the question asked at the start of this chapter and ask how does group size affect probability of detection?

plot(mexdolphins$size[!$size)], predict(hn.dolphins.size), xlab="Group size", ylab="Probability of detection")
Figure 3: Probability of detection plotted against observed group size, we can see that with increasing group size, we have increasing probability of detection.

Adding and selecting covariates

We can add the Beaufort sea state covariate into the model, alongside group size by simply adding it into the formula. Before doing so we ensure that sea state is a factor covariate:

mexdolphins$beaufort <- as.factor(mexdolphins$beaufort)
hn.dolphins.size.beaufort <- ds(mexdolphins, truncation=8000, model=df(scale=~size+beaufort))
## Warning in ds(mexdolphins, truncation = 8000, model = df(scale = ~size + :
## Unable to invert Hessian: using pseduo-inverse
## Summary of fitted detection function
## Transect type          : line 
## Number of observations : 47 
## Distance range         : 0 - 8000 
## AIC                    : 838.1266 
## Detection function     : Half-normal 
## Detection function parameters
##                Estimate           SE
## (Intercept)  6.59546436 1.330671e+00
## size         0.04059144 3.299831e-02
## beaufort2    8.29262592 9.043820e+06
## beaufort3    0.51333594 8.863909e-01
## beaufort4    0.42457606 1.364223e+00
## beaufort5   -0.85601876 8.281355e-01
## Kolmogorov-Smirnov p-value : 0.8545092 
## Cramer-von Mises p-value   : 0.908566 
##            Estimate        SE        CV
## Average p 0.5717546 0.1711626 0.2993637

As Beaufort sea state is included as a factor covariate, which gives us a number of extra parameters in the model. We can see the effect of adding the covariate in a plot:

Figure 4: Fitted detection function for the pantropical spotted dolphin data where the size and Beaufort sea state were included as covariates. Left plot shows the detection function when size and Beaufort sea state are fixed at their median values. The middle plot shows the detection function with size held at its median value and Beaufort sea state varying, where as the right plot shows the effect varying group size over its quantile values and fixing Beaufort sea state at its median.

The middle plot above shows that there is little difference in the lower sea states, but a considerable difference once we get to sea state 5, when we see a much faster decrease in probability of detection.

Just as when deciding between the half-normal and hazard-rate detection functions in the previous chapter, we can use the AIC to evaluate relative model fit and goodness of fit testing/Q-Q plots to evaluate absolute model fit.

We can see some improvement in the Q-Q plot results between the model with and without the group size covariate in the following plot:
Figure 5: Comparison of quantile-quantile plots for two models fitted to the spotted dolphin data. Left plot shows the model without any covariates, the right with observed group size included as a covariate. Note that the points in the right plot are much closer to the line y = x.

We can compare the models that we’ve fitted so far in a table, but we first fit a model with only Beaufort sea state as a covariate, so complete the set of model combinations:

hn.dolphins.beaufort <- ds(mexdolphins, truncation=8000, model=df(scale=~beaufort))
## Warning in ds(mexdolphins, truncation = 8000, model = df(scale =
## ~beaufort)): Unable to invert Hessian: using pseduo-inverse
## Summary of fitted detection function
## Transect type          : line 
## Number of observations : 47 
## Distance range         : 0 - 8000 
## AIC                    : 846.0192 
## Detection function     : Half-normal 
## Detection function parameters
##               Estimate           SE
## (Intercept)  8.1816931 2.576122e-01
## beaufort2    8.7871511 1.225235e+07
## beaufort3    0.3839194 4.283888e-01
## beaufort4   -0.0619388 4.621588e-01
## beaufort5    0.3542033 5.984973e-01
## Kolmogorov-Smirnov p-value : 0.4006785 
## Cramer-von Mises p-value   : 0.5706475 
##            Estimate        SE        CV
## Average p 0.6693629 0.1268596 0.1895229

We can now build a table5 :

Model Formula # pars Pa CV(Pa) AIC
2 Half-normal ~size 2 0.6601482 0.1860966 836.0817
4 Half-normal ~size + beaufort 6 0.5717546 0.2993637 838.1266
1 Half-normal ~1 1 0.7230935 0.1895098 842.3143
3 Half-normal ~beaufort 5 0.6693629 0.1895229 846.0192

Using the table, we can see that the best model by AIC is the one with only size included as a covariate. Also note that the AIC-best model also has the lowest coefficient of variation (CV(Pa); in general we expect covariate models to be more precise Marques et al. (2007)).


In this chapter we’ve seen how adding covariates to the detection function can account for heterogeneity in the detection probability. Covariate models make intuitive sense too, since we do not expect all individuals (or groups) in the population to have the same detectability at a given distance. This intuitive interpretation allows for direct interpretation of covariates, which may be biologically interesting. The ability to include covariates in an analysis will be particularly useful in Adundance estimation, where we will investigate stratification of abundance estimates by covariate.


Beavers, S. and Ramsey, F. (1998) Detectability analysis in transect surveys. Journal of Wildlife Management, 948–957.

Burnham, K.P., Buckland, S.T., Laake, J.L., Borchers, D.L., Marques, T.A., Bishop, J.R., et al. (2004) Further topics in distance sampling. Advanced distance sampling (eds S.T. Buckland), D.R. Anderson), K.P. Burnham), J.L. Laake), D.L. Borchers), & L. Thomas), Oxford University Press.

Cleveland, W.S., Grosse, E. and Shyu, M.W. (1991) Local regression models. Statistical models in s (eds J.M. Chambers), & T.J. Hastie), pp. 309–376. Pacific Grove: Wadsworth; Brooks.

Marques, F. and Buckland, S.T. (2003) Incorporating covariates into standard line transect analyses. Biometrics, 59, 924–935.

Marques, T.A., Thomas, L., Fancy, S. and Buckland, S.T. (2007) Improving estimates of bird density using multiple-covariate distance sampling. The Auk, 124, 1229–1243.

  1. The scale actually goes up to 12 (“hurricane”) and beyond, see and for more information. One may often find that Beaufort sea state is recorded as decimals (state of 4.5 for example), it’s tricky to work out what to do with these and is worth consulting those who collected the data.

  2. Note that a covariate called size has a special meaning in Distance2 as it is used in the estimation of abundance via the Horvitz-Thompson estimator (see Adundance estimation).

  3. Note that when we refer to Pa as the “average detection probability”. The average is calculated over the observations. We used the same term for Pa in the previous chapter when talking about non-covariate models to be consistent with what is reported in Distance2. See Abundance estimation for more information on how this is calculated.

  4. Strictly speaking we can write the probability of detection as a function of the covariates: $\hat{p_i}(z_i)$, though the shorter pi form is used to save space. Hopefully the subscript will be a sufficient reminder.

  5. More details on how this was done in Improving the fit of detection functions.