In the last chapter we saw how to estimate abundance, but more important than these abundance estimates alone (referred to as *point estimates*), are their variances. The *N̂*s we estimated are realisations of a stochastic process, that’s why we are leveraging statistical machinery against the problem.

Understanding where uncertainty comes from allows one to improve a survey design and check that models are reasonable. Presenting uncertainty in an understandable way to those who want to use your results also allows them to understand the potential shortcomings of the results.

In this chapter we will talk about how to estimate uncertainty in estimates of abundance, and in particular *how* these expressions are derived, so you can get an idea of what the uncertainty represents. Although this chapter is a little more statistically heavy, hopefully in understanding the material presented here you will be able to better understand what the causes of large uncertainties might be in your model – that is to say: “stick with it, it’s worth it”.

We can identify the sources of uncertainty in our abundance estimates by thinking about the Horvitz-Thompson-like estimator for the abundance we saw in Estimating abundance:

$$
\hat{N} = \frac{A_\mathcal{R}}{A_\text{covered}}\sum_{i=1}^n \frac{s_i}{\hat{p_i}}.
$$

Thinking about which components of the estimator we are uncertain about, we can see that we have at least:

*Number of observations*: If we repeated our survey, we would expect that the number of animals we observed (*n*) to change each time. This is because the detection process is random and also because we don’t expect the animals to be stationary in the intervening period between surveys. It’s convenient (when estimating the uncertainty) for us to think about the*encounter rate*rather than just*n*. For line transects the encounter rate is defined as*n*/*L*(where*L*is the total length of the lines surveyed). For point transects the encounter rate is defined as*n*/*K*(where*K*is the total number of points visited).*Model parameters*: since we estimate the parameters of the detection function, we have some certainty about the values of those parameters. For example, as we have seen so far in the`summary`

results, the scale parameter(s) of the half-normal detection function are accompanied by standard errors. We need to incorporate that uncertainty into our estimates of probability/probabilities of detection (*p*_{i}) and hence abundance (*propagating*the uncertainty).

Plus, perhaps:

*Group size*: We account for animals in groups via*s*_{i}above, however, it is often difficult to count how many animals are in the group (particularly in a marine situation where cetaceans or birds may be diving and surfacing). We may therefore attach uncertainty to our measurements of group size (*s*_{i}).*Line length*(for line transect models): If the study design was randomly placed (either completely or as a zig-zag or grid offset^{1}), the line lengths may be random, as various physical features may get in the way of our line placement.

As mentioned above, it’s easier for us to think about the uncertainty in *n*/*L*, rather than *n* alone. The estimators themselves look much like the estimators of variances you’ll have seen in a regular statistics class. When we have a line transect analysis, we need to take into account the line length, so we weight by this.

For line transects we can estimate the variance of the encounter rate as^{2}

$$
\widehat{\text{Var}}\left(\frac{n}{L}\right) = \frac{K}{L^2 (K-1)} \sum_{k=1}^{K} l_k^2 \left( \frac{n_k}{l_k} - \frac{n}{L}\right)^2
$$

where *l*_{k} is the length of line *k* (which there are *K* of) and *L* is the total line length (i.e. ∑_{k}*l*_{k} = *L*). The number of observations per line is denoted *n*_{k}, so *n* = ∑_{k}*n*_{k}.

Whereas for point transects we have:

$$
\widehat{\text{Var}}\left(\frac{n}{Kt}\right) = \frac{1}{t^2K(K-1)} \sum_{k=1}^{K} \left( n_k - \frac{n}{K}\right)^2
$$

if there were *K* points, each of which was visited *t* times^{3}, so the total number of survey points is *K**t*. These estimators have quite similar forms, though for the point case, we don’t have a “length” to deal with, we instead think about numbers of visits and points.

Note that there are several variations on the estimator of the variance of encounter rate. The variations can be useful when particular survey setups are used. These are covered and compared comprehensively (though perhaps rather mathematically) in Fewster et al. (2009). We’ll stick to using these estimators in this chapter, as they generally performs well in a wide variety of situations.

The uncertainty from the estimation of the detection function parameters comes from usual maximum likelihood theory. In general in order to find parameters, we maximise a likelihood (a product of the probability densities for our observerations); the parameter values with the highest likelihood will be those selected.

As a simple analogy, if we thought of a hazard-rate model, we could think of it’s two parameters (shape and scale parameters) as the axes of a coordinate system and the likelihood of the parameters as the height of a given point in that system. We’re interested in finding the location of the highest point (the maximum likelihood estimate; MLE) – this is what the optimizer will try to do^{4}.

Thinking about being at that highest point, as we look around ideally we’d like to be at the top of a very sharp spike. On the other hand, we may actually just be on a ridge (with a thin path of approximately equal height stretching out in front of us) or we might just be on a small, flattish hill. We can characterise how “pointy” the MLE is mathematically by looking at it’s second derivatives (called the *Hessian matrix*). If we’re on a ridge or a flat hill we are uncertain about our estimate, since there are other values that are nearby that give very similar likelihoods for different parameter values^{5}.

We can calculate the uncertainty of the model parameters by taking the Hessian matrix (which is a byproduct of the optimisation anyway) and inverting it, this will give us the variance of the parameters.

To convert our uncertainty in parameters to uncertainty in the probability of detection, we use the *sandwich estimator*. This takes our parameter variance and pre- and post-multiplies it by the derivatives of the Horvitz-Thompson estimator with respect to the parameters – moving the estimate of variance into detectability-space rather than parameter-space:

$$
\text{Var}(\hat{P}_a) = \left[ \frac{\partial \hat{P}_a}{\partial \boldsymbol{\theta}} {\Bigg|}_{\boldsymbol{\theta}=\hat{\boldsymbol{\theta}}}\right]^T
(-H)^{-1} \left[ \frac{\partial \hat{P}_a}{\partial \boldsymbol{\theta}} {\Bigg|}_{\boldsymbol{\theta}=\hat{\boldsymbol{\theta}}}\right]
$$

where $\frac{\partial \hat{P}_a}{\partial \boldsymbol{\theta}}$ is the derivative of the probability of detection with respect to the model parameters (the “bread” of the sandwich) and *H*^{ − 1} is the inverse of the Hessian matrix (the “filling” of the sandwich). The vertical lines with $\boldsymbol{\theta}=\hat{\boldsymbol{\theta}}$ at the bottom indicate that we evaluate the derivatives at their maximum likelihood values.

Note that in the above we use *P*_{a} even in the covariate case. When covariates are included in the detection function we can find *P*_{a} by working out what the value of *P*_{a} would be needed to get the estimated *N̂* by calculating *P*_{a} = *n*/*N̂*.

We have now found the uncertainty in both the parameter estimates and the encounter rate. We now assume that these are independent and use the fact that squared coefficients of variation sum^{6}

$$
\text{CV}(\hat{N})^2 = \text{CV}\left(\frac{n}{L}\right)^2 + \text{CV}\left(\frac{1}{\hat{P_a}}\right)^2
$$

for line transects or

$$
\text{CV}(\hat{N})^2 = \text{CV}\left(\frac{n}{K}\right)^2 + \text{CV}\left(\frac{1}{\hat{P_a}}\right)^2
$$

for point transects.

It is usually the case that the uncertainty stemming from the parameters of the detection function is dwarfed by the uncertainty from the encounter rate. This can be a useful check that calculations are correct.

These calculations are made by `Distance2`

when `abundance_estimate`

is called to calculate abundance. As one might expect, `ER_Variance`

gives the variance of $\frac{n}{K}$ and `Model_Variance`

gives the variance of $\hat{P_a}$.

As mentioned in the previous chapter, stratification will reduce variance. This is because we are breaking up the CV terms in the above equations into sums of groups of observations that are more similar (e.g. they are from the same habititat type). This kind of post-stratification can be very useful.

In this chapter we’ve looked at the potential sources of uncertainty in our estimates of abundance. We’ve looked at one possible way to estimate uncerttainty coming from the encounter rate (though there are many options depending on the survey setup) and we’ve seen how uncertainty in detection function parameters impacts our overall uncertainty via the detectability.

- Barry and Welsh (2001) initially prompted the writing of the next paper, questioning the then “standard” distance sampling variance estimator.
- Fewster et al. (2009) gives a more technical overview of the various variance estimators for distance sampling and a large simulation study evaluating them.
- Innes et al. (2002) provides an interesting account of how one addresses abundance estimation for a complex survey situation.
- Strindberg and Buckland (2004) gives information on zig-zag designs for line transect surveys.
- Buckland et al. (2007) gives information on line transect surveying for plants and special considerations necessary.

Barry, S. and Welsh, A. (2001) Distance sampling methodology. *Journal of the Royal Statistical Society. Series B, Statistical Methodology*, **63**, 23–31.

Buckland, S.T., Borchers, D.L., Johnston, A., Henrys, P.A. and Marques, T.A. (2007) Line Transect Methods for Plant Surveys. *Biometrics*, **63**, 989–998.

Fewster, R.M., Buckland, S.T., Burnham, K.P., Borchers, D.L., Jupp, P.E., Laake, J.L., et al. (2009) Estimating the Encounter Rate Variance in Distance Sampling. *Biometrics*, **65**, 225–236.

Goodman, L.A. (1960) On the Exact Variance of Products. *Journal of the American Statistical Association*, **55**, 708.

Innes, S., Heide-Jørgensen, M.P., Laake, J.L., Laidre, K.L., Cleator, H.J., Richard, P., et al. (2002) Surveys of belugas and narwhals in the Canadian High Arctic in 1996. *NAMMCO Scientific Publications*, **4**, 169–190.

Strindberg, S. and Buckland, S.T. (2004) Zigzag Survey Designs in Line Transect Sampling., **9**, 443–461.

Survey design is out of the scope for us, though there are many good references on the topic. See Further reading.↩

This is the estimator “R2” from Fewster et al. (2009).↩

This is estimator “P1” from Fewster et al. (2009). Note that there are ifferent estimators for different survey setups (e.g. when each point is surveyed a different number of times), see Web Appendix B of Fewster et al. (2009). The authors note that results vary based on the statification scheme used.↩

It’s easy to think theoretically and think we can always find the maximum likelihood estimate (and therefore the “best” parameter estimates) but optimisation is hard so the best we can hope for is a “locally” optimum result (the best result for where we looked).↩

We actually use the

*inverse*of the Hessian, since the derivative of a very pointy peak would be large and this is actually when we are very certain about the paramers, therefore we invert.↩Note that squared coefficients of variation sum

*approximately*, see Goodman (1960).↩