So far we’ve looked at how to model the probability of detection by modelling the detection function. Though this is interesting in itself, we usually only use the detection function as a stepping stone to get to an abundance estimate for the population that we’re interested in. This chapter explores how to estimate abundance once a detection function has been fitted, revisiting some of the examples we’ve seen so far and introducing some new ones.

Our basic technique in this chapter is to estimate the abundance for the area we covered in the survey, then scale this up to the larger study area. This scaling up involves assuming the area we scale up to is the same as the area that we surveyed and that the whole area is homogeneous. Sometimes that’s not reasonable, so we’ll go on to talk about estimating abundance in smaller areas (*strata*), for which we believe our assumption of homogeneity is reasonable.

One of the outputs of the `summary`

s that we saw in the previous sections is the average detectability^{1}. In What else affects detectability? we saw that different average detectabilities were calculated for each unique covariate combination (in models where covariates were included). We can use the average detectability to estimate abundance.

We’ll begin by estimating the number of animals in the covered area (that is, the area we actually surveyed – the area we covered by transects out to the truncation distance). To compute this abundance estimate (which we’ll denote *N̂*_{covered}) we use a *Horvitz-Thompson-like estimator* (Thompson, 2002), which one can think of as “correcting” the size of the observations based on the probability of detection (more precisely we re-weight by the inverse of the average probability of detection). The Horvitz-Thompson estimator for the abundance in the covered area is defined as^{2}

$$
\hat{N}_\text{covered} = \sum_{i=1}^n \frac{s_i}{\hat{P_a}},
$$

where *s*_{i} is the size of the group of the *i*^{th} observation. So for each group (or individual, if *s*_{i} = 1 for every observation) we saw, inflate them by *P*_{a}, then sum up over the groups.

Returning to the pantropical dolphins, we can do this very simply in R (recalling that the `predict`

function will give us the average probabilities of detection for the observations):

```
data(mexdolphins)
mex.hn <- ds(mexdolphins, truncation=8000, model=df(scale=~size))
# we use is.na() here to remove the sampling locations where no observations
# occurred. See "Getting your data into shape".
sum(mexdolphins$size[!is.na(mexdolphins$size)]/predict(mex.hn))
```

`## [1] 5235.904`

In this case we fitted a simple half-normal model without covariates but the same code would work for a model where a covariate is included, as `predict`

will give the correct average probabilities of detection. In this case we change our notation slightly mathematically, denoting the average probability of detection per observation as *p*_{i}:

$$
\hat{N}_\text{covered} = \sum_{i=1}^n \frac{s_i}{\hat{p_i}},
$$

Note that above we estimated the number of *individuals* in the covered area. It may be interesting to calculate the number of *groups* that we saw, for example if animals occur in family/social units. In which case we can replace `mexdolphins$size`

with `1`

and obtain a *group abundance*:

`sum(1/predict(mex.hn))`

`## [1] 71.19614`

Now, neither of these numbers are particularly useful on their own, we’d rather know about the wider study area.

The beauty of the Horvitz-Thompson estimator for the covered area is we require only probabilities in order to estimate abundance (we’ll denote the abundance generally as *N̂*). To extrapolate beyond the covered area, we now need to include the area covered by transects (denoted *A*_{covered}) and the area of the region we want to scale up to (which we’ll denote *A*_{ℛ}). More precisely, we use the following estimator:

$$
\hat{N} = \frac{A_\mathcal{R}}{A_\text{covered}}\sum_{i=1}^n \frac{s_i}{\hat{p_i}}.
$$

Now we could modify our R code, above and calculate the covered area and total area and simply pre-multiply by the ratio of these quantities but this will become unwieldy when it comes to calculations for multiple strata (see below) and more importantly, when it comes to estimating our uncertainty in our estimates of abundance (see How certain are we in our estimates?). We can instead use the function `abundance_estimate`

to calculate the abundance for us, provided we have our data in a particular format (“`flatfile`

”, see Getting your data into shape).

Let’s try this out using the `mexdolphins`

data:

`mex.abund <- abundance_estimate(mex.hn, ~1)`

We’ve supplied two arguments to the function: a fitted detection function (which will contain the relevant data) and a formula that defines the stratification (see below for more on this). Here we specify the formula as simply `~1`

indicating that there is no stratification and `abundance_estimate`

will use the `Area`

column in `mexdolphins`

to calculate *A*_{ℛ}.

We can see abundance estimates and other summary statistics by just printing the object that `abundance_estimate`

returns (or by not supplying an object to store the result, the same information will be printed).

`mex.abund`

```
## Summary statistics:
## Area Effort CoveredArea n_observations n_samples ER
## Total 1.6793e+12 5598900 89582400000 47 45 8.394506e-06
##
## Abundance estimates:
## Estimate Variance CV Model_Variance ER_Variance
## Total 71529.01 610602157 0.3454593 488525.1 610113632
```

The first table lists summary statistics, in order:

`Area`

: total study area,*A*_{ℛ},`Effort`

: total survey effort (line length or number of points),`CoveredArea`

: area covered by the survey,*A*_{covered},`n_observations`

: number of observations,`n_samples`

: number of transects visited,`ER`

: encounter rate, number of observations, divided by total line length (more on this in the next chapter, How certain are we in our estimates?).

Note that the numbers for `Area`

, `Effort`

and `CoveredArea`

above are very large as they are in measured metres (squared for `Area`

and `CoveredArea`

).

The second table lists the abundance estimate (`Estimate`

) then various metrics relating to our uncertainty in that estimate. The latter will be covered in detail in the next chapter, How certain are we in our estimates?.

The above abundance estimate assumes that the whole study area is homogeneous and that the pantropical spotted dolphins are uniformly distributed within it. Unfortunately this usually is not true. We often survey terrestrial areas where part of the study area is a bog, part is open fields part is forested; or a marine environment where depth, chlorophyll or salinity changes in a way that affects the study species. In later chapters we’ll find that in the case of the Gulf of Mexico, dolphins are influenced by the water depth (see also Miller et al. (2013), Appendix A).

To illustrate the use of strata we now look at some simulated data based on a survey of Antarctic minke whales (*Balaenoptera bonaerensis*). The data is simulated from models fitted to data from the International Whaling Commission’s International Decade of Cetacean Research Southern Ocean Whale and Ecosystem Research (IWC IDCR-SOWER) programme 1992-1993 austral summer surveys (Branch and Butterworth, 2001). An example of the simulated data^{3} is shown below. The analysis here is based on examples originally written by Eric Rexstad^{4}.

Note that there is not only more effort (more transects) near the shore (furthest south wiggly line) but also more observations. It is thought that minke whales are associated to the sea ice, where they feed on krill (Williams et al., 2014).

We begin, as usual by fitting a detection function to the data, truncating at 1.5km:

```
# load the minke data
data(minke)
minke$Region.Label.Area <- minke$Area
minke_hn <- ds(minke, truncation=1.5)
summary(minke_hn)
```

```
## Summary of fitted detection function
## Transect type : line
## Number of observations : 88
## Distance range : 0 - 1.5
## AIC : 3950.751
##
## Detection function : Half-normal
##
## Detection function parameters
## Estimate SE
## (Intercept) -0.3411766 0.01140948
##
## Kolmogorov-Smirnov p-value : 0.6694128
## Cramer-von Mises p-value : 0.9194977
##
## Estimate SE CV
## Average p 0.5733038 0.005852128 0.01020773
```

The area above the stepped line in the above map is the North region and below is the South. We can estimate the abundance for each of these areas individually using the Horvitz-Thompson like estimator for each area. For example, for the North area:

$$
\hat{N}_\text{North} = \frac{A_\text{North}}{A_\text{North, covered}}\sum_{i \in \text{North}} \frac{s_i}{\hat{p_i}}.
$$

where *A*_{North} is the total area of the North study area, *A*_{North, covered} is the covered area within the North study area, and *i* ∈ North denotes the observations in the North area.

The column `Region.Label`

separates the Northern and Southern regions of the study area, we can find their areas with the following code:

`unique(minke[,c("Region.Label", "Region.Label.Area")])`

```
## Region.Label Region.Label.Area
## 1 South 84734
## 46 North 630582
```

As before, we supply the fitted detection function object and a formula. This time the formula is `~Region.Label`

:

`abundance_estimate(minke_hn, ~Region.Label)`

```
## Summary statistics:
## Area Effort CoveredArea n_observations n_samples ER
## North 630582 1358.38 4075.14 49 12 0.03607238
## South 84734 484.41 1453.23 39 13 0.08051031
## Total 715316 1842.79 5528.37 88 25 0.04775368
##
## Abundance estimates:
## Estimate Variance CV Model_Variance ER_Variance
## North 13225.44 23363573.4 0.3654764 15000.31 23348573.1
## South 3966.46 796479.5 0.2250009 1349.23 795130.3
## Total 17191.90 24169050.5 0.2859603 25347.07 24143703.4
```

The output is similar to that of the pantropical spotted dolphins, though now we have an extra two lines, giving the statistics for each strata (note the the areas are as we expected).

As a comparison of the resulting abundance estimates, we can set the total area and look at what the abundance would be estimated as if we considered a non-stratified analysis:

```
minke_unstratified <- minke
minke_unstratified$Area <- sum(unique(minke_unstratified$Region.Label.Area))
minke_hn_unstrat <- ds(minke_unstratified, truncation=1.5)
abundance_estimate(minke_hn_unstrat, ~1)
```

```
## Summary statistics:
## Area Effort CoveredArea n_observations n_samples ER
## Total 715316 1842.79 5528.37 88 25 0.04775368
##
## Abundance estimates:
## Estimate Variance CV Model_Variance ER_Variance
## Total 19860.89 22106447 0.2367341 33828.06 22072619
```

As we can see, ignoring the stratification leads to a much larger estimate of abundance, since we are extrapolating the average abundance of the two areas over the two strata, which have rather different numbers of observations (perhaps due to the ice association mentioned above).

We can also include the region as a covariate in our detection function and again estimate the abundance:

```
minke_hn_region <- ds(minke, truncation=1.5, model=df(scale=~Region.Label))
abundance_estimate(minke_hn_region, ~Region.Label)
```

```
## Summary statistics:
## Area Effort CoveredArea n_observations n_samples ER
## North 630582 1358.38 4075.14 49 12 0.03607238
## South 84734 484.41 1453.23 39 13 0.08051031
## Total 715316 1842.79 5528.37 88 25 0.04775368
##
## Abundance estimates:
## Estimate Variance CV Model_Variance ER_Variance
## North 0 0 NaN 0 0
## South 0 0 NaN 0 0
## Total 0 0 NaN 0 0
```

Calculations for the abundance are the same for point transects as for lines, aside from the calculation of the covered area which of course is given by the area of the circular samplers not lines.

`Average p`

)An additional interesting statistic we might want to calculate is the overall average probability of detection (with the average being taken over all the observations). This can be done by noting that if the sum were simply over the numerator of the fraction, we’d have the number of individuals observed. We can then think of the estimate of *N̂* as this number divided by some “average” detection probability, we’ll refer to this as $\hat{P_a}$ and this is the quantity given in the `Average p`

given in the `summary`

output above (calculated as $\hat{P_a} = n/\hat{N}_\text{covered}$). For detection functions without covariates this will just equal the average detectability.

This chapter has looked at how we calculate abundance once we’ve estimated our detection function. We’ve looked into how to calculate three useful summary statistics: the abundance in the study area or strata (*N̂*), the average detectability over the samples ($\hat{P_a}$) and the abundance in the area covered in the survey (*N̂*_{covered}).

Although these summary statistics seem like the end point of the analysis, of equal (if not greater) importance is the uncertainty around the estimates. Understanding the variability in the abundance we’ve calculated will enable us to feel confident (or not) in our results. This will be covered in the next chapter.

- Thompson (2002) is a thorough and accessible text on sampling theory, covering the theoretical basis of the Horvitz-Thompson estimator.
- Miller et al. (2013) Appendix A analyses the Gulf of Mexico dolphin data using a spatially explicit model. This will be revisited in later chapters, but is referenced here for completeness.
- Borchers, Buckland and Zucchini (2002) covers in more detail how to estimate abundance from many different survey types.

Borchers, D.L., Buckland, S.T. and Zucchini, W. (2002) *Estimating Animal Abundance*. Springer.

Branch, T.A. and Butterworth, D.S. (2001) Southern Hemisphere minke whales: standardised abundance estimates from the 1978/79 to 1997/98 IDCR-SOWER surveys. *Journal of Cetacean Research and Management*, **3**, 143–174.

Hedley, S.L. and Buckland, S.T. (2004) Spatial models for line transect sampling., **9**, 181–199.

Miller, D.L., Burt, M.L., Rexstad, E.A. and Thomas, L. (2013) Spatial models for distance sampling data: recent developments and future directions. *Methods in Ecology and Evolution*, **4**, 1001–1010.

Thompson, S.K. (2002) *Sampling*, 2nd ed. Wiley.

Williams, R., Kelly, N., Boebel, O., Friedlaender, A.S., Herr, H., Kock, K.H., et al. (2014) Counting whales in a challenging, changing environment. *Scientific Reports*, **4**, 1–6.

Here, as elsewhere, “average detectability” and “average probability of detection are used as synonyms. Though the latter is more precise, the former is much less unwieldy.↩

Note we could write this as $n/\hat{P_a}$, but we leave the estimator in this form for comparison with the covariate case.↩

The figure was adapted from Hedley and Buckland (2004). The data and figure do not necessarily match up, but to give a rough representation of the simulated survey effort and observations.↩

Eric’s analysis, showing how to perform a stratified analysis can be found on the Distance website at http://distancesampling.org/R/vignettes/minke.html.↩