In the previous chapter we saw that when we model detectability in our surveys we can reliably estimate abundance. The previous examples were simulations, so we knew the true abundance; it’s now time to get our hands dirty with some survey data.

Let’s begin by looking at data collected on pantropical spotted dolphins (*Stenella attenuata*) in the Gulf of Mexico^{1}. Data was collected by NOAA’s Southeast Fisheries Science Centre and comes from several shipboard surveys in the Gulf of Mexico during 1996 as part of the GulfCet II Program. 47 groups of pantropical spotted dolphins were observed by the crew of the Oregon II.

We can start by loading the `Distance2`

package which contains both the data we want to analyse and the functions we’ll use later.

```
# load the package
library(Distance2)
# load the dolphin data
data(mexdolphins)
```

For now we’ll ignore the exact format of the data (which is simply a `data.frame`

, we’ll investigate its other columns in the next chapter), for now we’ll concentrate on the `distance`

column. Looking at a histogram of the distances, as we did previously:

`hist(mexdolphins$distance, main="", xlab="Distance (m)")`

As with the data we simulated in the previous chapter, we can see that there is a decrease in frequency with increasing distance from the transect line (though this is a little less well behaved in reality). If the assumption of strip transect sampling were true this histogram would be flat, indicating that the dolphins could be detected with certainty at all distances.

We now want to model the decrease in the histogram of distances in order to correct the counts we have, and go on to estimate abundance.

The relationship between count frequency and distance is described by a *detection function*. More formally, we model the probability that we detect an animal given that it is at a certain distance (*x*, say) from the line, mathematically:

ℙ(animal detected | animal at distance *x*)

In general, we’ll refer to the detection function as *g*, which for now we’ll just say is a function of the distances (*x*) and its parameter(s) (generally *θ* or $\boldsymbol{\theta}$), which we’ll estimate.

$$ g(x; \sigma)=\exp\left(-\frac{x^2}{2\sigma^2}\right), $$

here we denote the parameter we wish to estimate as

We’d like to have the shape of the detection function mimic the shape of the histogram; we’ll go into more detail on how that’s done below but for now let’s try to fit a detection function to the Gulf of Mexico dolphin data…

`hn.df <- ds(mexdolphins, truncation=8000)`

The function `ds`

will fit the detection function for us (to data set `mexdolphins`

). We set the `truncation`

to be 8000m (more on truncation below) and fit a (default) half-normal model for the detection function. Let’s see what that looks like before we examine the model in further detail:

`plot(hn.df, main="")`

As with most R objects, we can use the `summary`

method to find out further information about our fitted detection function:

`summary(hn.df)`

```
## Summary of fitted detection function
## Transect type : line
## Number of observations : 47
## Distance range : 0 - 8000
## AIC : 842.3143
##
## Detection function : Half-normal
##
## Detection function parameters
## Estimate SE
## (Intercept) 8.579708 0.2300667
##
## Kolmogorov-Smirnov p-value : 0.1447243
## Cramer-von Mises p-value : 0.3236715
##
## Estimate SE CV
## Average p 0.7230935 0.1370333 0.1895098
```

The summary tells us how many observations were used to fit the data, the truncation distance, what detection function we used, the AIC, goofness of fit test *p*-values, parameter estimates and “`Average p`

”, along with their associated uncertainties.

It is worth noting that the parameter estimate (marked `(Intercept)`

) is on the log scale, so *σ* = exp 8.58 (why the parameters are reported in this way will become apparent in the next chapter, What else affects detectability?).

Mathematically the hazard-rate detection function is defined as:

$$
g(x; \sigma, b)=1-\exp\left(-\left(\frac{x}{\sigma}\right)^{-b}\right).
$$

The hazard-rate detection function has two parameters: *σ*, the scale parameter as in the half-normal and *b*, which is called the *shape* parameter. The role of the shape parameter is to control the size of the *shoulder* of the detection function – how far detection remains certain. We can see its effect in comparing between plots where *b* = 5 and *b* = 1, a larger *b* gives a longer distance out to which we observe animals with certainty.

We can fit a hazard-rate detection function to the dolphin data in a similar way to half-normal, though now we must specify a detection function model:

`hr.df <- ds(mexdolphins, truncation=8000, model=df(model=~hr))`

we can then plot this, as before:

`plot(hr.df)`

We can again use the `summary`

function:

`summary(hr.df)`

```
## Summary of fitted detection function
## Transect type : line
## Number of observations : 47
## Distance range : 0 - 8000
## AIC : 842.1967
##
## Detection function : Hazard-rate
##
## Detection function parameters
## Estimate SE
## (Intercept) 7.945285 0.9700149
## shape 0.000000 0.7602676
##
## Kolmogorov-Smirnov p-value : 0.6058832
## Cramer-von Mises p-value : 0.634694
##
## Estimate SE CV
## Average p 0.5754871 0.3286864 0.5711446
```

Note that there are now two rows of parameters, one for scale and then one for shape.

The hazard-rate model has a rather different average detection probability from the half-normal model (0.5755 rather than 0.7231 for the half-normal) and hence we’ll end up with rather different abundance estimates. How do we decide between these two models?

In general, we’d like to fit a number of different detection functions, choose the best and use that for our subsequent inference. In order to choose which detection function is “best”, we must first define what we mean by “best”. One can view this as a three stage process:

- amongst the candidate models, which are “reasonable”,
- perform goodness of fit testing on the reasonable models,
- amongst those reasonable models with good goodness of fit, which is the best?

First we need to ensure that the models that we are comparing are “reasonable”. What I mean by this is they pass the following checks:

**Biologically reasonable**: It’s important, before fitting models, to ensure that models are biologically plausible. Rexstad and coauthors (1988) show that one can easily construct datasets of covariates that predict the phenomena of interest well but are actually totally unrelated. Although their example is deliberately silly, it highlights an important principle: avoiding a modelling philosophy of “throw everything in and see what happens”. This will be particularly pertinent in What else affects detectability?, when we talk about factors other than distance that affect detectability.**Convergent**: It is sometimes the case that when fitting a model, an warning will be reported about the optimisation not having converged. In this case the results are not necessarily reliable. We’ll cover troubleshooting such models in Improving the fit of detection functions, offering some ways to fix non-convergent models. For now, let’s just say that it’s generally a bad idea to include such models in the model selection process.**Eyeball test**: Inspecting fitted model plots can tell you a lot. One should not underestimate the amount of information that can be gleaned from simply plotting the detection function. It is possible that a model can converge but with rather “wacky” parameters leaving a model that doesn’t resemble the data at all. Ensuring that such models are investigated (again, more in this in Improving the fit of detection functions) and excluded is another important step towards model selection.

Goodness of fit testing involves measuring discrepancies in how the model describes the data and the data themselves. Goodness of fit tests give **absolute** measure of model fit.

Three statistical tests are commonly used to assess goodness of fit for detection functions, these are: *χ*^{2}, Kolmogorov-Smirnov and Cramer-von Mises tests. The latter two tests are based on quantifying the results of quantile-quantile plots (Q-Q plots), so we’ll talk about them first.

A Q-Q plot allows us to graphically assess how similar samples from two distributions are. To judge goodness of fit for detection functions, we are interested in comparing the cumulative distribution function (CDF) and empirical distribution function (EDF). The CDF (of the detection function) evaluates the probability of observing an animal at a distance less than or equal to a given value (*x*, say); we usually denote the CDF as *F*(*x*). The empirical distribution function gives the proportion of observations for which the CDF is less than or equal to that of a given distance. In other words, we’re judging whether the number of observations up to a given distance are in line with what the model says they should be. The “given values” that we use are the observed distances. If our model fits well then we should have agreement between these two measures so plotting them against each other should lead to a straight line at *y* = *x*. Points above the line indicate that the model thinks there should be more observations than there were, points below the line indicate that there were more observations than the model thinks.

We can plot a Q-Q plot for our fitted detection functions using the `gof_tests`

function in `Distance2`

:

`gof_tests(hn.df)`

```
## Goodness of fit testing for distance sampling model
##
## Kolmogorov-Smirnov test:
## Test statistic : 0.16713
## p-value : 0.14472
## Cramer-von Mises test:
## Test statistic : 0.17416
## p-value : 0.32367
```

The function also prints out some statistics for the two tests we’ll investigate next. First let’s look at the plot. The grey line indicates the *y* = *x* line, we can see that there seem to be rather more points below that line than above, this indicates that we may have a problem with our model fit. To further investigate this we need to formalise what we mean by “far from the line *y* = *x*”, which we’ll achieve by looking at two statistical tests.

We can see from the above Q-Q plot that there are some deviations from the line *y* = *x*. The Kolmogorov-Smirnov test asks the question “what’s the largest vertical distance between a point and the *y* = *x* line?” It uses this distance as a statistic to test the null hypothesis that the samples (EDF and CDF in our case) are from the same distribution (and hence our model fits well). If the deviation between the *y* = *x* line and the points is too large we reject the null hypothesis and say the model doesn’t have a good fit. The *p*-value for the test is given in the `gof_tests`

output and is 0.1447, telling us that the deviation is not too severe.

Rather than looking at the single biggest difference between the *y* = *x* line and the points in the Q-Q plot, we might prefer to think about all the differences between line and points, since there may be many smaller differences that we want to take into account rather than looking for one large deviation. In this case the Cramer-von Mises test can help us. Its null hypothesis is the same, but the statistic it uses is the sum of the deviations from each of the point to the line. For the half-normal detection function we can see above that the *p*-value (of 0.3237) also doesn’t indicate that the deviations are too far.

To illustrate these ideas further, we can perform the goodness of fit testing for the hazard-rate model:

`gof_tests(hr.df, plot=FALSE)`

```
## Goodness of fit testing for distance sampling model
##
## Kolmogorov-Smirnov test:
## Test statistic : 0.11125
## p-value : 0.60588
## Cramer-von Mises test:
## Test statistic : 0.09027
## p-value : 0.63469
```

Note that using the `plot=FALSE`

argument, we can prevent the plot from being shown and just give the test statistics and *p*-values. The output for the hazard-rate model gives us some weak evidence that the distributions of the CDF and EDF are not the same. The Cramer-von Mises test is more powerful (since we are using more information in calculating its statistic), so we should usually pay it more attention.

The next plot shows the Q-Q plot for the hazard-rate model, with the Kolmogorov-Smirnov test statistic highlighted in blue and the differences used in the calculation of the Cramer-von Mises statistic highlighted in red.

```
## Goodness of fit testing for distance sampling model
##
## Kolmogorov-Smirnov test:
## Test statistic : 0.11125
## p-value : 0.60588
## Cramer-von Mises test:
## Test statistic : 0.09027
## p-value : 0.63469
```

The results from the goodness of fit tests show that neither of our proposed detection functions lead to a particularly poor fit. We therefore go on to think about comparing the models to each other.

*χ*^{2} tests have been omitted here. They’re most appropriate for binned data, so we’ll come back to them when we look at binning distances later on.

Once we have a reasonable set of models, we want to be able to rank them in order of how well they represent the data. It is relatively easy to ensure that the model fits the data too well (as, in theory at least, we can make our model for the detection function arbitrarily complex and reproduce the histogram of distances). To avoid this, we can penalise our measure of model fit (the *log likelihood*) based on the number of parameters we use – making us justify the use of the extra complexity by an appropriately-sized payoff in fit. Akaike’s Information Criterion (AIC) is the metric that we’ll use to perform model selection for the detection function. This ensures that we pick models that represent the data well while ensuring we have parsimonious models. In contrast to goodness of fit testing, AIC only gives a **relative** measure of model fit.

We see the AIC reported for the two models above in their summaries. We can also access the AIC via the `$AIC`

element of the model object:

`hn.df$AIC`

`## [1] 842.3143`

`hr.df$AIC`

`## [1] 842.1967`

A smaller AIC is better, but if the AICs are closer than 2 and differ by only one parameter (as we see in the two detection functions above), we can discard the more complex model as it only marginally decreases the AIC at the cost of one additional parameter: hardly parsimonious. Based on this, we can discard the hazard-rate model, leaving the half-normal as our chosen detection function at this point.

In Improving the fit of detection functions we’ll look at the `summarize_models`

function in `Distance2`

that can be used to summarise results from many models in a table.

At this point it’s worth noting that we’re never interested in the absolute values of the AIC, but rather the different between the best candidate model’s AIC and the others. Articles will often include tables with a column labelled ΔAIC containing these differences. Burnham and Anderson (2002) is a thorough treatment of AIC in an ecological modelling context, as well as a more philosophical guide on statistical modelling.

With a fitted detection function we can evaluate the probability of observing an animal given its distance from the transect (the definition of the detection function, *g*) but, what we’d rather know is the probability of detecting an animal on average (since our aim is to make inference about the population of animals, rather than a single observation in particular). To do this we want to integrate distance out of the detection function – removing the conditioning on the distance.

Before we do that, it’s important to remember that in order to do distance sampling we have to make some assumption about the distribution of the animals with respect to the transects. If we design our survey well we can assume that animals are *uniformly distributed with respect to the transect* – meaning there not “clumps” of animals at certain distances^{2}. We denote the distribution of all animals with respect to the transect with *π*_{y}(*y*) in general. For line transects, we let *π*_{x}(*x*) = 1/*w* (since we are assuming the distribution is uniform between 0 and the truncation distance). We need to include this assumption in our calculation of average detectability.

Denoting the average probability of detection as $\hat{P_a}$, for line transects we can write^{3} :

$$
\hat{P_a} = \frac{1}{w}\int_0^w g(x;\boldsymbol{\theta}) \text{d}x
$$

where *w* denotes the truncation distance (which we set to 8000m). For the pantropical dolphins we obtain $\hat{P_a}=$ 0.7231 (see also the output of the `summary`

, under `Average p`

/`Estimate`

).

These average probabilities of detection are used to calculate abundance estimates as we’ll see in Abundance estimation.

So far we’ve looked at the dolphin data from the Gulf of Mexico, a line transect survey. Now let’s look at point transect data and show that fitting a detection function to that data is not significantly different.

We’ll be looking at data consisting of 1485 observations of amakihi (*Hemignathus virens*), a bird species native to Hawaii^{4}. Data were collected as part of a larger survey (of many species) on the island of Hawaii. The surveys were performed over 7 periods between July 1992 and April 1995. There were 41 point transects, although they were not all surveyed every time.

As for the dolphins we can again load the data, which is bundled with `Distance2`

:

`data(amakihi)`

Plotting the histogram of observed distances:

`hist(amakihi$distance, main="", xlab="Distance (m)")`

The histogram is the shape we saw in the previous chapter, an increase up to a point (due to the geometry of the sampler), then a drop off after that point. We also notice that unlike the dolphin example, the observed distances trail off rather more gradually, with a few distances higher than 200m (the median observed distance is 45). We probably don’t want to model the long tail in the distances as they are likely outliers, so we will *truncate* the data. Before we do let’s see what happens when we don’t truncate.

`hn.df.notrunc <- ds(amakihi, transect="point")`

```
## Warning in check_truncation(truncation, data): No truncation supplied,
## using maximum distance
```

Notice two differences here in our call to `ds`

: we supply `transect="point"`

to tell `ds`

that our model is of point transect data and that we didn’t include a `truncation`

. When we don’t supply a truncation distance `ds`

uses the largest observed distance and warns us that is happening.

We can again look at summary information for the fitted model

`summary(hn.df.notrunc)`

```
## Summary of fitted detection function
## Transect type : point
## Number of observations : 1485
## Distance range : 0 - 250
## AIC : 14201.9
##
## Detection function : Half-normal
##
## Detection function parameters
## Estimate SE
## (Intercept) 3.742683 0.009661874
##
## Kolmogorov-Smirnov p-value : 1.346919e-11
## Cramer-von Mises p-value : 1.236632e-05
##
## Estimate SE CV
## Average p 0.05701686 0.001101779 0.01932374
```

We estimate $P_a=$0.057. This time we use *π*_{r}(*r*) = 2*r*/*w*^{2}, so

$$
\hat{P_a} = \frac{2}{w^2}\int_0^w r g(r;\boldsymbol{\theta}) \text{d}r,
$$

for point transects.

As for line transects we can plot the resulting model along with goodness of fit information:

```
par(mfrow=c(1,2))
plot(hn.df.notrunc, xlab="Distance (m)", main="")
gof_tests(hn.df.notrunc)
```

```
## Goodness of fit testing for distance sampling model
##
## Kolmogorov-Smirnov test:
## Test statistic : 0.09307
## p-value : 0
## Cramer-von Mises test:
## Test statistic : 3.99489
## p-value : 1e-05
```

We can see that many points lie below the line *y* = *x*, indicating that the model thinks there should be fewer observations up to each distance than were observed (the *p*-values of the goodness of fit tests also reflect this). This, combined with the detection function going to zero well before the truncation, at a little less than 150m (largest observed distance was 250m).

Note that the histogram plotted with the detection function here has been rescaled to take into account the increasing area searched with increasing radial distance. We can plot the fitted model’s probability density function with the unscaled histogram by supplying the `pdf=TRUE`

option to `plot`

:

`plot(hn.df.notrunc, xlab="Distance (m)", pdf=TRUE, main="")`

We can see the density for distances greater than 150m is very small.

Given the poor goodness of fit results, we can truncate the data to improve the fit of the detection function. A perennial question in distance sampling is “at what distance do I truncate?”

We’d rather not throw-away too much of our (or our field workers’) hard-earned data – only enough to improve the fit of our model. There is no general rule for when and where to truncate however Buckland et al (Buckland et al., 2001) suggest for line transects that a truncation is chosen such that *g*(*w*) ≈ 0.15 (p. 103) and for point transect (due to the aforementioned geometry considerations) we choose the truncation such that *g*(*w*) ≈ 0.1 (p. 151). These rules of thumb may or may not make sense in a given situation, for example for the pantropical dolphins we didn’t truncate at the analysis stage, as it appeared that truncation had occurred during data collection (there are no “outliers” as there are for the amakihi data).

A typical procedure to ensure that the truncation used is “right” is to fit many models with different truncation and try to ensure that the truncation you choose has good goodness of fit results and few (or no) outliers.

We’ll try compare two further truncation distances, one at 150m, and one at 82.5m as used in (Marques et al., 2007).

```
hn.df.trunc.150 <- ds(amakihi, transect="point", truncation=150)
hn.df.trunc.82.5 <- ds(amakihi, transect="point", truncation=82.5)
```

Having fitted the models, we can again compare summaries and goodness of fit, along with detection function plots.

`summary(hn.df.trunc.150)`

```
## Summary of fitted detection function
## Transect type : point
## Number of observations : 1474
## Distance range : 0 - 150
## AIC : 13898.02
##
## Detection function : Half-normal
##
## Detection function parameters
## Estimate SE
## (Intercept) 3.707559 0.01210836
##
## Kolmogorov-Smirnov p-value : 3.466577e-08
## Cramer-von Mises p-value : 5.69763e-07
##
## Estimate SE CV
## Average p 0.1474673 0.003543475 0.02402889
```

`summary(hn.df.trunc.82.5)`

```
## Summary of fitted detection function
## Transect type : point
## Number of observations : 1243
## Distance range : 0 - 82.5
## AIC : 10833.84
##
## Detection function : Half-normal
##
## Detection function parameters
## Estimate SE
## (Intercept) 3.580267 0.02011161
##
## Kolmogorov-Smirnov p-value : 0.0003152646
## Cramer-von Mises p-value : 0.003577949
##
## Estimate SE CV
## Average p 0.3514386 0.01127421 0.03208017
```

Although there is not a large change in parameter estimate between the two models, this (along with the change in truncation) does lead to a large difference in the average detectability (and therefore any abundance estimates). Note that we can’t compare the AICs of models with different truncations, as the models use different data.

We can see that the quantile-quantile plots are much improved when truncation is set at 82.5m (right side), points are closer to the line *y* = *x*. Though, looking at the goodness of fit test results, we don’t see a very compelling story for this model.

`gof_tests(hn.df.trunc.82.5, plot=FALSE)`

```
## Goodness of fit testing for distance sampling model
##
## Kolmogorov-Smirnov test:
## Test statistic : 0.05934
## p-value : 0.00032
## Cramer-von Mises test:
## Test statistic : 0.93083
## p-value : 0.00358
```

At this point we’ll stop looking at the truncation and turn back to detection function model selection and compare the half-normal with hazard-rate model, keeping the truncation at 82.5m.

```
hr.df.trunc.82.5 <- ds(amakihi, transect="point", truncation=82.5, model=df(model=~hr))
summary(hr.df.trunc.82.5)
```

```
## Summary of fitted detection function
## Transect type : point
## Number of observations : 1243
## Distance range : 0 - 82.5
## AIC : 10807.55
##
## Detection function : Hazard-rate
##
## Detection function parameters
## Estimate SE
## (Intercept) 3.4545394 0.06311553
## shape 0.8342906 0.06533068
##
## Kolmogorov-Smirnov p-value : 0.06280827
## Cramer-von Mises p-value : 0.3344041
##
## Estimate SE CV
## Average p 0.3285788 0.02013387 0.06127561
```

We can see that this model has a much better AIC than the half-normal model (hazard-rate: 10807.55, versus 10833.84 for half-normal). Plotting the detection function and performing goodness of fit tests indicates that this model is much better than the half-normal.
```
## Goodness of fit testing for distance sampling model
##
## Kolmogorov-Smirnov test:
## Test statistic : 0.03731
## p-value : 0.06281
## Cramer-von Mises test:
## Test statistic : 0.16984
## p-value : 0.3344
```

This chapter shown how detection functions are fitted in `Distance2`

and how to select a detection function. We covered both line and point transects as well as how to select the truncation distance.

In the next chapter we’ll look at different models for the detection function and how we can use additional data to model detectability when it relies on other variables as well as distance.

Buckland, S.T., Anderson, D.R., Burnham, K.P., Laake, J.L., Borchers, D.L. and Thomas, L. (2001) *Introduction to Distance Sampling*. Oxford University Press.

Burnham, K.P. and Anderson, D.R. (2002) *Model Selection and Multimodel Inference*. Springer Science & Business Media.

Cox, M.J., Borchers, D.L., Demer, D.A., Cutter, G.R. and Brierley, A.S. (2011) Estimating the density of Antarctic krill (Euphausia superba) from multi-beam echo-sounder observations using distance sampling methods. *Journal of the Royal Statistical Society-Series C Applied Statistics*, **60**, 301–316.

Marques, T.A., Buckland, S.T., Bispo, R. and Howland, B. (2012) Accounting for animal density gradients using independent information in distance sampling surveys. *Statistical Methods & Applications*, **22**, 67–80.

Marques, T.A., Buckland, S.T., Borchers, D.L., Tosh, D. and McDonald, R.A. (2010) Point Transect Sampling Along Linear Features. *Biometrics*, **66**, 1247–1255.

Marques, T.A., Thomas, L., Fancy, S. and Buckland, S.T. (2007) Improving estimates of bird density using multiple-covariate distance sampling. *The Auk*, **124**, 1229–1243.

Rexstad, E.A., Miller, D., Flather, C., Anderson, E., Hupp, J. and Anderson, D.R. (1988) Questionable multivariate statistical inference in wildlife habitat and community studies. *The Journal of Wildlife Management*, **52**, 794–798.

For convenience the data are bundled in an R-friendly format, although all of the code necessary for creating the data from the Distance project files is available at github.com/dill/mexico-data. The original OBIS-SEAMAP page for the data may be found at the SEFSC GoMex Oceanic 1996 survey page.↩

One such problem is when line transects are positioned to run parallel to some geographical feature that attracts animals (e.g. hedgerows). A number of articles try to address issues arising from such surveys by modelling the effect of the feature (Marques et al., 2010, Cox et al. (2011), Marques et al. (2012)).↩

More generally, we’d write $\hat{P_a} = \int_0^w g(y;\boldsymbol{\theta}) \pi_y(y) \text{d}y$.↩

Data are again bundled with

`Distance2`

. Thanks go to Steven Fancy of the National Parks Service for providing the data. A more detailed analysis is provided in the associated paper (Marques et al., 2007).↩