The half-normal and hazard-rate detection functions seen in the previous chapters appear adequate for the datasets we’ve addressed so far, but there are a number of other modelling options available for the detection function. This chapter will give an overview of these models and how they can be fitted using Distance2. We also revisit model selection and look at the function summarize_models, which can build a table of fitted models before looking at common fitting problems and how to work around them.

## Additional models for the detection function

Here we’ll look at two classes of detection function: key plus adjustment terms (K+A; also alternatively key plus adjustment functions) models and mixture models. Both of these classes fit into the AIC model selection framework discussed so far and can be tested for goodness of fit, so it’s a case of understanding their formulation and how to fit them in Distance2.

Both the half-normal and hazard-rate detection functions are somewhat limited in their flexibility. A simple adaptation of these detection functions is to consider them to be “key” functions which can be augmented with one or more “adjustment” terms (Buckland, 1992; Buckland et al., 2001). The general form of these models mathematically is:
$$g(y) = \frac{k(y)\left[ 1 + \sum_{j=2}^J \alpha_j a_j(y)\right]}{k(0)\left[ 1 + \sum_{j=2}^J \alpha_j a_j(0)\right]}$$
where we are modelling our detection function g(y) with the key function k(y) (e.g. half-normal) and J adjustment functions aj which are multiplied by corresponding coefficients αj, which we’ll estimate. The denominator (which is simply the numerator evaluated at zero distance) ensures that the detection probability is certain at zero distance (g(0) = 1). Different aj(y)s are considered to be appropriate for each key function.

In general we add up to J adjustments, this means that adjustments of order 2, …, J are included1. It is rarely that case that a model with more than 3 adjustment terms is selected by AIC.

Which key and adjustment combinations and the particular orders that are appropriate is decided by whether the key and adjustments are orthogonal. In simple terms, orthogonality mathematically describes functions that don’t “overlap”; we’d like the functions not to overlap as we want each adjustment to contribute something “new” to the detection function (rather than adding to a part of the shape that’s already present).

There is some controversy in the distance sampling community over whether one should include covariates in K+A models. Leaving aside the more philosophical issue of whether one should include covariates in a model that includes adjustments, we are left with an issue of practicality. It’s highly likely that the model for the detection function will become overparameterised in this case, so in general the use of adjustments when covariates are used in the model is not encouraged. There are additional considerations covered in the Monotonicity and Model selection sections below.

#### Half-normal K+A models

When using a half-normal key, two adjustments are considered appropriate: cosine series and Hermite polynomials2. We can look at these functions graphically:

These two models can be accessed in Distance2 via the model= argument to df(). For example looking at the Gulf of Mexico dolphin data again:

# half-normal with cosine adjustments of order 2
dolphins.hn_cos <- ds(mexdolphins,truncation=8000, df(model=~hn+cos(2)))
# half-normal with Hermite adjustments of order 2
dolphins.hn_hermite <- ds(mexdolphins,truncation=8000, df(model=~hn+hermite(2)))
Plotting these detection functions along with our half-normal model from the previous chapter, we can see that the Hermite polynomial adjustments have had no effect on the fit, where as the cosine adjustments have made the function rather too wiggly:

For the half-normal with Hermite polynomial detection function, the adjustment didn’t appear to have any effect on the shape of the detection function. We can investigate the estimated parameters of the model and see that parameter that controls the influence of the Hermite adjustment is very very small:

#### Monotonicity

Plots of the models above show a common property amongst the fitted detection functions: they always decrease with increasing distance (monotonically decreasing). This is no coincidence, in fact it takes some computational effort to ensure that this happens. In this section we’ll first examine why monotonicity is a reasonable rquest to make of your detection functions, then go on to show what can go wrong when detection functions are non-monotonic.

Monotonicity makes sense as we are much more likely to see animals near us than further away. We may see “bumps” in the histogram of distances further out, but we assume this is just noise in the data, actually out model is that we see fewer objects as their distance from the transect increases. If there is some very large bump in the histograms we suspect that there has been some problem during the survey (some features that have caused animals to appear to “clump together” at a certain distance) or that the detection function is overfitting to random variation in the data (by chance there are more observations at one distance further away than others). One cause of this is that the transect was surveyed parallel to some geographical feature that the animals were interested in, for example if a transect is a road running through fields, then birds may sit on fences and are not so observable in the fields inbetween.

By default Distance2 will try to constrain those functions which can become non-monotonic during the fitting. But we can see what could go wrong by setting the monotonicity control option, we can force the fitting routine to not constrain the function to be monotonic:

dolphins.hn_cos_nomono <- ds(mexdolphins,truncation=8000, df(model=~hn+cos(2)), control=list(monotonicity=FALSE))

summary(dolphins.hn_cos_nomono)
## Summary of fitted detection function
## Transect type          : line
## Number of observations : 47
## Distance range         : 0 - 8000
## AIC                    : 841.118
##
## Detection function     : Half-normal + cosine(2) adjustments
##
## Detection function parameters
##              Estimate        SE
## (Intercept) 8.5842206 0.2327356
## cos(2)      0.3699387 0.1962412
##
## Kolmogorov-Smirnov p-value : 0.6798172
## Cramer-von Mises p-value   : 0.7620765
##
##            Estimate       SE        CV
## Average p 0.5240601 0.147298 0.2810708

Comparing this summary to the previous one above (when montonicity is enforced), we can see that there is a difference in the coefficient estimated for the cosine part of the model. We can also look a plot of the model, to make the problem more obvious.

plot(dolphins.hn_cos_nomono)

We can also find out whether and where the detection functions are non-monotonic using the df_check_mono() function in Distance2. We run the function on a fitted detection function object, if the function is monotonic then it will return TRUE. If the function is non-monotonic, then df_check_mono will return FALSE and will give warnings explaining how the non-monotonicity has manifested itself.

Using the above analysis to illustrate the checking function:

df_check_mono(dolphins.hn_cos)
## Detection function is not strictly monotonic!
## [1] FALSE
df_check_mono(dolphins.hn_cos_nomono)
## Detection function is not strictly monotonic!
## [1] FALSE
We can include the plot=TRUE argument to show where the non-monotonicity occurs:

So in the former case, we don’t have to be too concerned about the monotonicity in the function, however the second case shows a much more concerning non-monotonic function, we should be much more concerned about this and discard this model.

As mentioned above, fitting the detection function so that its shape is constrainted is a computationally complicated task. In Distance2 we simple take a set of distances between the left and right truncation and evaluate them. The optimiser then ensures that as the distance increases the detection function value at that distance decreases compared to the last. When the values are larger, then the parameters that lead to that shape are discarded. This method is not foolproof, as the number of distances we evaluate the detection function at has an effect on whether the algorithm “knows” that the detection function is non-monotonic. Increasing the number of points leads to a longer time taken to fit the model, since we are doing more calculations and comparisons.

Returning to the considerations above regarding the addition of adjustments to models which contain covariates, it’s worth noting that detecting monotonicity gets much harder when covariates are included. df_check_mono simply evaluates g(x) over a grid and checks that the points always decrease when going left to right, but for a covariate model there are effectively as many functions to check as there are different observed values of the covariates (or rather their unique combinations). This is relatively straightforward when there is a single factor with a few levels, but quickly becomes a computational nightmare.

Forunately, as we’ll see next, there is another way to construct detection functions that ensures that they are monotonic by construction.

### Mixture models

An alternative way to formulate the detection function is by use of mixture models. A mixture model takes advantage of the mathematical fact that when you sum any number of monotonic functions, the resulting function is also monotonic – this works around the above monotonicity issues in a simple way.

CITE ME developed a mixture model detection function using a sum of half-normal functions:
$$g(y) = \sum_{j=1}^J \phi_j \exp\left(-\frac{y^2}{2 \sigma_j^2}\right)$$
we refer to the number of components or points in such a model and denote this J. Each of the J components are weighted by the ϕjs, which control the amount of influence that each component has in the model. We impose the condition that the ϕj must sum to one.

Plotting a mixture model detection function gives some intuition about how the model works:

In the above plot the two dashed lines are added (with equal weighting of 1/2 each) to produce the bold line. Because each component is monotonic, the black line is guaranteed to be monotonic.

Since the simplest mixture model detection function uses 3 parameters, it’s recommended that the formulation is used for relatively large data sets. For that reason we’ll switch to the amakihi data we saw in the previous chapter to illustrate how to fit the model:

data(amakihi)
# again, we'll truncate at 82.5m
# 1-point mixture -- just a half-normal model
amakihi.hn_mix1 <- ds(amakihi, transect="point", truncation=82.5, model=df(~hn+mix(1)))
# 2-point mixture
amakihi.hn_mix2 <- ds(amakihi, transect="point", truncation=82.5, model=df(~hn+mix(2)))
# 2-point mixture with observer as a covariate
amakihi.hn_mix2_obs <- ds(amakihi, transect="point", truncation=82.5, model=df(~hn+mix(2), scale=~obs))
We can again plot the resulting models:

Note that a “1-point” mixture is simply a half-normal detection function as descibed in Models for detectability.

### Model selection

At this point we have quite a lot of models (7 for the dolphins and 3 for the amakihi) and it would be nice to return to the topic of model selection. Fortunately, all of the models described here fit into the AIC framework that we saw in the previous chapters. We can also use the goodness of fit procedures that we’ve seen so far.

Ignoring goodness of fit for now, we can use the summarize_models function in Distance2 to create a table of model results (sorted by AIC) for easy comparison. For example, for the models of dolphins we can produce the results table by simply giving the function the models we wish to consider as arguments.

summarize_models(dolphins.hn_cos, dolphins.hn_hermite, dolphins.hr_cos, dolphins.hr_poly, dolphins.fourier1, dolphins.fourier12)
##                         Model Formula # pars       P_a   CV(P_a)      AIC
## 1     Half-normal + cosine(2)      ~1      2 0.5498216 0.2997429 841.4398
## 5         Uniform + cosine(1)      ~1      1 0.6996426 0.1889087 841.8044
## 6       Uniform + cosine(1,2)      ~1      2 0.6263006 0.2769566 842.6838
## 3     Hazard-rate + cosine(2)      ~1      3 0.5492290 0.2979662 843.2615
## 4 Hazard-rate + polynomial(2)      ~1      3 0.5759385 0.6443184 844.1967
## 2    Half-normal + Hermite(2)      ~1      2 0.7230932 0.2744102 844.3143

Note that the numbering of rows corresponds to the order in which the models were passed to summarize_models. Columns are as follows:

• Model gives a short description of fitted model (this may be ambiguous so the row numbers may be helpful in working out which model is which).
• Formula describes the covariate model (just ~1 when there are no covariates).
• # pars gives the number of parameters in the model.
• P_a lists the average probability of detection.
• CV(P_a) gives the coefficient of variation of average probability of detection giving an indication of uncertainty in the model (more on this in How certain are we in our estimates?).
• AIC finally lists Akaike’s information criterion for the model.

We can do the same thing for the detection functions fitted to the amakihi data too:

summarize_models(amakihi.hn_mix1, amakihi.hn_mix2, amakihi.hn_mix2_obs)
##                         Model Formula # pars       P_a    CV(P_a)      AIC
## 3 2-point half-normal mixture    ~obs      5 0.2788664 0.05789090 10778.69
## 2 2-point half-normal mixture      ~1      3 0.2834901 0.06204498 10805.48
## 1                 Half-normal      ~1      1 0.3514386 0.03208017 10833.84

The Formula column now includes ~obs for the model fitted with observer as a covariate.

These tables can provide a useful summary when many models have been fitted, but don’t excuse the investigator from performing model checking and plotting as described previously. They are a convenient utility not a fast track to completing an analysis.

As mentioned above, we don’t recommend adding adjustments to models when covariates are included, as they can lead to detection probabilities that are greater than 1 for some distances and make it difficult to diagnose monotonicity problems. However another potential problem is that in general we don’t expect detection function models to have many parameters (usually models with fewer than 5 parameters are enough to encapsulate a sufficient information about the shape of the detection functions). Although the “parameter hungry” argument can also be levelled against mixture model detection functions, they at least do not suffer from the additional issues of implausible function shape.

## Model fitting problems

There are times when one can write the code to fit a model in R but it will not fit. One might receive error messages about convergence errors or invalid parameter values. These errors can happen for a variety of reasons and a comprehensive treatment is impossible but here are some possible reasons and fixes for model fitting problems.

• Truncation: Model fitting an often fail if the truncation distance is too large, as there may not be a value of the model parameters that encapsulates all of the data that was collected. Decreasing the truncation distance and refitting the model may allow the model to be fitted.
• Too many parameters: When including covariates in the model the number of parameters to estimate can often be large (especially when a factor with many levels is included). In this case fitting the simplest model and increasing its complexity one covariate at a time may help. One can also use the parameter values from the previous model fit as starting values for the more complex model, which at least gives the optimizer a good starting place (see ?starting_values for more information on how to do this in Distance2).
• Optimization algorithm: The default optimizer used by ds is relatively robust but it may be necessary to use additional options to allow a larger space of parameter values to be explored. A useful option is to supply the following option to ds: control=list(optimx.method=c("SANN","nlminb")) which adds "SANN" to the optimizers that ds will use (normally only nlminb is used). SANN uses simulated annealing (Press et al., 1990), which may allow us to explore more of the parameter space.
• Covariate scaling: Covariates are collected on very different scales; for example Beaufort sea state will usually vary between 0 and 5, but times of day might be measured in minutes from a set point, so could be in the hundreds. These different scales can cause problems for the optimizer, so by default ds will attempt to rescale all the covariates by multiplying them by that covariates standard deviation divided by the standard deviation of the distances. This usually works fairly well but one might need to manually rescale in some situations. In this case the control option parscale can be set, e.g. by supplying control=list(parscale=c(1,0.01,1) to ds to perform the rescaling.

The above reasons are the most popular ways in which a model can fail to fit, though there are many other potential complex pitfalls. Building detection functions with small increments of complexity can make the diagnosis of these issues significantly easier.

It is also worth noting at this point that although convergence issues can cause the model to fail to fit entirely, it is also possible that the model will fit but the parameters which the optimizer finds are not optimal. One indication that parameters may not be optimal is large uncertainty associated with the parameters (which can be found in the model summary output), in this case following the above advice in setting starting values and optimizer may help.

## Recap

In this chapter we looked at additional models for the detection function that can be used in a variety of different situations when simple half-normal or hazard-rate detection functions are not flexible enough. All these models can be selected by AIC and checked for goodness of fit via Kolmogorov-Smirnov and Cramer-von Mises tests. The summarize_models function then allows one to easily review fitted models. Finally we looked at what can be done when models fail to fit. In the next chapter we’ll look at how we can estimate abundance once we’ve successfully fitted and selected our detection function.

The models described above expand the set of possibilities considerably on those from the previous chapters. There are yet more models for the detection function to consider, though unfortunately not enough time here to give details on them all. A brief summary:

• Gamma and 2-part normal detection functions are useful when we can assume that detectability is certain at some point but that point is not at zero distance. This can happen during aerial surveys when observers struggle to see what is directly below them but can spot animals at a given distance with certainty. Gamma detection functions are covered in Becker and Quang (2009).
• Exponential power series is a more flexible alternative to the hazard-rate (Otto and Pollock, 1990).
• Negative exponential: can be useful when very spiked data are collected but usually is only recommended to salvage a survey gone badly wrong (as the negative exponential has no “shoulder”). A mixture model approach is usually more reliable when data are very spiked (see e.g. Buckland, 1992; Buckland et al., 2001; Eidous, 2011).

The original literature on these models is spread accross a number of papers.

• Key function plus adjustment-type models were first proposed by (Buckland, 1992) and further elaborated on in (Buckland et al., 2001).
• Mixture model detection funcions were proposed by (Miller and Thomas, 2015).

## References

Becker, E. and Quang, P.X. (2009) A gamma-shaped detection function for line-transect surveys with mark-recapture and covariate data., 14, 207–223.

Buckland, S.T. (1992) Fitting density functions with polynomials. Applied Statistics, 41, 63–76.

Buckland, S.T., Anderson, D.R., Burnham, K.P., Laake, J.L., Borchers, D.L. and Thomas, L. (2001) Introduction to Distance Sampling. Oxford University Press.

Eidous, O.M. (2011) A Semi-Parametric Model for Line Transect Sampling without the Shoulder Condition., 1–15.

Miller, D.L. and Thomas, L. (2015) Mixture models for distance sampling detection functions. PloS one.

Otto, M.C. and Pollock, K.H. (1990) Size Bias in Line Transect Sampling: A Field Test. Biometrics, 46, 239–245.

Press, W.H., Flannery, B.P., Teukolsky, S.A. and Vetterling, W.T. (1990) Numerical recipes. Cambridge University Press.

1. It doesn’t really make sense to include only higher order terms, i.e. ignoring order 2 and only including order 3 rather than both 2 and 3. As the adjustment order increases the complexity of the adjustment increases and each term is orthogonal.

2. Further mathematical details of Hermite polynomials can be found in many textbooks, Wolfram’s page is a good starting point.