The half-normal and hazard-rate detection functions seen in the previous chapters appear adequate for the datasets we’ve addressed so far, but there are a number of other modelling options available for the detection function. This chapter will give an overview of these models and how they can be fitted using Distance2. We also revisit model selection and look at the function summarize_models, which can build a table of fitted models before looking at common fitting problems and how to work around them.

Additional models for the detection function

Here we’ll look at two classes of detection function: key plus adjustment terms (K+A; also alternatively key plus adjustment functions) models and mixture models. Both of these classes fit into the AIC model selection framework discussed so far and can be tested for goodness of fit, so it’s a case of understanding their formulation and how to fit them in Distance2.

Key plus adjustment models

Both the half-normal and hazard-rate detection functions are somewhat limited in their flexibility. A simple adaptation of these detection functions is to consider them to be “key” functions which can be augmented with one or more “adjustment” terms (Buckland, 1992; Buckland et al., 2001). The general form of these models mathematically is:
$$ g(y) = \frac{k(y)\left[ 1 + \sum_{j=2}^J \alpha_j a_j(y)\right]}{k(0)\left[ 1 + \sum_{j=2}^J \alpha_j a_j(0)\right]} $$
where we are modelling our detection function g(y) with the key function k(y) (e.g. half-normal) and J adjustment functions aj which are multiplied by corresponding coefficients αj, which we’ll estimate. The denominator (which is simply the numerator evaluated at zero distance) ensures that the detection probability is certain at zero distance (g(0) = 1). Different aj(y)s are considered to be appropriate for each key function.

In general we add up to J adjustments, this means that adjustments of order 2, …, J are included1. It is rarely that case that a model with more than 3 adjustment terms is selected by AIC.

Which key and adjustment combinations and the particular orders that are appropriate is decided by whether the key and adjustments are orthogonal. In simple terms, orthogonality mathematically describes functions that don’t “overlap”; we’d like the functions not to overlap as we want each adjustment to contribute something “new” to the detection function (rather than adding to a part of the shape that’s already present).

There is some controversy in the distance sampling community over whether one should include covariates in K+A models. Leaving aside the more philosophical issue of whether one should include covariates in a model that includes adjustments, we are left with an issue of practicality. It’s highly likely that the model for the detection function will become overparameterised in this case, so in general the use of adjustments when covariates are used in the model is not encouraged. There are additional considerations covered in the Monotonicity and Model selection sections below.

Half-normal K+A models

When using a half-normal key, two adjustments are considered appropriate: cosine series and Hermite polynomials2. We can look at these functions graphically:
Figure 1: Half-normal key plus adjustment models. Left, with cosine adjustment terms of order 3; right, with Hermite polynomial adjustment terms of order 3. Solid line black shows the detection function, solid grey line is the half-normal key and the dashed grey show the two adjustments (scaled and shifted to be on the same axes).

These two models can be accessed in Distance2 via the model= argument to df(). For example looking at the Gulf of Mexico dolphin data again:

# half-normal with cosine adjustments of order 2
dolphins.hn_cos <- ds(mexdolphins,truncation=8000, df(model=~hn+cos(2)))
# half-normal with Hermite adjustments of order 2
dolphins.hn_hermite <- ds(mexdolphins,truncation=8000, df(model=~hn+hermite(2)))
Plotting these detection functions along with our half-normal model from the previous chapter, we can see that the Hermite polynomial adjustments have had no effect on the fit, where as the cosine adjustments have made the function rather too wiggly:
Figure 2: Detection function models fitted to the Mexico dolphin data, left to right: half-normal, half-normal with cosine adjustment of order 2, half-normal with Hermite polynomials of order 2.

For the half-normal with Hermite polynomial detection function, the adjustment didn’t appear to have any effect on the shape of the detection function. We can investigate the estimated parameters of the model and see that parameter that controls the influence of the Hermite adjustment is very very small:

dolphins.hn_hermite$par
##   (Intercept)    hermite(2) 
##  8.579719e+00 -2.866084e-06

It therefore seems reasonable to rule out this model as we are estimating an extra parameter for no gain. The half-normal with cosine model (second panel) seems like a much better candidate.

Hazard-rate K+A models

It is also possible to combine the hazard-rate detection function with adjustment terms to lead to a more flexible fit for the detection function. We can again use cosine adjustments, just as we did with the half-normal detection function above. Instead of Hermite polynomials, we use simple polynomials.

Plotting an example of these two adjustments:
Figure 3: Hazard-rate key with adjustments. Left, with cosine adjustments of order 3; right, simple polynomial adjustment terms of order 3. Solid line black shows the detection function, solid grey line is the hazard-rate key and the dashed grey shows the adjustment (rescaled to be on the same axes).

The syntax for the hazard-rate adjustment models is very similar to that of the half-normal models:

# hazard-rate with a cosine adjustment of order 2
dolphins.hr_cos <- ds(mexdolphins,truncation=8000, df(model=~hr+cos(2)))
# hazard-rate with a simple polynomial adjustment of order 2
dolphins.hr_poly <- ds(mexdolphins,truncation=8000, df(model=~hr+poly(2)))
We can again plot the resulting models:
Figure 4: Detection function models fitted to the Mexico dolphin data, left to right: hazard-rate, half-normal with cosine adjustment of order 2, half-normal with simple polynomial of order 2.

Fourier models

A final set of detection function models that fit into the “key plus adjustments” formulation is the “Fourier” type model. In this case we have a uniform key function and a cosine series as adjustments to that. We can write this model as:
$$ g(y) = \frac{\frac{1}{w}\left[ 1 + \sum_{j=1}^J \alpha_j \cos \left( \frac{j \pi y}{w} \right) \right]}{\frac{1}{w}\left[ 1 + \sum_{j=1}^J \alpha_j \cos(0)\right]} $$
Note that here the key “function” is 1/w and the summation of the cosines starts at j = 1 (rather than j = 2 above).

We can again look at the function graphically:
Figure 5: Fourier detection function; uniform key with cosine adjustments of order 1 and 2. Solid line black shows the detection function, solid grey line is the constant uniform key and the dashed grey show the two adjustments (scaled and shifted to be on the same axes).

Fourier models can be specified in a similar way to the half-normal and hazard-rate models above:

# uniform with a cosine adjustment of order 1
dolphins.fourier1 <- ds(mexdolphins,truncation=8000, df(model=~unif+cos(1)))
# uniform with cosine adjustments of order 1&2
dolphins.fourier12 <- ds(mexdolphins,truncation=8000, df(model=~unif+cos(1,2)))
We can again plot the fitted models:
Figure 6: Detection function models fitted to the Mexico dolphin data. Left: uniform with cosine adjustment of order 1. Right: uniform with cosine adjustments of order 1 and 2.

Recap

So far we’ve looked at some new options for modelling the detection function. They are summarised in the following table along with their mathematical forms:

Key function Mathematical form Adjustment Mathematical form
uniform 1/w cosine $\sum_{j=1}^J \alpha_j \cos(j \pi y/w)$
half-normal $\exp\left(-\frac{y^2}{2 \sigma^2}\right)$ cosine $\sum_{j=2}^J \alpha_j \cos(j \pi y/w)$
Hermite polynomial $\sum_{j=2}^J \alpha_j H_{2j}(y/w)$
hazard-rate $1-\exp\left[-\left(\frac{y}{\sigma}\right)^{-b}\right]$ cosine $\sum_{j=2}^J \alpha_j \cos(j \pi y/w)$
Simple polynomial $\sum_{j=2}^J \alpha_j (y/w)^{2j}$

Monotonicity

Plots of the models above show a common property amongst the fitted detection functions: they always decrease with increasing distance (monotonically decreasing). This is no coincidence, in fact it takes some computational effort to ensure that this happens. In this section we’ll first examine why monotonicity is a reasonable rquest to make of your detection functions, then go on to show what can go wrong when detection functions are non-monotonic.

Monotonicity makes sense as we are much more likely to see animals near us than further away. We may see “bumps” in the histogram of distances further out, but we assume this is just noise in the data, actually out model is that we see fewer objects as their distance from the transect increases. If there is some very large bump in the histograms we suspect that there has been some problem during the survey (some features that have caused animals to appear to “clump together” at a certain distance) or that the detection function is overfitting to random variation in the data (by chance there are more observations at one distance further away than others). One cause of this is that the transect was surveyed parallel to some geographical feature that the animals were interested in, for example if a transect is a road running through fields, then birds may sit on fences and are not so observable in the fields inbetween.

By default Distance2 will try to constrain those functions which can become non-monotonic during the fitting. But we can see what could go wrong by setting the monotonicity control option, we can force the fitting routine to not constrain the function to be monotonic:

dolphins.hn_cos_nomono <- ds(mexdolphins,truncation=8000, df(model=~hn+cos(2)), control=list(monotonicity=FALSE))

summary(dolphins.hn_cos_nomono)
## Summary of fitted detection function
## Transect type          : line 
## Number of observations : 47 
## Distance range         : 0 - 8000 
## AIC                    : 841.118 
## 
## Detection function     : Half-normal + cosine(2) adjustments 
## 
## Detection function parameters
##              Estimate        SE
## (Intercept) 8.5842206 0.2327356
## cos(2)      0.3699387 0.1962412
## 
## Kolmogorov-Smirnov p-value : 0.6798172 
## Cramer-von Mises p-value   : 0.7620765 
## 
##            Estimate       SE        CV
## Average p 0.5240601 0.147298 0.2810708

Comparing this summary to the previous one above (when montonicity is enforced), we can see that there is a difference in the coefficient estimated for the cosine part of the model. We can also look a plot of the model, to make the problem more obvious.

plot(dolphins.hn_cos_nomono)
Figure 7: Half-normal with order 2 cosine adustments fitted to the dolphin data, but when the fitting doesn’t constrain the detection function to be monotonically decreasing

We can also find out whether and where the detection functions are non-monotonic using the df_check_mono() function in Distance2. We run the function on a fitted detection function object, if the function is monotonic then it will return TRUE. If the function is non-monotonic, then df_check_mono will return FALSE and will give warnings explaining how the non-monotonicity has manifested itself.

Using the above analysis to illustrate the checking function:

df_check_mono(dolphins.hn_cos)
## Detection function is not strictly monotonic!
## [1] FALSE
df_check_mono(dolphins.hn_cos_nomono)
## Detection function is not strictly monotonic!
## [1] FALSE
We can include the plot=TRUE argument to show where the non-monotonicity occurs:
Figure 8: Montonicity check plot for the half-normal with cosine adjustment of order 2, fitted to the dolphin data. The non-monotonic part of the detection function is highlighted in red.

So in the former case, we don’t have to be too concerned about the monotonicity in the function, however the second case shows a much more concerning non-monotonic function, we should be much more concerned about this and discard this model.

As mentioned above, fitting the detection function so that its shape is constrainted is a computationally complicated task. In Distance2 we simple take a set of distances between the left and right truncation and evaluate them. The optimiser then ensures that as the distance increases the detection function value at that distance decreases compared to the last. When the values are larger, then the parameters that lead to that shape are discarded. This method is not foolproof, as the number of distances we evaluate the detection function at has an effect on whether the algorithm “knows” that the detection function is non-monotonic. Increasing the number of points leads to a longer time taken to fit the model, since we are doing more calculations and comparisons.

Returning to the considerations above regarding the addition of adjustments to models which contain covariates, it’s worth noting that detecting monotonicity gets much harder when covariates are included. df_check_mono simply evaluates g(x) over a grid and checks that the points always decrease when going left to right, but for a covariate model there are effectively as many functions to check as there are different observed values of the covariates (or rather their unique combinations). This is relatively straightforward when there is a single factor with a few levels, but quickly becomes a computational nightmare.

Forunately, as we’ll see next, there is another way to construct detection functions that ensures that they are monotonic by construction.

Mixture models

An alternative way to formulate the detection function is by use of mixture models. A mixture model takes advantage of the mathematical fact that when you sum any number of monotonic functions, the resulting function is also monotonic – this works around the above monotonicity issues in a simple way.

CITE ME developed a mixture model detection function using a sum of half-normal functions:
$$ g(y) = \sum_{j=1}^J \phi_j \exp\left(-\frac{y^2}{2 \sigma_j^2}\right) $$
we refer to the number of components or points in such a model and denote this J. Each of the J components are weighted by the ϕjs, which control the amount of influence that each component has in the model. We impose the condition that the ϕj must sum to one.

Plotting a mixture model detection function gives some intuition about how the model works:
Figure 9: Half-normal mixture model detection function. Solid line black shows the detection function, Dashed grey lines show the two the two mixture components.

In the above plot the two dashed lines are added (with equal weighting of 1/2 each) to produce the bold line. Because each component is monotonic, the black line is guaranteed to be monotonic.

Since the simplest mixture model detection function uses 3 parameters, it’s recommended that the formulation is used for relatively large data sets. For that reason we’ll switch to the amakihi data we saw in the previous chapter to illustrate how to fit the model:

data(amakihi)
# again, we'll truncate at 82.5m
# 1-point mixture -- just a half-normal model
amakihi.hn_mix1 <- ds(amakihi, transect="point", truncation=82.5, model=df(~hn+mix(1)))
# 2-point mixture
amakihi.hn_mix2 <- ds(amakihi, transect="point", truncation=82.5, model=df(~hn+mix(2)))
# 2-point mixture with observer as a covariate
amakihi.hn_mix2_obs <- ds(amakihi, transect="point", truncation=82.5, model=df(~hn+mix(2), scale=~obs))
We can again plot the resulting models:
Figure 10: Mixture model detection functions fitted to the Hawaiian amakihi data. Left: 1-point half-normal mixture model (equivanelent to using ~hn). Right: 2-point mixture of half-normals.
Figure 11: Mixture model detection function with 2 components fitted to the Hawaiian amakihi data with observer included as a factor covariate.

Note that a “1-point” mixture is simply a half-normal detection function as descibed in Models for detectability.

Model selection

At this point we have quite a lot of models (7 for the dolphins and 3 for the amakihi) and it would be nice to return to the topic of model selection. Fortunately, all of the models described here fit into the AIC framework that we saw in the previous chapters. We can also use the goodness of fit procedures that we’ve seen so far.

Ignoring goodness of fit for now, we can use the summarize_models function in Distance2 to create a table of model results (sorted by AIC) for easy comparison. For example, for the models of dolphins we can produce the results table by simply giving the function the models we wish to consider as arguments.

summarize_models(dolphins.hn_cos, dolphins.hn_hermite, dolphins.hr_cos, dolphins.hr_poly, dolphins.fourier1, dolphins.fourier12)
##                         Model Formula # pars       P_a   CV(P_a)      AIC
## 1     Half-normal + cosine(2)      ~1      2 0.5498216 0.2997429 841.4398
## 5         Uniform + cosine(1)      ~1      1 0.6996426 0.1889087 841.8044
## 6       Uniform + cosine(1,2)      ~1      2 0.6263006 0.2769566 842.6838
## 3     Hazard-rate + cosine(2)      ~1      3 0.5492290 0.2979662 843.2615
## 4 Hazard-rate + polynomial(2)      ~1      3 0.5759385 0.6443184 844.1967
## 2    Half-normal + Hermite(2)      ~1      2 0.7230932 0.2744102 844.3143

Note that the numbering of rows corresponds to the order in which the models were passed to summarize_models. Columns are as follows:

  • Model gives a short description of fitted model (this may be ambiguous so the row numbers may be helpful in working out which model is which).
  • Formula describes the covariate model (just ~1 when there are no covariates).
  • # pars gives the number of parameters in the model.
  • P_a lists the average probability of detection.
  • CV(P_a) gives the coefficient of variation of average probability of detection giving an indication of uncertainty in the model (more on this in How certain are we in our estimates?).
  • AIC finally lists Akaike’s information criterion for the model.

We can do the same thing for the detection functions fitted to the amakihi data too:

summarize_models(amakihi.hn_mix1, amakihi.hn_mix2, amakihi.hn_mix2_obs)
##                         Model Formula # pars       P_a    CV(P_a)      AIC
## 3 2-point half-normal mixture    ~obs      5 0.2788664 0.05789090 10778.69
## 2 2-point half-normal mixture      ~1      3 0.2834901 0.06204498 10805.48
## 1                 Half-normal      ~1      1 0.3514386 0.03208017 10833.84

The Formula column now includes ~obs for the model fitted with observer as a covariate.

These tables can provide a useful summary when many models have been fitted, but don’t excuse the investigator from performing model checking and plotting as described previously. They are a convenient utility not a fast track to completing an analysis.

As mentioned above, we don’t recommend adding adjustments to models when covariates are included, as they can lead to detection probabilities that are greater than 1 for some distances and make it difficult to diagnose monotonicity problems. However another potential problem is that in general we don’t expect detection function models to have many parameters (usually models with fewer than 5 parameters are enough to encapsulate a sufficient information about the shape of the detection functions). Although the “parameter hungry” argument can also be levelled against mixture model detection functions, they at least do not suffer from the additional issues of implausible function shape.

Model fitting problems

There are times when one can write the code to fit a model in R but it will not fit. One might receive error messages about convergence errors or invalid parameter values. These errors can happen for a variety of reasons and a comprehensive treatment is impossible but here are some possible reasons and fixes for model fitting problems.

The above reasons are the most popular ways in which a model can fail to fit, though there are many other potential complex pitfalls. Building detection functions with small increments of complexity can make the diagnosis of these issues significantly easier.

It is also worth noting at this point that although convergence issues can cause the model to fail to fit entirely, it is also possible that the model will fit but the parameters which the optimizer finds are not optimal. One indication that parameters may not be optimal is large uncertainty associated with the parameters (which can be found in the model summary output), in this case following the above advice in setting starting values and optimizer may help.

Recap

In this chapter we looked at additional models for the detection function that can be used in a variety of different situations when simple half-normal or hazard-rate detection functions are not flexible enough. All these models can be selected by AIC and checked for goodness of fit via Kolmogorov-Smirnov and Cramer-von Mises tests. The summarize_models function then allows one to easily review fitted models. Finally we looked at what can be done when models fail to fit. In the next chapter we’ll look at how we can estimate abundance once we’ve successfully fitted and selected our detection function.

Further reading

The models described above expand the set of possibilities considerably on those from the previous chapters. There are yet more models for the detection function to consider, though unfortunately not enough time here to give details on them all. A brief summary:

The original literature on these models is spread accross a number of papers.

References

Becker, E. and Quang, P.X. (2009) A gamma-shaped detection function for line-transect surveys with mark-recapture and covariate data., 14, 207–223.

Buckland, S.T. (1992) Fitting density functions with polynomials. Applied Statistics, 41, 63–76.

Buckland, S.T., Anderson, D.R., Burnham, K.P., Laake, J.L., Borchers, D.L. and Thomas, L. (2001) Introduction to Distance Sampling. Oxford University Press.

Eidous, O.M. (2011) A Semi-Parametric Model for Line Transect Sampling without the Shoulder Condition., 1–15.

Miller, D.L. and Thomas, L. (2015) Mixture models for distance sampling detection functions. PloS one.

Otto, M.C. and Pollock, K.H. (1990) Size Bias in Line Transect Sampling: A Field Test. Biometrics, 46, 239–245.

Press, W.H., Flannery, B.P., Teukolsky, S.A. and Vetterling, W.T. (1990) Numerical recipes. Cambridge University Press.


  1. It doesn’t really make sense to include only higher order terms, i.e. ignoring order 2 and only including order 3 rather than both 2 and 3. As the adjustment order increases the complexity of the adjustment increases and each term is orthogonal.

  2. Further mathematical details of Hermite polynomials can be found in many textbooks, Wolfram’s page is a good starting point.