Yes! You can do that in mgcv
This is the page for a talk I gave at University of Edinburgh Centre for Statistics Seminar, 11 March 2024.
If you just want the talk slides click here.
Below is a perslide set of references, in case that’s useful to anyone…
Slide  References 

More about me at converged.yt  
You can find out more about BioSS here and UKCEH here.  
Unfortunately, as the gradual decay of Twitter continues, I couldn’t find Jenny’s tweet to link here.  
Well, sort of.  
Wood (2017) is a good starting point.  
You can see a neat Shiny app that allows you to play with this here (built as part of a BioSSUKCEH Framework course). It takes a while to load as it uses wasm to run R in your browser. Source is here.  
Different penalties basically imply different basis functions, so the penalty really tells us a lot about how the model works. Simon has spent a lot of time thinking about how to estimate smoothing parameters, e.g.: Wood (2008), Wood (2011), Wood et al. (2016) and Wood & Fasiolo (2017).  
Proof of this is straightforward… \[\begin{align*} \lambda\int\left(\frac{\partial^{2}s(x)}{\partial x^{2}}\right)^{2}\text{d}x &= \lambda\int\left(\frac{\partial^{2}}{\partial x^{2}}\sum_{k}\beta_{k}b_{k}(x)\right)^{2}\text{d}x\\ &=\lambda\int\left(\sum_{k}\beta_{k}\frac{\partial^{2}b_{k}(x)}{\partial x^{2}}\right)^{2}\text{d}x\\ &=\lambda\int\left(\boldsymbol{\beta}\boldsymbol{d}(x)\right)^{2}\text{d}x\\ &=\lambda\int\boldsymbol{\beta}^{\intercal}\boldsymbol{d}(x)\boldsymbol{d}(x)^{\intercal}\boldsymbol{\beta}\text{d}x\\ &=\lambda\boldsymbol{\beta}^{\intercal}\int\boldsymbol{d}(x)\boldsymbol{d}(x)^{\intercal}\text{d}x\boldsymbol{\beta}\\ \text{where } \boldsymbol{S} &= \int\boldsymbol{d}(x)\boldsymbol{d}(x)^{\intercal}\text{d}x \end{align*}\] 

All these examples are stolen from mgcv manual pages. You can see the code for fitting and plotting them here. 

(You don’t have to be a Bayesian if you don’t want to be.)  
I have a preprint (Miller, 2019), which looks into these results, their implications and history a bit more.  
Simple posterior sampling code for this example, here. You can find out more about uncertainty estimation for GAMs in this wee explainer I wrote. 

See, again, Miller (2019) and (as we’ll talk about later) Miller et al. (2019).  
See here if you don’t understand this reference.  
Boring penalties can be good! For example, they can endup being sparse, which is very useful!  
Figure from Jacobson et al. (2022), where we built a fairly complicated model of beaked whale response to US Navy sonar.  
Details of this are in Miller et al. (2019).  
This penalty looks like a nightmare, but you can do some clever stuff in mgcv using a linear transformation of the log smoothing parameters. See here for code and section 4 of the supplementary materials for the maths 

For the last point, we might dettach a little from the spirit of the talk as some of the software implementations may perform better with given structures (ie.g., some might handle sparseness more efficiently, some may dealw tih big data more easily etc).  
This section is based on Pedersen et al. (2019)  
In the paper, we describe these models in full, with multiple mgcv implementations for each scenario. 

Eric Pedersen, Noam Ross and Gavin Simpson maintain the MRFtools package to assist with this kind of situation. 

This section is based off work that BioSS and UKCEH did in collaboration last year. A paper is being prepared, but in the meantime here are the course notes for a workshop we ran at the end of the project.  
This isn’t the most “functional data analysis”y way to explain this, but it worked well with a nonstatistical audience.  
Coding this up is surprisingly straightforward, thanks to the linear functional methods that are builtin to mgcv . Find out more here 

Ken Newman and I are working on this as part of an aphid arrival project with BioSS as part of a work package for the Scottish Government.  
Photo from SASA’s flickr. Data come from the UK suction trap network. Find out more here and were collected and prepared by Rothamsted Resarch and SASA in Scotland, as well as many partner organisations.  
These plots show the date of detection of the 10th aphid in traps around the UK.  
We’re preparing a manuscript on this topic and hope to have it completed this year. Again, these models are very easy to specify in mgcv , see Wood (2017) section 7.4.2. 

A recent paper (that I reviewed) does a great job of showing how to do exactly this: Dovers et al. (2024). Original work on the BermanTurner device is Baddeley & Turner (2000) and Berman & Turner (1992). The inlabru approach is covered in Simpson et al. (2016). 

This is covered in some depth in the above papers. See also Silverman (1982)  
for an important detail of the “trick”.  
See Bravington et al. (2021) for all the details on this.  
A quick introduction to these kinds of models “density surface models” where we have a detection probability derived from distance sampling, can be found in Miller et al. (2013).  
More info on bootstrap failure can be found in Rubin (1981).  
See the paper for a complete description of this!  
This full process is covered in Miller et al. (2022), based on a collaboration with the NOAA Southwest Fisheries Science Centre and Duke University.  
Based on various employments, contracts, visits etc.  
I’m trying to develop a website with these resources at calgary.converged.yt. Please let me know if you want to contribute!  
converged.yt/talks/edinburghgam2024 
References
Baddeley, A., & Turner, R. (2000). Practical Maximum Pseudolikelihood for Spatial Point Patterns. Australian & New Zealand Journal of Statistics, 42(3), 283–322.
Berman, M., & Turner, T. R. (1992). Approximating Point Process Likelihoods with GLIM. Applied Statistics, 41(1), 31. https://doi.org/10.2307/2347614
Bravington, M. V., Miller, D. L., & Hedley, S. L. (2021). Variance Propagation for Density Surface Models. Journal of Agricultural, Biological and Environmental Statistics. https://doi.org/10.1007/s13253021004382
Dovers, E., Stoklosa, J., & Warton, D. I. (2024). Fitting logGaussian Cox processes using generalized additive model software. The American Statistician, 1–17. https://doi.org/10.1080/00031305.2024.2316725
Jacobson, E. K., Henderson, E. E., Miller, D. L., Oedekoven, C. S., Moretti, D. J., & Thomas, L. (2022). Quantifying the response of Blainville’s beaked whales to U.S. Naval sonar exercises in Hawaii. Marine Mammal Science, 38(4), 1549–1565. https://doi.org/10.1111/mms.12944
Miller, D. L. (2019). Bayesian views of generalized additive modelling. arXiv:1902.01330 [Stat]. https://arxiv.org/abs/1902.01330
Miller, D. L., Becker, E. A., Forney, K. A., Roberts, J. J., Cañadas, A., & Schick, R. S. (2022). Estimating uncertainty in density surface models. PeerJ, 10, e13950. https://doi.org/10.7717/peerj.13950
Miller, D. L., Burt, M. L., Rexstad, E. A., & Thomas, L. (2013). Spatial models for distance sampling data: Recent developments and future directions. Methods in Ecology and Evolution, 4(11), 1001–1010. https://doi.org/10.1111/2041210X.12105
Miller, D. L., Glennie, R., & Seaton, A. E. (2019). Understanding the Stochastic Partial Differential Equation Approach to Smoothing. Journal of Agricultural, Biological and Environmental Statistics. https://doi.org/10.1007/s1325301900377z
Pedersen, E. J., Miller, D. L., Simpson, G. L., & Ross, N. (2019). Hierarchical generalized additive models in ecology: An introduction with mgcv. PeerJ, 7, e6876. https://doi.org/10.7717/peerj.6876
Rubin, D. (1981). The Bayesian Bootstrap. The Annals of Statistics, 9(1), 130–134.
Silverman, B. W. (1982). On the Estimation of a Probability Density Function by the Maximum Penalized Likelihood Method. The Annals of Statistics, 10(3), 795–810. https://doi.org/10.1214/aos/1176345872
Simpson, D., Illian, J. B., Lindgren, F., Sørbye, S. H., & Rue, H. (2016). Going off grid: Computationally efficient inference for logGaussian Cox processes. Biometrika, 103(1), 49–70. https://doi.org/10.1093/biomet/asv064
Wood, S. N. (2008). Fast stable direct fitting and smoothness selection for generalized additive models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(3), 495–518.
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(1), 3–36.
Wood, S. N. (2017). Generalized Additive Models. An Introduction with R (2nd ed.). CRC Press.
Wood, S. N., & Fasiolo, M. (2017). A generalized FellnerSchall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. Biometrics, 73(4), 1071–1081. https://doi.org/10.1111/biom.12666
Wood, S. N., Pya, N., & S{\"a}fken, B. (2016). Smoothing Parameter and Model Selection for General Smooth Models. Journal of the American Statistical Association, 111(516), 1548–1563. https://doi.org/10.1080/01621459.2016.1180986