Predictive Soil Mapping (PSM) is based on applying statistical and/or machine learning techniques to fit models for the purpose of producing spatial and/or spatiotemporal predictions of soil variables i.e. maps of soil properties and classes at different resolutions. It is a multidisciplinary field combining statistics, data science, soil science, physical geography, remote sensing, geoinformation science and number of other sciences (Scull et al. 2003; McBratney, Mendonça Santos, and Minasny 2003; Henderson et al. 2004; Boettinger et al. 2010; Zhu et al., n.d.). Predictive Soil Mapping with R is about understanding the main concepts behind soil mapping, mastering R packages that can be used to produce high quality soil maps, and about optimizing all processes involved so that also the production costs can be reduced.
The main idea behind Predictive vs traditional expert-based soil mapping is that production of maps: (a) is based on using state-of-the-art statistical methods to ensure objectivity of maps (including objective uncertainty assessment vs expert judgment), and (b) is driven by automation of the processes so that overall soil data production costs can be reduced and updates of the maps implemented without a need for large investments. R in that sense is a logical platform to develop PSM workflows and applications, especially thanks to the vibrant and productive R spatial interest group activities and also thanks to the increasingly professional soil data packages such as the soiltexture, aqp, soilprofile, soilDB and similar.
Book is divided into sections covering theoretical concepts, preparation of covariates, model selection and evaluation, prediction and visualization and distribution of final maps. Most of chapters contain R code examples that try to illustrate main processing steps and give practical instructions to developers and applied users.
Most of methods described in this book are based on the following publications:
Hengl, T., Nussbaum, M., Wright, M. N., Heuvelink, G. B., and Gräler, B. (2018) Random Forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ Preprints.
Sanderman, J., Hengl, T., Fiske, G., (2017) The soil carbon debt of 12,000 years of human land use. PNAS, doi:10.1073/pnas.1706103114
Ramcharan, A., Hengl, T., Nauman, T., Brungard, C., Waltman, S., Wills, S., & Thompson, J. (2018). Soil Property and Class Maps of the Conterminous United States at 100-Meter Spatial Resolution. Soil Science Society of America Journal, 82(1), 186-201.
Hengl, T., Leenaars, J. G., Shepherd, K. D., Walsh, M. G., Heuvelink, G. B., Mamo, T., et al. (2017) Soil nutrient maps of Sub-Saharan Africa: assessment of soil nutrient content at 250 m spatial resolution using machine learning. Nutrient Cycling in Agroecosystems, 109(1), 77–102.
Hengl T, Mendes de Jesus J, Heuvelink GBM, Ruiperez Gonzalez M, Kilibarda M, Blagotic A, et al. (2017) SoilGrids250m: Global gridded soil information based on machine learning. PLoS ONE 12(2): e0169748. doi:10.1371/journal.pone.0169748
Shangguan, W., Hengl, T., de Jesus, J. M., Yuan, H., & Dai, Y. (2017). Mapping the global depth to bedrock for land surface modeling. Journal of Advances in Modeling Earth Systems, 9(1), 65-88.
Hengl, T., Roudier, P., Beaudette, D., & Pebesma, E. (2015) plotKML: scientific visualization of spatio-temporal data. Journal of Statistical Software, 63(5).
Gasch, C. K., Hengl, T., Gräler, B., Meyer, H., Magney, T. S., & Brown, D. J. (2015) Spatio-temporal interpolation of soil water, temperature, and electrical conductivity in 3D+ T: The Cook Agronomy Farm data set. Spatial Statistics, 14, 70–90.
Hengl, T., Nikolic, M., & MacMillan, R. A. (2013) Mapping efficiency and information content. International Journal of Applied Earth Observation and Geoinformation, 22, 127–138.
Hengl, T., Heuvelink, G. B., & Rossiter, D. G. (2007) About regression-kriging: from equations to case studies. Computers & geosciences, 33(10), 1301-1315.
Hengl, T. (2006) Finding the right pixel size. Computers & geosciences, 32(9), 1283–1298.
Some other publications / books on the subject of Predictive Soil Mapping include:
Malone, B.P, Minasny, B., McBratney, A.B., (2016) Using R for Digital Soil Mapping. Progress in Soil Science ISBN: 9783319443270, 262 pages.
California Soil Resource Lab, (2017) Open Source Software Tools for Soil Scientists, UC Davis.
McBratney, A.B., Minasny, B., Stockmann, U. (Eds) (2018) Pedometrics. Progress in Soil Science ISBN: 9783319634395, 720 pages.
FAO, (2018) Soil Organic Carbon Mapping Cookbook. 2nd edt. ISBN: 9789251304402
Readers are also encouraged to obtain and study the following R books before following some of the more complex exercises in this book:
Bivand, R., Pebesma, E., Rubio, V., (2013) Applied Spatial Data Analysis with R. Use R Series, Springer, Heidelberg, 2nd Ed. 400 pages.
Kabacoff, R.I., (2011) R in Action: Data Analysis and Graphics with R. Manning publications, ISBN: 9781935182399, 472 pages.
Kuhn, M., Johnson, K. (2013) Applied Predictive Modeling. Springer Science, ISBN: 9781461468493, 600 pages.
Lovelace, R., Nowosad, J., Muenchow, J., (2018) Geocomputation with R. forthcoming book with CRC Press.
Reimann, C., Filzmoser, P., Garrett, R., Dutter, R., (2008) Statistical Data Analysis Explained Applied Environmental Statistics with R. Wiley, Chichester, 337 pages.
For the most recent developments in the R-spatial community refer to https://r-spatial.github.io and/or the R-sig-geo mailing list.
This book is constantly updated and contributions are welcome (through pull requests, but also through adding new chapters) provided that some minimum requirements are met. To contribute a complete new chapter please contact the editors first. Some minimum requirements to contribute a chapter are:
- The data needs to be available in majority of chapter, best via some R package or web-source.
- Chapter should focus on implementing computing in R (it should be written as R tutorial).
- All examples should be computationally efficient with not more than 30 secs of computing time per process on a single core system.
- Theoretical basis for methods and interpretation of results should be based on peer-review publications. This book is not intended to host primary research / experimental results, but only to supplement existing research publications.
- Chapter should consist of at least 1500 words and at most 3500 words.
- The topic of the chapter must be closely connected to theme of soil mapping, soil geographical databases, methods for processing spatial soil data and similar.
In principle, all submitted chapters should also follow closely the five pillars of Wikipedia, especially: Verifiability, Reproducibility, No original research, Neutral point of view, Good faith, No conflict of interest, and no personal attacks.
bookdown::render_book("index.Rmd") # to build the book browseURL("docs/index.html") # to view it
The authors are grateful to numerous contributions from colleagues around the world, especially for the contributions by the current and former ISRIC — World Soil Information colleagues: Robert MacMillan, Gerard Heuvelink, Johan Leenaars, Jorge Mendes de Jesus, Wei Shangguan, David G. Rossiter, and many others. ISRIC is a research foundation funded primarily by the Dutch Government. The authors also grateful to the support received via the AfSIS project, which has been funded by the Bill and Melinda Gates Foundation (BMGF) and the Alliance for a Green Revolution in Africa (AGRA). Many soil data processing examples in the book are based on the R code developed by Dylan Beuadette, Pierre Roudier, Julian Moeys, Brandad Malone and many other developers. Author is also grateful to comments and suggestions to the methods explained in the book by Travis Nauman, Amanda Ramcharan, David G. Rossiter and Julian Moeys.
SoilGrids are based on numerous soil profile data sets that have been kindly contributed by various national and international agencies: the USA National Cooperative Soil Survey Soil Characterization database (http://ncsslabdatamart.sc.egov.usda.gov/) and profiles from the USA National Soil Information System, Land Use/Land Cover Area Frame Survey (LUCAS) Topsoil Survey database (Tóth, Jones, and Montanarella 2013), Africa Soil Profiles database (Leenaars 2014), Australian National Soil Information by CSIRO Land and Water (Karssies 2011; Searle 2014), Mexican National soil profile database (Instituto Nacional de Estadística y Geografía (INEGI) 2000) provided by the Mexican Instituto Nacional de Estadística y Geografía / CONABIO, Brazilian national soil profile database (Cooper et al. 2005) provided by the University of São Paulo, Chinese National Soil Profile database (Shangguan et al. 2013) provided by the Institute of Soil Science, Chinese Academy of Sciences, soil profile archive from the Canadian Soil Information System (MacDonald and Valentine 1992) and Forest Ecosystem Carbon Database (FECD), ISRIC-WISE (Batjes 2009), The Northern Circumpolar Soil Carbon Database (Hugelius et al. 2013), eSOTER profiles (Van Engelen and Dijkshoorn 2012), SPADE (Hollis et al. 2006), Unified State Register of soil resources RUSSIA (Version 1.0. Moscow — 2014), National Database of Iran provided by the Tehran University, points from the Dutch Soil Information System (BIS) prepared by Wageningen Environmental Research, and others. We are also grateful to USA’s NASA, USGS and USDA agencies, European Space Agency Copernicus projects, JAXA (Japan Aerospace Exploration Agency) for distributing vast amounts of remote sensing data (especially MODIS, Landsat, Copernicus land products and elevation data), and to the Open Source software developers of the packages rgdal, sp, raster, caret, mlr, ranger, h2o and similar, and without which predictive soil mapping would most likely not be possible.
This book has been inspired by the the Geocomputation with R book, an Open Access book edited by Robin Lovelace, Jakub Nowosad and Jannes Muenchow. Many thanks to Robin Lovelace for helping with rmarkdown and for giving some initial tips for compiling and organizing book. The author is also grateful to the numerous software/package developers, especially Edzer Pebesma, Roger Bivand, Robert Hijmans, Markus Neteler, Tim Appelhans, and Hadley Wickham, that have enabled a generation of researchers and applied projects.
Every effort has been made to trace copyright holders of the materials used in these materials. Should we, despite all our efforts have overlooked contributors please contact the author and we shall correct this unintentional omission without any delay and will acknowledge any overlooked contributions and contributors in future updates.
Data availability: All data used in this book is either available through R packages or is available via the github repository. If not mentioned otherwise, all code presented is available under the GNU General Public License v2.0.
Copyright: © 2018 Hengl et al.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.