2 Software installation and first steps

Edited by: T. Hengl

This section contains instruction on how to install and use software to run predictive soil mapping and export results to GIS or web applications. It has been written (as most of the book) for Linux users, but should not be too much of a problem to adopt to Microsoft Windows OS and/or Mac OS.

2.1 List of software in use

Software combination used in this book.

Figure 2.1: Software combination used in this book.

For processing the covariates we used a combination of Open Source GIS software, primarily SAGA GIS (Conrad et al. 2015), packages raster (Hijmans and van Etten 2017), sp (Pebesma and Bivand 2005), and GDAL (Mitchell and GDAL Developers 2014) for reprojecting, mosaicking and merging tiles. GDAL and parallel packages in R are highly suitable for processing large data.

Software (required):

R script used in this tutorial you can download from the github. As a gentle introduction to R programming languange and soil classes in R we recommend the chapter on importing and using soil data. Some more example of SAGA GIS + R usage you can find in the soil covariates chapter. To visualize spatial predictions in a web-browser or Google Earth you could also consider following the soil web-maps tutorial. As a gentle introduction to R programming languange and spatial classes in R we recommend following the Geocomputation with R book. Obtaining also the R reference card is highly recommended.

2.2 Installing software on Ubuntu OS

On Ubuntu (often the recommended standard for GIS community) main software can be installed within 10–20 minutes. We start with installing GDAL, proj4 and some packages that you might need later on:

sudo apt-get install libgdal-dev libproj-dev libjasper-dev
sudo apt-get install gdal-bin python-gdal

Next, we can install R and RStudio. For R studio you can use the CRAN distribution or the optimized distribution provided by (the former REvolution company; now Microsoft):

wget https://mran.blob.core.windows.net/install/mro/3.4.3/microsoft-r-open-3.4.3.tar.gz
tar -xf microsoft-r-open-3.4.3.tar.gz
cd microsoft-r-open/
sudo ./install.sh

Note that the R versions are constantly being updated so you will need to replace the URL based on the information provided on the home page (http://mran.microsoft.com). Once you run install.sh you will have to accept the license terms two times before the installation can be completed. If everything went succesful, you can get the session info by:

sessionInfo()
#> R version 3.4.3 (2017-11-30)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 16.04.4 LTS
#> 
#> Matrix products: default
#> BLAS: /opt/microsoft/ropen/3.4.3/lib64/R/lib/libRblas.so
#> LAPACK: /opt/microsoft/ropen/3.4.3/lib64/R/lib/libRlapack.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] methods   stats     graphics  grDevices utils     datasets  base     
#> 
#> other attached packages:
#>  [1] raster_2.6-7           rgdal_1.2-16           plotKML_0.5-9         
#>  [4] quantregForest_1.3-7   RColorBrewer_1.1-2     randomForest_4.6-12   
#>  [7] gstat_1.1-5            rpart_4.1-11           plyr_1.8.4            
#> [10] aqp_1.15               boot_1.3-20            sp_1.2-5              
#> [13] GSIF_0.5-4             microbenchmark_1.4-2.1 RevoUtils_10.0.7      
#> [16] RevoUtilsMath_10.0.1  
#> 
#> loaded via a namespace (and not attached):
#>  [1] splines_3.4.3       Formula_1.2-2       highr_0.6          
#>  [4] latticeExtra_0.6-28 pixmap_0.4-11       yaml_2.1.16        
#>  [7] pillar_1.0.1        backports_1.1.2     lattice_0.20-35    
#> [10] digest_0.6.13       checkmate_1.8.5     colorspace_1.3-2   
#> [13] htmltools_0.3.6     Matrix_1.2-12       XML_3.98-1.9       
#> [16] bookdown_0.7.12     scales_0.5.0        intervals_0.15.1   
#> [19] htmlTable_1.11.1    tibble_1.4.1        ggplot2_2.2.1      
#> [22] RSAGA_0.94-5        nnet_7.3-12         lazyeval_0.2.1     
#> [25] survival_2.41-3     magrittr_1.5        evaluate_0.10.1    
#> [28] MASS_7.3-47         xts_0.10-1          foreign_0.8-69     
#> [31] class_7.3-14        FNN_1.1             tools_3.4.3        
#> [34] dismo_1.1-4         shapefiles_0.7      data.table_1.10.4-3
#> [37] stringr_1.2.0       munsell_0.4.3       cluster_2.0.6      
#> [40] plotrix_3.7         colorRamps_2.3      compiler_3.4.3     
#> [43] e1071_1.6-8         spacetime_1.2-1     rlang_0.1.6        
#> [46] classInt_0.1-24     grid_3.4.3          rstudioapi_0.7     
#> [49] htmlwidgets_0.9     base64enc_0.1-3     rmarkdown_1.8      
#> [52] gtable_0.2.0        codetools_0.2-15    reshape_0.8.7      
#> [55] gridExtra_2.3       zoo_1.8-0           knitr_1.18         
#> [58] Hmisc_4.0-3         rprojroot_1.3-1     stringi_1.1.6      
#> [61] Rcpp_0.12.14        acepack_1.4.1       xfun_0.1
system("gdalinfo --version")

This shows for example that the this installation of R is based on the Ubuntu 16.* LTS and the version of GDAL is up to date. Using an optimized distribution of R (read more about “The Benefits of Multithreaded Performance with Microsoft R Open”) is especially important if you plan to use R for production purposes i.e. to optimize computing and generation of soil maps for large amount of pixels.

To install RStudio we can run:

sudo apt-get install gdebi-core
wget https://download1.rstudio.org/rstudio-1.1.447-amd64.deb 
sudo gdebi rstudio-1.1.447-amd64.deb
sudo rm rstudio-1.1.447-amd64.deb

Again, RStudio is constantly updated so you might have to adjust the rstudio version and distribution.

Predictive soil mapping is about making maps, and maps require a GIS software so that one can open view overlay and analyze the maps. GIS software recommended for soil mapping in this book is SAGA GIS, QGIS, GRASS GIS and Google Earth. To install SAGA GIS on Ubuntu we can use:

sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable
sudo apt-get update
sudo apt-get install saga

If installation was succesful, you should be able to access SAGA command line also from R by using:

system("saga_cmd --version")

To install QGIS (https://download.qgis.org/) you might first have to add the location of the debian libraries:

sudo sh -c 'echo "deb http://qgis.org/debian xenial main" >> /etc/apt/sources.list'  
sudo sh -c 'echo "deb-src http://qgis.org/debian xenial main " >> /etc/apt/sources.list'  
sudo apt-get update 
sudo apt-get install qgis python-qgis qgis-plugin-grass

Other utility software that you might need include htop that allows you to track processing progress:

sudo apt-get install htop iotop

and some additional libraries used be devtools, geoR and similar can be installed via:

sudo apt-get install build-essential automake; 
        libcurl4-openssl-dev pkg-config libxml2-dev;
        libfuse-dev mtools libpng-dev libudunits2-dev

You might also need the 7z software for easier compression and pigz for parallelized compression:

sudo apt-get install pigz zip unzip p7zip-full 

2.3 RStudio

RStudio is, in principle, the main R scripting environment and can be used to control all other software used in the course. A more detailed RStudio tutorial is available at: RStudio — Online Learning. Consider also following some spatial data tutorials e.g. by James Cheshire (http://spatial.ly/r/). Below is an example of RStudio session with R editor on right and R console on left.

RStudio is a commonly used R editor written in C++.

Figure 2.2: RStudio is a commonly used R editor written in C++.

To install all required R packages used in some script at once, you can use:

ls <- c("rgdal", "raster", "GSIF", "plotKML", 
        "nnet", "plyr", "ROCR", "randomForest", 
        "psych", "mda", "h2o", "dismo", "grDevices", 
        "snowfall", "hexbin", "lattice", "ranger", 
        "soiltexture", "aqp", "colorspace",
        "randomForestSRC", "ggRandomForests", "scales",
        "xgboost", "parallel", "doParallel", "caret")
new.packages <- ls[!(ls %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)

This will basically check if some package is installed already, then install it if it is missing only. You can put this line at top of each R script that you share so that anybody using that script will automatically get all missing packages.

The h2o package requires Java libraries, so you should first install Java by using e.g.:

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
java -version

2.4 plotKML and GSIF packages

Many examples in the GSIF course rely on the top 5 most commonly used packages for spatial data: (1) sp and rgdal, (2) raster, (3) plotKML and (4) GSIF. To install most up-to-date version of plotKML/GSIF, you can also use the R-Forge versions of the package:

if(!require(GSIF)){
  install.packages("GSIF", repos=c("http://R-Forge.R-project.org"), 
                 type = "source", dependencies = TRUE)
}

A copy of the most-up-to-date stable versions of plotKML and GSIF is also available on github. To run only some specific function from GSIF package you could do for example:

source_https <- function(url, ...) {
   # load package
   require(RCurl)
   # download:
   cat(getURL(url, followlocation = TRUE, 
       cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")), 
       file = basename(url))
   source(basename(url))
}
source_https("https://raw.githubusercontent.com/cran/GSIF/master/R/OCSKGM.R")

To test if these packages work properly create soil maps and visualize them in Google Earth by running the following lines of code (see also function: fit.gstatModel):

library(GSIF)
library(sp)
library(boot)
library(aqp)
library(plyr)
library(rpart)
library(splines)
library(gstat)
library(quantregForest)
library(plotKML)
demo(meuse, echo=FALSE)
omm <- fit.gstatModel(meuse, om~dist+ffreq, meuse.grid, method="quantregForest")
#> Fitting a Quantile Regression Forest model...
#> Fitting a 2D variogram...
#> Saving an object of class 'gstatModel'...
om.rk <- predict(omm, meuse.grid)
#> Subsetting observations to fit the prediction domain in 2D...
#> Prediction error for 'randomForest' model estimated using the 'quantreg' package.
#> Generating predictions using the trend model (RK method)...
#> [using ordinary kriging]
#> 
100% done
#> Running 5-fold cross validation using 'krige.cv'...
#> Creating an object of class "SpatialPredictions"
om.rk
#>   Variable           : om 
#>   Minium value       : 1 
#>   Maximum value      : 17 
#>   Size               : 153 
#>   Total area         : 4964800 
#>   Total area (units) : square-m 
#>   Resolution (x)     : 40 
#>   Resolution (y)     : 40 
#>   Resolution (units) : m 
#>   Vgm model          : Exp 
#>   Nugget (residual)  : 2.34 
#>   Sill (residual)    : 8.32 
#>   Range (residual)   : 5760 
#>   RMSE (validation)  : 1.7 
#>   Var explained      : 75.2% 
#>   Effective bytes    : 1226 
#>   Compression method : gzip
#plotKML(om.rk)
Example of plotKML output.

Figure 2.3: Example of plotKML output.

2.5 Connecting R and SAGA GIS

SAGA GIS is an extensive GIS geoprocessor software with over 600 functions. SAGA GIS can not be installed from RStudio (it is not a package for R). Instead, you need to install SAGA GIS using the installation instructions from the software homepage. After you have installed SAGA GIS, you can send processes from R to SAGA GIS by using the saga_cmd command line interface:

if(!Sys.info()['sysname']=="Linux"){
  saga_cmd = "C:/Progra~1/SAGA-GIS/saga_cmd.exe"
} else {
  saga_cmd = "saga_cmd"
}
system(paste(saga_cmd, "-v"))

To use some SAGA GIS function you need to carefully follow the SAGA GIS command line arguments. For example,

library(plotKML)
library(rgdal)
library(raster)
data("eberg_grid")
gridded(eberg_grid) <- ~x+y
proj4string(eberg_grid) <- CRS("+init=epsg:31467")
writeGDAL(eberg_grid["DEMSRT6"], "./extdata/DEMSRT6.sdat", "SAGA")
system(paste(saga_cmd, 'ta_lighting 0 -ELEVATION "./extdata/DEMSRT6.sgrd" 
             -SHADE "./extdata/hillshade.sgrd" -EXAGGERATION 2'))
Deriving hillshading using SAGA GIS and then visualizing the result in R.

Figure 2.4: Deriving hillshading using SAGA GIS and then visualizing the result in R.

2.6 Connecting R and GDAL

Another very important software for handling spatial data (and especially for exchanging / converting spatial data) is GDAL. GDAL also needs to be installed separately (for Windows machines use e.g. “gdal-201-1800-x64-core.msi”) and then can be called from command line:

if(.Platform$OS.type == "windows"){
  gdal.dir <- shortPathName("C:/Program files/GDAL")
  gdal_translate <- paste0(gdal.dir, "/gdal_translate.exe")
  gdalwarp <- paste0(gdal.dir, "/gdalwarp.exe") 
} else {
  gdal_translate = "gdal_translate"
  gdalwarp = "gdalwarp"
}
system(paste(gdalwarp, "--help"))

We can use GDAL to reproject grid from the previous example:

system('gdalwarp ./extdata/DEMSRT6.sdat ./extdata/DEMSRT6_ll.tif -t_srs \"+proj=longlat +datum=WGS84\"')
library(raster)
plot(raster("./extdata/DEMSRT6_ll.tif"))
Ebergotzen DEM reprojected in geographical coordinates.

Figure 2.5: Ebergotzen DEM reprojected in geographical coordinates.