Soil Research Soil Research Society
Soil, land care and environmental research
RESEARCH ARTICLE

Tree-based techniques to predict soil units

H. S. K. Pinheiro A D , P. R. Owens B , L. H. C. Anjos A , W. Carvalho Júnior C and C. S. Chagas C
+ Author Affiliations
- Author Affiliations

A Agronomy Institute – Soil Department, Federal Rural University of Rio de Janeiro, Rodovia BR 465, Km 7, Campus Universitário, Zona Rural, 23897-000 Seropédica, RJ, Brazil.

B USDA Dale Bumpers Small Farms Research Center, 6883 S State Highway 23, Booneville, AR 72927, USA.

C Embrapa Solos (National Center of Soil Research), R. Jardim Botânico 1024, Rio de Janeiro, RJ, Brazil.

D Corresponding author. Email: lenask@gmail.com

Soil Research - https://doi.org/10.1071/SR16060
Submitted: 5 March 2016  Accepted: 24 April 2017   Published online: 1 June 2017

Abstract

Quantitative soil–landscape models offer a method for conducting soil surveys that use statistical tools to predict natural patterns in the occurrence of particular map units across a landscape. The aim of the present study was to predict soil units in a watershed with wide variation in landscape conditions. The approach relied on a modelling of soil-forming factors in order to understand the variability of the landscape components in the region. Models were generated for landscape attributes related to pedogenesis, specifically elevation, slope, curvature, compound topographic index, Euclidean distance from stream networks, landforms map, clay minerals index, iron oxide index and normalised difference vegetation index, along with an existing geology map. The soil classification was adapted from the World Reference Base System for Soil Resources, and the predominant soil taxonomic orders observed were Ferrasols, Acrisols, Gleysols, Cambisols, Fluvisols and Regosols. The algorithms used to predict the soil units were based on decision tree (DT) and random forest (RF) methods. The criteria used to evaluate the models’ performance were statistical indices, coherence between predicted units and the legacy map, as well as accuracy checks based on control samples. The best performing model was found to be the RF algorithm, with resulting statistical indices considered excellent (overall = 0.966, kappa = 0.962). The accuracy of the map as determined by control points was 67.89%, with a kappa value of 61.39%.

Additional keywords: digital soil mapping, landscape modelling, pedology, soil classification, soil landscape.


References

Barthold FK, Wiesmeier M, Breuer L, Frede HG, Wu J, Blank FB (2013) Land use and climate control the spatial distribution of soil types in the grasslands of Inner Mongolia. Journal of Arid Environments 88, 194–205.
Land use and climate control the spatial distribution of soil types in the grasslands of Inner Mongolia.CrossRef |

Behrens T, Foster H, Scholten T, Steinrucken U, Spies ED, Goldschmitt M (2005) Digital soil mapping using artificial neural networks. Journal of Plant Nutrition and Soil Science 168, 21–33.
Digital soil mapping using artificial neural networks.CrossRef |

Ben-Dor E, Taylor RG, Hill J, Demattê JAM, Whiting ML, Chabrillat S, Sommer S, Donald LS (2008) Imaging spectrometry for soil applications. Advances in Agronomy 97, 321–392.
Imaging spectrometry for soil applications.CrossRef |

Boettinger JL (2010) Environmental covariates for digital soil mapping in the western USA. In ‘Digital soil mapping. Bridging research, environmental application, and operation’. (Eds JL Boettinger, DW Howell, AC Moore, AE Hartemink, S Kienast-Brown) pp. 17–27. (Springer: Berlin)

Boettinger JL, Ramsey RD, Bodily JM, Cole NJ, Kienast-Brown S, Nield SJ, Saunders AM, Stum AK (2008) Landsat spectral data for digital soil mapping. In ‘Digital soil mapping with limited data’. (Eds AE Hartemink, AB Mcbratney, ML Mendonça-Santos) pp. 192–202. (Springer-Verlag: New York, NY)

Bou Kheir R, Greveb HM, Abdallahc C, Dalgaardb T (2010) Spatial soil zinc content distribution from terrain parameters: a GIS based decision-tree model in Lebanon. Environmental Pollution 158, 520–528.
Spatial soil zinc content distribution from terrain parameters: a GIS based decision-tree model in Lebanon.CrossRef |

Breiman L (2001) Random forests. Machine Learning 45, 5–32.
Random forests.CrossRef |

Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) ‘Classification and regression trees.’ (Wadsworth & Brooks: Monterey, CA)

Carvalho Filho A, Lumbreras JF, Wittern KP, Lemos AL, Santos RD, Calderano Filho B, Mothci EP, Larach JOI, Conceição M, Tavares NP, Santos HG, Gomes JBV, Calderano SB, Gonçalves AO, Martorano LG, Santos LCO, Barreto WO, Claessen MEC, Paula JL, Souza JLR, Lima TC, Antonello LL, Lima PC, Oliveira RP, Aglio MLD (2003) Mapa de reconhecimento de baixa intensidade dos solos do estado do Rio de Janeiro Scale 1 : 250.000 (In Portuguese) Rio de Janeiro, RJ (Embrapa Solos) Boletim de Pesquisa e Desenvolvimento 1, 32

Carvalho Júnior W, Chagas CS, Fernandes EI, Vieira CE, Schaefer CEG, Bhering SB, Francelino MR (2011) Digital soilscape mapping of tropical hillslope areas by neural networks. Scientia Agrícola 68, 691–696.
Digital soilscape mapping of tropical hillslope areas by neural networks.CrossRef |

Carvalho Júnior W, Silva Chagas C, Muselli A, Pinheiro HSK, Pereira NR, Bhering SB (2014) Método do hipercubo latino condicionado para a amostragem de solos na presença de covariáveis ambientais visando o mapeamento digital de solos. Revista Brasileira de Ciência do Solo 38, 386–396.
Método do hipercubo latino condicionado para a amostragem de solos na presença de covariáveis ambientais visando o mapeamento digital de solos.CrossRef |

Chagas CS (2006) Mapeamento digital de solos por correlação ambiental e redes neurais em uma bacia hidrográfica de domínio de mar de morros. PhD Thesis, Federal University of Vicosa. [In Portuguese with an English abstract]

Chagas CS, Carvalho Júnior W, Bhering SB (2011) Integração de dados do Quickbird e atributos do terreno no mapeamento digital de solos por redes neurais artificiais. Revista Brasileira de Ciência do Solo 35, 693–704.
Integração de dados do Quickbird e atributos do terreno no mapeamento digital de solos por redes neurais artificiais.CrossRef | [In Portuguese with an English abstract]

Chagas CS, Vieira CAO, Fernandes Filho EI (2013) Comparison between artificial neural networks and maximum likelihood classification in digital soil mapping. Revista Brasileira de Ciência do Solo 37, 339–351.

Chagas CS, Carvalho W, Bhering SB, Calderano Filho B (2016) Spatial prediction of soil surface texture in a semiarid region using random forest and multiple linear regressions. Catena 139, 232–240.

Congalton RG, Green K (1999) ‘Assessing the accuracy of remotely sensed data: principles and practices.’ (Lewis Publishers: New York)

Crivelenti RC, Coelho RM, Adami SF, Oliveira SRM (2009) Mineração de dados para a inferência de relações solo-paisagem em mapeamentos digitais de solo. Pesquisa Agropecuária Brasileira 44, 1707–1715.
Mineração de dados para a inferência de relações solo-paisagem em mapeamentos digitais de solo.CrossRef | [In Portuguese with an English abstract]

Dantas JRC, Almeida JR, Lins GA (2008). Impactos ambientais na bacia hidrográfica de Guapi–Macacu e suas conseqüências para o abastecimento de água nos municípios do leste da Baía de Guanabara. Série Gestão e Planejamento Ambiental, 10, Coleção Artigos Técnicos no. 7, Centro de Tecnologia Mineral do Ministério da Ciência e Tecnologia (CETEM/MCT), Rio de Janeiro, Brazil.

Departamento De Recursos Minerais (DRM) (1979) Projeto Carta Geológica do Estado do Rio de Janeiro. Petrópolis: folha SF-23-Z-B-IV-2. Rio de Janeiro. Escala 1 : 50.000. UFRJ, Rio de Janeiro.

DRM (1980a) Projeto Carta Geológica do Estado do Rio de Janeiro. Teresópolis: folha SF-23-Z-B-11-3. Rio de Janeiro. Escala 1 : 50.000. DRM-RJ/GEOSOL, 1v, Belo Horizonte.

DRM (1980b) Projeto Carta Geológica do Estado do Rio de Janeiro. Nova Friburgo: folha SF-23-Z-B-II-4. Rio de Janeiro. Escala 1 : 50.000. DRM-RJ/GEOSOL, 1v, Belo Horizonte.

DRM (1980c) Projeto Carta Geológica do Estado do Rio de Janeiro. Rio Bonito: folha SF-23-Z-B-IV-1. Rio de Janeiro. Escala 1 : 50.000. DRM-RJ, Niterói.

DRM (1981a) Projeto Carta Geológica do Estado do Rio de Janeiro. Itaboraí: folha SF-23-Z-B-V-1. Rio de Janeiro. Escala 1 : 50.000. GEOMITEC – DRM/RJ, Niterói.

DRM (1981b) Projeto Carta Geológica do Estado do Rio de Janeiro. Cava: folha SF-23-Z-B-IV-1. Rio de Janeiro. Escala 1 : 50.000. DRM-RJ/GEOSOL, 2v, Belo Horizonte.

DRM (1984) Projeto Carta Geológica do Estado do Rio de Janeiro. Itaipava: folhaSF-23-Z-B-I-4. Rio de Janeiro. Escala 1 : 50.000. UFRJ, Rio de Janeiro.

Ehsani AH, Quiel F (2008) Geomorphometric feature analysis using morphometric parameterization and artificial neural networks. Geomorphology 99, 1–12.
Geomorphometric feature analysis using morphometric parameterization and artificial neural networks.CrossRef |

Environmental Systems Research Institute (ESRI) (2010) ‘ArcGIS and ArcINFO v.10.’ [CD-ROM] (ESRI: Redlands, CA)

Figueiredo MA, Varajão AFDC, Fabris JD, Loutfi IS, Carvalho AP (2004) Alteração superficial e pedogeomorfologia no sul do Complexo Bação – Quadrilátero Ferrífero (MG). Revista Brasileira de Ciência do Solo 28, 713–729.

Gallant JC, Austin JM (2015) Derivation of terrain covariates for digital soil mapping in Australia. Soil Research 53, 895–906.

Giasson E, Sarmento EC, Weber E, Flores CA, Hasenack H (2011) Decision trees for digital soil mapping on subtropical basaltic steeplands. Scientia Agrícola 68, 167–174.
Decision trees for digital soil mapping on subtropical basaltic steeplands.CrossRef |

Giasson E, Hartemink AE, Tornquist CG, Teske R, Bagatini T (2013) Evaluation of five algorithms of decision trees and three digital elevation models for digital soil mapping at semidetail level at the Lageado Grande watershed, RS, Brazil. Ciência Rural 43, 1967–1973.
Evaluation of five algorithms of decision trees and three digital elevation models for digital soil mapping at semidetail level at the Lageado Grande watershed, RS, Brazil.CrossRef |

Godinho Silva SH, Owens PR, Menezes MD, Santos R, Junior W, Curi N (2014) A technique for low cost soil mapping and validation using expert knowledge on a watershed in Minas Gerais, Brazil. Soil Science Society of America Journal 78, 1310–1319.
A technique for low cost soil mapping and validation using expert knowledge on a watershed in Minas Gerais, Brazil.CrossRef |

GRASS Development Team (2013) Geographic Resources Analysis Support System (GRASS v.7.0.3) GIS. Available at http://grass.osgeo.org/home/copyright [accessed 13 May 2014].

Grimm R, Behrens T, Märker M, Elsenbeer H (2008) Soil organic carbon concentrations and stocks on Barro Colorado Island – digital soil mapping using random forests analysis. Geoderma 146, 102–113.
Soil organic carbon concentrations and stocks on Barro Colorado Island – digital soil mapping using random forests analysis.CrossRef |

Han J, Kamber M (2001) ‘Datamining: concepts and techniques.’ (Morgan Kaufmann/CA: San Francisco, CA)

Hengl TE, Heuvelink GBM (2004). New challenges for predictive soil mapping. In ‘Global Workshop on Digital Soil Mapping’, 14–17 September 2004, Montpellier, France. pp. 1–9. (AGRO-M/INRA: Montpellier, France). Available at http://spatial-analyst.net/PDF/Hengl_Heuvelink_DSM2004.pdf [accessed 15 June 2014].

Hutchinson MF (1993) Development of continent-wide DEM with applications to terrain and climate analysis. In ‘Environmental modeling with GIS’. (Ed. MF Goodchild) pp. 392–399. (Oxford University Press: New York, NY)

Instituto Brasileiro de Geografia e Estatística (IBGE) (1974) Escala 1 : 50.000. Carta topográfica. Instituto Brasileiro de Geografia e Estatística, Diretoria de Geociências. Departamento de Cartografia. Nova Friburgo, folha SF-23-Z-B-II-4, Rio de Janeiro. Available at www.ibge.gov.br/home/#sub_download [accessed 20 April 2011].

IBGE (1979a) dados digitais da carta topográfica na escala 1 : 50.000. Instituto Brasileiro de Geografia e Estatística, Diretoria de Geociências, Departamento de Cartografia, Itaipava. Available at www.ibge.gov.br/home/#sub_download [accessed 20 April 2011].

IBGE (1979b) Escala 1 : 50.000. Carta topográfica. Instituto Brasileiro de Geografia e Estatística. Diretoria de Geociências. Departamento de Cartografia, Itaboraí, folha SF-23-Z-B-V-1. 2. 135a, Rio de Janeiro Available at www.ibge.gov.br/home/#sub_download [accessed 20 April 2011].

IBGE (1979c) Escala 1 : 50.000. Carta topográfica. Instituto Brasileiro de Geografia e Estatística, Diretoria de Geociências, Departamento de Cartografia. Petrópolis, folha SF-23-Z-B-IV-2. 2.135a, Rio de Janeiro. Available at www.ibge.gov.br/home/#sub_download [accessed 20 April 2011].

IBGE (1983) Escala 1 : 50.000. Carta topográfica. Instituto Brasileiro de Geografia e Estatística, Diretoria de Geociências, Departamento de Cartografia. Teresópolis, folha SF-23-Z-B-II-3 MI-2716-3. 2. 135a, Rio de Janeiro. Available at www.ibge.gov.br/home/#sub_download [accessed 20 April 2011].

IBGE (2008) Modelo de Elevação Projeto RJ-25. Metadados. Rio de Janeiro. Instituto Brasileiro de Geografia e Estatística. Diretoria de Geociências. Departamento de Cartografia, Pontado Forno, Folha SF-24-Y-A-IV-3-NE 2748-3-NE. Available at geoftp.ibge.gov.br/ [accessed 20 April 2011].

IBGE (2013) ‘Manual Técnico de Pedologia.’ 3rd edn. (Diretoria de Geociências, Coordenação de Recursos Naturais e Estudos Ambientais, IBGE).

International Union of Soil Sciences (IUSS) Working Group (2014) World reference base for soil resources. World Soil Resources Reports No. 106, FAO, Rome, Italy.

Iwahashi J, Pike RJ (2007) Automated classifications of topography from DEMs by an unsupervised nested-means algorithm and a three-part geometric signature. Geomorphology 86, 409–440.
Automated classifications of topography from DEMs by an unsupervised nested-means algorithm and a three-part geometric signature.CrossRef |

Jasiewicz J, Stepinski TF (2013) Geomorphons – a pattern recognition approach to classification and mapping of landforms. Geomorphology 182, 147–156.
Geomorphons – a pattern recognition approach to classification and mapping of landforms.CrossRef |

Lagacherie P, McBratney AB (2007) Spatial soil information systems and spatial soil inference systems: perspectives for digital soil mapping. In ‘Digital soil mapping: an introductory perspective’. Developments in Soil Science 31. (Eds P Lagacherie, AB McBratney, M Voltz) pp. 389–399. (Elsevier: Amsterdam, The Netherlands)

Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33, 159–174.
The measurement of observer agreement for categorical data.CrossRef |

Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2, 18–22.

Lorenzetti R, Barbetti R, Fantappiè M, L’abate G, Costantini EA (2015) Comparing data mining and deterministic pedology to assess the frequency of WRB reference soil groups in the legend of small scale maps. Geoderma 237–238, 237–245.
Comparing data mining and deterministic pedology to assess the frequency of WRB reference soil groups in the legend of small scale maps.CrossRef |

McBratney AB, Mendonça-Santos ML, Minasny B (2003) On digital soil mapping. Geoderma 117, 3–52.
On digital soil mapping.CrossRef |

McKenzie NJ, Ryan PJ (1999) Spatial prediction of soil properties using environmental correlation. Geoderma 89, 67–94.
Spatial prediction of soil properties using environmental correlation.CrossRef |

Menezes MD, Silva SHG, Owens PR, Curi N (2013) Digital soil mapping approach based on fuzzy logic and field expert knowledge. Ciência e Agrotecnologia 37, 287–298.
Digital soil mapping approach based on fuzzy logic and field expert knowledge.CrossRef |

Minasny B, McBratney AB (2006) A conditioned Latin hypercube method for sampling in the presence of ancillary information. Computers & Geosciences 32, 1378–1388.
A conditioned Latin hypercube method for sampling in the presence of ancillary information.CrossRef |

Minasny B, McBratney AB, Santos ML, Santos HG (2003) Revisão sobre funções de pedotransferência (PTFs) e novos métodos de predição de classes de solos e atributos do solo. Documentos no. 45, Embrapa Solos, Rio de Janeiro, Brazil. [In Portuguese with an English abstract]

Monserud RA, Leemans R (1992) Comparing global vegetation maps with the kappa statistic. Ecological Modelling 62, 275–293.
Comparing global vegetation maps with the kappa statistic.CrossRef |

Moonjun R, Farshad A, Shresha DP, Vaiphase C (2010) Artificial neural network and decision tree in predictive soil mapping of Hoi Num Rin sub-watershed. In ‘Digital soil mapping. Bridging research, environmental application, and operation’. (Eds JL Boettinger, DW Howell, AC Moore, AE Hartemink, S Kienast-Brown) pp. 151–163. (Springer)

Moore ID, Grayson RB, Ladson AR (1991) Digital terrain modelling: a review of hydrological. Geomorphological and biological application. Hydrology Processes 5, 3–30.

Moran C, Bui E (2002) Spatial data mining for enhanced soil map modelling. International Journal of Geographical Information Science 16, 533–549.
Spatial data mining for enhanced soil map modelling.CrossRef |

Odeh IOA, Crawford M, McBratney AB (2007) Digital mapping of soil attributes for regional and catchment modelling, using ancillary covariates, statistical and geostatistical techniques. In ‘Digital soil mapping: an introductory perspective’. Developments in Soil Science 31. (Eds P Lagacherie, AB McBratney, M Voltz) pp. 437–453. (Elsevier: Amsterdam, The Netherlands)

Oliveira JB, Moniz AC Oliveira JB, Moniz AC (1975) Levantamento pedológico detalhado da estação experimental de Ribeirão Preto, SP. Bragantia 34, 1–55.

Pinheiro HSK (2012) Digital soil mapping by artificial neural network in Guapi–Macacu watershed, RJ. MSc Thesis, Federal Rural University of Rio de Janeiro. [In Portuguese with an English abstract]

Pinheiro HSK, Owens PR, Chagas CS, Júnior WC, Anjos LHC (2016) Applying artificial neural networks utilizing geomorphons to predict soil classes in a Brazilian Watershed. In ‘Digital Soil mapping across paradigms, scales and boundaries’. (Eds G Zhang, D Brus, F Liu, X Song, P Lagacherie) pp. 89–102. (Springer: Singapore)

R Development Core Team (2013) R: a language and environment for statistical computing. (R Foundation for Statistical Computing: Vienna, Austria). Available at http://www.r-project.org [accessed 15 June 2014].

Rad MRP, Toomanian N, Khormali F, Brungard CW, Komaki CB, Bogaert P (2014) Updating soil survey maps using random forest and conditioned Latin hypercube sampling in the loess derived soils of northern Iran. Geoderma 232, 97–106.

Roudier P, Beaudette DE, Hewitt AE (2012) A conditioned Latin hypercube sampling algorithm incorporating operational constraints. In ‘Digital soil assessments and beyond: proceedings of the 5th Global Workshop on Digital Soil Mapping’, 10–13 April 2012, Sydney, NSW, Australia. (Eds B Minasny, BP Malone, AB McBratney) pp. 227–231. (CRC Press)

Sabins FF (1997) ‘Remote sensing principles and interpretation.’ 3rd edn. (W. H. Freeman and Co.: New York, NY)

Sabins FF (1999) Remote sensing for mineral exploration. Ore Geology Reviews 14, 157–183.
Remote sensing for mineral exploration.CrossRef |

Santos HG, Jacomine PKT, Anjos LHC, Oliveira VA, Lumbreras JF, Coelho MR, Almeida JA, Cunha TJF, Oliveira JB (2013) ‘Sistema Brasileiro de Classificação de Solos.’ 3rd edn. (Embrapa Solos: Rio de Janeiro, Brazil)

Silva LD, Cunha H (2001) ‘Geologia do Estado do Rio de Janeiro: texto explicativo do mapa geológico do Estado do Rio de Janeiro.’ (Companhia de Pesquisa de Recursos Minerais (CPRM): Brasília, Brazil) [in Portuguese]

Stum AK, Boettinger JL, White MA, Ramsey RD (2010) Random forests applied as a soil spatial predictive model in arid Utah. In ‘Digital soil mapping. Bridging research, environmental application, and operation’. (Eds JL Boettinger, DW Howell, AC Moore, AE Hartemink, S Kienast-Brown) pp. 179–190. (Springer: Berlin)

Taghizadeh-Mehrjardi R, Toomania N, Khavaninzadeha AR, Jafari A, Triantafili J (2016) Predicting and mapping of soil particle-size fractions with adaptive neuro-fuzzy inference and ant colony optimization in central Iran. European Journal of Soil Science 67, 707–725.
Predicting and mapping of soil particle-size fractions with adaptive neuro-fuzzy inference and ant colony optimization in central Iran.CrossRef |

ten Caten A (2011) Mapeamento digital de solos: metodologias para atender a demanda por informacao especial em solos. PhD Thesis, Federal University of Santa Maria, Brazil. [In Portuguese with an English abstract]

Varajão CAC, Salgado AAR, Varajão AFDC, Braucher CF, Nalini Júnior HA (2009) Estudo da evolução da paisagem do quadrilátero ferrífero (Minas Gerais. Brasil) por meio da mensuração das taxas de erosão (10be) e da pedogênese. Revista Brasileira de Ciência do Solo 33, 1425

Vaz de Melo L (2009) Uso de redes neurais artificiais no mapeamento de solos na Bacia do Rio Turvo Sujo - Viçosa MG. MSc Thesis, Federal University of Vicosa. [In Portuguese with an English abstract]

Villela SM, Mattos A (1975) ‘Hidrologia aplicada.’ (McGraw-Hill do Brasil: São Paulo, Brazil)

Yang WL, Yang JW (1997) An assessment of AVHRR/NDVI–ecoclimatological relations in Nebraska. USA. International Journal of Remote Sensing 18, 2161–2180.
An assessment of AVHRR/NDVI–ecoclimatological relations in Nebraska. USA.CrossRef |

Zhu AX (2001) Soil mapping using GIS, expert knowledge, and fuzzy logic. Soil Science Society of America Journal 65, 1463–1472.
Soil mapping using GIS, expert knowledge, and fuzzy logic.CrossRef |



Export Citation