Tree-Based Techniques for Predicting the Compression Index of Clayey Soils

Document Type : Regular Article


1 Geofirst Pty Ltd., 2/7 Luso Drive, Unanderra, NSW 2526, Australia

2 Department of Civil Engineering, Faculty of Engineering, University of Malaya, Kuala Lumpur, 50603, Malaysia

3 Assistant Professor, Department of Engineering, Payame Noor University, Tehran, Iran

4 Department of Civil Engineering, Technical and Vocational University (TVU), Tehran, Iran


Compression index is an effective assessment of primary consolidation settlement of clayey soils, but the process of obtaining compression index is time-consuming and laborious. Thus, in the present study, we developed two classical tree-based techniques: random forest (RF) and extreme gradient boosting (XGBoost), to predict the compression index of clayey soils. To establish these two models, we collected an available dataset—including 391 consolidation tests for soils—from previously published research. The dataset consists of six physical parameters, including the initial void ratio, natural water content, liquid limit, plastic index, specific gravity, and soil compression index. The first five parameters are the models’ inputs while the compression index is the models’ output. We trained both two tree-based models using 90% of the entire dataset and used the remaining 10% to assess the well-trained models, which is consistent with the published research. Several statistical metrics, such as coefficient of determination (R2), root mean squared error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE), are the criteria for assessing the models’ performance. The results show that the RF model has better accuracy in predicting compression index compared with the XGBoost model because it outperforms the XGBoost model both on the training and testing datasets. The performance of the RF model is R2 of 0.928 and 0.818, RMSE of 0.016 and 0.025, MAPE of 7.046% and 10.082%, and MAE of 0.012 and 0.020 on the training and testing datasets, respectively. The sensitivity analysis reveals that the initial void ratio has a significant impact on the compression index of clayey soils.


[1]     Lee C, Hong SJ, Kim D, Lee W. Assessment of Compression Index of Busan and Incheon Clays with Sedimentation State. Mar Georesources Geotechnol 2015;33:23–32.
[2]     Nagaraj TS, Srinivasa BR, Murthy S. A critical reappraisal of compression index euqations. Geotechnique 1987:135–6.
[3]     Shimobe S, Spagnoli G. A General Overview on the Correlation of Compression Index of Clays with Some Geotechnical Index Properties. Geotech Geol Eng 2022;40:311–24.
[4]     Onyelowe KC, Ebid AM, Nwobia L, Dao-Phuc L. Prediction and performance analysis of compression index of multiple-binder-treated soil by genetic programming approach. Nanotechnol Environ Eng 2021;6.
[5]     Gregory AS, Whalley WR, Watts CW, Bird NRA, Hallett PD, Whitmore AP. Calculation of the compression index and precompression stress from soil compression test data. Soil Tillage Res 2006;89:45–57.
[6]     McCabe BA, Sheil BB, Long MM, Buggy FJ, Farrell ER, Quigley P. Discussion: Empirical correlations for the compression index of Irish soft soils. Proc Inst Civ Eng Geotech Eng 2016;169:90–2.
[7]     Alkroosh I, Alzabeebee S, Al-Taie AJ. Evaluation of the accuracy of commonly used empirical correlations in predicting the compression index of Iraqi fine-grained soils. Innov Infrastruct Solut 2020;5:1–10.
[8]     Danial Mohammadzadeh S, Kazemi SF, Mosavi A, Nasseralshariati E, Tah JHM. Prediction of compression index of fine-grained soils using a gene expression programming model. Infrastructures 2019;4:1–12.
[9]     Singh A, Noor S. Soil Compression Index Prediction Model for Fine Grained Soils. Int J Innov Eng Technol 2012;1:4.
[10]   Al-khafaji AW. Compression Index Equations 2018.
[11]   Yoon GL, Kim BT, Jeon SS. Empirical correlations of compression index for marine clay from regression analysis. Can Geotech J 2004;41:1213–21.
[12]   Park H Il, Lee SR. Evaluation of the compression index of soils using an artificial neural network. Comput Geotech 2011;38:472–81.
[13]   Onyejekwe S, Kang X, Ge L. Assessment of empirical equations for the compression index of fine-grained soils in Missouri. Bull Eng Geol Environ 2015;74:705–16.
[14]   Ghanizadeh AR, Ghanizadeh A, Asteris PG, Fakharian P, Armaghani DJ. Developing bearing capacity model for geogrid-reinforced stone columns improved soft clay utilizing MARS-EBS hybrid method. Transp Geotech 2023;38:100906.
[15]   Cavaleri L, Barkhordari MS, Repapis CC, Armaghani DJ, Ulrikh DV, Asteris PG. Convolution-based ensemble learning algorithms to estimate the bond strength of the corroded reinforced concrete. Constr Build Mater 2022;359:129504.
[16]   Tan WY, Lai SH, Teo FY, Armaghani DJ, Pavitra K, El-Shafie A. Three Steps towards Better Forecasting for Streamflow Deep Learning. Appl Sci 2022;12.
[17]   Shan F, He X, Jahed Armaghani D, Zhang P, Sheng D. Success and challenges in predicting TBM penetration rate using recurrent neural networks. Tunn Undergr Sp Technol 2022;130:104728.
[18]   Skentou AD, Bardhan A, Mamou A, Lemonis ME, Kumar G, Samui P, et al. Closed-Form Equation for Estimating Unconfined Compressive Strength of Granite from Three Non-destructive Tests Using Soft Computing Models. Rock Mech Rock Eng 2022:
[19]   Indraratna B, Armaghani DJ, Correia AG, Hunt H, Ngo T. Prediction of resilient modulus of ballast under cyclic loading using machine learning techniques. Transp Geotech 2022:100895.
[20]   Ghanizadeh AR, Delaram A, Fakharian P, Armaghani DJ. Developing Predictive Models of Collapse Settlement and Coefficient of Stress Release of Sandy-Gravel Soil via Evolutionary Polynomial Regression. Appl Sci 2022;12:9986.
[21]   He B, Armaghani DJ, Lai SH. Assessment of tunnel blasting-induced overbreak: A novel metaheuristic-based random forest approach. Tunn Undergr Sp Technol 2023;133:104979.
[22]   Mittal M, Satapathy SC, Pal V, Agarwal B, Goyal LM, Parwekar P. Prediction of coefficient of consolidation in soil using machine learning techniques. Microprocess Microsyst 2021;82:103830.
[23]   Ozer M, Isik NS, Orhan M. Statistical and neural network assessment of the compression index of clay-bearing soils. Bull Eng Geol Environ 2008;67:537–45.
[24]   Kurnaz TF, Dagdeviren U, Yildiz M, Ozkan O. Prediction of compressibility parameters of the soils using artificial neural network. Springerplus 2016;5:1801.
[25]   Kordnaeij A, Kalantary F, Kordtabar B, Mola-Abasi H. Prediction of recompression index using GMDH-type neural network based on geotechnical soil properties. Soils Found 2015;55:1335–45.
[26]   Nguyen MD, Pham BT, Ho LS, Ly HB, Le TT, Qi C, et al. Soft-computing techniques for prediction of soils consolidation coefficient. Catena 2020;195.
[27]   Benbouras MA, Kettab Mitiche R, Zedira H, Petrisor AI, Mezouar N, Debiche F. A new approach to predict the compression index using artificial intelligence methods. Mar Georesources Geotechnol 2019;37:704–20.
[28]   Bentéjac C, Csörgő A, Martínez-Muñoz G. A comparative analysis of gradient boosting algorithms. Artif Intell Rev 2021;54:1937–67.
[29]   Farzin Kalantary. Prediction of compression index using artificial neural network. Sci Res Essays 2012;7.
[30]   Smiti A. A critical overview of outlier detection methods. Comput Sci Rev 2020;38:100306.
[31]   Wickham H, Stryjewski L. 40 Years of Boxplots. HadCoNz 2011:1–17.
[32]   Drew JH, Glen AG, Leemis LM. Computing the cumulative distribution function of the Kolmogorov-Smirnov statistic. Comput Stat Data Anal 2000;34:1–15.
[33]   Breiman L. Random Forests. Mach Learn 2001;45:5–32.
[34]   He B, Lai SH, Mohammed AS, Muayad M, Sabri S, Ulrikh DV. Estimation of Blast-Induced Peak Particle Velocity through the Improved Weighted Random Forest Technique. Appl Sci 2022;12:5019.
[35]   Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., New York, NY, USA: ACM; 2016, p. 785–94.
[36]   Chen T, Guestrin C. XGBoost: Reliable Large-scale Tree Boosting System Tianqi. Proc. 22nd SIGKDD Conf. Knowl. Discov. Data Min., San Francisco, CA, USA: 2015, p. 13–7.
[37]   Zhou J, Qiu Y, Zhu S, Armaghani DJ, Khandelwal M, Mohamad ET. Estimation of the TBM advance rate under hard rock conditions using XGBoost and Bayesian optimization. Undergr Sp 2021;6:506–15.
[38]   Liashchynskyi P, Liashchynskyi P. Grid Search, Random Search, Genetic Algorithm: A Big Comparison for NAS 2019:1–11.
[39]   Buitinck L, Louppe G, Blondel M, Pedregosa F, Mueller A, Grisel O, et al. API design for machine learning software: experiences from the scikit-learn project 2013:1–15.
[40]   Biau G, Scornet E. A random forest guided tour. Test 2016;25:197–227.
[41]   Hastie T et. all. Statistics The Elements of Statistical Learning. Springer Ser Stat 2009;27:745.
[42]   Goldstein A, Kapelner A, Bleich J, Pitkin E. Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation. J Comput Graph Stat 2015;24:44–65.
[43]   Tiwari B, Ajmera B. New Correlation Equations for Compression Index of Remolded Clays. J Geotech Geoenvironmental Eng 2012;138:757–62.
[44]   Akbarimehr D, Eslami A, Imam R. Correlations between Compression Index and Index Properties of Undisturbed and Disturbed Tehran clay. Geotech Geol Eng 2021;39:5387–93.
[45]   Erzin Y, MolaAbasi H, Kordnaeij A, Erzin S. Prediction of Compression Index of Saturated Clays Using Robust Optimization Model. J Soft Comput Civ Eng 2020;4:1–16.