Article title: Predicting the Distribution of Arsenic in Groundwater by a Geospatial Machine Learning Technique in the Two Most Affected Districts of Assam, India: The Public Health Implications
The resources contain grid averaged arsenic (As) concentration (mean concentrations) and predictor variables in Jorhat and Golaghat districts of Assam. The GPS data has an error of +/- 250 m for location privacy. Basic workflow of random forest model in python environment is also provided. The final model was determined through random 10-fold cross-validation and also through recursive feature elimination. Final model was used in the prediction of arsenic probability in unknown locations. We have also checked spatial cross-validation. The results were found to be consistent and confirmed the overall distribution of high/moderate/low-risk zones for As in groundwater.
Some of the original data can be downloaded from: https://www.hydroshare.org/resource/bbe23dfacab647568a18dc338114d6d7/ reference: https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2017WR022485
Published research article can be found here: https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2021GH000585