Land Use Change Modelling Using Logistic Regression, Random Forest and Additive Logistic Regression in Kubu Raya Regency, West Kalimantan
1 Department of Statistics, IPB University, Jl. Lingkar Akademik Kampus IPB Dramaga, Bogor 16680, Indonesia
2 Centre for International Forestry Research – World Agroforestry Cntre, Jl. CIFOR, Situ Gede, Sindang Barang, Bogor 16115, Indonesia
*) Correspondence: alfanugraha@apps.ipb.ac.id
Abstract
Kubu Raya District is a district in the province of West Kalimantan which has a wetland ecosystem including a high-density swamp or peatland ecosystem along with an extensive area of mangroves. The function of wetland ecosystems is essential for fauna, as a source of livelihood for the surrounding community and as storage reservoir for carbon stocks. Most of the land in Kubu Raya District is peatland. As a consequence, peat has long been used for agriculture and as a source of livelihood for the community. Along with the vast area of peat, the district also has a potential high risk of peat fires. This study aims to predict land use changes in Kubu Raya District using three statistical machine learning models, specifically Logistic Regression (LR), Random Forest (RF) and Additive Logistic Regression (ALR). Land cover map data were acquired from the Ministry of Environment and Forestry and subsequently reclassified into six types of land cover at a resolution of 100 m. The land cover data were employed to classify land use or land cover class for the Kubu Raya district, for the years 2009, 2015 and 2020. Based on model performance, RF provides greater accuracy and F1 score as opposed to LR and ALR. The outcome of this study is expected to provide knowledge and recommendations that may aid in developing future sustainable development planning and management for Kubu Raya District.
Keywords: land use change modelling, wetlands, logistic regression, random forest, additive logistic regression.
1. Introduction
Peat forest is an area that has the potential to store carbon stocks many times more than tropical rain forests or mineral forests (Adeolu et al., 2018; Aditya et al., 2020). The richness of biodiversity in peat forests also varies, interacting with each other to establish peat ecosystems. However, despite the potential benefits, peat forest is vulnerable to forest and land degradation (Abraham et al., 2023). Land clearing is commonly followed by burning, while dry peatland is highly susceptible to fire. Fire on peatland cause the release of carbon into the atmosphere. The process of releasing greenhouse gases via the oxidation of peat into the atmosphere triggers a high level of greenhouse gas emissions (Abuhay et al., 2023).
The amount of emissions resulting from degraded peatlands is greater than the emissions produced by other ecosystems. Based on the research conducted by Rahsia et al. (2021), the amount of CO2 emissions on peatlands that burned in May-July 2019 in the city of Pontianak indicated that CO2 flux during the measurement period ranged from 183-595 tons of CO2. The research undertaken by Aguilera et al. (2023) maintains that the estimated calculation of carbon released by deforested and drained tropical peatlands is 31 Mg C per hectare, per annum. Emissions from peatlands have a significant impact on the surrounding biodiversity. The temperature of the Earth will increase due to heat trapped in the atmosphere. Preserving the Earth from experiencing a temperature rise over 2 degrees Celsius in the future can be achieved by conserving peat ecosystem areas (Akbar et al., 2023).
Peat ecosystem protection and management where attempts are made to preserve the function of the peat ecosystem and prevent loss to the peat ecosystem have been included in the Regulation of The Government of The Republic of Indonesia. The regulation also mandates stakeholders to prepare a Peat Ecosystem Protection and Management Plan document (RPPEG), that contains analysis and recommendations on peat ecosystem management (Beroho et al., 2023). The document is summarised into a document for the protection and management of peat ecosystems at the provincial and district levels. The RPPEG is expected to ensure the preservation of peat ecosystem functions in West Kalimantan, specifically in Kubu Raya District.
Kubu Raya District is a regency situated in West Kalimantan Province with somewhat extensive peat wetlands that support numerous economic activities. The most common activities that are carried out as a source of livelihood for people on peatland include farm management, fish farming, and other activities with high economic value (Wang, 2022). However, the utilisation of peatlands frequently fails to pay attention to the application of the principle of sustainable peatland use. One peatland exploitation activity that produces widespread damage is the draining of peatlands which causes drought resulting in peat fires, notably in the dry season. Therefore, the Indonesian government has focused on Kubu Raya District with the aim of restoring the peat. Based on these conditions, conducting land use change analysis is essential to support the government's goal of ensuring the preservation of peat ecosystem functions and to specifically assist in preparing the RPPEG documents, particularly in relation to Kubu Raya District.
Land use change analysis can be completed by building a model. The land use change model may facilitate the understanding of the process of land use change and its driving factors (Bohai et al., 2023; Cao et al., 2023). In addition, the model can be used to predict changes in land use and land cover by means of a simulation process based on a geographic information system using a statistical machine learning approach. The most common model in classification is Logistic Regression (LR), which is an example of Generalised Linear Models (GLM) (Chen et al., 2022). In linear models, such as linear regression, the assumption that the response variables are normally distributed is required. In fact, response variables are repeatedly found in the form of binomial, Poisson, gamma and several exponential family distributions. Hence, the GLM has developed to overcome this problem.
GLM have three components: (i) the random component specifies the response variable and its probability distribution; (ii) linear predictors; and (iii) link function connects the random component with the linear predictors. In the context of the GLM, the distribution of a random component in LR is binomial and the link function is the natural log of the odds known as logit link function. Nonetheless, LR only captures the linear relationship between the logit link function and the explanatory variables (Tsiripidis, 2023). Nonlinear relationship patterns can be identified by replacing the linear predictors component in the GLM with the additive components. Hastie & Tibshirani (1985) developed and improved the component in GLM using the additive approach, as clearly explained in Subchapter 2.4. GAM employ a series of smoothing splines to express the nonlinear relationship between the expected mean of responses and a set of predictors. Similar to GLM, the adaptation of the logit link function in GAM is known as Additive Logistic Regression (ALR). ALR is able to model more complex relationships than those observed in GLM (Daba et al., 2022).
Several previous studies of land use change modelling, namely land use change in the Western Highlands of Vietnam using LR, in a tropical mountain landscape of Northern Ecuador using GAM and in Waterloo, Ontario, Canada, compare the performance of Markov Chain, LR, ALR and survival analysis (Siddik et al., 2022). However, the study area does not comprise peatlands, land change analysis is only undertaken in forest areas, together with agricultural and development areas. In general, peatlands are terrestrial wetland ecosystems. Thus, the study aims to develop a land use change model and predict the existing wetland in Kubu Raya District, West Kalimantan Province using LR from the GLM family, ALR as a component of GAM and Random Forest (RF), as a tree-based model that is also capable of identifying nonlinear relationships (Emmanuel et al., 2023).
Based on the forest and land fire early warning and detection system, the largest changes ensued in West Kalimantan due to the fires that occurred between 2014-2019. Various factors contribute to the forest and land fires in West Kalimantan, although the large area of peatlands, particularly in Kubu Raya District are the main potential for fire disasters. These peatlands are prone to fire caused by human activities to clear and dry the peatland area and then develop it into plantation areas (Gao et al., 2023). Numerous research has been carried out to prevent fire via a hydrological approach (Assidik et al., 2021), forest and fire hazards modelling employing the Hybrid Fire Index and to provide awareness of community empowerment (Akbar et al., 2023). Therefore, this study uses another approach through land use change modelling to prevent and overcome peatland fires. The years of observation were conducted in 2009, 2015 and 2020. This research predominantly focuses on the class of wetlands in land cover because peatland comes under this particular classification. The comparative study on the performance model aims to ascertain the potential changes in wetlands, enabling the development of valuable planning recommendations.
2. Research Methods
2.1. Study Area
The study area considered for this study was the district of Kubu Raya. Kubu Raya is a district resulting from the expansion of Pontianak District which was formed via Law No. 35 of 2007 with an area of 6,985.20 km2. Kubu Raya is divided into nine sub-districts, as shown in Figure 1. Kubu Raya is located on the west of West Kalimantan Province. The study area is located between longitudes 108°35' - 109°58' E and latitudes 0°44' N - 1°01' S. The physical character of the district consists of land areas and coastal islands that include seas. Kubu Raya is also composed of watersheds, namely Kapuas watershed in the downstream section, allowing many natural products from the upstream area of Kapuas River to flow into Kubu Raya. This allows the development of processing industries from various natural product commodities in Kubu Raya along the Kapuas River. Kubu Raya has 39 small islands situated in the marine coastal area. These islands are predominantly inhabited by the community who work as fishermen. The population has limited access to public services due to unequal development in remote areas.
2.2. Data
The original LC map covers the entire region of Indonesia. In this case, the national map has to be cut out of one dataset using an administrative area in the study area (see the illustration below). Observations during this period indicated numerous changes in land cover, predominantly due to fires and other land use change activities across the wetlands. The LC MoEF map has a resolution of 100 x 100 m with a one-pixel unit equivalent to an area of one hectare (Gaur, 2023). The response variables applied in the land cover classification based on the PPIC documents are forest land (F), cropland (C), grassland (G), wetlands (W), settlements (S), as well as other lands (O).
The occurrence of land use land cover change is caused by a combination of several driving factors triggering land change. The complexity, research site conditions and predetermined response variables are considered to establish the driving factors that are responsible for land use change or the explanatory variables determined based on the previous study listed in Table 1, along with the data sources used shown in Table 2.
Variable |
Description |
Source |
X1 |
Distance to road |
|
X2 |
Distance to river |
(Ghosh, 2020) |
X3 |
Slope |
(Baig, 2021) |
X4 |
Elevation |
(Baig, 2021) |
X5 |
Distance to lost wetlands |
(Ghosh, 2020) |
X6 |
Population density |
(Ghosh, 2020) |
X7 |
Distance to city |
(Ghosh, 2020) |
Data |
Source |
LC 2009, 2015, 2020 |
LC MoEF |
Distance to road |
Maps of the Earth’s Surface (RBI) from BIG |
Distance to river |
RBI from Geospatial Information Agency (BIG) |
Slope |
SRTM-DEM from United States Geological Survey (USGS) |
Elevation |
SRTM-DEM from USGS |
Distance to lost wetlands |
LC MoEF 2009 and 2015 |
Population density |
Central Bureau of Statistics (BPS) Kubu Raya 2010 |
Distance to city |
LC MoEF |
2.3. Classification of Land Cover Classes
The MoEF data has 23 LC classes which are then regrouped into six LC classes as a response variable with reference to the definition provided by the Intergovernmental Panel on Climate Change (IPCC) (2003) document, specifically forest land (F), cropland (C), grassland (G), wetlands (W), settlements (S), and other lands (O). The reclassification step will assist in capturing changes in land cover by means of a more specific class with a focus on wetland cover as a class that is confirmed to have experienced significant changes. Table 3 presents the detail of the classification of the LC class.
LC MoEF Code |
LC MoEF Class |
ID |
LC IPCC |
2001 |
Primary dryland forest |
1 |
Forest land |
2002 |
Secondary dryland forest |
1 |
Forest land |
2004 |
Primary mangrove forest |
4 |
Wetlands |
20041 |
Secondary mangrove forest |
4 |
Wetlands |
2005 |
Primary swamp forest |
4 |
Wetlands |
20051 |
Secondary swamp forest |
4 |
Wetlands |
2006 |
Plantation forest |
1 |
Wetlands |
2007 |
Dry shrub |
3 |
Grassland |
2010 |
Estate crop |
2 |
Cropland |
2012 |
Settlement areas |
5 |
Settlements |
2014 |
Bare ground |
6 |
Other lands |
2500 |
Cloud |
6 |
Other lands |
3000 |
Savanna and grasses |
3 |
Grassland |
5001 |
Open water |
4 |
Wetlands |
20071 |
Wet shrub |
4 |
Wetlands |
20091 |
Pure dry agriculture |
2 |
Cropland |
20092 |
Mixed dry agriculture |
2 |
Cropland |
20093 |
Paddy field |
2 |
Cropland |
20094 |
Fish pond/aquaculture |
4 |
Wetlands |
20121 |
Port and harbour |
5 |
Settlements |
20122 |
Transmigration areas |
5 |
Settlements |
20141 |
Mining areas |
5 |
Settlements |
50011 |
Open swamp |
4 |
Wetlands |
2.4. Logistic Regression
LR is a component of GLM. GLM consists of a random component, linear predictors and a link function relating two components. In the LR model, the response variable or the random component in the GLM, is assumed to have a binomial distribution. The natural parameter for the binomial distribution is the log odds of response outcome 1, the supposed logit of . The logit is the link function for binary random components.
Logit link function provides the probability of land use land cover change; in this particular study the change in wetlands as a function of explanatory variables. The likelihood of land use change in every pixel is a function of the value of the explanatory variable at the same pixel. LR is employed to retrieve the variable conversion of wetlands in Kubu Raya District (Géant et al., 2023). The LR model (Equation 1) was formulated to be written as follows:
|
(1) |
where:
The model can be transformed linearly where the link function transforms to the natural parameter known as the canonical link.
|
is the logit link function of |
|
commonly termed logit transformation, as shown in Equation 2: |
|
|
|
(2) |
||||
The probability of the occurrence of wetlands can be determined at each pixel using the glm() command with the argument family=’binomial’ in R that is available in the lulcc package as a function to execute LR analysis.
LR is a common method in predictive models used in land use change modelling. LR was initially applied in relation to deforested land in Massachusetts, land use change in Thailand by Buya (2020) and Vietnam by Huu et al., (2022). Based on the simulation results obtained by those studies, LR is reasonably accurate at predicting the quantity of land use changes, nevertheless the model depended on the quantity and completeness of the driving factors.
2.5. Random Forest
Random Forest (RF) is a combined tree method in which the number of trees produced forms a forest enabling the analysis to be performed on a group of these trees. RF uses the majority vote mechanism of the various trees created to solve classification or prediction problems. Girma et al. (2022) described RF on a dataset of size n with a number of q explanatory variables in the following stages:
i. Random sampling is performed by the bootstrap replacement of size n in each training dataset.
ii. Trees are built up to their maximum size (without pruning). The tree's construction is conducted by randomly selecting variables where m explanatory variables are chosen with m < q. From the m explanatory variables, the best is selected as a divider and continues with separating into two new nodes. This process continues until the minimum size of observations in the node is reached.
iii. Steps (i) and (ii) perform many L repetitions to obtain L decision trees.
Similar to LR, the model is ready to use in lulcc and can be performed using the randomForest() command with the default set parameter for the number of trees to grow to the equivalent of 500. The number of variables randomly sampled as candidates at each split is equal to two.
RF has been used in estimation and change in land cover under the urban area. The study compared the machine learning algorithms RF, K-Nearest Neighbour (KNN) and Support Vector Machine (SVM) with advanced deep learning algorithms DeepLabv3+, the variant of Deep Neural Network (DNN). Concerning the performance evaluation of all the algorithms, specifically in the case of urban targets, the DeepLabv3+ recorded the highest overall pixel accuracy and F1 score followed by RF, KNN and SVM. Among the traditional machine learning algorithms, RF classifier exhibits a meaningful contribution from each of the classes but failed to map the river water accurately.
2.6. Additive Logistic Regression
The GAM (Generalised Additive Model) directly accommodates the existence of the non-linear effects of independent variables without having to explicitly understand the structure of these influences. GAM is the extension of GLM that replaces the linear predictors with an additive component. The ALR model is one of the families associated with GAM whose additive components are adapted to the LR model to become an additive logistic model as follows (Equation 3):
|
(3) |
|||||
where |
|
are smooth functions. Based on the above function, all values in the additive component |
||||
|
have a logit link function |
|
|
|||
Not unlike the GLM, the probability to predict wetlands in the GAM is defined in Equation 4 :
|
(4) |
Applied the backfitting algorithm together with the local scoring method to obtain the fit model and estimate the smooth function in a nonparametric fashion. The study on land use change modelling using the GAM achieved a better performance than LR, Markov chain and survival analysis.
The ALR model is not supported in the current version of the lulcc package. In R, Huo et al., (2022) developed the ALR algorithm into the mgcv package. Thus, the fitting and predicting command in mgcv is modified via the source code. The probability of the occurrence of wetlands using ALR in R could be executed using gam command with the argument family=’binomial’.
2.7. Data Analysis Procedure
Land use change modelling were conducted using the statistical program R using lulcc package that supports binary logistic regression and random forests (Jafarpour et al., 2022). The model has been modified and combined with GAM in R using mgcv package (Kumar et al., 2023). The procedure of analysis (Figure 2) in the study was as follows:
1. Collect LC maps of 2009, 2015 and 2020 and spatial parameters.
2. Produce land use change maps and transition probability matrices of land use changes to identify the largest dominant changes.
3. Combine all the maps.
4. Obtain 5% of the sample data using simple random sampling. The sample data will be compared with the overall data to identify the effect of the model for any type of data.
5. Split the sample data into two pieces, 80% is used for the training dataset, whilst the remainder as the test dataset.
6. Perform the LR, RF and ALR model using the training dataset.
7. Evaluate the model using the test dataset based on the accuracy, F1 score and AUC-ROC (Area Under Curve-Receiver Operator Characteristic) metrics (Pontius & Parmentier, 2014).
8. Repeat the process in steps 5 to 7 for the overall data 10 times to evaluate the overall performance of the model in order that the average goodness-of-fit value of each combination of variables is obtained.
9. Generate the prediction maps with the best model and validate by means of the actual map using the multi-label confusion matrix based on the performance value in step 7.
3. Results and Discussion
3.1. Land Cover Maps of Kubu Raya District
Figure 3 illustrates the map of Kubu Raya District with the six LC classes comprising a total area of 853,451 hectares (ha) with dimensions of 1,024 x 1,375 pixels. The details of the total area represented by those classes were considered by calculating the number of pixels multiplied by the area each cell represents. The process to calculate the total area for each class was obtained automatically using the terra package in R. Based on the area calculation of the LC maps from 2009–2020, wetlands and cropland dominated the district of Kubu Raya. The area of LC in each observed year is presented in Table 4.
In 2009, wetlands comprised up to 556,561 ha or 65.2% of the total area of Kubu Raya District whereas cropland covered 32.35%. Six years later in 2015, wetlands decreased dramatically to approximately 425,489 ha or 49.86% of Kubu Raya District. Simultaneously, cropland increased by roughly 10.68% to occupy an area of roughly 367,212 (43.03%). The decline in the area of wetlands continued to occur in the following five years to 2020, covering 396,542 ha or 46.46% of the area. The area of cropland in 2020 dominated to become 46.58% of the total area of Kubu Raya District.
Forest land area increased appreciably between 2009–2020 owing to plantation forests in the LC MoEF being classified as forest land in the IPCC LC class. The development of settlements continued, rising to 10,524 ha in 2020, particularly in the area near Pontianak City. The land use category which changed dynamically comprised other lands. In 2009, it occupied approximately 0.99% of Kubu Raya District, although by 2015, it had increased to roughly 2.58%. Concerning grassland, only minimal changes were observed from 2009 to 2020.
No |
LC |
2009 |
2015 |
2020 |
|||
ha |
% |
ha |
% |
ha |
% |
||
1 |
Forest land |
6,691 |
0.78 |
33,280 |
3.90 |
37,754 |
4.42 |
2 |
Cropland |
276,056 |
32.35 |
367,212 |
43.03 |
397,505 |
46.58 |
3 |
Grassland |
1,975 |
0.23 |
1,404 |
0.16 |
1,489 |
0.17 |
4 |
Wetlands |
556,461 |
65.20 |
425,489 |
49.86 |
396,542 |
46.46 |
5 |
Settlements |
3,849 |
0.45 |
4,027 |
0.47 |
10,524 |
1.23 |
6 |
Other lands |
8,419 |
0.99 |
22,039 |
2.58 |
9,637 |
1.13 |
3.2. Spatial Data Layers of The Independent Variables
The explanatory variable map employs maps from several sources that have been clipped with the boundaries of Kubu Raya District and adjusted to the dimensions in the LC MoEF map. The distance to road variable is generated from arterial roads, local roads and collector roads which are used as driving factors to explain that the closer land use is to the road, the faster land use changes occurred. Distance to the river taken from the main river, population density and distance to the city also reveal something similar; that the closer the land use is to that aspect, the faster the land use changes occurred, considering the spatial proximity of roads and cities as important influencing factors (Ghosh et al., 2020).
The slope and elevation from the Landsat DEM image are employed with the assumption that there is a strong relationship with land use land cover changes. Baig et al. (2021) included slope because the altitude dynamic is frequently associated with anthropogenic activity. The variable distance to the lost wetlands is obtained from the transition of reduced wetland area due to land use change activities in 2009-2015. Proximity to lost wetlands is essential for determining the vulnerable parts of a wetland area (Ghosh et al., 2020). The explanatory variables maps are shown in Figure 4.
3.3. Land Use Change 2009-2015
Identification of land use change was conducted by constructing a transition matrix of land use change in the 2009-2015 period to observe the pattern of dominant changes that occurred at those two points of the year. Based on the land use change transition area matrix, 29 transitions occurred during this specific period (Table 5). This transition also confirms that the dominant change occurred from wetlands to other LC with significant changes observed in 91,545 ha of cropland, 23,595 ha of forest land, and 16,556 ha of other lands. The study conducted by Yang et al. (2023) identified that the land use change activities originated from human activities in an attempt at land use utilisation which is an essential factor in the occurrence of forest and land fires.
Land Cover |
2015 |
|||||
2009 |
F |
C |
G |
W |
S |
O |
Forest land (F) |
6.230 |
126 |
70 |
242 |
0 |
23 |
Cropland (C) |
2.188 |
272.110 |
34 |
192 |
356 |
1.176 |
Grassland (G) |
392 |
125 |
1.149 |
287 |
0 |
22 |
Wetlands (W) |
23.595 |
91.545 |
132 |
424.586 |
46 |
16.557 |
Settlements (S) |
0 |
205 |
19 |
0 |
3.625 |
0 |
Other lands (O) |
875 |
3.101 |
0 |
182 |
0 |
4.261 |
3.4. Land Use Change Modelling
Construction of the land use change model begins by taking a 5% sample point of 42,673 points from the entire area of 2009 LC data regularly for each land cover class to optimise the time spent in building the model compared to using all LC data as a whole, with the objective that the model does not tend to predict most of the LC class. Subsequently, the extraction of values on the LC map and each explanatory variable map is performed based on the coordinates of the sample points that have been previously distributed to form the dataset. The dataset is divided into training data comprising roughly 80% of the sample data or 34,140 points, with the remaining 8,533 points as test data.
The understanding of land use change models in relation to LR, RF, and ALR was conducted on training data and subsequently testing the results on each model with test data. The process is repeated 10 times to obtain 10 goodness-of-fit values. In addition to the sample data, all of the above steps are also applied to the entire observation dataset (Salmona et al., 2023).
Table 6 presents the comparison of the accuracy and F1 scores of the LR, RF and ALR models in 5% of the sample data and the entire data. The RF model performed better than the LR and ALR models on the 5% sample data or when using all the data, on average. This is denoted by a high accuracy value of 0.950 for the sample data followed by 0.952 for all data and an F1 score of 0.962 followed by 0.954 for all data in the RF model compared to the other two models.
The accuracy and F1 scores between models are also displayed visually using the boxplot in Figure 5 and Figure 6. The goodness-of-fit model for the 5% sample data has a marginally broader interval than the overall data.
Goodness-of-fit |
LR |
RF |
ALR |
|||||
5% |
100% |
5% |
100% |
5% |
100% |
|||
Accuracy |
0,856 |
0,854 |
0,950 |
0,952 |
0,880 |
0,878 |
||
F1 |
0,892 |
0,890 |
0,962 |
0,964 |
0,909 |
0,907 |
||
AUC |
0,899 |
0,898 |
0,990 |
0,991 |
0,937 |
0,936 |
||
The classification capability of each model in determining wetland and non-wetland classes is also measured by the ROC curve, which is the curve between the true positive rate and the false positive rate. The model has good classification ability if the ROC curve line is above the diagonal line with the area under the curve (AUC) close to 1. In Figure 7, RF appears to deliver the best performance in both datasets with the curve line furthest from the diagonal line and AUC values of 0.990 and 0.991. Subsequently, ALR exhibited AUC values of 0.937 and 0.936 in both datasets, while LR, identified as the model with the lowest AUC values, recorded 0.899 and 0.989 in the respective datasets. The results of the ROC prove that the RF provided good prediction results in determining the changes of each pixels into wetland rather than non-wetland (Salako et al., 2023).
|
|
The probability map between models is a map where the probability value of a pixel develops into a class of wetlands marked with a lighter colour with a value close to 1. The probability of becoming a class of non-wetlands is darker in colour with a value close to 0. The actual wetland covered an area of 556,461 ha in 2009.
Based on the prediction results in the map, the total area of wetlands in the RF model is closer to its value with an actual total area of approximately 557,871 ha, while the ALR is 565,154 ha and the LR is 574,144 ha. All LC classes are predicted by inputting all 2020 datasets as test data into the RF model, as the model that produces the best performance. Figure 8, presents probability maps for the classes of forest land, cropland, grassland, wetlands, settlements and other lands, respectively.
Figure 9 illustrates the prediction map of the best model by overlaying all prediction maps for each LC. The predicted area of wetlands from the result in 2020 is 391,723 ha, a reduction in contrast to the actual wetland area of 396,542 ha.
3.6. Discussion
The presented study seeks to answer the question: what is the overall accuracy of different statistical methods in predicting the potential changes from class wetlands to non-wetlands? (Zarandian et al., 2023). To answer this question, three different conceptual approaches to modelling land use change, i.e., parametric model, tree-based model and nonparametric model as represented by Logistic Regression (LR), Random Forest (RF) and Additive Logistic Regression (ALR), are developed, quantified and compared for accuracy. These approaches were used to generate models of six different land cover classes and were tested using 2020 data for the Kubu Raya district, West Kalimantan. The results demonstrate that overall accuracy was highest for RF followed by ALR and LR either with a small sample size or full data.
LR and ALR that are used to model land use change were designed to represent one-to-one land use change, with only two classes comprising wetlands to non-wetlands and non-wetlands to wetlands (Ren et al., 2023). Although RF has the ability to address the multiclass classification problem, the execution performed remains the same as the binary classification. To develop the prediction map, all the combinations of two classes have been executed allowing the six probability maps to be established. Based on the ROC curves as a metric evaluation to the binary classification problem, yet again, RF outperformed the other modelling approaches (Penny et al., 2023).
The utilisation of LR as a classification model presents several challenges because it does not require complex parameters and a specific model treatment. The execution time in LR is much faster than the other two models, with each iteration only lasting a few seconds. However, spatial autocorrelation in the error terms of the LR causes bias in the standard errors of the parameter estimates. Spatial autocorrelation in spatial analysis can affect the model for the reason that the pixel value can be predicted by its neighbours (Karasiak et al., 2022). This phenomenon can also be found in several studies (Baig et al., 2021; Buya, 2020; Gaur, 2023). A simple strategy to reduce the impact of spatial autocorrelation is drawing a random sample size of the data to fit the predictive model (Liu et al., 2023).
In contrast, the run time for the ALR and RF lasted half an hour for each iteration in the simulation class. In the study completed by Robinson (2018), the ALR took over eighty-five minutes to perform the simulation model due to the iteration in the back-fitting of smoothing functions. In R, smoothing functions have to be initialised for each selected explanatory variable which has been identified as a nonlinear relationship to perform the classification model using ALR (Aguilera et al., 2023). The response variable also has to be set as a factor data type or categorical data. The smoothing parameters estimation in this research applied the default method, Generalised Cross Validation (GCV). Guarderas et al. (2022) employed beta regression to predict the model and Restricted Maximum Likelihood (REML) as an estimation method for smoothing parameters. The conservative results from the treatment of ALR based on the research explained between 21% to 42% of the variation of the distinct land cover transition in the study region. Moreover, the hyperparameter in RF was not considered in this study (Xia et al., 2023). The number of trees to grow and variables as candidates at each split was set to default, 500 and 2, respectively. Nevertheless, the prediction result using the RF model was more accurate than LR and ALR. RF is the best recommendation for conducting land use change analysis, although it does not rule out the possibility that ALR can also be used by considering several parameters to improve the model (Ren et al., 2023).
Between 2009 to 2015, it was observed that wetlands decreased dramatically in covered areas, as a result of cropland taking over wetlands. Due to this conversion, cropland witnessed substantial expansion, accompanied by significant changes in forest land. Sequestration, the process involving changes from the low carbon stocks area to the high carbon stocks area, is expected to reduce the emissions. However, forest cover used in this study includes the production forest, for instance, industrial plantation forest (Raihan et al., 2023). A different finding is that the pattern of changes in wetlands tended to occur in the locations where the wetlands were close to the road. It indicated that distance to road is a significant variable in assessing the changes to non-wetlands. The second variable that plays an important role in this study was elevation. The relevant study incorporating wetlands as part of the analysis is the wetland conversion risk assessment of Kolkata Timur, India. This study determined that areas adjacent to megacities and other municipality areas and the highest population density were identified as high-risk zones concerning wetlands conversion (Seena et al., 2023). The other land use simulation for Selangor, Malaysia (Baig et al., 2021), concluded that land use changes are typically influenced by population and demand growth.
The performance of the model in detecting wetlands is a notable component of this study. However, it is essential to recognise and address certain limitations. Focusing only on the analysis of the response variable, without taking into account the effect of a set of predictive variables may cause misclassification. ALR is expected to perform better than LR and RF but the minimal design features and parameters may not maximise the ability of the model to classify. Future research will seek to determine the other related factors or covariates, such as weather, hydrological and socio-economic aspects (Verburg et al. 2019). Providing a more detailed classification of LC classes would offer a more accurate representation of the existing conditions in Kubu Raya. Notwithstanding that the one-to-one model or single transition can reveal the effects of predictors specifically, the performance of the many-to-one land use change model can enrich the output results and reduce the model’s simulation time (Akbar et al., 2023). It would be worth investigating and implementing a variety of statistical machine learning or deep learning approaches in the lulcc package in the future, such as the mixed effects model, ensemble learning or a variant of neural networks with several tuning hyperparameters to obtain the optimal performance. The final step is to validate the model using the Total Operating Characteristic (TOC) created by Liu et al. (2021) to substitute the popular ROC that claims to offer more information and a distinct interpretation.
4. Conclusion
From 2009 to 2020, Kubu Raya District was primarily characterised by wetlands and cropland, with wetlands occupying virtually half of Kubu Raya District. During that period, the largest transformation occurred in cropland with greater potential. The results of predictive land use change on the wetlands class in Kubu Raya District reveal that the Random Forest model provides the best average goodness-of-fit model compared to the Logistic Regression and Additive Logistic Regression models. The model with 5% sample point data does not produce a significantly different goodness-of-fit model compared to using all spatial data in Kubu Raya District. By considering the findings obtained by this research, RF is expected to be able to predict possible changes in wetlands. Furthermore, synchronised spatial plans may prevent detrimental land use change and may support the sustainable peatland restoration and management in Kubu Raya District.
Acknowledgements can be deliv-ered to the parties who have helped research and completion of the writing of the manuscript. These parties can act as mentors, funders, providers of data, and so forth
Conceptualization: Alfa Nugraha Pradana, Anik Djuraidah, Agus Mohamad Soleh; methodology: Alfa Nugraha Pradana, Anik Djuraidah, Agus Mohamad Soleh; investigation: Alfa Nugraha Pradana, Anik Djuraidah; writing—original draft preparation: Alfa Nugraha Pradana, Anik Djuraidah; writing—review and editing: Alfa Nugraha Pradana, Anik Djuraidah, Agus Mohamad Soleh; visualiza-tion: Alfa Nugraha Pradana, Anik Djuraidah. All authors have read and agreed to the published version of the manuscript
References
Abraham, Charlotte González, Cynthia Flores, Santana Sonia, Rodríguez Ramírez, and Marcela Olguín. (2023). Long ‑ Term Pathways Analysis to Assess the Feasibility of Sustainable Land ‑ Use and Food Systems in Mexico. Sustainability Science, 18(1), 469–84. doi: 10.1007/s11625-022-01243-7.
Abuhay, Wassie, Temesgen Gashaw, and Lewoye Tsegaye. (2023). Assessing Impacts of Land Use / Land Cover Changes on the Hydrology of Upper Gilgel Abbay Watershed Using the SWAT Model. Journal of Agriculture and Food Research, 12, 100535. doi: 10.1016/j.jafr.2023.100535.
Adeolu, Adesiji R., Thamer A. Mohammad, Nik N. Nik Daud, Alexander K. Sayok, Padfield Rory, and Evers Stephanie. (2018). Soil Carbon and Nitrogen Dynamics in a Tropical Peatland. Elsevier Inc.
Aditya, Jeremy, Prananto Budiman, Rudiyanto Rudiyanto, and Peter Grace. (2020). Drainage Increases CO 2 and N 2 O Emissions from Tropical Peat Soils. Global Change Biology, 26(8), 1–18. doi: 10.1111/gcb.15147.
Aguilera-Benavente, Francisco, and Nikolai Shurupov. (2023). Computers , Environment and Urban Systems Combining a Land Parcel Cellular Automata ( LP-CA ) Model with Participatory Approaches in the Simulation of Disruptive Future Scenarios of Urban Land Use Change Ram O. Computers, Environment and Urban Systems, 99, 101895. doi: 10.1016/j.compenvurbsys.2022.101895.
Akbar, Ali, Milad Zhoolideh, Hossein Azadi, Ju-hyoung Lee, and Jürgen Scheffran. (2023). Interactions of Land-Use Cover and Climate Change at Global Level : How to Mitigate the Environmental Risks and Warming Effects Intergovernmental Panel on Climate Change. Ecological Indicators, 146, 109829. doi: 10.1016/j.ecolind.2022.10 9829.
Baig Ca-ann, Mohammed Feras, Muhammad Raza, Ul Mustafa, Imran Baig, Husna Binti Takaijudin, and Muhammad Talha Zeshan. (2021). Assessment of Land Use Land Cover Changes and Future. Predictions Simulation, 14(3), 1–17. doi: 10.3390/w14030402.
Beroho, Mohamed, Hamza Briak, El Khalil Cherif, Imane Boulahfa, Abdessalam Ouallali, Rachid Mrabet, Fassil Kebede, Alexandre Bernardino, and Khadija Aboumaria. (2023). Future Scenarios of Land Use / Land Cover ( LULC ) Based on a CA-Markov Simulation Model : Case of a Mediterranean Watershed in Morocco. Remote Sensing, MDPI, 15, 1162. doi: 10.3390/rs15041162.
Bohai, Western, Bay Using, Remote Sensing, Yongbin Zhang, Caiyao Kou, Mingyue Liu, Weidong Man, and Fuping Li. (2023). Estimation of Coastal Wetland Soil Organic Carbon Content in Topographic Data. Remote Sensing, MDPI, 15(4241), 1–20. doi: 10.3390/rs15174241.
Buya, Suhaimee. (2020). Modelling of Land-Use Change in Thailand Using Binary Logistic Regression and Multinomial Logistic Regression. Arabian Journal Of Geosciences, 13(12), 437. doi: 10.1007/s12517-020-05451-2.
Cao, Min, Ya Tian, Kai Wu, Min Chen, Yu Chen, Xue Hu, Zhongchang Sun, Lijun Zuo, Huadong Guo, Hui Lin, and Guonian Lü. (2023). Future Land-Use Change and Its Impact on Terrestrial Ecosystem Carbon Pool Evolution along the Silk Road under SDG Scenarios. Science Bulletin, 68(7), 740–49. doi: 10.1016/j.scib.2023.03.012.
Chen, Yuhan, Jia Wang, Nina Xiong, Lu Sun, and Jiangqi Xu. (2022). Impacts of Land Use Changes on Net Primary Productivity in Urban Agglomerations under Multi-Scenarios Simulation. Remote Sensing, MDPI, 14(1775), 1–21. doi: doi.org/10.3390/rs14071755.
Daba, Mekonnen H., and Songcai You. (2022). Quantitatively Assessing the Future Land-Use / Land-Cover Changes and Their Driving Factors in the Upper Stream of the Awash River Based on the CA – Markov Model and Their Implications for Water Resources Management. Sustainability, MDPI, 14(1538), 1–29. doi: 10.3390/su14031538.
Emmanuel, Balogun, Abdulla Al, Ajeyomi Adedoyin, Zullyadini A. Rahaman, Ologun Emmanuel, Mahir Shahrier, Bushra Monowar, Muhammad Tauhidur, and Olarewaju Timilehin. (2023). Environmental and Sustainability Indicators Monitoring and Predicting the Influences of Land Use / Land Cover Change on Cropland Characteristics and Drought Severity Using Remote Sensing Techniques. Environmental and Sustainability Indicators, 18, 100248. doi: 10.1016/j.indic.2023.100248.
Gao, Chunliu, Deqiang Cheng, Javed Iqbal, and Shunyu Yao. (2023). Yellow River Region ( GYRR ) Land Cover and the Relationship Analysis with Mountain Hazards. Land, MDPI, 12(340), 1–24. doi: 10.3390/land12020340.
Gaur, Srishti. (2023). A Comprehensive Review on Land Use / Land Cover ( LULC ) Change Modelling for Urban Development : Current Status and Future Prospects. Sustainability, MDPI, 15(903), 1–12. doi: 10.3390/ su15020903.
Géant, Chuma B., Mushagalusa N. Gustave, and Serge Schmitz. (2023). Mapping Small Inland Wetlands in the South ‑ Kivu Province by Integrating Optical and SAR Data with Statistical Models for Accurate Distribution Assessment Democratic Republic of Congo. Scientific Reports, 13(17626), 1–23. doi: 10.1038/s41598-023-43292-7.
Ghosh, Sasanka, and Arijit Das. (2020). Wetland Conversion Risk Assessment of East Kolkata Wetland : A Ramsar Site Using Random Forest and Support Vector Machine Model. Journal of Cleaner Production, 275, 123475. doi: 10.1016/j.jclepro.2020.123475.
Girma, Rediet, Christine Fürst, and Awdenegest Moges. (2022). Land Use Land Cover Change Modelling by Integrating Artificial Neural Network with Cellular Automata-Markov Chain Model in Gidabo River Basin , Main Ethiopian Rift. Environmental Challenges, 6, 100419. doi: 10.1016/j.envc.2021.100419.
Hastie, T., & Tibshirani, R. (1985). Generalized additive models; some applications. In Generalized Linear Models: Proceedings of the GLIM 85 Conference held at Lancaster, UK, Sept. 16–19, 1985 (pp. 66-81). Springer US.
Huo, Jingeng, Zhenqin Shi, Wenbo Zhu, Hua Xue, and Xin Chen. (2022). A Multi-Scenario Simulation and Optimization of Land Use with a Markov – FLUS Coupling Model : A Case Study in Xiong ’ an New Area , China. Sustainability, MDPI, 14(2425), 1–20. doi: 10.3390/su14042425.
Huu, Cuong Nguyen, Cuong Nguyen Van, Tien Nguyen, and Ngoc My. (2022). AGRICULTURE AND Modelling Land-Use Changes Using Logistic Regression in Western Highlands of Vietnam : A Case Study of Lam Dong Province. Agriculture And Natural Resources, 56(5), 935–44. doi: 10.34044/j.anres.2022.56.5.08.
Jafarpour, Kamran, Ali Shamsoddini, Mir Najaf, Faizah Binti, Che Ros, and Ali Khedmatzadeh. (2022). Predicting Spatial and Decadal of Land Use and Land Cover Change Using Integrated Cellular Automata Markov Chain Model Based Scenarios ( 2019 – 2049 ) Zarriné-R ū d River Basin in Iran ✩. Environmental Challenges, 6:100399. doi: 10.1016/j.envc.2021.100399.
Kumar, Nitesh, and Maurya Sana. (2023). Land Use / Land Cover Dynamics Study and Prediction in Jaipur City Using CA Markov Model Integrated with Road Network. GeoJournal, 88(1), 137–60. doi: 10.1007/s10708-022-10593-9.
Li, Xiang, Zhaoshun Liu, Shujie Li, and Yingxue Li. (2022). Multi-Scenario Simulation Analysis of Land Use Impacts on Habitat Quality in Tianjin Based on the PLUS Model Coupled with the InVEST Model. Sustainability, MDPI, 14(6923), 1–18. doi: 10.3390/su14116923.
Liu, Lintao, Shouchao Yu, Hengjia Zhang, Yong Wang, and Chao Liang. (2023). Analysis of Land Use Change Drivers and Simulation of Different Future Scenarios : Taking Shanxi Province of China as an Example. International Journal of Environmental Research and Public Health, MDPI, 20(1626), 1–19. doi: 10.3390/ijerph20021626.
Liu, Zhen, Robert Gilmore, and Pontius Jr. (2021). The Total Operating Characteristic from Stratified Random Sampling with an Application to Flood Mapping. 13(19), 3922. doi: 10.3390/rs13193922.
M L Assidik, I Soekarno, Widyaningtias, I. A. Humam. (2021). Water Balance Analysis and Hydraulic Structure Design to Prevent Peatland Fires. Science, Environmental, 758(1), 1–7. doi: 10.1088/1755-1315/758/1/012006.
N. Karasiak, J.‑F. Dejoux, C. Monteil, D. Sheeren. (2022). Spatial Dependence between Training and Test Sets : Another Pitfall of Classification Accuracy Assessment in Remote Sensing. Machine Learning, 111(7), 2715–40. doi: 10.1007/s10994-021-05972-1.
Paulina Guarderas, Franz Smith, and March Dufrene. (2022). Land Use and Land Cover Change in a Tropical Mountain Landscape of Northern Ecuador : Altitudinal Patterns and Driving Forces. Plose One, 17, 1–26. doi: 10.1371/ journal.pone.0260191.
Penny, Jessica, Carlos M. Ordens, Steve Barnett, Slobodan Djordjevi, and Albert S. Chen. (2023). Small-Scale Land Use Change Modelling Using Transient Groundwater Levels and Salinities as Driving Factors – An Example from a Sub-Catchment of Australia ’ s Murray-Darling Basin. Agricultural Water Management, 278, 108174. doi: 10.1016/j.agwat.2023.108174.
Rahsia, Shandra Andina, Evi Gusmayanti, and Rossie W. Nusantara. (2021). Emisi Karbondioksida ( CO 2 ) Lahan Gambut Pasca Kebakaran Tahun 2018 Di Kota Pontianak. Jurnal Ilmu Lingkungan, 18(2), 384–391. doi: 10.14710/jil.18.2.384-391.
Raihan, Asif, Tarig Ali, Maruf Mortula, and Rahul Gawai. (2023). Spatiotemporal Analysis of the Impacts of Climate Change on UAE Mangroves Spatiotemporal Analysis of the Impacts of Climate Change on Mangroves Located in the United Arab Emirates. Journal of Sustainable Development of Energy Water and Environment Systems, 11(3), 1–19. doi: 10.13044/j.sdewes.d11.0460.
Ren, Dong-Feng, Aihua Cao, and Fei-yue Wang. (2023). Response and Multi-Scenario Prediction of Carbon Storage and Habitat Quality to Land Use in Liaoning Province , China. Sustainability, MDPI, 15(4500), 1–23. doi: 10.3390 /su15054500.
Robinson, Bo Sun and Derek T. (2018). Comparison of Statistical Approaches for Modelling Land-Use Change. Land MDPI, 7(144), 1–33. doi: 10.3390/land7040144.
Salako, Gabriel, David J. Russell, Andres Stucke, and Einar Eberhardt. (2023). Assessment of Multiple Model Algorithms to Predict Earthworm Geographic Distribution Range and Biodiversity in Germany : Implications for Soil ‑ Monitoring and Species ‑ Conservation Needs. Biodiversity and Conservation, 32(7), 2365–94. doi: 10.1007/ s10531-023-02608-9.
Salmona, Yuri Botelho, Eraldo Aparecido, Trondoli Matricardi, David Lewis Skole, Andrade Silva, Osmar De Ara, Coelho Filho, Marcos Antonio Pedlowski, James Matos Sampaio, Leidi Cahola, and Reuber Albuquerque. (2023). A Worrying Future for River Flows in the Brazilian Cerrado Provoked by Land Use and Climate Changes. Sustainability, MDPI, 15(4251), 1–24. doi: 10.3390/su15054251.
Seena, Sahadevan, Christiane Baschien, Juliana Barros, Kandikere R. Sridhar, Manuel A. S. Graça, Heikki Mykrä, and Mirco Bundschuh. (2023). Ecosystem Services Provided by Fungi in Freshwaters: A Wake-up Call. Hydrobiologia, 850(12–13), 2779–94. doi: 10.1007/s10750-022-05030-4.
Siddik, Sifat, Shibli Sadik, Atikur Rahman, and Nazrul Islam. (2022). The Impact of Land Use and Land Cover Change on Groundwater Recharge in Northwestern Bangladesh. Journal of Environmental Management, 315(4), 115130. doi: 10.1016/j.jenvman.2022.115130.
Tsiripidis, Ioannis. (2023). Simulating Future Land Use and Cover of a Mediterranean Mountainous Area : The Effect of Socioeconomic Demands and Climatic Changes. Land, MDPI, 12(253), 1–23. doi: 10.3390/land12010253.
Verburg, Peter H., Peter Alexander, Tom Evans, Nicholas R. Magliocca, Ziga Malek, Mark D. A. Rounsevell, and Jasper Van Vliet. (2019). ScienceDirect Beyond Land Cover Change : Towards a New Generation of Land Use Models. Current Opinion in Environmental Sustainability, 38, 77–85. doi: 10.1016/j.cosust.2019.05.002.
Wang, Baixue, and Weiming Cheng. (2022). Effects of Land Use / Cover on Regional Habitat Quality under Different Geomorphic Types Based on InVEST Model. Remote Sensing, MDPI, 14(1279), 1–34. doi: doi.org/10. 3390/rs14051279.
Xia, Chuyu, Jian Zhang, Jing Zhao, Fei Xue, Qiang Li, Kai Fang, and Zhuang Shao. (2023). Exploring Potential of Urban Land-Use Management on Carbon Emissions - A Case of Hangzhou , China. Ecological Indicators, 146, 109902. doi: 10.1016/j.ecolind.2023.109902.
Yang, Haijiang, Xiaohua Gou, Bing Xue, Weijing Ma, Wennong Kuang, Zhenyu Tu, Linlin Gao, Dingcai Yin, and Junzhou Zhang. (2023). Research on the Change of Alpine Ecosystem Service Value and Its Sustainable Development Path. Ecological Indicators, 146(3), 109893. doi: 10.1016/j.ecolind.2023.109893.
Zarandian, Ardavan, and Fatemeh Mohammadyari. (2023). Scenario Modelling to Predict Changes in Land Use / Cover Using Land Change Modeler and InVEST Model : A Case Study of Karaj Metropolis , Iran. Environmental Monitoring and Assessment, 195(273), 1–22. doi: 10.1007/s10661-022-10740-2.
Article Metrics
Abstract view(s): 981 time(s)Refbacks
- There are currently no refbacks.