Forum Geografi, 33(2), 2019; DOI: 10.23917/forgeo.v33i2.8351

Suitable Proportion Sample of Holdout Validation for Spatial Rainfall Interpolation in Surrounding the Makassar Strait

Giarno 1,2*, Muhammad Pramono Hadi 1, Slamet Suprayogi 1, Sigit Heru Murti 1

1 Faculty of Geography, Gadjah Mada University, Yogyakarta, Indonesia

2 Paotere Maritime Station, BMKG Makassar, Indonesia

 

*) Corresponding Author (e-mail: giarno97182@gmail.id)

Received: 28 June 2019 / Accepted: 18 January 2020 / Published: 23 January 2020

Abstract

Spatial rainfall interpolation requires a number of suitable validation samples to maintain accuracy. Generally, the larger the areas which can be predicted, the better the interpolation. In addition, the data used for validation should be separated from the modelling data. Moreover, the number of samples determine optimally proportion the independent sites. The objective of this study is to determine the optimal sample ratio for holdout validation in interpolation methods; the Makassar Strait was chosen as the study location because of its daily rainfall variation. The accuracy of the sample selection is tested using correlation, root mean square error (RMSE), mean absolute error (MAE) and the indicators of contingency tables. The results show that accuracy depends on the size of the modelling data. Therefore, the more extensive the data used for interpolation, the better the accuracy. Otherwise, if the rain gauge data is separated according to province, there will be a variation in accuracy in the number of independent samples. For rainfall interpolation, it is recommended to use a minimum 75% of data sites to maintain accuracy. Comparison between kriging and inverse distance weighting or IDW methods indicates that IDW is better. Moreover, rainfall characteristics affect the accuracy and portion of the independent sample.

Keywords: validation, independent sample, spatial interpolation, rainfall, Makassar Strait.

 

Abstrak

Akurasi pada interpolasi curah hujan secara spasial memerlukan jumlah sampel yang tepat agar tetap baik. Semakin besar area yang dapat diprediksi, maka semakin baik suatu interpolasi. Data yang digunakan dalam validasi ini seharusnya terpisah dari data yang digunakan untuk interpolasi atau data yang independen Masalah selanjutnya jumlah proporsi sampel perlu diuji untuk menentukan proporsi sampel yang optimal tanpa mengurangi akurasi. Selat Makassar dipilih karena tingginya varibilitas curah hujan di wilayah ini. Tujuan penelitian ini adalah untuk mendapatkan proporsi sampel yang optimal pada permasalahan interpolasi data curah hujan. Uji akurasi menggunakan korelasi, RMSE, MAE dan indikator tabel kontigensi. Hasilnya menunjukkan proporsi jumlah data yang digunakan untuk model sangat berpengaruh terhadap akurasi. Semakin besar maka akurasinya akan semakin baik. Jika data dipisahkan menurut propinsinya maka terdapat variasi terhadap proporsi sampel independent yang digunakan untuk validasi. Direkomendasikan data yang digunakan minimal menggunakan 75 % dari keseluruhan data untuk menjaga akurasi. Dibandingkan dengan kriging, maka interpolasi menggunakan IDW lebih baik, dimana akurasinya lebih tinggi. Kharakteristik curah hujan ternyata juga mempengaruhi proporsi jumlah sampel.

Keywords: holdout validation, independent sample, spatial interpolasi, rainfall, Makassar Strait.

1. Introduction

The modelling of some fields, such as agriculture, ecology and hydroelectricity, requires in situ rainfall data (Goovaerts, 2000; Jia, et al., 2011; Kyriakidis, et al., 2001; Langella et al., 2010; Li and Shao, 2010). Nevertheless, the level of rainfall data is commonly unsatisfactory because of inadequate rain gauge data. The spatial interpolation method is one solution to solve the lack of measurements. Gennerally, there are two methods of interpolation (Ly at al., 2013). The first is deterministic, such as the Thiessen polygon (THP), spline (SPL) and inverse distance weighting (IDW).  The second method uses spatial variance, or geostatistics, such as kriging.

The application of spatial interpolation in rainfall data is unique because the accuracy of the interpolation results depends on time and place. Kriging, as a geostatistical interpolation method, has generally been found to be the best method (Ly, et al., 2013; Wijemannage, 2014; Firdaus and Talib, 2016; Javari, 2017), but occasionally deterministic interpolation may also be the best method (Keblouti, et al., 2012; Ly, et al., 2013). Therefore, both deterministic and geostatistical interpolation for daily precipitation can be used with nearly the same performance (Ly, et al., 2013; Chen, et al., 2017). Nevertheless, if the density of data is sufficient, geostatistical methods produce better results than IDW (Eischeid, et al., 2000), but this situation is uncommon in the Indonesian Maritime Continent (IMC). Rainfall measurement has many constraints and is expensive, so data can be inadequate. Short-term accumulated rainfall is more difficult to measure than longer-term accumulation because it fluctuates in line with changing rainfall events. Monthly rainfall is more varied than yearly; in the latter there are striking differences between the wet season and the low precipitation of the dry season. The IDW method can be chosen as an alternative because it is easy and simple to apply. Comparison of interpolation results has shown that IDW interpolation can also be better than other methods (Ahrens, 2006; Keblouti, et al., 2012; Ly et al., 2013; Yang et al., 2015).

The performance of spatial interpolation not only concerns the interpolation methods, but also the affected location, validation methods and sample used for validation. Most validation methods are used to assess the accuracy of an interpolation method by calculating the correlation value, root mean square error (RMSE) and mean absolute error (MAE) (Weber & Englund, 1994; Weng, 2002; Hu et al., 2004; Weng, 2006; Tewolde, 2010). Therefore, measurements using these parameters only assess the value of variables, not events. For this reason, to measure an event accurately, for example the accuracy of predictied rainfall, a contingency table is used. Therefore, in this study the assessment of accuracy includes such a table (Jolliffe & Stephenson, 2003).

The choice of sample site and number of samples also affects accuracy. Although the sample location can be specified (Tabios et al., 1985), it is generally randomly selected (Ly et al., 2013). Randomization can guarantee the objectivity of accuracy. If a location has little data, then cross validation is used. On the other hand, if there is quite extensive data, evaluation uses independent location in specific sites. Assuming that in the Makassar Strait there are a considerable number of rain gauges, then evaluation in this work uses some places that selected randomly. Moreover, since the density of data has a big effect on accuracy, the proportion of data used for interpolation and that used for proper validation so that the estimation is optimal should be more or less. Because of the uniqueness of the rain patterns in the Makassar Strait, evaluations were also conducted for each province with different rain patterns.

2. Research Method

2.1. Study Area and Data

The Makassar Strait, which is located in the middle part of the tropical IMC, was chosen for the study. The region is bounded two seas, the Sulawesi Sea and the Java Sea, and two large islands, Kalimantan and Sulawesi. On the east part of the strait, Sulawesi has a complex topography. Conversely, on the west part there are the flat lands of Kalimantan. The Asian- Australian monsoon greatly influences the region’s rainfall. Since it is located on the equator, IMC has a wet, hot, humid climate all year round. The region is also greatly influenced by various global weather phenomena. The monsoons, Madden-Julian Oscillation (MJO), Indian Ocean Dipole (IOD) and El Niño-Southern Oscillation (ENSO) all affect rainfall. Moreover, local phenomena such as sea-land breezes and mountain-valley breezes also lead to the complex rainfall in the region (D’Arrigo & Wilson, 2008; Qian, 2007; Hidayat & Kizu, 2010; Renggono, 2011; Hashiguchi et al., 2013; Lee, 2015; Martono & Wardoyo. 2017).

The most influential weather phenomenon is the monsoon. There are two major seasons in Indonesia, the dry monsoon season and the rainy monsoon season. Commonly, the wet season in most of Indonesia is from September to March and the dry season from March or June (depending on the area) to September, but the times of onset and ending of the seasons vary from place to place (Giarno et al., 2012). Besides the Asian-Australian monsoon, as the most influential rainfall event in this region, other global phenomena which effect it in the short term include the Madden–Julian Oscillation (MJO). This is a phenomenon with a 40–50-day oscillation occurring in tropical areas, which can suppress mean sea-level pressure, creating convection over tropical regions, particularly the IMC (Madden & Julian, 1972). Many studies have demonstrated that MJO has an important influence on the diurnal cycle of precipitation over the IMC, particularly in Indonesia and Papua New Guinea (Peatman et al., 2014; Peatman et al., 2015; Vincent & Lane, 2016). This situation means that rain is uneven across the region; one place may have different early rainfall season than another (Giarno et al., 2012), which also makes the accuracy of remote sensing rainfall estimates vary, even if they are close geographically (Giarno et al., 2018).

Figure 1. Rain gauge location in surrounding the Makassar Strait, consist of Kalimantan Timur or Kaltim (top left), Kalimantan Selatan or Kalsel (bottom left), Sulawesi Tengah or Sulteng (top right) and Sulawesi Selatan or Sulsel (bottom right).

2.2. Choice of Independent Sample

Rainfall interpolation can be validated in three ways: by using an independent location, cross-validation or comparison with a hydrological model. If a region has a high level of data, then evaluation uses separate datasets. In this case, the datasets are divided into two: one set of data for interpolation and the other for validation (Tabios et al., 1985; Ly et al., 2013).  Ideal validation uses this method because the process is completely independent of the model, but a region rarely has sufficient data available for interpolation. Therefore, cross-validation is generally used (Isaaks et al., 1990). This method compares the results of interpolation with a hydrological model, so the results of rainfall interpolation can be used as input for the hydrological model.

Some studies have employed limited rainfall data, hence the wide use of cross-validation compared to other methods. The amount of rainfall data used to validate the results of spatial interpolation of rain using cross-validation is generally from fewer than 50 rain gauges (Wagner et al., 2012; Wang et al., 2014; Xu et al., 2015; Firdaus & Talib, 2016), although sometimes a cross- validation can be performed from more than 100 locations (Javari, 2017). Based on these studies, the kriging and its derivatives have been found to be generally better than other methods. However, in some places it was found that the performances of IDW and kriging could be similar, albeit in complex regions (Otieno et al., 2014; Xu et al., 2015). Beside the methods, gauge density should be considered as it has an effect on accuracy. Besides the interpolation method having an influence on accuracy, the number of rain gauges used for evaluation also affects variations in accuracy. Moreover, use of a fixed number of rain gauges for test accuracy or in the hold-out method is less common in the evaluation of rainfall interpolation methods than cross-validation. Although cross-validation is sufficient for evaluation when there are many datasets, it needs a long time and consumes considerable power. Since cross-validation uses multiple train-test splits, more computing power and run time are necessary than when using the holdout method.

Previous studies have shown that the number of independent samples used for accuracy tests affects the results. Therefore, the aim of this study is to test the number of independent sample of accuracy of the spatial interpolation of rainfall. For this reason, rain gauge data are divided into two parts, namely data used for the interpolation of rainfall and data for evaluation of the accuracy of the interpolation results. The accuracy of the test samples were selected based on rainfall characteristics in percentages of multiples of 5, namely 5%, 10%, 15%, 20%, 25%, 30%, 35% and 40%, of the available data.

2.3. Evaluation Method

Validation of this study was made with root mean square error (RMSE), mean absolute error (MAE) and correlation as goodness indicators of accuracy. The deviation between rainfall estimates and rainfall observed was measured by RMSE and MAE. RMSE was formulated as follows

                                                                                      (1)

                                                                                               (2)

where and  refer to radar rainfall estimates and rainfall recorded by rain gauges respectively. Correlation (r) using Pearson coefficient correlation was then performed to ascertain the strength and direction of the relationship between rainfall estimates and rainfall from independent locations using following equation.

                                                                 (3)



Correlation measures the size and direction of the relationship between two variables. The indicator is commonly used for evaluation of spatially remote sensing rainfall estimates. A contingency table can be used to evaluate a specific event, such as a rain event (Jolliffe & Stephenson, 2003), as shown in Table 1. Four criteria are involved. Hits refer to the number of estimates and observed state rain , while false alarms refer to the number of radar estimates state rain, but no rain was measured in observation. Misses refer to the number of estimates that predicted rain, whereas the rain gauges did record precipitation. Finally, correct negative refers to both rain gauges and radar estimates are no rain.

Table 1. Contingency table for rain event evaluation

Tool

Rain gauge

Event

Rain

No rain

Radar QPE

Rain

Hits

False alarm

No rain

Miss

Correct negative

 

Five indicators were used in this work: ACC (accuracy), BIAS, FAR, POD and CSI (equations 4 to 8). There is no more powerful indicators in statistical contingency. ACC is sensitive to a dominant event, while BIAS is affected by the number of false alarms or misses. FAR is sensitive to false alarms, while POD is affected by the number of misses. Finally, CSI only considers hits and is sensitive to correct negative. The formulation of the indicators is as below:

                                                                              (4)

                                                                                     (5)

                                                                                                    (6)

                                                                                      (7)

                                                                     (8)

3. Results and Discussion

3.1. Results

Because the region studied is so wide and has different rainfall characteristics, evaluation of the independent sample selection was made with two approaches: evaluation on the whole area and evaluation according to the uniqueness of the region. In the first evaluation, all the data were used together, then sorted for interpolation models and independent data to determine the accuracy. In the second evaluation, which was very similar to the first, only the process of sorting data and interpolation was conducted by region, as shown in Figure 1.

3.1.1. Accuracy of Spatial Interpolation in the Area Surrounding Makassar Strait

The amount of data used for interpolation affects the accuracy of both methods, although IDW is more varied than kriging (KRIG), as shown in Table 2. Normally, assessment of the accuracy of rain predictions uses correlation, RMSE and MAE, to evaluate the ratio of independent samples. The results show that if the data used are less the number of independent samples is smaller than the correlation tends to be higher, RMSE and MAE are lower. Moreover, only in MAE, with 30% of independent sample is there a minor anomaly, which is lower than the sample size of 20% and 25% of data. On the other hand, evaluation based on the ability to predict rainfall events shows support for these conclusions on the values of POD, ACC and CSI. The values of these indicators are close to perfect scores when the number of independent samples used for evaluation is lower, although not the smallest portion.

The best ACC and FAR values use 85% of the data for IDW interpolation. However, the ACC indicator is almost the same value for all sample ratios, while FAR the greater the proportion used for evaluation, which shows the farther away from perfect value. There are similarities between ACC and BIAS, with the value for all the proportions almost the same. However, BIAS has a value greater than 1, which means that the prediction is overestimated or that there are more false alarms than misses. In addition, the number of hits is also directly proportional to the amount of data used for interpolation. Based on the CSI value, the best proportion of independent samples is 20%. On the other hand, the proportion of 5% - 20% of independent samples is almost the same as the CSI value.

Table 2. Accuracy of rainfall spatial interpolation on the surrounding Makassar Strait. The grey color refers to the best indicator

Independent

IDW

Sample (%)

Correlation

RMSE

MAE

ACC

BIAS

POD

FAR

CSI

5

0.307

11.018

6.274

0.805

1.097

0.587

0.428

0.451

10

0.317

11.935

6.672

0.811

1.059

0.577

0.418

0.450

15

0.263

11.425

6.693

0.822

1.117

0.552

0.396

0.451

20

0.251

12.841

7.100

0.801

1.028

0.567

0.413

0.458

25

0.256

12.808

7.058

0.802

1.059

0.563

0.438

0.444

30

0.251

12.648

6.828

0.804

1.028

0.544

0.420

0.446

35

0.227

12.971

7.390

0.807

1.041

0.515

0.431

0.427

40

0.218

14.144

7.933

0.793

1.029

0.507

0.451

0.415

Independent

Kriging

Sample (%)

Correlation

RMSE

MAE

ACC

BIAS

POD

FAR

CSI

5

0.338

28.577

17.976

0.761

0.770

0.462

0.350

0.378

10

0.238

32.123

20.792

0.747

0.764

0.420

0.407

0.346

15

0.273

32.281

20.110

0.760

0.922

0.436

0.412

0.349

20

0.233

31.159

19.401

0.759

0.914

0.396

0.444

0.326

25

0.248

33.242

21.147

0.715

0.854

0.444

0.408

0.357

30

0.206

33.245

20.587

0.753

1.092

0.415

0.466

0.333

35

0.223

30.581

18.762

0.759

0.844

0.407

0.460

0.336

40

0.232

32.476

20.428

0.749

0.819

0.411

0.450

0.336

 

Kriging interpolation can increase accuracy, as shown in Table 2. In brief, the lower the ratio of independent samples from the overall data used for validation, the better the accuracy. This can be concluded from values of correlation, RMSE, ACC, POD, FAR and CSI in 5% of independent samples have the best value. However, based on the value of BIAS, it was found that the proportion of 15% was the best. Therefore, using IDW is preferable than kriging in rainfall interpolation. The comparison of performance shows that IDW is twice as effective as kriging in RMSE and MAE, while for the other parameters, the IDW results are better, but only slightly.

3.1.2. Accuracy According to Rainfall Pattern Region

The area around the Makassar Strait is unique, while East Kalimantan (Kaltim) has an equatorial rainfall pattern. The rainfall type in South Kalimantan (Kalsel) and South Sulawesi (Sulsel) is a monsoonal rainfall pattern, although in the South Sulawesi found more monsoonal than South Kalimantan. Finally, rainfall in the middle of Sulawesi isvery low compared to the other three regions. Inverse distance weighting (IDW) and kriging (KRIG) were employed for test performance and the number of independent sample ratio was also considered. Symbolized in this work uses IDW05, which means IDW is used as an interpolation method with an independent sample number of 5% of existing data; IDW10 means 10%, and so on. The relationship between the number of samples used for the spatial interpolation and the accuracy of the independent region results are shown in Tables 3 to 6.

Table 3. Accuracy of rainfall spatial interpolation on the Kaltim region. The grey color refers to the best indicator

Independent

IDW

Sample (%)

Correlation

RMSE

MAE

ACC

BIAS

POD

FAR

CSI

5

-0.012

8.622

6.121

0.664

0.950

0.450

0.638

0.285

10

0.083

10.360

6.093

0.773

1.197

0.558

0.503

0.403

15

0.124

9.074

5.405

0.776

1.152

0.562

0.481

0.404

20

0.201

14.468

7.945

0.751

1.112

0.601

0.384

0.478

25

0.159

14.208

8.300

0.730

0.999

0.563

0.376

0.466

30

0.105

13.227

7.169

0.750

1.132

0.514

0.505

0.397

35

0.112

15.117

8.535

0.725

1.249

0.580

0.435

0.455

40

0.126

12.515

7.067

0.722

1.455

0.532

0.525

0.380

Independent

Kriging

Sample (%)

Correlation

RMSE

MAE

ACC

BIAS

POD

FAR

CSI

5

0.011

16.151

13.169

0.666

0.906

0.520

0.483

0.352

10

0.152

17.714

11.726

0.635

1.034

0.407

0.577

0.240

15

0.070

20.946

13.518

0.666

0.832

0.476

0.399

0.388

20

0.110

16.312

11.371

0.709

1.375

0.517

0.544

0.342

25

0.137

17.840

11.506

0.704

1.168

0.503

0.478

0.363

30

0.100

19.806

13.099

0.685

1.244

0.506

0.477

0.387

35

0.136

19.301

11.543

0.692

1.070

0.462

0.487

0.341

40

0.114

19.954

12.428

0.682

1.156

0.456

0.456

0.345

 

The number of rainfall data sites in the modelling is not always directly proportional to the accuracy, as can be seen in Table 3. The indicator of accuracy shows that the portion of independent samples has little such 5% of the overall data and not always being the most accurate. Almost all the parameters show that the portion of 5% independent locations is not as accurate as the larger portion sample in the IDW, apart from the RMSE value. Kriging has same performance, apart from RMSE and POD. Moreover, insufficient modeling data will clearly reduce accuracy. The independent sample size in East Kalimantan should be no more than 25%.

The interpolation method also affects accuracy. Compared to kriging, the IDW method is clearly better in East Kalimantan, where correlation increases. On the other hand, RMSE and MAE decrease. Moreover, following IDW contingency indicators such as ACC, POD, BIAS, FAR and CSI of IDW that approach exquisite value than kriging. The values of RMSE and MAE using IDW are almost twice those of the kriging method. However, if emphasized the correlation obtained by IDW method uses a sample portion of more than 10% for the East Kalimantan region.

On the other hand, the South Kalimantan region shows that using a large amount of rainfall data improves accuracy, as shown in Table 4. Using 5% and 10% portion of independent samples in this region can improve performance compared to larger sample portions, both for IDW and kriging, except for BIAS, FAR and CSI. Therefore, portion sharing of the independent sample for validation in South Kalimantan should not exceed 10%. Moreover, the interpolation method also affects accuracy, with the IDW method slightly better than kriging for the South Kalimantan Island region. The comparison accuracy parameters of both these methods have almost the same value. Moreover, in this area an overestimation of IDW compared to kriging was found, based on the value of BIAS, which was greater than 1 in the IDW. Furthermore, the advantage of the kriging method over IDW is that it can increase the number of hits, as can be seen from the CSI value.

Table 4. Accuracy of rainfall spatial interpolation on the Kalsel region. The grey color refers to the best indicator

Independent

IDW

Sample (%)

Correlation

RMSE

MAE

ACC

BIAS

POD

FAR

CSI

5

0.227

12.598

8.035

0.690

1.016

0.450

0.538

0.289

10

0.109

10.484

6.834

0.681

1.061

0.356

0.636

0.210

15

0.138

12.597

7.964

0.630

1.235

0.408

0.534

0.259

20

0.159

13.425

8.370

0.639

1.183

0.434

0.508

0.290

25

0.177

12.535

7.949

0.654

1.245

0.426

0.542

0.271

30

0.146

15.051

9.577

0.598

0.981

0.410

0.493

0.296

35

0.127

15.970

10.033

0.598

0.985

0.397

0.505

0.283

40

0.105

15.098

9.379

0.603

1.385

0.401

0.570

0.248

Independent

Kriging

Sample (%)

Correlation

RMSE

MAE

ACC

BIAS

POD

FAR

CSI

5

0.158

13.012

9.668

0.680

0.953

0.443

0.512

0.286

10

0.142

20.676

14.988

0.553

0.859

0.474

0.383

0.340

15

0.141

18.416

12.546

0.590

0.985

0.407

0.484

0.273

20

0.108

20.226

13.677

0.603

0.938

0.389

0.491

0.271

25

0.083

18.038

12.447

0.637

1.111

0.380

0.593

0.225

30

0.070

20.081

13.216

0.588

1.105

0.395

0.529

0.259

35

0.083

21.080

13.772

0.581

0.823

0.354

0.516

0.240

40

0.115

20.623

13.950

0.596

0.880

0.391

0.489

0.270

Table 5. Accuracy of rainfall spatial interpolation on the Sulteng region. The grey color refers to the best indicator

Independent

IDW

Sample (%)

Correlation

RMSE

MAE

ACC

BIAS

POD

FAR

CSI

5

0.226

7.909

4.994

0.724

0.739

0.344

0.627

0.201

10

0.153

8.439

4.660

0.725

0.926

0.314

0.681

0.183

15

0.187

10.011

5.363

0.714

0.881

0.321

0.576

0.220

20

0.152

9.615

5.101

0.709

1.096

0.340

0.638

0.212

25

0.166

10.711

5.514

0.695

1.034

0.342

0.592

0.236

30

0.122

12.981

6.262

0.715

0.934

0.290

0.639

0.190

35

0.069

15.783

7.563

0.683

0.878

0.280

0.636

0.192

40

0.125

14.639

7.705

0.653

0.786

0.327

0.551

0.236

Independent

Kriging

Sample (%)

Correlation

RMSE

MAE

ACC

BIAS

POD

FAR

CSI

5

0.239

15.957

10.977

0.618

0.839

0.434

0.397

0.345

10

0.184

13.767

8.430

0.661

0.928

0.341

0.571

0.216

15

0.143

17.781

10.879

0.652

0.982

0.341

0.560

0.223

20

0.145

16.057

9.270

0.684

1.057

0.344

0.613

0.200

25

0.156

18.991

11.335

0.649

0.872

0.357

0.529

0.239

30

0.129

18.092

10.340

0.632

0.917

0.319

0.559

0.199

35

0.142

22.892

12.949

0.659

1.108

0.373

0.532

0.242

40

0.096

19.760

11.168

0.651

0.746

0.283

0.551

0.196

 

In the eastern part of the Makassar Strait, the influence accucary of the independent sample is different to that in the western part of strait. The Central Sulawesi and South Sulawesi regions show that a large amount of rainfall data was used for modelling interpolation, especially in the IDW method direct proportional to accuracy, as shown in Tables 5 and 6. Based on these table, it can be shown that the proportion of independent samples which are slightly 5% or 10% of data are almost always better than larger portion samples. Therefore, if the evaluation uses correlation, RMSE, MAE and ACC can show that the small independent samples can increase correlation and ACC. Moreover, it can reduce RMSE and MAE more than the bigger sample. In addition, the contingency parameters such as BIAS, POD, FAR and CSI indicate variations in the accuracy of rainfall estimates when predicting rain events. The number of overestimates in South Sulawesi is higher than in the Central Sulawesi, as seen in the BIAS value of more than 1 for various numbers of independent samples. However, the accuracy or number of hits in South Sulawesi is better than in Central Sulawesi if referring to the high ACC and CSI values.

Based on the correlation, RMSE, MAE and ACC, the greater the volume of applied data used for modelling, the better and more accurate. The IDW methods can result in the best interpolation, whereas portion sample evaluation in South Sulawesi and Central Sulawesi (Sulteng), should not be used more than 10% of data. However, if evaluation parameters are added to the contingency table, fluctuations are found in the sample ratio. A larger volume of data used for modelling does not always produce better predictions and also results in more overestimated events. Both the South Sulawesi and Central Sulawesi regions show fluctuations in the values of BIAS, POD, FAR and CSI. The calculations show that 20% - 25% proportions of data in the IDW method in fact result in better accuracy compared to lower proportions. Moreover, based on the values of POD and BIAS, the number of misses in Central Sulawesi increases compared to South Sulawesi.

Table 6. Accuracy of rainfall spatial interpolation on the Sulsel region. The grey color refers to the best indicator

Independent

IDW

Sample (%)

Correlation

RMSE

MAE

ACC

BIAS

POD

FAR

CSI

5

0.418

9.553

5.412

0.783

0.893

0.494

0.427

0.350

10

0.366

8.799

4.855

0.761

0.964

0.472

0.455

0.325

15

0.368

9.609

5.182

0.763

1.062

0.461

0.481

0.320

20

0.344

11.170

6.016

0.745

0.923

0.476

0.419

0.352

25

0.355

11.323

6.009

0.747

0.972

0.494

0.425

0.362

30

0.323

11.542

6.239

0.748

1.115

0.484

0.465

0.346

35

0.347

11.727

6.403

0.736

1.168

0.499

0.450

0.358

40

0.298

11.811

6.272

0.742

1.079

0.448

0.453

0.337

Independent

Kriging

Sample (%)

Correlation

RMSE

MAE

ACC

BIAS

POD

FAR

CSI

5

0.318

27.272

16.123

0.700

0.819

0.383

0.444

0.275

10

0.313

27.401

16.186

0.751

0.847

0.418

0.486

0.283

15

0.359

27.447

15.667

0.769

1.020

0.473

0.476

0.315

20

0.357

35.019

23.967

0.648

0.731

0.459

0.341

0.355

25

0.334

29.276

17.378

0.734

1.000

0.431

0.490

0.287

30

0.379

40.179

27.101

0.625

0.866

0.500

0.345

0.365

35

0.305

24.794

13.992

0.764

0.936

0.350

0.500

0.242

40

0.319

40.142

27.622

0.592

0.749

0.410

0.383

0.296

 

In addition, the portion of data used for interpolation in Central Sulawesi has little effect on the accuracy of kriging. Moreover, the best portion of independent sample data in this area is around 5% - 10%. This means that the modelling must use at least 90% of the data, since the area that we want predict as large as possible. Unlike in Middle Sulawesi, validation in South Sulawesi showed that the portion of evaluation data in kriging proposes 15% - 40% of the whole data. Validation utilizing RMSE and MAE shows that generally accuracy is in line with the number of samples in kriging. Moreover, the best correlation used 15% - 30% of independent data for validation, although this can fluctuate. There is a tendency that the best share of validation independent data is between 5% to 15% in IDW. The kriging results obtained opposite of assuming that less data, the better the accuracy. The evaluation in each of these locations showed that IDW interpolation has better accuracy than kriging. Because RMSE and MAE in IDW are smaller than kriging. Moreover, the magnitude of BIAS, POD and FAR on IDW approached a more ideal value than kriging.

3.2. Discussion

Generally, the density of observed rainfall data relates to increased accuracy of interpolation. However, this work shows that use of a hold-out for validation in a whole region and in each region does not always happen. The correlation in rain gauge data from all around the Makassar Strait shows that the portion of data used for interpolation is 90%. On the other hand, based on RMSE and MAE, the highest proportion of 95% is recommended using the IDW method. Conversely, the kriging method appears more stable, whereas high data densities can produce better accuracy. However, validation of whole data does not take into account rainfall variability in the area, since daily rainfall in the region varies greatly with respect to place, time, topography and position. Even, the accuracy of remote sensing rainfall estimation products such as the TRMM rainfall estimates (Giarno et al., 2018). This research found that the topography, the position of land in respect to water and the causes of rain affect the accuracy of rainfall prediction.

Validation based on rainfall patterns in the vicinity of Makassar Strait showed that the magnitude of the sample size relates to accuracy, although increasing the volume of modelling data does not always improve accuracy. The four locations chosen for accuracy testing showed that each place had different effects on the portion sample. East Kalimantan has an equatorial rainfall pattern, while in South Kalimantan it is a monsoonal pattern. The results show that the best sample size for East Kalimantan evaluation should be no more than 25%. On the contrary, in the South Kalimantan region, the large amount of rainfall data used for modelling is directly proportional to accuracy. The portion of independent samples in this region ranged between 5% and 10%. Moreover, the IDW method is slightly better than kriging in both South Kalimantan and East Kalimantan regions. Furthermore, East Kalimantan has few data and equatorial rainfall patterns, and weather varies more than in South Kalimantan. In addition, the data in this region is denser and of a monsoonal type.

Conversely, the area in the eastern part of the Makassar Strait has more rain gauges than in the western part. Validation showed that the volume of modelling data was proportional to accuracy, especially using the IDW method. The 5% or 10% portion samples were almost always better than larger sizes. However, the number of overestimates in South Sulawesi is higher than in Central Sulawesi, but the number of hits in South Sulawesi is higher than in Central Sulawesi. Moreover, the best evaluation portion sample for IDW modelling should be no more than 25% and 5% - 10% in Central and South Sulawesi respectively.

Comparison with previous reseach shows that kriging is the best method (Ly et al., 2013; Wijemannage, 2014; Firdaus & Talib, 2016; Javari, 2017), although Karydas et al. (2009) found that there were no differences between kriging and other methods such as IDW. However, this work confirms that deterministic interpolation can be the best method (Keblouti et al., 2012). Rainfall accumulation varies on a yearly, monthly and daily basis. Moreover, considerable precipitation falls in the wet season and little in the dry season. Therefore, both deterministic and geostatistical interpolation for daily precipitation can be used with similiar performance (Ly et al., 2013; Chen et al., 2017). However, geostatistical methods will give better results than IDW if the data have sufficient density (Eischeid et al., 2000). On the other hand, in this study there are slight differences, in that kriging is generally worse than IDW, except in South Kalimantan, where IDW and kriging have almost the same accuracy. Moreover, in this study the accuracy of rainfall interpolation obtained using IDW could be better than with other methods (Ahrens, 2006; Keblouti et al., 2012; Ly et al., 2013; Yang et al., 2015).

4. Conclusion

Using holdout validation in this study showed that the best rainfall sample size to validate in East Kalimantan should no more than 25%, although in the other regions surrounding Makassar Strait it should be not more than 10%. IDW spatial interpolation is superior to the kriging method for all rain gauge rainfall in the areas surrounding the Makassar Strait. However, both IDW and kriging can be used for interpolation with almost the same performance in correlation, whereas other parameters showed that IDW has better accuracy than kriging interpolation. The values of RMSE and MAE in IDW are almost twice those of kriging. Moreover, although geostatistical methods can commonly result in more improvement than IDW if the data has sufficient density, IDW in this work has robust accuracy, although the portions of rainfall data change. Only in South Kalimantan are the accuracy of IDW and kriging virtually identical.

Acknowledgements

Data of this research was supported by Indonesian meteorological agency (BMKG). The authors especially appreciate to the Banjarbaru Climatology Station and Maros Climatology Station.

References

Ahrens, B. (2006) Distance in spatial interpolation of daily rain gauge data. Hydrol. Earth Syst. Sci., Vol.10, pp.197–208.

Chen, T., Ren, L., Yuan, F., Yang, X., Jiang, S., Tang, T., Liu, Y., Zhao, C. & Zhang. L. (2017) Comparison of Spatial Interpolation Schemes for Rainfall Data and Application in Hydrological Modeling. Water,Vol. 9, No.342, pp.1-18.

D'Arrigo, R. & Wilson, R. (2008) Short Communication: El Ni˜no and Indian Ocean influences on Indonesian drought: implications for forecasting rainfall and crop productivity. International Journal of Climatology, Vol.28, pp.611-616.

Eischeid, J. K., Pasteris, P. A., Diaz, H. F., Plantico, M. S. & Lott, N. J. (2000) Creating a serially complete, national daily time series of temperature and precipitation for the Western United States. J. Appl. Meteorol., Vol.39, pp.1580–1591.

FirdausHum, N. N. M. & Talib, S. A. (2016) Spatial interpolation of monthly precipitation in selangor, Malaysia-Comparison and evaluation of methods. Journal of Applied and Physical Sciences, Vol.2, No.1, pp.1-9

Giarno, Zadrach, L. D. & Mustofa, M. A. (2012) Kajian awal musim hujan and awal musim kemarau di Indonesia. Jurnal Meteorologi and Geofisika, Vol.1.  pp.1–8.

Giarno, Hadi, M. P., Suprayogi, S. & Murti, S. H. (2018), Distribution of Accuracy of TRMM Daily Rainfall in Makassar Strait. Forum Geografi, Vol.32, No.1, pp.38-52.

Goovaerts, P. (2000) Geostatistical approaches for incorporating elevation into the spatial interpolation of rainfall. J. Hydrol., Vol.228, No.1, pp.113-129.

Hashiguchi, H., Tabata, Y., Yamamoto, M. K., Marzuki, Mori, S., Yamanaka, M. D., Syamsudin, F. & Manik, T. (2013) Observational Study on Diurnal Precipitation Cycle over Indonesian Maritime Continent. Journal of Disaster Research, Vol.8, pp.1–9.

Hidayat, R. and Kizu, S. (2010) Influence of the Madden–Julian Oscillation on Indonesian rainfall variability in austral summer. International Journal of Climatology, Vol.30, pp.1816-1825.

Hu, K., Li, B., Lu, Y. & Zhang, F. (2004) Comparison of various spatial interpolation methods for non-stationary regional soil mercury content, Environmental Science. Vol.25, pp.132–137.

Isaaks, E.H. & Srivastava, R.M. (1990) An introduction to applied geostatistics. New York, USA: Oxford University Press.

Javari, M. (2017) Comparison of interpolation methods for modeling spatial variations of Precipitation in Iran. International Journal of Environmental and Science Education, Vol.12, No.5, pp.1037-1054.

Jia, S. F., Zhu, W. B., Lu, A. F. and Yan, T. T., (2011) A statistical spatial downscaling algorithm of trmm precipitation based on NDVI and DEM in the qaidam basin of china. Remote Sensing and Environment, Vol.115, pp.3069–3079.

Jolliffe, I. T. and Stephenson, D. B. (2013) Forecast Verification: A Practitioner's Guide in Atmospheric Science. 2nd Edition, Wiley.

Karydas, C. G., Gitas, I. Z., Koutsogiannaki, E., Simantiris, N. L. and Silleos, G. N. (2009) Evaluation of spatial interpolation techniques for mapping agricultural topsoil properties in Crete. EARSeL eProceedings 8.

Kebloutia, M., Ouerdachia, L. and Boutaghane, H. (2012) Spatial Interpolation of Annual Precipitation in Annaba–Algeria–Comparison and Evaluation of Methods. Energy Procedia, Vol.18, pp.468 – 475.

Kyriakidis, P.C., Kim J. and Miller, N.L. (2001) Geostatistical mapping of precipitation from rain gauge data using atmospheric and terrain characteristics. J. Appl. Meteorol.,Vol.40, No.11, pp.1855-1877.

Langella, G., Basile, A., Bonfante, A. and Terribile, F. (2010) High-Resolution space-time rainfall analysis using integrated ann inference systems. Journal of Hydrology, Vol.387, No.3-4, pp. 328–342.

Lee, H. S. (2015) General Rainfall Patterns in Indonesia and the Potential Impacts of Local Seas on Rainfall Intensity. Water, Vol.7. pp.1750-1768.

Li, M. & Shao, Q. X. (2010) An improved statistical approach to merge satellite rainfall estimates and raingauge data. Journal of Hydrology, Vol.385, No.1-4, pp.51–64.

Ly, S., Charles, C. and Degré, A. (2013) Different methods for spatial interpolation of rainfall data for operational hydrology and hydrological modeling at watershed scale. A review. Biotechnol. Agron. Soc. Environ. Vol.17, No.2, pp.392-406.

Madden, R. A., and Julian, P. R. (1972) Description of global-scale circulation cells in the tropics with a 40–50 day period. Journal of the Atmospheric Sciences, Vol.29, No.6, pp.1109–1123.

Martono, M. & Wardoyo, T. (2017) Impacts of El Niño 2015 and the Indian Ocean Dipole 2016 on Rainfall in the Pameungpeuk and Cilacap Regions. Forum Geografi, Vol.31, No.2, pp.184–195.

Tewolde, M. G., Beza, T. A., Costa, A. C. and Painho, M. (2010) Comparison of Different Interpolation Techniques to Map Temperature in the Southern Region of Eritrea, 13th AGILE International Conference on Geographic Information Science.

Otieno, H., Yang, J., Liu, W. and Han, D. (2014) Influence of rain gauge density on interpolation method selection. J Hydrol Eng, Vol.19, No.11, pp.1-8.

Peatman, S. C., Matthews, A. J. and Stevens, D. P. (2014) Propagation of the Madden–Julian Oscillation through the Maritime Continent and scale interaction with the diurnal cycle of precipitation, Quarterly Journal of the Royal Meteorological Society. Vol.140, No.680,pp. 814–825.

Peatman, S. C., Matthews, A. J. and Stevens, D. P. (2015) Propagation of the Madden–Julian Oscillation and scale interaction with the diurnal cycle in a high-resolution GCM. Climate Dynamics. Vol.45, No.9, pp.2901–2918.

Qian, J. H. (2007) Why precipitation is mostly concentrated over islands in the maritime continent. Journal of The Atmospherics Sciences, Vol.65, pp.1428–1441.

Renggono, F. (2011) Pola sebaran hujan di DAS Larona. Jurnal Sains & Teknologi Modifikasi Cuaca, Vol.12, pp.17–24.

Tabios, G.Q. and Salas, J.D. (1985) A comparative analysis of techniques for spatial interpolation of precipitation. Water Resour. Bull., Vol.21, pp.265-380.

Vincent, C. L. & Lane, T. P. (2016) Evolution of the diurnal precipitation cycle with the passage of a Madden–Julian oscillation event through the Maritime Continent. Monthly Weather Review, Vol.144, No.5, pp.1983–2005.

Wagner, P. D., Fiener, P., Wilken, F., Kumar, S. and Schneider, K. (2012) Comparison and evaluation of spatial interpolation schemes for daily rainfall in data scarce regions, Journal of Hydrology. Vol.464–465, pp.388–400

Wang, S., Huang, G. H., Lin, Q. G., Li, Z., Zhang, H. and Fan, Y. R. (2014) Comparison of interpolation methods for estimating spatial distribution of precipitation in Ontario, Canada, Int. J. Climatol. Vol.34, pp.3745–3751.

Weber, D. D. and Englund, E. J. (1994) Evaluation and comparison of spatial interpolators II. Math. Geol., Vol.26, pp.589–603.

Weng, Q. (2002) Quantifying uncertainty of digital elevation models derived from topographic maps. In: Richardson D, van Oosterom P (eds) Advances in Spatial Data Handling, Springer-Verlag, New York, pp.403–418.

Weng, Q. (2006) An evaluation of spatial interpolation accuracy of elevation data. In Progress in Spatial Data Handling, Riedl A, Kainz W, Elmes GA (eds), Springer-Verlag, Berlin, pp.805–824.

Wijemannage, A. L. K., Ranagalage, M. and Perera, E. N. C. (2016) Comparison of spatial interpolation methods for rainfall data over Sri Lanka, ACRS Proceedings.

Xu, W., Zou, Y., Zhang, G. and Linderman, M. (2015) A comparison among spatial interpolation techniques for daily rainfall data in Sichuan Province, China, Int. J. Climatol. Vol.35, pp.2898–2907

Yang, X., Xie, X., Liu, D. L., Ji, F. and Wang, L. (2015) Spatial Interpolation of Daily Rainfall Data for Local Climate Impact Assessment over Greater Sydney Region. Advances in Meteorology, http://dx.doi.org/10.1155/2015/563629.

 

© 2019 by the authors. Submitted for possible open access publication under the terms and conditions of the Creative Commons Attribution (CC-BY-NC-ND) license (http://creativecommons.org/licenses/by/4.0/).

 

Article Metrics

Abstract view(s): 953 time(s)

Refbacks

  • There are currently no refbacks.