Regression kriging to improve basal area and growing stock volume estimation based on remotely sensed data, terrain indices and forest inventory of black pine forests

Background: The use of satellite imagery to quantify forest metrics has become popular because of the high costs associated with the collection of data in the field. Methods: Multiple linear regression (MLR) and regression kriging (RK) techniques were used for the spatial interpolation of basal area (G) and growing stock volume (GSV) based on Landsat 8 and Sentinel-2. The performance of the models was tested using the repeated k-fold cross-validation method. Results: The prediction accuracy of G and GSV was strongly related to forest vegetation structure and spatial dependency. The nugget value of semivariograms suggested a moderately spatial dependence for both variables (nugget/sill ratio~70%). Landsat 8 and Sentinel-2 based RK explained approximately 52% of the total variance in G and GSV. Root-mean-square errors were 7.84 m2 ha-1 and 49.68 m3 ha-1 for G and GSV, respectively. Conclusions: The diversity of stand structure particularly at the poorer sites was considered the principal factor decreasing the prediction quality of G and GSV by RK. New Zealand Journal of Forestry Science Bolat et al. New Zealand Journal of Forestry Science (2020) 50:4 https://doi.org/10.33494/nzjfs502020x49x


Introduction
Forest inventory studies are conducted to quantify forest attributes such as basal area (G), growing stock volume (GSV), biomass and carbon sequestration that are providing essential information for forest managers. G has been often used as an important auxiliary variable to determine competition indices (Contreras et al. 2011), stand density (Curtis 1982), and diameterheight increment and mortality (Kuehne et al. 2016). Determination of the GSV (i.e. standing volume) is needed to assess silvicultural treatments for ensuring sustainable wood production in a managed forest (O'hehir & Nambiar 2010).
Measuring G and GSV can be time consuming particularly for precision forestry. As a consequence, foresters are using remote sensing for the estimation of forest metrics in remote and difficult locations.
Keywords: Forestry, k-fold cross-validation, Landsat 8, Sentinel-2, semi-arid region They found that using regression kriging resulted in the greatest precision and almost unbiased estimates of G for Loblolly pine and Slash pine. The authors concluded that kriging is robust for the interpolation of forest variables such as G, GSV, and other carbon and forestry metrics such as stand density, stand age and dominant height. Maselli and Chiesi (2006) evaluated three methods relying on remotely sensed data including k-nearest neighbour method, a locally calibrated regression analysis and a kriging method for forest volume estimation in a Mediterranean region. Results showed that all three methods were similar in terms of their correlation coefficient and root-mean-square error, but the kriging method was slightly better with regard to lower residual variance. dos Reis et al. (2018) assessed a variety of methods including a random forest algorithm and a kriging method integrated with Landsat Thematic Mapper (TM) data for the spatial prediction of G and GSV. Multiple linear regression and artificial neural network models had the poorest performance for the estimation of timber volume of Eucalyptus clonal stands.
In forest inventories, a wide range of data types are gathered in terms of forest variables as well as geographical locations. The relationships between a response variable and explanatory variable(s) can differ from one sample unit to another because of sampling density, site index and stand age. In order to obtain unbiased residual estimates, local (random) effects should be introduced in a model (Fortin et al. 2007). If a model accounts for local effects adequately, it may predict the target variable at an acceptable level of accuracy (Magnussen et al. 2017). Geostatistical approaches have the ability to account for unknown random effects in a local forest area, providing approximately unbiased estimates of residuals (Isaaks & Srivastava 1989). Also, the forest metrics are spatially correlated within the same locations. Although multiple regression analysis is user-friendly and easy to use for forest managers, it does not account for spatial autocorrelation in data. Therefore, especially in case of a trend in the error variance across predictions, this method can be inappropriate in the prediction of forest metrics due to its biased parameter estimates. RK uses residuals from the least-square methods (i.e. multivariate regression). The improvements on the predictions can be important when taking spatial autocorrelation in the data into account. Although there are a variety of approaches to consider spatial autocorrelation in data, the development and implementation of these methods require a strong statistical background. In this context, RK is both userfriendly and relatively simple, which may suit the needs of forest managers. Therefore, the objective of this study was to evaluate the performance of Landsat 8 and Sentinel-2 based RK and MLR for improving the spatial predictions of G and GSV.

Study area
The study area is within the Inner Anatolia semi-arid climate in the Black Sea -steppe transitional zone in Turkey ( Figure 1). It is bounded by latitudes 40° 29′ 09″-40° 30′ 44″ N and longitudes 33° 25′ 47″-33° 27′ 19″ E (WGS 1984 UTM Zone 36N). In the study area, average annual temperature and total precipitation are approximately 11.3 °C and 412 mm, respectively, while elevations range from 1280 to 1642 m (Turkish State Meteorological Service n.d.). Terrain variables of the study area were obtained from a digital elevation map with a twelve-metre resolution ( Table 1). The FIGURE 1. Location of the study area and its location in Turkey study area consists of 380 hectares, where Anatolian black pine (Pinus nigra Arnold. subsp. pallasiana Lamb Holmboe) is widely distributed. In this region, harsh ecological conditions such as low precipitation, high temperatures, and poor soils contribute towards poor forest growth (Barbati et al. 2014;Erşahin et al. 2016;Göl & Abay 2003). Such conditions adversely influence the distribution and abundance of conifer forests in this region (Çolak & Rotherham 2006). Anatolian black pine has been frequently planted because it adapts well to the semi-arid conditions of this region (Konukcu 2001).

Field research
The study area was afforested in the 1961-1965 period and have been thinned every ten years since establishment. A systematic sampling with 51 plots distributed within a 200 x 200 m grid were measured in 2016. Circular plot sizes ranged from 400 to 800 m 2 depending on the crown closure as suggested by the forest management guidelines for Turkey (Anonymous 2008). We calculated descriptive statistics for basal area (G, m 2 ha -1 ), and growing stock volume (GSV, m 3 ha -1 ) using the SAS software ® . (Table 2). G and GSV of each sample plot were calculated by Eq. 1 and Eq. 2 (Şenyurt 2017). In each sample plot, tree diameters at breast height (dbh) were measured with 0.1 cm-precision for each living tree (dbh ≥ 8 cm) using calipers. Trees with unsuppressed growth (due to lack of competition) representing potential site productivity were selected to assess the site index of each sample plot. Tree heights were measured with 1% precision using the Vertex IV ultrasound instrument. Top height was calculated as the mean height of the 100 tallest trees within a hectare. The mean age of each sample plot was measured using increment cores extracted from five trees that were selected based on the mean square diameter of the sample plot. Subsequently, the site index of each sample plot was assessed using the site index equation developed by Kalıpsız (1963) for black pine stands for a base age of 100 years. Relative density (RD) of each sample plot was calculated by the stand density index of Curtis (1982)

Satellite images, processing, and data
Landsat 8 and Sentinel-2 satellite images were obtained from the United States Geological Survey Earth Explorer data portal (USGS 2000) and acquired on 02 and 20 August 2016, respectively. WGS 1984 (UTM Zone 36) projection system was used for orthorectification and georeferencing of the satellite images with first order nearest neighbourhood rules. The atmospheric correction was applied to Landsat 8 and Sentinel-2 images using the ATCOR module of QGIS ® Software. Then, the geometric correction was applied using twenty ground control points such as crossings, bridges, and hill tops through a Global Positioning System (GPS).
Inventory was carried out using circular plots of 400, 600 and 800 m 2 . A single pixel obtained from Landsat 8 covers an area of 900 m 2 (30x30 m). Sentinel-2 has bands with a spatial resolution of 10, 20 and 60 m, which cover an area of 100, 200 and 3600 m 2 , respectively. Since Landsat 8 and Sentinel-2 spatial resolution (i.e. B9 and B10 bands of Sentinel-2) are higher than sample plot size, the position of some sample plots may not be the center of a pixel. In such a circumstance, the buffer zone was applied for obtaining more representative data. For those plots that did not coincide with the center of the corresponding pixel, the average of all those pixels comprised by a plot was calculated.

Statistical and geostatistical analysis Multiple linear regression (MLR)
MLR has been used to assess the relationships between forest structural attributes and remotely sensed data (Naesset 2002). We focused on the predictability of G and GSV using MLR and regression kriging (RK) based on Landsat 8, Sentinel-2 data, and terrain indices. For this purpose, the statistical significance of the terrain variables and of the variables obtained from Landsat 8 and Sentinel-2 were tested at 0.05 significance level using the forward variable selection technique in SAS ® software.
Where; b 0 is a constant coefficient, b i is the vector of independent variables, x i is an independent variable that accounts for variation of a dependent variable and e is the model residual. The model residuals are expected to have a normal distribution.

Regression kriging (RK)
The sample plots in the same location are inherently interdependent (spatially correlated). Therefore, in fitting a model, it is highly important to consider the spatial autocorrelation in data in order to improve the model performance. RK uses the spatial autocorrelation in the residuals from MLR, and therefore may improve the predictions, as MLR does not consider spatial autocorrelation in data. Kriging methods have been increasingly used for interpolating spatially dependent data in recent decades.
RK consists of three steps. Initially, MLR is carried out to estimate the regression parameters. Then, the residuals from MLR are incorporated into ordinary kriging to account for prediction uncertainty using ArcGIS ® software. Finally, the values of the target variable were calculated by adding the predictions of MLR and the kriged values of the residuals (Odeh et al. 1995) (Figure 2).
Where: φ 0 and φ i are the estimated values by MLR, x i is an independent variable that accounts for dependent variable variation, n is the number of observations, λ i is ordinary kriging weight and, ε is the kriged model residual at measurement locations.

Predictive performance and validation of MLR and RK
The repeated k-fold cross-validation with n repetition procedure was used to adequately exploit our small sample data (n=51) using SAS ® software. Five-fold with 10 repetitions was applied in the cross-validation procedure, which allowed to reduce prediction biases. The data were randomly split into two groups (training and testing subsamples) using the unrestricted random sampling method at each repetition, and the different random subsamples were selected for the training and testing purposes. This process was repeated ten times for determining the best model parameters at the 0.05 significance level. Finally, the best predictive MLR and RK models were compared using goodness-of-fit statistics including the root-mean-square error (RMSE), the mean absolute error (MAE) and the adjusted coefficient of determination (R 2 adj ) (Kozak & Kozak 2003) (Eq. 6 to 8). The similarities between the observed and the predicted values were assessed by the correlation coefficient (r). RK and MLR were further evaluated by plotting the residuals for each model. Also, the distribution of the relative error percent, RE (%), was used for model comparison.

Satellite Indices
Satellite Indices Satellite Indices Satellite Indices Landsat 8

B1-B7, B9
Sentinel-2 S1-S12, S8a Sentinel - (2017) Where; D i and D i are the observed and the predicted values, respectively, D i is the mean of the observed values, and n T and k are the total number of the observations and independent variables, respectively.

Results
We used multiple linear regression (MLR) to predict basal area (G) and growing stock volume (GSV) based on Landsat 8 and Sentinel-2 data and terrain indices choosing explanatory variable(s) at the 0.05 level of significance. The band values and vegetation indices of Landsat 8 and Sentinel-2 (Table 3) and terrain variables (Table 1) were analyzed to predict G and GSV. When Landsat 8 and the terrain variables were considered, EVI, elevation and STI independent variables were significant at the 0.05 level of significance. When Sentinel-2 and the terrain variables were considered, NLI NIRn2 , elevation and STI were found to be significant at P<0.05 (Table 5). Goodness-of-fit statistics for MLR and RK are given in Table 4. RK performed better than MLR for both prediction and validation datasets, particularly when using RK based on Landsat 8 data. In both data sets, the models based on Sentinel-2 data performed worse than those based on Landsat 8. In summary, the prediction and validation statistics suggested using RK based on Landsat 8 data to predict G and GSV (Table 4).
Residual plots for G and GSV are displayed in Figure 3. The residual patterns of RK and MLR had no trend, suggesting that the predictions were unbiased for all cases. Figure 4a shows the relative errors (REs) for the model predictions against different values of G and GSV. The number of observed data used in the predictions is an important factor determining modeling performance or prediction quality. Interestingly, in general, Figure 4a indicated that the models performed better at G ≤20 and G ≥40 m 2 ha -1 , while the number of observed data were lower at these Gs. However, we believe that those lower RE values for G predictions at ≤20 and ≥40 m 2 ha -1 could be biased. Figure 4b shows a similar case for the GSV models (the number of observed data are not shown in the figure).
A graph of the observed versus modelled (predicted) Gs, are shown in Figure 5. These graphics suggest that G estimates obtained from Landsat 8 showed greater similarity to the observations than those obtained from Sentinel-2 (r=0.72). In contrast, GSV estimates obtained from Sentinel-2 were more precise than those obtained from Landsat 8 (r=0.75).
The experimental semivariograms for the residuals of G and GSV are shown in Figure 6. Nugget values for Landsat 8 are lower than those for Sentinel-2. The nugget values are 70% for both G and GSV when using Landsat 8 and 70% for G and 65% for GSV when using Sentinel-2. Cambardella et al. (1994) proposed that a variable with a nugget ratio <25% is assumed to be strongly spatially dependent, between 25% and 75% moderately spatially ̭ ̵ FIGURE 2: Flowchart of regression kriging application used to spatially interpolate G and GSV dependent, and >75% weakly spatially dependent. The nugget value was quite high in our research field, suggesting small-scale variabilities in the study area, in other words, a weak autocorrelation in our data. The surface maps of G and GSV obtained by RK are shown in Figure 7.

Discussion
The goodness-of-fit-statistics suggested that RK models were adequate for both G and GSV predictions. Although the residual distributions showed that RK models were unbiased for G and GSV estimates, the relative error distributions suggested that RK models were biased for lower and higher values of G and GSV. Since plots within the same sampling unit are inherently correlated, the assumption of independence of observations is generally violated in the ordinary least square methods (e.g. multivariate regression) (Gregorie 1987). This feature results frequently in large error variance in residuals achieved by MLR (Fortin et al. 2007). In this study, the small error variance occurred due to low spatial dependence in the residuals   (Figure 6). While MLR may be more suitable in the case of low spatial dependency in the data, it does not take into account the spatial structure in data (Palmer et al. 2009). RK has the capability to account for these structures in continuous variables and has the advantages of the variable selection (Odeh et al. 1995). Therefore, we recommend the use of RK thus providing the surface map of the predicted values and accounting for measurement uncertainties.
The study area is located in the Anatolian steppe transitional zone, covered by a semi-arid ecosystem (Göl & Abay 2003). In this region, tree (or stand) growth is limited due to the ecological conditions such as low precipitation, extreme temperature, and also poor soils. In this type of ecosystems, the General Directorate of Forestry of Turkey frequently recommends the Anatolian black pine for afforestation (Konukcu 2001). However, the harsh environmental conditions may lead   to highly variable stands. Therefore, the explanatory variables (e.g. NDVI) are likely to be less correlated with the response variable, as suggested by our study. In our view, the forest vegetation structure limited the performance of RK, as the study site was composed of poor growth and thinned patches. In other words, sample plots with similar diameters at breast height may differ in tree heights ranging from 20 to 24 m (8.2%), 15 to 19 m (57.4%) and 10 to 14 m (34.4%). These conditions led to poor correlations between the spectral data and the ground measurements. Another reason for suboptimal predictions was the thinning treatments in the study area. Since the stands were partially thinned, the sample plots with similar G showed the differences in terms of crown closure and stem density, which results in a discrepancy between the spectral data and the stand characteristics, leading to imprecise and biased models. Thinning treatments and harsh environmental conditions led to the spatial discontinuities, suggesting a weak spatial dependency of the G and GSV in the study area. In other words, the high short-range variability as evidenced by high nugget effect (~70%) occurred in the data (Dai et al. 2014;Gilbert & Lowell 1997;Gunnarsson et al. 1998;Maselli & Chiesi 2006;Viana et al. 2012). The nugget effect is one of the key factors decreasing the interpolation quality by kriging (Isaaks & Srivastava 1989). Our results suggest that a systematic sampling (200 x 200 m grid) resulted in a high nugget effect in a semi-arid region (nugget effect ~70% for our data). In this context, the results of this study are important, evidencing that the sampling distance and scheme used in National Forest Inventory is not proper for geostatistical studies and that additional random subsamples are needed in finer resolution for a successful geostatistical analysis and interpolation. In these types of forests, Destan et al. (2013) and Corona et al. (2014) advised that inventory should be carried out with smaller sampling distances and that including more auxiliary variables representing spatial variability (e.g. soil and topography) may improve the performance of the kriging methods. In this study, aspect, slope, elevation, TWI, STI, and CI were used as explanatory variables to improve the predictions. Along with EVI indices of Landsat 8, NLI NIRn2 indices of Sentinel-2, elevation and STI contributed to explain the total variance of G and GSV at the 0.05 significance level. Consequently, our results are promising as RK explained approximately fifty percent of total variance observed in both G and GSV obtained from Landsat 8 and Sentinel-2.

Conclusions
EVI obtained from Landsat 8, NLI NIRn2 obtained from Sentinel-2, elevation and STI were the best independent variables explaining G and GSV. RK performed adequately to predict G from Landsat 8 and GSV from Sentinel-2. RK and MLR had similar residual scatterplots. However, the relative error distributions showed that especially in GSV estimates RK based on Landsat 8 performed better than MLR based on both Landsat 8 and Sentinel-2.
A highly random distribution of G and GSV occurred in our study site as shown by high nugget variance, suggesting a weak autocorrelation from a geostatistical perspective, which might be attributed to the coexistence in short distances of high-low productive patches, the coarse sampling scheme and undefined thinning practices. This suggests the sampling scheme may have been inadequate to detect the autocorrelation in G and GSV. Across such sites, a geostatistical sampling scheme should be performed with shorter sampling distances to improve modeling of semivariograms around the origin, which may help to decrease the nugget effect. Sampling sites having equal distributions of site index values can increase the performance of MLR and RK. The sampling sites with thinning practices should be excluded to improve the variance explained. In conclusion, we found that reasonably accurate predictions for forest planning can be achieved using Landsat 8 and Sentinel-2 through the RK method.