Diameter distribution model development of tropical hybrid Eucalyptus clonal plantations in Sumatera, Indonesia: A comparison of estimation methods

Background: Effective forest management and planning often requires information about the distribution of volume by size and product classes. Size-class models describe the diameter distribution and provide information by diameter class, such as the number of trees, basal area, and volume per unit of area. A successful diameterdistribution model requires high flexibility yet robust prediction of its parameters. To our knowledge, there are no studies regarding diameter distribution models for Eucalyptus hybrids in Indonesia. Therefore, the aim of this study was to compare different recovery methods for predicting parameters of the 3-parameter Weibull distribution for characterising diameter distributions of Eucalyptus hybrid clone plantations, on Sumatera Island of Indonesia. Methods: The parameter recovery approach was proposed to be compatible with stand-average growth and yield models developed based on the same data. Three approaches where compared: moment-based recovery, percentilebased prediction and hybrid methods. The ultimate goal was to recover Weibull parameters from future stand attributes, which were predicted from current stand attributes using regression models. Results: In this study, the moment method was found to give the overall lowest mean error-index and Kolmogorov– Smirnov (KS) statistic, followed by the hybrid and percentile methods. The moment-based method better fit long tails on both sides of the distribution and exhibited slightly greater flexibility in describing plots with larger variance than the other methods. Conclusions: The Weibull approach appeared relatively robust in determining diameter distributions of Eucalyptus hybrid clone plantation in Indonesia, yet some refinements may be necessary to characterize more complex distributions. New Zealand Journal of Forestry Science Waldy et al. New Zealand Journal of Forestry Science (2022) 52:1 https://doi.org/10.33494/nzjfs522022x151x


Introduction
Effective forest management and planning often requires information about the distribution of volume by size and product classes (Burkhart & Tomé 2012). Size-class and individual-tree models are required for predicting distributions of multiple products (Weiskittel Keywords: 3-parameter Weibull; parameter recovery method; moment-based estimates; percentile estimates; hybrid parameter estimates; Indonesia, Eucalyptus hybrids The concept of diameter distributions is not new to forestry literature (de Liocourt 1898). Several probability density functions are used to describe the structure of forest stands, including Weibull distributions (Bailey & Dell 1973). The Weibull distribution is one of the most widely used diameter distribution models (Burkhart & Tomé 2012) and one of the best-performing models among other distributions (Eisfeld et al. 2005).
The Weibull distribution is a flexible distribution function capable of fitting a variety of distribution shapes (Poudel & Cao 2013). It has a lower bound and calculations of proportions of trees across diameter classes is straightforward and does not require numeric integration (Cao 2012). In addition, the parameters of the Weibull distribution are generally well correlated with several stand-level attributes (Bailey & Dell 1973), such as dominant height, quadratic mean diameter, and mean diameter (Sghaier et al. 2016). Because of its flexibility, the Weibull distribution is applicable to many different species and forest structures from pure, single cohort, even-aged stands to multispecies, multicohort, uneven-aged stands (Mattney & Sullivan 1982;Hyink & Moser 1983;Knowe et al. 1994;Knowe & Stein 1995;Siipilehto 1999;Cao 2004;Newton et al. 2005).
The three-parameter Weibull distribution function is commonly used to quantify tree diameter distributions because of: its flexibility in fitting a variety of shapes and degrees of skewness; the relative simplicity of estimating its parameters; and the cumulative distribution has a closed-form solution (Bailey & Dell 1973;Bowling et al. 1989;Little 1983;Matney & Sullivan 1982;Rennolls et al. 1985;Schreuder & Swank 1974;Zarnoch et al. 1991). The probability density function (PDF) for the threeparameter Weibull distribution is specified by: (1) = 0, otherwise Integration yields a closed-form cumulative distribution function (CDF): (2) The parameter α is referred to as the location parameter and defines the minimum value of the distribution, β is the scale parameter, γ is the shape parameter and is diameter at breast height (Clutter et al. 1983). The β and γ parameters must be positive, while α mathematically can be positive, zero, or negative (provided -α ≥ 0). For diameter distribution applications ≥ 0.
For shape (γ) parameters less than 1, the Weibull distribution assumes a classic inverse J-shape distribution typically found in uneven-aged stands, while when γ equals 1, the negative exponential distribution results. Mound shape curves typical of even-aged stands are produced for γ greater than 1 (Burkhart & Tomé 2012). When γ is equal 3.6, the Weibull distribution is symmetrical, similar to a normal distribution shape. Right-skewed curves are defined for γ less than 3.6 and left-skewed curves for γ greater than 3.6. As γ approaches infinity, the distribution approaches a spike over a single point (Burkhart & Tomé 2012). The location parameter often is assumed known in many cases, so it is logical to set this parameter to the smallest value or the lower limit of diameter measurement (Kershaw et al. 2016).
Parameter estimates based on maximum likelihood methods for the Weibull distribution requires individual tree data (Bolker 2008). Maximum likelihood estimation has several desirable statistical properties, such as consistency and asymptotic normality (Bury 1999;Royle & Dozario 2008), and provides better estimates compared to other methods (Zhou & McTague 1996). However, it requires more computational resources (Cao & McCarty 2006) and precise individual tree measurements. In most forestry applications, diameter distributions are generally predicted from characteristics measured in a stand of interest. Hyink and Moser (1983) presented a generalized framework for estimating diameter distributions using parameter prediction methods (PPM) and parameter recovery methods (PRM). In the PPM approach, the future parameters of the distribution model are directly predicted from the current parameters and other information about the stand such as density, basal area and volume. (Kershaw et al. 2016). The PPM uses the location (α), scale (β), and shape (γ) parameters as the dependent variables, which have been previously estimated using maximum likelihood methods (Cao 2004). In the PRM approach, future values of stand parameters are directly predicted, and the parameters of the diameter distribution are derived from the stand parameters (Hyink & Moser 1983;Kershaw et al. 2016).
According to Siipilehto and Mehtätalo (2013), there are two main options for PRM approaches: moment-based and percentile-based estimation. The PRM moment-based approach solves for the Weibull parameters typically using the moments of the diameter distribution that are estimated from regression equations using a variety of stand characteristics (Bowling et al. 1989;Hyink & Moser 1983;Lindsay et al. 1996;Matney & Sullivan 1982;Newton et al. 2004;Strub & Burkhart 1975) or from stand-level models predicting future stand conditions (Clutter et al. 1983;Waldy et al. 2021). The percentile-based method predicts the Weibull parameters using percentiles of the diameter distribution that also can be estimated from stand characteristics (Bailey et al. 1989;Brooks et al. 1992;Lohrey & Bailey 1976;Knowe 1992;Magnussen 1986;McTague & Bailey 1987). Other methods, such as a hybrid approach (McTague & Bailey 1987), cumulative distribution function regressions (CDFR; Cao 2004), and modified CDFRs (Poudel & Cao 2013) also have been proposed.
A successful diameter-distribution model requires robust predictions of its parameters. The PRM approach was proposed because its estimates were compatible with stand-average growth and yield models developed from the underlying diameter distribution data (Hyink & Moser 1983). Other methods were proposed because of the nature of the underlying data and/or the objectives of the study. In this study, the overall objective of this analysis was to evaluate different parameter recovery methods for predicting parameters of the Weibull PDF for characterising diameter distributions of Eucalyptus hybrid clone plantations in Sumatera, Indonesia. The specific objectives were to: (1) compare moment-based, percentile-based and hybrid methods; (2) determine the best approach for robust estimation of diameter distributions across a full range of current and future predicted stand conditions; and (3) based on observed performance, predict moments and/or percentiles from stand and site characteristics using nonlinear regression analyses.

Study site
This study was conducted in Sector Teso East of PT. Riau Andalan Pulp and Paper, a member of Asia Pacific Resources International Holding Limited (APRIL) Group and used inventory plot data from Eucalyptus hybrid plantations. Teso East is 19,600 ha in size and is located in the central region of Sumatera Island, Riau Province, Indonesia in the Kampar and Kuantan Singingi regencies between 101° 18′ E and 101° 32′ E, and 00° 09′ N and 00° 03′ N ( Figure 1). The region is characterised by a wet tropical climate with average rainfall ranging from 2000-3000 mm per year, and the average rainy days is around 160 days per year. The annual average temperature is 27.6 o C with an average minimum of 21.8 o C and an average maximum of 35 o C.
Until now, Estate Teso East has had its fifth rotation. The first rotation was planted with Acacia mangium in 1995 and then Eucalyptus sp. was planted on a large scale starting in 2010. Based on the soil characteristics, this study location was dominated by soil horizon B (topsoil) and C (parent material). This plantation area is relatively flat with slopes ranging from 0-15% and low elevation ranging from 30-90 metres above sea level.

Data collection
In APRIL plantations, the plot layout used systematic sampling with random starting points. Initial sampling intensity was 1% of total stand area (one plot represents 4 ha area) with an additional 2-5% sampling intensity for a pre-harvest inventory (one year before harvest). Plots were circular with a radius of 11.28 m (0.04 ha). First measurements were made at six months after planting and regularly continued at twelve-month intervals until harvesting. dbh was measured beginning at 18 months. All live trees with dbh of 1.0 cm and greater on each FIGURE 1: Location of the study area in the Sector Teso East, central region of Sumatera plot were measured using a diameter tape at 1.3 metres above the ground as measured from the uphill side of the stem. Each tree was assigned a status (live or dead), and assessed for wind damage, pests, and diseases.
This study only used the most recently established eucalyptus clone stands that were planted 2013 to 2016 with an initial spacing of 3 x 2 m (1667 trees per ha) and had at least three consecutive measurements. There are 2808 measurements in this study, where clone A and B accounted 2,476 and 332 plots, respectively. Table  1 summarises stand attributes and their associated increments by clone. In comparison to clone A, clone B is recently developed by the APRIL Group concession. The maximum age of existing inventory data for clone B is 42 months. Because of its promising growth (Table 1), it was important to include this clone in the modeling conducted in this study.

Maximum likelihood estimation (MLE)
Weibull parameters were estimated using the individual tree dbh measurements from each plot at each measurement period using maximum likelihood estimation (MLE) methods (Johnson et al. 1995;Casella & Berger 2001). The location parameter (α) was set to 1 cm in this study (minimum dbh measured), and the MLE estimates of the scale (β) and shape (γ) parameters were used as reference distributions to compare with the recovered parameter estimates. The likelihood function of the Weibull PDF (Eq. 1): (3) MLEs where obtained by minimising the negative of the logarithm of the likelihood function.

Moment-based parameter recovery
In moment-based parameter recovery, regression equations, as a function of mean top height, stand density, age, or relative density, are used to predict the arithmetic mean diameter (D̄, the first moment) and Waldy et al. New Zealand Journal of Forestry Science (2022)  the quadratic mean diameter (DQ, the square root of the second moment). As with the MLE estimates, we set α =1, our minimum measured dbh. Weibull parameters were then recovered from the arithmetic mean diameter (D̄) and quadratic mean diameter (DQ) using (Burkhart & Tomé 2012): where was predicted DQ, D̄ was predicted D̄, and α, β, and γ were recovered Weibull parameters. Eqs. 4 and 5 were numerically solved to recover the parameters using the algorithms developed by Kershaw and Maguire (1995).

Percentile-based parameter recovery
Percentile estimation of the Weibull parameters are relatively easy to obtain given the closed form solution of the cumulative distribution and the geometric interpretation of the distribution parameters (Zarnoch & Dell 1985). Similar to the moment estimates, the percentile estimates are functions of the distribution parameters and are often highly correlated with forest stand characteristics (Borders et al. 1987;Knowe 1992;Knowe et al. 1992). If three sample percentiles are known, each can be equated to its corresponding Weibull cumulative distribution function value and the three equations can be solved iteratively for estimates of α, β, and γ (Burkhart & Tomé 2012 Solving Eq.6 for D p yields: The scale parameter, β, is given by: Given two percentiles p 1 and p 2 where p 1 < p 2 , γ is estimated using: Theoretically, any two percentiles can be used; however, Bailey et al. (1989) found best performance resulted when percentiles represented a broader proportion of the distribution. In this study, we used the 25 th and 99 th diameter percentiles to recover parameters β and γ of the Weibull distribution.

Hybrid parameter recovery methods
Moments and percentiles are combined in the hybrid approach (Bailey et al. 1989;Brooks et al. 1992;Knowe et al. 2005;Lee & Coble 2006;Coble & Lee 2008;and Jiang & Brooks 2009). As in the other approaches, the location parameter was expected to be known (i.e., α = 1), and the β and γ parameters were recovered from a moment estimate and two percentile estimates using (Bailey et al. 1989): Here we used estimates of DQ, 25 th and 99 th diameter percentiles to estimate β and γ.

Moment and percentile estimation
The quadratic mean diameter, arithmetic mean diameter, and the percentiles were calculated for each plot and measurement period using the individual tree dbh data. Several regression equation forms were used to predict the diameter moments or percentiles (e.g., Matney & Farrar 1992;Baldwin & Feduccia 1987;Cao 2004) based on stand characteristic. In this study, the logistic equation was used to predict arithmetic mean diameter (D̄) and a modified general form of the regression equation from Cao (2004) to predict diameter percentiles with additional DQ variables to ensure compatibility with stand-level models (Waldy et al. 2021). DQ was estimated using the relationship of basal area and stand density. In Waldy et al. (2021), the stand density model from Clutter et al. (1983) and basal area model derived from the Schumacher polymorphic equation were the best fit models based on several models evaluated, for projecting future attributes to any point in time. These time-based models gave DQ estimates with rMSE = 0.72 cm and explained 85% of the variability (Waldy et al. 2021).
Arithmetic mean diameter (D̄) was estimated using a logistic equation and the estimated DQ: Finally, percentiles were estimated using: where D̄ was arithmetic mean diameter (cm); D i was the i th diameter percentile (cm); DQ was the quadratic mean diameter (cm); RD was a relative density measure defined as the ratio of actual density to the maximum density attainable in a stand with the same mean tree size; HT was the top height (m); AGE was stand age (months); b i were regression parameters and ϵ was a random error term. Equation (12) assured that DQ > D̄ because the logistic component, , and exp(x) > 0.
Fitted Eqs. 12 and 13 were used to predict each plot × measurement period moments.
Nonlinear regression was used to fit Eqs 12 and 13 and appropriate goodness of fit criteria were used to evaluate the moment and percentiles estimates. We then included clone and site class as random effects into the regression equations and nonlinear mixed effects methods (Pinheiro & Bates 2000) were used estimate fixed and random effects for each model. A likelihood ratio test used to assess significance of random effects (Weiskittel et al. 2011).

Model evaluation
The Kolmogorov-Smirnov (KS) statistic (Massey 1951) and an error-index (EI; Reynolds et al. 1988) were computed for each method to evaluate the three prediction methods. Using a significance level of 5%, the KS test was used to compare the estimated cumulative frequency and the observed frequency. The method producing the lowest average KS statistics and errorindex values was considered the best method. All estimation and analyses were carried out using the R statistical language (R Core Team 2020).

Moment and percentile prediction models
Using nonlinear least square analysis, all coefficients for Equation 12 and 13 to predict arithmetic mean diameter and diameter percentiles were significant (p < 0.05). Consequently, all parameters associated with stand characteristics (DQ, RD, HT, and AGE) were included for mixed-effect analysis that involved clone and site class within clone as random effects (Table 2). Based on the likelihood ratio test, the best random-effects models included all coefficients as random effects of clone and site class within the clone (Table 2). Parameter estimates and their associated standard errors (in parentheses), random effects standard deviations, and goodness-of-fit statistics for the arithmetic mean diameter, 25 th and 99 th diameter percentiles are shown in Table 2. The full model with fixed and random effects accounted for 99%, 77% and 93% of the variation for D̄, D 25 and D 99 prediction models, respectively ( Table 2). The D 25 percentile model had the lowest performance compared with the D̄ and D 99 models. Differences between clones were greater than differences across sites within clones for almost all modelling parameters, except b 1 and b 2 associated with the D 99 model (Table 2). Coefficient estimates (fixed + random effects) by clone and site classes for all prediction models are shown in Table 3.

Characteristics of Maximum Likelihood Estimates
The estimated Weibull scale parameters, β, based on the MLE method ranged from 3.95 to 15.11 and the shape parameters, γ, ranged from 1.68 to 11.50 (Table 4). Based on the K-S test, estimated diameter distributions were not significantly (p > 0.05) different from the observed diameter distributions for 68% (CI = 95%) and 80% (CI = 99%) of the observed Eucalyptus hybrid clone diameter distributions (Table 5). The scale parameter increased, and the shape parameter decreased, with increasing age and site class for both clones ( Figure  2). For a similar stand age, clone A had lower scale and shape parameters than clone B. In addition, clone B had smaller variation in scale parameter but higher variation in shape parameter than clone A (Figure 2).  average scale parameter, the PCT and HYB methods were relatively closer to the MLE based on the average shape parameter (Table 4; Figure 3). Shape parameters for HYB method were similar with the PCT that derived from the same variables and formulation (Table 4; Figure 3). Like most other applications, the shape parameter was more difficult to model than the scale parameter ( Figure 3).

Diameter distribution model comparisons
Based on the statistical model evaluation of three PRMs, the MOM had the best fit based on the KS statistic at 0.1757, followed by the HYB and PCT with KS statistic at 0.1988 and 0.2125, respectively (Table 5). The KS statistic also indicated the coverage (# of observed distributions fitting within the 95% CI) of estimated distributions were 49% (MOM), 26% (PCT), and 37% (HYB); and 66% (MOM), 44% (PCT), and 54% (HYB) at 99% confidence (Table 5). The MOM also had the lowest mean error-index at 26.5218, followed by HYB (27.7307) and PCT (31.4708) ( Table 5). In terms of the differences in precision for predicting the Weibull parameter, the MOM has the lowest variability with the standard deviation of error index at 6.7118 and the PCT had the highest variability at 9.7699. Although all approaches allow for a direct mathematical link between the predicted overall stand characteristics and a diameter distribution that is

Equation Parameter
Estimate coefficient by clone and site class    consistent with those characteristics, the moment-based method (MOM) indicated the best fit to the observed data when compared to the other methods (PCT and HYB). For evaluation, some graphical examination of the performance of the three PRMs were conducted. Figure  4 illustrates the three methods for typical plots of the diameter distributions observed in the Eucalyptus hybrid clone plantations. The plots represent a range of clones, stand ages, and the variation of distributions typically observed in the region. Figure 4a shows a unimodal distribution for clone A at 30 months and illustrates that the three methods perform equally well for modeling the distribution of that plot. Figure 4b shows a multimodal distribution. In this case, none of the three methods fitted the plot well and all missed the valley (8-9 cm) and the peak (13-14 cm). However, the MOM and PCT were better fits for the peak (6-7 cm) than the HYB. In Figure  4c for clone B at 30 months with a distribution that was close to normal, the MOM was the better fit than the others. The PCT and HYB show a similar pattern for this plot. While in Figure 4d with an irregular distribution, the MOM tends to deal with tails of both sides and MOM exhibited slightly greater flexibility in describing the larger variance than the two other methods.

Discussion
To be compatible with a recently developed standlevel growth and yield model (Waldy et al. 2021), the predicted quadratic mean diameter played an important role in recovering parameters of the Weibull distribution that characterised the future diameter distributions. In this study, the prediction models were quite good at predicting arithmetic mean diameter and 99 th percentiles but relatively poor at predicting the 25 th percentile. This means that stand attributes, especially DQ and AGE, have a stronger correlation with the arithmetic mean diameter and higher percentiles than lower percentiles, which has been found in similar analyses. For example, Cao (2012) found similar trends in predicting future diameter distributions of loblolly pine (Pinus taeda L.). He found that stand attributes explained more variability in estimating higher percentiles (50 th percentile and above) than lower percentiles. The method of moments is one of the most accurate methods for estimating the Weibull distribution parameters (Al-Fawzan 2000;Nanang 1998;Ueno & Osawa 1987;Shifley & Lentz 1985). Moments are the preferred method in growth and yield models because they ensure numeric compatibility and generally require fewer equations (Weiskittel et al. 2011). Our study indicated the MOM had a higher percentage of coverage for estimated distributions that fit the observed diameter distributions than PCT and HYB. This higher percentage for MOM in this study was likely because of sufficient sample sizes to model moment based recovery with an average of 60 trees per plot (plot size 0.04 ha). Bankston (2019) reported that larger plot sizes resulted in more accurately predicted diameter distributions. However, he suggested a plot size of 0.08 ha might be sufficient for model building for data from unthinned stands. A substantial decrease in error was no longer evident when plot size increased from 0.08 to 0.10 hectares. Shiver (1988) suggested a sample size of not less than 50 trees is required to obtain satisfactorily accurate estimates of the Weibull parameters, while simulation studies from Saborowski (1994) indicated that a sample with n = 80 could generally be expected to produce satisfactory results. To get a better diameter distribution that represented plot-level measurements, modifying the plot size is not a logical decision for the well-established forest inventory system applied by the APRIL group. Instead of using Weibull distributions, other probability density functions (e.g., beta, Johnson's SB, gamma or lognormal) are potentially worth examining to describe the structure of a Eucalyptus hybrid clone in this region. Goodwin (2020) suggested several variants of the Weibull distribution. While some of those models produced better estimates of diameter distributions, Goodwin's recommendation for plantations was to use the 3-P Weibull distribution, as done in this study.
Using simulated development of the MOM model predictions, the peakedness of the distribution was reduced for both clones with increasing stand age. The diameter distribution of clone A tended to become more positively skewed and the variation increased with increasing stand age ( Figure 5), partly because of mortality in the lower tree strata, and thinnings from below, which remove the suppressed trees or crown thinning (thinning from above), which remove dominant and co-dominant trees (van Laar & Akca 2007). The number of trees decreased in lower diameter classes and increased in upper classes, shifting curves to the right and increasing the flattening degree with increasing age. In clone B, the shape curves for all ages were almost similar with no significant increases in variation. These findings were relatively logical because the mortality for this clone was very low, so the diameter growth tends to be relatively uniform.

Conclusions
In this study, the moment method was found to give the lowest mean error-index and KS statistic, followed by the hybrid and percentile methods. Although all three methods had difficulty in describing multimodal diameter distributions, the moment method tended to be more robust with tails on both sides of the distribution and exhibited slightly greater flexibility in describing the larger variance than the two other methods. Overall, the Weibull approach appeared relatively robust in estimating diameter distributions of Eucalyptus hybrid clone plantation in Indonesia yet some refinements may be necessary to characterise more complex distributions and longer rotations. Simulations are based on prediction standlevel variables (SD, BA, HT, RD and DQ) using site class = 26 and initial stand density 1667 trees ha -1 .