Evaluating basic density calibrations based on NIR spectra recorded on the three wood faces and subject to different mathematical treatments

Background: Near infrared (NIR) spectroscopy has been successfully applied to estimate the chemical, physical and mechanical properties of various biological materials, including wood. This study aimed to evaluate basic density calibrations based on NIR spectra collected from three wood faces and subject to different mathematical treatments. Methods: Diffuse reflectance NIR spectra were recorded using an integrating sphere on the transverse, radial and tangential surfaces of 278 wood specimens of Eucalyptus urophylla x Eucalyptus grandis. Basic density of the wood specimens was determined in the laboratory by the immersion method and correlated with NIR spectra by Partial Least Squares regression. Different statistical treatments were then applied to the data, including Standard Normal Variate, Multiplicative Scatter Correction, First and Second Derivatives, Normalization, Autoscale and MeanCenter transformations. Results: The predictive model based on NIR spectra measured on the transverse surface performed the best (R2cv = 0.85 and RMSE = 25.5 kg/m3) while the model developed from the NIR spectra measured on the tangential surface had the poorest performance (R2cv = 0.53 and RMSE = 46.8 kg/m3). The difference in performance between models based on original (untreated) and mathematically-treated spectra was minimal. Conclusions: Multivariate models fitted to NIR spectra were found to be efficient for predicting the basic density of Eucalyptus wood, especially when based on spectra measured on the transversal surface. For this data set, models based on the original spectra and mathematically treated spectra had similar performance. The reported findings show that mathematical transformations are not always able to extract more information from the spectra in the NIR. New Zealand Journal of Forestry Science Amaral et al. New Zealand Journal of Forestry Science (2021) 51:2 https://doi.org/10.33494/nzjfs512021x100x


Introduction
The determination of density and its variation within the tree, both in the radial and longitudinal direction is fundamental for understanding wood quality (Silva et al. 2015). Thus, determining the basic density of wood has become a crucial step in the management routine of forest-based companies.
To assess the quality of wood in forest plantations, forestry companies need rapid and cost-effective techniques, since analysing it using conventional methods can consume time and money, which makes the Keywords: NIR signature; multivariate statistics; wood properties; Eucalyptus timber, hardwood. per sample); being non-invasive; suitable for use in the production line, allowing online or real-time analysis; requiring minimum sample preparation; and uses relatively simple instruments that can be transported over long distances (Muñiz 2012). NIR spectra are correlated with material composition or properties determined by standardised methods using multivariate tools in order to generate a predictive model (Meder et al. 2010). Multivariate regression models developed from NIR spectra have been successfully used to estimate wood density across a range of species (Schimleck et al. 1999;Gindl et al. 2001;Schimleck et al. 2005;Jiang et al. 2006;Jones et al. 2006;Mora et al. 2008;Hein et al. 2009;, Arriel et al. 2019. Distortions and errors in the predictive models developed from NIR spectra can occur, since part of the spectral information may not be correlated with the investigated property. Control over experimental procedures and post-processing of the data is required to remove, reduce or standardise irrelevant information and consequently, improve the quality of the signal in the calibration, remove the imperfections present in the original spectra, without changing the information contained in it (NAES et al. 2002). Therefore, research is required to define the most appropriate way to apply the technique and generate reliable predictions. For example, Hein et al. (2010) demonstrated that the chemical properties of Eucalyptus urophylla wood are better estimated from the spectra measured in milled wood than in whole (unprocessed) wood. Costa et al. (2018) investigated parameter settings of the NIR spectrometer and which wood surface is most suitable for measuring spectra and generating models to estimate wood density in Eucalyptus. According to Sandak et al. (2016), there is still a need to better understand the fundamental issues and the impact that the most commonly utilised methods for pre-processing spectral data have on predictive models for wood properties (and those of other ligno-cellulosic materials). In short, it is not still clear how mathematical transformations affect the spectral variation on different wood surfaces.
Once NIR spectra have been obtained, the aim of this study, therefore, was to evaluate basic density calibrations based on NIR spectra recorded on transverse, radial and tangential wood surfaces and subject to different mathematical treatments. Common mathematical treatments were applied to the NIR spectra collected from Eucalyptus wood samples and Partial Least Squares (PLS) regressions for estimating wood density were developed and compared.

Origin and sample preparation
The specimens used in this study were obtained from a progeny test of Eucalyptus urophylla x Eucalyptus grandis located in Minas Gerais State in Brazil. Trees were felled at 6 and 6.5 years of age. Central boards produced from 10 trees were air-dried and clapboards were cut so that the radial and tangential planes were well aligned with the surface of each wood piece. A total of 278 wood specimens were produced with nominal size of 50 mm x 25 mm x 25 mm (L x R x T) in length, width and thickness, respectively. Only defect-free specimens (without cracks or knots) were considered for NIR spectroscopic analysis.
NIR spectra acquisition and basic density determination NIR spectra were recorded from 12,500 cm -1 to 3,600 cm -1 with a spectral resolution of 8 cm -1 in diffuse reflection mode using a Fourier transform NIR spectrometer (Model Vector 22/N, MPA, BrukerOptik GmbH, Ettlingen, Germany). An integrating sphere was used to obtain the NIR spectra used in this study. The integration sphere is a lead sulphide detection system, which receives the incident ray after reflection in the sample. NIR spectra were recorded on the radial, tangential and transverse surfaces of each wood specimen ( Figure 1).
NIR spectra readings were performed in an acclimatised room (temperature of 20°C and a relative humidity of 65%). Under these conditions, the moisture content of wood specimens stabilised at 12%. After spectra acquisition, the basic density of the wood specimens was determined as the ratio between the mass of the oven-dried specimens and their saturated volume (measured by the immersion method) according to standard NBR 11941 (NBR 2003).

Multivariate statistics
Partial Least Squares (PLS-R) regressions were developed to describe the relationship between basic density of wood and NIR spectra for each specimen surface using the Unscrambler software (Camo AS, Norway, v.9.7). For calibration and validations, only the spectral range from 9,000 cm -1 to 4,000 cm -1 was considered as indicated by Costa et al. (2018). The number of latent variables used in these regressions was automatically suggested by FIGURE 1: Transverse, radial and tangential surface the wood the software. In order to suppress part of the noise and improve the signal quality, the following pre-treatments were applied: Standard Normal Variation (SNV), Multiplicative Scatter Correction (MSC), Savitzky Golay Derivatives (13-point filter and first and second-order polynomials -1D or 2D), Normalisation (N), Autoscale (AS) and Centre and Scale (CS). Anomalous samples were detected from studentised residues and leverage plot and excluded from the models. SNV and the first derivative are two pre-treatments commonly used to remove distortions and errors in NIR spectra. The SNV method is applied to every spectrum individually. The average and standard deviation of all the data points for that spectrum is calculated. The average value is subtracted from the absorbance for every data point and the result is divided by the standard deviation (Reis et al. 2013). The first derivative is widely used in original spectra obtained from wood and consists of better defining overlapping peaks in the same region and making the baseline correction in wood spectra as a result of the particle morphology (Costa et al. 2018).
PLS-R models were evaluated by cross-validation. The data were divided into six subsets, each containing 46 or 47 specimens. Preliminary models were developed using data from five of the subsets and validated using the remaining subset that was not used to develop the model. Thus, each preliminary model was calibrated with 232 samples and validated using 46 samples. In each preliminary model, the samples were selected at random. This procedure was repeated six times, so that all subsets were used for validation. The final model for each wood surface (and for each mathematical treatment) had its regression coefficients calculated from the average of the six preliminary models.
The ranking of models was based on the following criteria: (1) coefficient of determination of the model in the cross validation (R²cv); (2) standard cross-validation error (RMSEcv); (3) number of latent variables (VL) used in the calibration and (4) ratio performance to deviation (RPD). The RMSEcv measures the efficiency of the calibration model in predicting the property of interest in a batch of unknown samples. The RPD was first used by Williams and Sobering (1993) and is conceptualised as the relationship between the standard error of the measured and predicted values. According to Williams and Sobering (1993) calibrations with RPD between 2 and 3 are classified as "sufficient for approximate predictions" and RPD between 3 and 5 are considered "satisfactory for prediction".

Effect of Wood surfaces on NIR model performance
The mean basic density of wood specimens was 462 kg/m 3 with a standard deviation of 68 kg/m 3 . The basic density was higher (494 kg/m 3 ) in the wood specimens from 6.5-year-old trees than from the specimens taken from 6-year-old trees (416 kg/m 3 ). The goodness of fitness statistics associated with the PLS-R models for estimating the wood density from untreated NIR spectra collected from different surfaces of the specimen are given in Table 1.
The best model for predicting wood density was developed using untreated NIR spectra recorded on the transverse wood surface. This model (Model 1) had R²c = 0.86 and RMSEc = 24.3 kg/m 3 in calibration (Table 1). Cross-validation using 6 subsets of 46-47 wood specimens and yielded an R²cv of 0.85 and RMSEcv of 25.5 kg/m 3 . Model 2 developed from spectra collected from the radial surface had an R²cv = 0.70 and RMSEcv= 37.3 kg/m 3 (Table 1). Both these models are considered satisfactory. On the other hand, model 3, which is based on NIR spectra recorded on the tangential surface, had the poorest performance (R²cv = 0.53 and RMSEcv =46.8 kg/m 3 ). The relationship between wood density values obtained from the gravimetric method and those estimated from NIR-based models 1, 2 and 3 is shown in Figure 2.

Effect of mathematical treatment on NIR models
A comparison of the mean untreated spectra recorded on transverse, radial and tangential wood surfaces with those treated using the first derivative and standard normal variate methods are presented in Figure 3. These two mathematical treatments enhanced the quality of the signal and improved the goodness of fit statistics associated with the PLS-R models ( Table 2).
The mathematical treatment that resulted in the best fit statistics was the second derivative (Model 7) which yielded an R²c of 0.90 and RMSEc of 21.5 kg/m 3 . However, for cross-validations, the best fit statistics were obtained for models developed from NIR spectra treated using the first derivative (model 6) and standard normal variate methods (SNV, model 4) which yielded the higher R²cv (0.86) and lower RMSEcv (25.4 kg/m 3  R²c -coefficient of determination of the calibration; RMSEc -root mean standard calibration error; R²cv -coefficient of determination of the cross validation; RMSEcv -root mean standard error of cross-validation; RPD -performance ratio of standard deviation; VL -latent variable.

NIR models for wood density
Numerous studies have been undertaken with the objective of developing NIR spectroscopic models for estimating wood density (Tsuchikawa & Schwanninger 2013, Tsuchikawa & Kobori 2015. In Eucalyptus, Viana et al. (2010) studied six different clones at 6 years of age and found that models fitted to NIR spectra were efficient at predicting basic density, chemical and anatomical properties. The goodness of statistics obtained for the predictive models for wood density reported in the present study (R² = 0.53 to 0.85 and RPD = 1.46 to 2.68) are similar to those from other studies that have used NIR spectroscopy to estimate wood density. For example, the models developed by Schimleck et al. (1999) to predict the basic density of Eucalyptus globulus wood had R² values between 0.62 and 0.80. In Larix decidua Mill, Gindl et al. (2001) developed models based on NIR spectra that had R² values of 0.98-0.99 in calibrations and 0.95-0.97 in cross-validations. Jones et al. (2006) evaluated the basic density of Pinus taeda L. Wood samples taken from trees ranging in age from 21 to 26 years across three different regions of Georgia, USA. Their models to predict the basic density from untreated spectra had an R² of 0.90 and RPD of 2.28 using six latent variables.

Anisotropic effect on NIR-based models
In the present study, the most robust models were developed using NIR spectra recorded on the transverse and radial surfaces (Models 1 and 2 of Table 1) while FIGURE 2: Wood basic density values determined by immersion method and estimated by models based on NIR spectra recorded on transverse (model 1), radial (model 2) and tangential (model 3) wood surfaces.
FIGURE 3: Averaged untreated NIR spectra (a); first derivative NIR spectra (b); and NIR spectra after standard normal variate (c); recorded on transverse, radial and tangential wood surfaces.
those developed using spectra collected on the tangential surface had poorer performance (Model 3, Table 1). Similar results were obtained in other studies carried out with across a range of different species. For example, Jiang et al. (2006) also compared the accuracy of estimation of basic density of Chinese fir (Cunninghamia lanceolata (Lamb.) Hook.) wood from NIR spectra in the tangential, radial, and transverse surfaces. They reported that the best predictive model was generated from the transverse surface. Hein et al. (2009) evaluated the robustness of models for predicting wood density in 14-year-old Eucalyptus urophylla using spectra taken from the three wood surfaces. They concluded that the best PLS-R models for wood density predictions were derived from radial surface NIR spectra and that models developed from tangential surface spectra had the poorest performance. Schimleck et al. (2005) compared models to predict wood properties based on radial and transverse faces of strips from Pinus taeda. They found that differences between the two sets of calibrations were small, indicating that either face could be used for NIR analysis. These differences in performance of models developed using spectra obtained from transverse, radial or tangential surfaces can be explained by the variation in the wood anatomical structure in the different planes (Costa et al. 2018). Variation in the exposition of the anatomical features in the different planes affects the absorbance and reflectance of NIR radiation. NIR spectra obtained from radial and transverse surfaces represent wood formed during a period ranging from several months through to multiple years, whereas the NIR spectra taken from the tangential surface represents wood produced over a short period of time. Therefore, they are less representative of the range of variation in wood properties of the entire specimen. Moreover, the NIR spectra obtained in this study represent only a few millimetres of material and were used to predict the wood density of specimens with dimensions of 50 x 25 x 25 mm.
In this study, the effect of applying mathematical treatments to the NIR spectra was negligible. According to Sandak et al. (2016), the goal of the signal (spectra) pre-processing is to eliminate or minimise variability within spectra that is not related to the investigated property of interest. Our study showed that there were no differences in goodness-of-fit statistics for models based on original (untreated) and mathematically treated spectra. Therefore, mathematical transformations are not always able to extract more information from the spectra in the NIR.

Conclusions
Overall, the key finding from this study was that NIR spectroscopy in conjunction with multivariate analysis could generate efficient models to predict density in Eucalyptus wood. Models for estimating wood density based on NIR spectra recorded on transverse or radial wood surfaces had better predictive performance than those developed from NIR spectra collected from tangential wood surfaces.
Mathematical transformations that have improved the performance of models in previous studies did not improve the fit statistics for the models developed in the present study. Despite this, we recommend that these mathematical transformations be applied and tested in future studies as there may be situations where they can significantly improve model performance. Treat -treatment; R²c -coefficient of determination of the calibration; RMSEc -root mean standard calibration error; R²cv -coefficient of determination of the cross validation; RMSEcv -root mean standard error of cross-validation; RPD -performance ratio of standard deviation; VL -latent variable.