Consensual Regression of Lasso-Sparse PLS models for Near-Infrared Spectra of Food

Yuan, Lei-Ming; Yang, Xiaofeng; Fu, Xueping; Yang, Jiao; Chen, Xi; Huang, Guangzao; Chen, Xiaojing; Li, Limin; Shi, Wen

doi:10.3390/agriculture12111804

Open AccessArticle

Consensual Regression of Lasso-Sparse PLS models for Near-Infrared Spectra of Food

¹

College of Electrical and Electronic Engineering, Wenzhou University, Wenzhou 325035, China

²

Xuetian Salt Industry Group Co., Ltd., Changsha 410004, China

^*

Author to whom correspondence should be addressed.

Agriculture 2022, 12(11), 1804; https://0-doi-org.brum.beds.ac.uk/10.3390/agriculture12111804

Submission received: 18 September 2022 / Revised: 21 October 2022 / Accepted: 25 October 2022 / Published: 29 October 2022

(This article belongs to the Special Issue Application of Spectroscopy and Sensor Technology in Agricultural Products)

Download

Browse Figures

Versions Notes

Abstract

:

In some cases, near-infrared spectra (NIRS) make the prediction of quantitative models unreliable, and the choice of a suitable number of latent variables (LVs) for partial least square (PLS) is difficult. In this case, a strategy of fusing member models with important information is gradually becoming valued in recent research. In this work, a series of PLS regression models were developed with an increasing number of LVs as member models. Then, the least absolute shrinkage and selection operator (Lasso) was employed as the model’s selection access to sparse uninformative ones among these PLS member models. Deviation weighted fusion (DW-F), partial least squares regression coefficient fusion (PLS-F), and ridge regression coefficient fusion (RR-F) were comparatively used further to fuse the above sparsed member models, respectively. Three spectral datasets, including six attributes in NIR data of corn, apple, and marzipan, respectively, were applied in order to validate the feasibility of this fusion algorithm. Six fusion models of the above attributes performed better than the general optimal PLS model, with a noticeable enhancement of root mean errors squared of prediction (RMSEP) arriving at its highest at 80%. It also reduced more than half of the spectral bands; the DW-F especially showed its excellent fusing capacity and obtained the best performance. Results show that the preferred strategy of DW-F model combined with Lasso selection can make full use of spectral information, and significantly improve the prediction accuracy of fusion models.

Keywords:

near-infrared spectra (NIRS); quantitative analysis; deviation weighted fusion (DW-F); partial least squares; least absolute shrinkage and selection operator (Lasso)

1. Introduction

Visible-Near Infrared Spectroscopy (Vis-NIRS) (400~2526 nm) is used widely as an analytical tool in agriculture—food safety and nutrition, pharmaceutical composition and activity, environmental contamination and monitoring, petrochemical, and bio-polymer fields [1,2,3,4,5,6]. Vis-NIRS has many advantages over the traditional methods, such as speed, non-destructiveness, simple sample preparation, simplicity of operators, short training, low cost, etc. [1,2,3,4,5,6,7,8,9]. Absorption peaks recorded in the spectra profile represent the vibrational information of hydrogen-containing groups, and tiny differences in peaks’ intensity reflect the variation in contents of the corresponding molecules or materials [2,8,10]. Quantitative or qualitative relationships can be modeled between peak intensity and attribute by chemo-metrics methods, instead of being judged by the naked eye [1,2,4,11].

However, the visible near-infrared spectral signal is mixed with a lot of hydrogen-containing group X–H inorganic molecules (where X can be C, O, N, S, etc.). It overlaps with many absorptions of each level of frequency multiplication and the combined frequency of vibrations [10,11]. To some degree, this leads to weakened signal intensity, ambiguous spectra, and severe overlaps, and will affect the accuracy of the quantitative relationship, thus hindering the use of the NIRS method to solve practical problems. Many multivariate analytical methods, including multiple linear regression (MLR), principal component regression (PCR), partial least square (PLS), artificial neural network (ANN), and so on, were proposed to calibrate the regression model. Most application cases show that the PLS algorithm is primarily suitable for building a quantitative model for the NIR dataset [2,8,12,13]. PLS finds the largest directions of variance in the spectral matrix, which explains variation in the property to be modeled, and forms latent variables (LVs) by projecting the spectra onto those directions (i.e., the score matrix) for the regression analysis.

A critical issue with PLS is that in the calculation of LVs, relying on its covariance with the property and with other variables, each of the original and the measured variables will contribute to each LVs. According to the generalized principle of the minimum root mean square error of cross-validation (RMSECV), in developing the PLS regression model, an appropriate LVs should be determined to optimize the predictive performance of the PLS model. However, the minimum RMSECV is influenced by many factors, and in practical applications, the number of LVs is suggested to be smaller than the optimized to avoid the uncertainty deviation (i.e., over-fitting) at the prediction stage [14]. As a result, it is difficult to determine the best number of LVs for the PLS model. Additionally, most spectral applications are based on a single model, which, generally, could achieve a relatively satisfactory performance [15]. However, in some particular solutions, considering that the small sample set is characterized by the widespread occurrence of high-dimensional data, the diversity of sample sources, and the difference in measurement conditions, these will make it challenging for the single model to achieve satisfactory results [1,13,16]. As a single model can only deal with a specific pattern in part of the spectral data, the near-infrared spectrum has resonant overlap, and the practical modes are complex, so it is challenging for the general single modeling method to integrate all patterns into a stable model [2,13,17]. Therefore, this may lead to missing some useful information, and may also lower the prediction accuracy of the regression model, especially the stability of the calibration model [18,19,20].

In recent years, the fusion modeling strategy has been widely studied and applied in many fields to reduce the shortcomings of the simplex modeling method [13,19,21,22]. In this work, the fusing strategy is proposed based on the decision level, which is the outputs of member models, and then used as the input of the fusion model. Since the score vectors (i.e., derivative information) extracted from PLS are orthogonal to each other, the mapping space formed by the different numbers of the top score vectors (i.e., latent variables) is independent. Thus, the positions of samples in these projection spaces are also different. In order to make full use of the spectral original and the derivative information, PLS regression models were developed with an increasing number of LVs, and were taken as the potential member models. Least absolute shrinkage and selection operator (Lasso) is a regularization technique for estimating generalized linear models, and is usually performed as an alternative to stepwise regression and dimensionality reduction techniques [23,24]. In this work, Lasso is used to sparse a series of PLS member models and select the important ones. Based on the idea of multi-models modeling, three fusion regression algorithms based on the outputs of spectral calibration models are proposed. The first method is PLS regression coefficient fusion, a consensual model that intelligently optimizes latent variables based on partial least squares algorithm, and then fuses regression coefficients. The second method is ridge regression coefficient fusion, adding a small perturbation λ·I (where λ is the ridge parameter and I is the identity matrix) to the least squares estimate. The last method is a deviation weightings fusion to integrate into a consensual model by using different weightings for different potential member models. The prediction performances of these consensual models were compared to those of the optimal PLS model, which was developed on the Unscrambler software (version 9.7, CAMO, Oslo, Norway).

2. Theory and Algorithm

2.1. Theory of Lasso

Lasso is a matrix-sparse method for compressing information from the high-dimensional dataset. It was proposed by Tibshirani [24,25] to reduce the dimensionality of inputs based on a regularization technique, the function of which is to minimize the residual sum of squares subject to the sum of the absolute value of coefficients that were less than a constant. The principle of Lasso adds an L1 regular term to a linear model, as Equation (1) shows. Here, the input item for Lasso regression was replaced with the values (i.e.,

{\overset{\land}{y}}_{i j}

) predicted by member models. The solution of coefficients (

{\overset{\land}{β}}_{l a s s o}

) of the Lasso model can be solved, and some are zero, indicating there is hardly little influence on the linear model, so as to achieve the purpose of screening the highly influenced variables [20].

{\overset{\land}{β}}_{l a s s o} = \arg \min_{β} {\sum_{i = 1}^{N} {(y_{i} - β_{0} - \sum_{j = 1}^{P} {\overset{\land}{y}}_{i j} β)}^{2} + λ \sum_{j = 1}^{P} | β_{j} |}

(1)

where N is the number of observations;

{\overset{\land}{y}}_{i j}

is the output of the member model; y_i is the response at observation i; λ is a nonnegative regularization parameter corresponding to one value of Lambda. The parameters β₀ and β are scalar and p-vector, respectively. With re-weighting attributes for the potential indicators, Lasso can be derived from its utilization to fuse member models into a fusion model. It can weigh the importance of member models and select the items with large weightings for the final fusion model. As a result, Lasso can eliminate some irrelevant variables which have little influence on the final model, and simplify the complexity of the model’s structure.

2.2. Deviation Weight Fusion (DW-F)

The principle of DW-F algorithm is that a series of member models, which have been previously constructed by different samples or features, are designed to integrate the output of these constructed member models into a consensual model by assigning the optimized coefficients. The fusion strategy can be explained as follows: assuming that the x_k is a spectral matrix, and f_k(x_k) is the k-th PLS model to predict the sample’s attribute concentration y. The fusion model F(x), as Equation (2) shows, is a linear combination of a series of member models

f_{1} (x)

,

f_{2} (x)

, …,

f_{k} (x)

by the optimized weightings w_k [15].

F (x) = \sum_{k = 1}^{N} w_{k} f_{k} (x_{k})

(2)

E (x) = y - F (x) = \sum_{k = 1}^{N} w_{k} (y - f_{k} (x_{k})) = \sum_{k = 1}^{N} w_{k} \cdot e_{k}

(3)

{\begin{cases} E^{2} = \sum_{k = 1}^{N} w_{k}^{2} e_{k}^{2} + 2 \sum_{k = 1}^{N} \sum_{i > k}^{N} w_{k} w_{i} e_{k} e_{i} = \sum_{k = 1}^{N} w_{k}^{2} e_{k}^{2} + 2 \sum_{k = 1}^{N} \sum_{i > k}^{N} w_{k} w_{i} r_{k i} e_{k}^{2} \\ A R G \min (\sum_{k = 1}^{N} w_{k}^{2} e_{k}^{2} + 2 \sum_{k = 1}^{N} \sum_{i > k}^{N} w_{k} w_{i} r_{k i} e_{k}^{2}) \\ s . t . {\begin{cases} 0 \leq w_{k} \leq 1 \\ \sum_{k = 1}^{N} w_{k} = 1, k \in [1, n] \end{cases} \end{cases}

(4)

where F(x) is the fusion model, f_k(x_k) is the k-th member model, w_k is the weighting of the k-th member model, e_k is the bias vector of the predicted attributes, and r_ki is the correlation coefficient between the vector of e_k and e_i. It is assumed that in the error formula (Equation (3)) the deviation error vector (e_k), which was predicted by the k-th member model, obeys the normal distribution (

N (0, σ^{2})

) and involves the random factors to some degree. Thus, the deviation error vectors (e₁, e₂, …, e_n) are independent of each other, and their combination forms the deviation matrix (E), in which each deviation error approximately obeys the normal distribution. In addition, the deviation error vector (e_i) and vector (e_k) are weakly correlated, and their correlation (r_ki) trends to 0. It can be inferred that the part

\sum w_{i} \cdot w_{k} \cdot r_{i k} \cdot e_{i} \cdot e_{k}

in Equation (4) is infinitely close to 0, and should be ignored. Next, the weightings (w_k) for each member model can be optimized by the surplus

A R G \min (\sum_{k = 1}^{N} w_{k}^{2} e_{k}^{2})

[17].

2.3. Estimation of Model’s Performance

RMSE (root mean squared error), r (correlation coefficient), and bias are the commonly used parameters in regression models. A well-calibrated model usually has a small RMSE and bias, and a large R (left-tending toward 1). The derived parameters from RMSE, RMSECV in the cross-validation stage, and RMSEP in the prediction stage were proposed to intuitively observe the prediction of the developed models. Generally, a qualified calibration model should have small RMSECV and RMSEP, but a slight difference exists between them to avoid under-fit or over-fit. In fusion models, RMSECV and Rcv (correlation coefficient of cross-validation) are calculated from the actual value y and the value

\overset{\land}{y}

predicted by fusion model F(x) at the cross-validation stage of each member model. Similarly, RMSEP and Rp (correlation coefficient of prediction) are calculated in this way at the prediction stage.

2.4. Framework of Proposed Fusion Strategy

The aim of the proposed fusing method, PLS-Lasso-DW-F, is to select the valuable member models and improve the performance of the final calibration model. By comparison, another two fusion methods, partial least squares regression coefficient fusion (PLS-F) and ridge regression coefficient fusion (RR-F), were systematically used to construct fusion models, respectively. The main steps for the fusing model, as Figure 1 shows, are the following:

(1): Samples with spectral and attributes data are randomly divided into calibration and prediction subsets with a ratio of 2:1.
(2): In the calibration dataset, a series of PLS models are continuously developed between spectra and attributes with an increasing number of latent variables (LVs), respectively. The max number of LVs is N, and set to the max of 20 in this work.
(3): The Lasso screen method is used to select the informative member models, and N PLS models are firstly taken as the potential member models. Finally, K PLS models are screened out as subsequent member models.
(4): Three fusing strategies are employed, respectively, to fuse these K PLS member models, and corresponding fusion models are obtained.
(5): Parameters in these fusion models are evaluated at the stage of cross-validation and prediction in order to compare their performances.

2.5. Software

In this work, the matrix calculations were performed using Matlab software (R2016b, Math Works Inc., Natick, MA, USA). The PLS algorithm in the iToolbox was used to develop a series of PLS member models [26]. The fusion DW-F codes were programmed, and others, such as Lasso and ridge algorithms, were called from the Matlab function database. In addition, as a comparison, the optimal PLS model was developed using the Unscrambler software (version 9.7, CAMO, Oslo, Norway), with adaptive selection of the corresponding number of latent variables in the cross-validation stage.

3. Experimental Data

Two public datasets and one experimental spectral dataset are employed in order to verify the validity and feasibility of the proposed fusion methods as listed in Table 1, which are:

(1) Corn data set. These data were distributed originally by Mike Blackburn at Cargill, and are freely downloaded from the eigenvector website https://eigenvector.com/resources/data-sets/#corn-sec (access on 20 December 2021). It contains 80 corn samples, which were measured on different NIR spectrometers. The spectral dataset collected by the ‘m5’ FT-IR spectrometer was employed in this work. Its spectral range is 1100~2498 nm, with an interval of 2 nm, resulting in 700 bands, as Figure 2a shows. Four attributes, namely moisture, oil, protein, and starch contents, are all used as the predicted responses.

(2) Apple data set. It contains the spectra and soluble solids content (SSC) of 134 ‘Red Fuji’ apples collected from Shandong Province, China. This spectral dataset was firstly acquired in Yuan et al. 2016 [27] to validate the feasibility of a self-developed portable fruit analyzer. The spectrum is in the range of 550~985 nm with 2040 wavelengths (Figure 2b). This spectra data has been cropped at the ends of 28 wavelengths in order to reduce the noise caused by the manufacturing photo-electric sensor chip.

(3) Marzipan data set. Generally, this is not an ideal dataset for calibrating, due to its small capacity database. It consists of only 32 almond protein samples, and is made from nine different classes [28]. The compression package can be found at www.models.life.ku.dk/Marzipan (access on 20 December 2021). The spectral curve of samples ranged from 450~2448 nm with a resolution of 2 nm, as Figure 2c shows. The moisture content of the marzipan was considered for numerical calculation.

4. Results and Discussions

4.1. Performance of General PLS

The PLS regression algorithm was applied to calibrate models between spectra and the contents of moisture, oil, protein, and starch in corn data, SSC in apple data, and the moisture content in marzipan data, respectively. At first, a general PLS model was built with cross-validation parameters of 5 k-folds segments and ‘syst123’ in the calibration set, and the max number of LVs is set to 20. A series of PLS models with an increasing number of LVs, from 1 to the max of 20, are continuously developed as member models for each sample’s attribute. Figure 3 shows the RMSECV and RMSEP in general PLS models for the calibration set, as well as the prediction set by predicting moisture, oil, protein, and starch content of corn samples with different numbers of latent variables. With increasing numbers of latent variables, values of RMSE in PLS models for corn’s four attributes have a consistent decline tendency. High values fall rapidly at first, and then gradually decline to a relatively stable region, with a slight fluctuation, until reaching a minimum, indicating that the higher the number of latent variables involved, the higher the predictive accuracy of the obtained models.

Figure 4 represents the tendency of RMSE in PLS models, with different numbers of LVs for SSC in apple and moisture content in marzipan, respectively. The movement of RMSE is similar to those calculations of corn data in Figure 3, and the overall trends are downward. Usually, the optimal PLS model is determined with suitable latent variables, according to the rule of the minimum RMSECV. However, in this work, some models require the maximum LVs to achieve the best performance, which is likely to lead to the potential model’s over-fit or under-fit. Obviously, this may not be a reliable way to select the best number of LVs just according to the minimum RMSECV, and it is difficult to choose a suitable number of LVs for some practical applications. Thus, a fusion strategy wherein the number of LVs is not considered was necessarily proposed in order to fuse a series of PLS models.

4.2. Sparse Member Models by Lasso

The Lasso linear regression model was developed between the outputs of the above PLS member models and the corresponding attributes. In Lasso regression, the tuning parameter λ controls the degree of coefficient’s restraint and lets λ be large enough to make a coefficient zero. In order to avoid the interference of artificial preferred settings, the average and standard error of sum squared residual (SSR) in Lasso regression models were cross-validated 10-fold in the calibration set. In this work, the tuning parameter λ was adaptively obtained by obeying the rule of a minimum of RMSE in the developed Lasso model.

Figure 5 depicts the assigned regression coefficients of PLS member models for four components in corn spectral data, using the Lasso screening method. With close observation, some coefficients are exactly zeros or trend to zeros, such as these 16 member models from 1st to 14th, 17th, and 18th for oil of corn in Figure 5b, and the top 18 member models for starch of corn in Figure 5d. More than 80% of member models are compressed, and their coefficients are assigned to 0. Commonly, the smaller the absolute coefficients, the less the importance of the member model. Coefficients of spectral variables in PLS member models are varied from numbers of latent variables, and it can be concluded that these rejected PLS member models with small coefficients are less important. This could be explained by their involved relative fewer latent variables and higher RMSEP, and these last PLS member models usually obtain more significant coefficients for the sake of their better predictive performance and more detailed information. The developed model can deal with processing the nonlinearity in the high-dimensional spectral data, and Lasso can select some influential variables to reduce the computational complexity and improve the interpretability of the developed model. As a result, Lasso performs the selection of PLS member models and regression analysis simultaneously.

A similar compression effect by the Lasso method is shown in Figure 6, described for apple and marzipan. For predicting the SSC of apples (Figure 6a), coefficients of the top 11 PLS member models, except the 2nd, are compressed to be very small. Among them, eight coefficients are assigned to 0, indicating that they have almost no influence on the Lasso regression model. In addition, only the 16th PLS model (in Figure 4a) obtains a good performance of prediction, and is comparable to that of the optimal or its neighborhood models. Obviously, from the coefficients of a series of PLS member models, even if the input has a similar good predictive performance, it is not necessarily selected by the Lasso method. For the moisture content of marzipan (Figure 6b), Lasso highlights the last two member models with larger absolute values of coefficients. In contrast, the last several PLS member models performed well and closed to each, but they were not assigned coefficients.

By compressing the number of inputs by the Lasso method, the regression coefficients of some inputs are compressed to 0, but some are still very small values close to 0. By analyzing the spectral data, it was found that these inputs with small coefficients are unlikely to produce a significant effect on the final regression model. Thus, in order to reduce the number of inputs and the amount of numerical calculations, other fusion or screen methods should be employed further.

4.3. Fusion of Member Models and Comparison

After the above sparse of 20 PLS member models by the Lasso method, coefficients of several PLS member models were compressed to 0, indicating that some had made hardly any contributions to the Lasso regression model and needed to be eliminated. The number of the reserved PLS member models varied from the minimum of 2 to the maximum of 15, with attributes of sample datasets. Meanwhile, it was found that some of the reserved PLS member models were discrete, but some were adjacent to each other. It is known from experience that the outputs of adjacent member models are highly related, to some degree. Thus, in order to compress the potential redundant ones among these reserved member models, fusion algorithms should be used further to fuse above the sparsed PLS member models. These fusions are PLS-F, RR-F, and DW-F, respectively. Parameters of RMSECV and Rcv in the cross-validation stage, and RMSEP and Rp in the prediction stage, are counted in order to estimate the performance of the developed fusion model.

Table 2 lists the evaluation parameters (RMSECV, Rcv, RMSEP, and Rp) of each component predicted by fusion models and the optimized PLS models, where they were developed with the optimal number of LVs in Unscrambler software (V9.7, CAMO, Oslo, Norway). The serial numbers of samples used in the calibration and the prediction set for these two software were consistent, in order to avoid the difference in the overall spectral dataset and to compare the performances of the developed models. In the corn data, even if they were from the same spectral data, the number of the reserved PLS member models varied from samples’ attributes by the Lasso sparse. A total of 15 PLS member models were reserved for the moisture, 12 member models for the protein content, and 7 for the starch content, but only 2 member models for the oil content. For the SSC of apple and the moisture content of marzipan, the retained 9 and 6 PLS member models are acceptable from the lasso’s compression.

4.4. Comparison and Discussions

A systematic comparison was made between the optimal PLS model and the fusion model in Table 2. The varying degrees of enhancements promoted by fusion models in Figure 7 could be observed by the height of the bar group plot. The x-axis represents six components, namely moisture, oil, protein, and starch in corn data, SSC in apple data, and moisture content in marzipan data. The y-axis represents the percentage of improvements for RMSEP predicted by fusion models compared to the single optimal PLS model. It can be seen that the ability to predict the qualities of samples is enhanced at different levels by fusion models, and it obtains better performances than that of a single optimal PLS model. In Table 2, it is observed that the Rcv and Rp are almost higher than 0.9 between the spectra and attributes, and RMSECV and RMSEP of moisture content in corn data can be significantly reduced from 0.0371 and 0.0375 in the optimized PLS model to 0.085 and 0.0076 in the PLS-F fusion model, with improvements of 77.1% and 79.7%. For the oil content in corn data, RMSECV and RMSEP also reduced from 0.0675 and 0.0760 in the single PLS model to 0.0334 and 0.0404 in the PLS-F model, with enhancements of 50.5% and 46.8%. For the starch content, the PLS-F model also promoted RMSEP by 15.9% in the prediction stage. For the protein content, it promoted only 1.2% for the RMSECV, but 13.4% for RMSEP at the same time. Moreover, the PLS-F fusion model obtained the RMSECV and RMSEP of SSC in apple data, which also decreased from 0.5153 and 0.5651 in the single optimal PLS model to 0.4908 and 0.4863, by 4.75% and 13.94%, respectively.

As for the small sample of marzipan data, the PLS-F fusion model also enhanced its performance by reducing the RMSECV by 80.67% in the calibration set. However, in the prediction dataset, the performance became slightly worse with the RMSEP, from 1.1169 to 1.1340 by improvements of −1.53%, and this may be explained by the model’s overfitting or unevenness of the spectral dataset, because the RMSEP is larger than RMSECV in the PLS-F model. Nevertheless, this fusion model still performed better than in the reported literature [27,28], which firstly introduced the apple and marzipan spectral data. From the above analysis, the PLS-F algorithm is a comparably effective method for improving the accuracy and robustness of quantitative data analysis.

The prediction performances of RR-F models were also constructed as the framework mentioned above in Section 2.4. After finishing the modeling stage, it was found that the RMSEP in the RR-F spectral fusion model was obviously reduced to a low level, in comparison to that of the optimal PLS model. In the corn data, RMSEP decreased overall to 0.0141, 0.0404, 0.0789, and 0.1697 for the moisture, oil, protein, and starch, and had enhancements of 62.4%, 46.8, 19.2%, and 13.6%, respectively, compared to the single optimal PLS models. For the SSC of apple data and the moisture of marzipan data, the RR-F fusion models also performed well at a comparable improvement that was visible to the naked eye in Figure 7. It can be concluded that the performance of the RR-F model is better than that of the single PLS model on the same data, and it is preferentially selected to fuse member models. In the DW-F models, their performances also operated well and acted better than those of the single PLS models. DW-F has significantly promoted the RMSEP, with enhancements of 82.7%, 51.5%, 11.3%, 14.9%, 17.5%, and 28.8%, compared to the optimal PLS models for the moisture, oil, protein, and starch in corn, SSC in apple, and moisture in marzipan, respectively. Even for the small number of marzipan samples, RMSEP of the spectral model reduced from 1.1169 in the optimal PLS model to 0.7948 in the DW-F model, with a 28.8% reduction, and it performed better than the former two fusion models. As a comparison, they performed close to Poerio’s research [29], where Lasso introduced the sparsity of the interval-based PLS models and optimized the regression models to predict the moisture of corn. From these three fusing models to predicting six samples’ attributes, PLS-F and RR-F performed at its best one time, for predicting the starch of corn and the protein of corn, respectively, while DW-F operated at its best four times for the remaining four attributes, and the other two performed close to their best. At the same time, the numbers of the involved member models in the final fusion models are reduced. It can be simplified as input preparation and numerical computation. Thus, the conclusion can be drawn that the three fusion models all performed well, and the DW-F strategy is preferred.

In terms of the modeling principle, PLS-F and RR-F used the coefficients estimated for a multiple linear regression in the matrix of the outputs of member models, which would be assigned weightings in combinations of the predictors, with a large covariance from the response values. However, for DW-F, it assigned the weightings to member models on the basis of assuming the orthogonal of any two predicted deviation vectors (that is, r_ij = 0), and minimization of the surplus of

\sum_{i = 1}^{n} r_{i}^{2} e_{i}^{2}

By this conduction, some sub-models will be compressed with a slight weighting, or even with weightings of 0, acting as variable selection methods [15]. Generally, the weighting assignment rule is that the smaller the RMSECV of the sub-model, the more prominent the weighting value. When the RMSECV of the sub-model is larger, a smaller weighting value is assigned. This rule can improve the final fusion model predict, but this result may lead to over-fitting the model [30]. Therefore, the improved weighting allocation strategy of DW-F not only considers the value of RMSECV, but also meets the requirement of minimizing the objective function. It considers the variance of each sub-model and the influence of the correlation coefficients between different member models.

5. Conclusions

Fusion of PLS member models combined with the sparsity of Lasso selections is proposed for calibrating the quantitative model without determining the optimal number of latent variables in a spectral dataset. Three fusing algorithms, including PLS-F, RR-F, and DW-F, are comparatively introduced in order to improve the prediction accuracy of spectral models. Three different spectra of corn data, apple data, and marzipan data are employed, respectively, to validate the feasibility of the proposed fusing method. Different numbers of latent variables in the PLS model indicate the difference in the orthogonal space of spectra. A quantitative model was developed to fuse these PLS member models without considering their best latent variables. Statistical results show that almost all fusion models have better performances than the conventional PLS model. By Lasso sparse, nearly half of PLS member models are removed from the fusing strategy. Among these three fusing methods, DW-F optimizes the weightings of linear combination and obtains the best performance, with a Rp of more than 0.95, as well as a RMSEP of 0.0065, 0.0369, 0.0867, 0.1460, 0.4663, and 0.7948, enhancing the improvements by 82.7%, 51.5%, 11.3%, 14.9%, 17.5%, and 28.8% for the moisture, oil, protein, and starch in corn, SSC in apple, and moisture in marzipan, respectively. Moreover, DW-F can further the sparsity of member models and reduce the correlation between these selected member models.

Author Contributions

Conceptualization, L.-M.Y.; data curation, X.Y.; formal analysis, X.F. and X.C. (Xi Chen); funding acquisition, L.-M.Y. and L.L.; project administration, L.-M.Y.; resources, L.-M.Y.; software, G.H.; supervision, X.C. (Xiaojing Chen), L.L. and W.S.; validation, X.F.; visualization, J.Y.; writing—review and editing, L.-M.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been financially supported by the National Natural Science Foundation of China (NO.61705168), and Wenzhou Major Scientific and Technological Innovation Projects of China (ZG2021029).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

These datasets will be available on request. The corn and Marzipan datasets can be downloaded from the website refereeing the citations.

Conflicts of Interest

The authors declare no conflict of interest.

References

Nobari Moghaddam, H.; Tamiji, Z.; Akbari Lakeh, M.; Khoshayand, M.R.; Haji Mahmoodi, M. Multivariate analysis of food fraud: A review of NIR based instruments in tandem with chemometrics. J. Food Compos. Anal. 2022, 107, 104343. [Google Scholar] [CrossRef]
Nagy, M.M.; Wang, S.; Farag, M.A. Quality analysis and authentication of nutraceuticals using near IR (NIR) spectroscopy: A comprehensive review of novel trends and applications. Trends Food Sci. Technol. 2022, 123, 290–309. [Google Scholar] [CrossRef]
Zareef, M.; Chen, Q.; Hassan, M.M.; Arslan, M.; Hashim, M.M.; Ahmad, W.; Kutsanedzie, F.Y.H.; Agyekum, A.A. An Overview on the Applications of Typical Non-linear Algorithms Coupled With NIR Spectroscopy in Food Analysis. Food Eng. Rev. 2020, 12, 173–190. [Google Scholar] [CrossRef]
Walsh, K.B.; McGlone, V.A.; Han, D.H. The uses of near infra-red spectroscopy in postharvest decision support: A review. Postharvest Biol. Technol. 2020, 163, 111139. [Google Scholar] [CrossRef]
Ding, L.; Yuan, L.-m.; Sun, Y.; Zhang, X.; Li, J.; Yan, Z. Rapid Assessment of Exercise State through Athlete’s Urine Using Temperature-Dependent NIRS Technology. J. Anal. Methods Chem. 2020, 2020, 8828213. [Google Scholar] [CrossRef] [PubMed]
Baiano, A. Applications of hyperspectral imaging for quality assessment of liquid based and semi-liquid food products: A review. J. Food Eng. 2017, 214, 10–15. [Google Scholar] [CrossRef]
Lohumi, S.; Lee, S.; Lee, H.; Cho, B.-K. A review of vibrational spectroscopic techniques for the detection of food authenticity and adulteration. Trends Food Sci. Technol. 2015, 46, 85–98. [Google Scholar] [CrossRef]
Nicolai, B.M.; Lotze, E.; Peirs, A.; Scheerlinck, N.; Theron, K.I. Non-destructive measurement of bitter pit in apple fruit using NIR hyperspectral imaging. Postharvest Biol. Technol. 2006, 40, 1–6. [Google Scholar] [CrossRef]
Monnier, G.F. A review of infrared spectroscopy in microarchaeology: Methods, applications, and recent trends. J. Archaeol. Sci. Rep. 2018, 18, 806–823. [Google Scholar] [CrossRef]
Xiaobo, Z.; Jiewen, Z.; Povey, M.J.W.; Holmes, M.; Hanpin, M. Variables selection methods in near-infrared spectroscopy. Anal. Chim. Acta 2010, 667, 14–32. [Google Scholar] [CrossRef]
Pasquini, C. Near infrared spectroscopy: A mature analytical technique with new perspectives—A review. Anal. Chim. Acta 2018, 1026, 8–36. [Google Scholar] [CrossRef] [PubMed]
Yun, Y.-H.; Li, H.-D.; Deng, B.-C.; Cao, D.-S. An overview of variable selection methods in multivariate analysis of near-infrared spectra. TrAC Trends Anal. Chem. 2019, 113, 102–115. [Google Scholar] [CrossRef]
Wang, H.-P.; Chen, P.; Dai, J.-W.; Liu, D.; Li, J.-Y.; Xu, Y.-P.; Chu, X.-L. Recent advances of chemometric calibration methods in modern spectroscopy: Algorithms, strategy, and related issues. TrAC Trends Anal. Chem. 2022, 153, 116648. [Google Scholar] [CrossRef]
Nicolai, B.M.; Beullens, K.; Bobelyn, E.; Peirs, A.; Saeys, W.; Theron, K.I.; Lammertyn, J. Nondestructive measurement of fruit and vegetable quality by means of NIR spectroscopy: A review. Postharvest Biol. Technol. 2007, 46, 99–118. [Google Scholar] [CrossRef]
Yuan, L.-m.; Mao, F.; Huang, G.; Chen, X.; Wu, D.; Li, S.; Zhou, X.; Jiang, Q.; Lin, D.; He, R. Models fused with successive CARS-PLS for measurement of the soluble solids content of Chinese bayberry by vis-NIRS technology. Postharvest Biol. Technol. 2020, 169, 111308. [Google Scholar] [CrossRef]
Beć, K.B.; Grabska, J.; Huck, C.W. In silico NIR spectroscopy—A review. Molecular fingerprint, interpretation of calibration models, understanding of matrix effects and instrumental difference. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2022, 279, 121438. [Google Scholar] [CrossRef]
Liu, K.; Chen, X.; Li, L.; Chen, H.; Ruan, X.; Liu, W. A consensus successive projections algorithm—Multiple linear regression method for analyzing near infrared spectra. Anal. Chim. Acta 2015, 858, 16–23. [Google Scholar] [CrossRef]
Singh, M.; Singh, R.; Ross, A. A comprehensive overview of biometric fusion. Inf. Fusion 2019, 52, 187–205. [Google Scholar] [CrossRef]
Modak, S.K.S.; Jha, V.K. Multibiometric fusion strategy and its applications: A review. Inf. Fusion 2019, 49, 174–204. [Google Scholar] [CrossRef]
Li, Y.; Xiong, Y.; Min, S. Data fusion strategy in quantitative analysis of spectroscopy relevant to olive oil adulteration. Vib. Spectrosc. 2019, 101, 20–27. [Google Scholar] [CrossRef]
Barbosa, C.D.; Baqueta, M.R.; Rodrigues Santos, W.C.; Gomes, D.; Alvarenga, V.O.; Teixeira, P.; Albano, H.; Rosa, C.A.; Valderrama, P.; Lacerda, I.C.A. Data fusion of UPLC data, NIR spectra and physicochemical parameters with chemometrics as an alternative to evaluating kombucha fermentation. LWT 2020, 133, 109875. [Google Scholar] [CrossRef]
Wang, X.; Feng, H.; Chen, T.; Zhao, S.; Zhang, J.; Zhang, X. Gas sensor technologies and mathematical modelling for quality sensing in fruit and vegetable cold chains: A review. Trends Food Sci. Technol. 2021, 110, 483–492. [Google Scholar] [CrossRef]
Ye, P.; Ji, G.; Yuan, L.-M.; Li, L.; Chen, X.; Karimidehcheshmeh, F.; Chen, X.; Huang, G. A Sparse Classification Based on a Linear Regression Method for Spectral Recognition. Appl. Sci. 2019, 9, 2053. [Google Scholar] [CrossRef] [Green Version]
Friedman, J.; Hastie, T.; Tibshirani, R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J. Stat. Softw. 2010, 33, 1–22. [Google Scholar] [CrossRef] [Green Version]
Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. 1996, 58, 267–288. [Google Scholar] [CrossRef]
Norgaard, L.; Saudland, A.; Wagner, J.; Nielsen, J.P.; Munck, L.; Engelsen, S.B. Interval partial least-squares regression (iPLS): A comparative chemometric study with an example from near-infrared spectroscopy. Appl. Spectrosc. 2000, 54, 413–419. [Google Scholar] [CrossRef]
Yuan, L.-m.; Cai, J.-r.; Sun, L.; Han, E.; Ernest, T. Nondestructive Measurement of Soluble Solids Content in Apples by a Portable Fruit Analyzer. Food Anal. Methods 2016, 9, 785–794. [Google Scholar] [CrossRef]
Christensen, J.; Nørgaard, L.; Heimdal, H.; Pedersen, J.G.; Engelsen, S.B. Rapid Spectroscopic Analysis of Marzipan—Comparative Instrumentation. J. Near Infrared Spectrosc. 2004, 12, 63–75. [Google Scholar] [CrossRef]
Poerio, D.V.; Brown, S.D. Stacked interval sparse partial least squares regression analysis. Chemom. Intell. Lab. Syst. 2017, 166, 49–60. [Google Scholar] [CrossRef]
Yuan, L.-M.; Mao, F.; Chen, X.; Li, L.; Huang, G. Non-invasive measurements of ‘Yunhe’ pears by vis-NIRS technology coupled with deviation fusion modeling approach. Postharvest Biol. Technol. 2020, 160, 111067. [Google Scholar] [CrossRef]

Figure 1. The framework of the proposed fusion strategy. Note: N and K stands for the number; PLS: partial least squares; PLS-F: partial least square regression coefficient fusion; RR-F: ridge regression coefficient fusion; DW-F: error weight fusion; Rcv: correlation coefficient of cross-validation; Rp: correlation coefficient of prediction; RMSECV: root mean squared errors of cross-validation; RMSEP: root mean squared errors of prediction.

Figure 2. Near-infrared spectroscopy of (a) corn data, (b) apple data, (c) marzipan data.

Figure 3. RMSE of PLS models with different numbers of LVs for (a) moisture, (b) oil, (c) protein, and (d) starch of corn. Note: RMSECV: root mean squared errors of cross-validation; RMSEP: root mean squared errors of prediction; RMSE: root mean squared errors; LVs: latent variables.

Figure 4. RMSE of different latent variable models for (a) Soluble solids contents (SSC) of apple and (b) moisture of marzipan.

Figure 5. The Beta2 of inputs in the Lasso regression model for (a) moisture, (b) oil, (c) protein, and (d) starch of corn.

Figure 6. The Beta2 of inputs in the Lasso regression model for (a) SSC of apple and (b) moisture of marzipan. SSC: soluble solids contents.

Figure 7. Percentages of improvements for RMSEP of three fusion models compared to the optimal PLS model in three spectral datasets.

Table 1. Introduction of spectral data used in this work.

Spectral Data	Attribute	Number of Calibration/Prediction	Spectral Range/nm	Bands	Range	Mean ± SD	CV
Corn	moisture	53/27	1100~2498	700	9.377~10.993	10.234 ± 0.38	0.0372
	oil				3.088~3.832	3.498 ± 0.177	0.0506
	protein				7.654~9.711	8.668 ± 0.497	0.0575
	starch				62.826~66.472	64.695 ± 0.821	0.0127
Apple [27]	SSC	90/44	550~985	2040	9.75~15.45	12.24 ± 1.141	0.093
Marzipan [28]	moisture	22/10	450~2448	1000	6.8~18.6	13.57 ± 3.664	0.270

Note: SD: standard deviation; CV: coefficient of variation, i.e., the ratio of SD to mean.

Table 2. Evaluation parameters for moisture, oil, protein, and starch of corn, organic matter of soil, and moisture of marzipan by PLS and three proposed fusion algorithms.

Modeling Method	Attribute	Corn				Apple	Marzipan
Modeling Method	Attribute	Moisture	Oil	Protein	Starch	SSC	Moisture
Optimal PLS	Best LVs	6	8	14	15	15	8
	RMSECV	0.0371	0.0675	0.1078	0.2198	0.5153	0.6613
	Rcv	0.968	0.916	0.968	0.954	0.892	0.972
	RMSEP	0.0375	0.0760	0.0977	0.1716	0.5651	1.1169
	Rp	0.978	0.899	0.983	0.974	0.873	0.928
Number after Lasso sparse		15	2	12	7	9	6
PLS-F	RMSECV	0.0085	0.0334	0.1065	0.1718	0.4908	0.1278
	Rcv	0.991	0.965	0.978	0.973	0.903	0.996
	RMSEP	0.0076	0.0404	0.0846	0.1443	0.4863	1.134
	Rp	0.989	0.952	0.992	0.986	0.916	0.916
RR-F	RMSECV	0.0131	0.0357	0.0991	0.1697	0.4865	0.122
	Rcv	0.978	0.959	0.984	0.977	0.918	0.996
	RMSEP	0.0141	0.0404	0.0789	0.1482	0.481	0.9712
	Rp	0.971	0.951	0.989	0.984	0.922	0.938
DW-F	RMSECV	0.008	0.0419	0.1015	0.1711	0.4842	0.2667
	Rcv	0.992	0.946	0.981	0.974	0.918	0.989
	RMSEP	0.0065	0.0369	0.0867	0.146	0.4663	0.7948
	Rp	0.993	0.962	0.988	0.986	0.937	0.957

Note: PLS: partial least square; Rcv: correlation coefficient of cross-validation; Rp: correlation coefficient of prediction; RMSECV: root mean squared errors of cross-validation; RMSEP: root mean squared errors of prediction; PLS-F: partial least square regression coefficient fusion; RR-F: ridge regression coefficient fusion; DW-F: error weight fusion; SSC: soluble solids contents.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yuan, L.-M.; Yang, X.; Fu, X.; Yang, J.; Chen, X.; Huang, G.; Chen, X.; Li, L.; Shi, W. Consensual Regression of Lasso-Sparse PLS models for Near-Infrared Spectra of Food. Agriculture 2022, 12, 1804. https://0-doi-org.brum.beds.ac.uk/10.3390/agriculture12111804

AMA Style

Yuan L-M, Yang X, Fu X, Yang J, Chen X, Huang G, Chen X, Li L, Shi W. Consensual Regression of Lasso-Sparse PLS models for Near-Infrared Spectra of Food. Agriculture. 2022; 12(11):1804. https://0-doi-org.brum.beds.ac.uk/10.3390/agriculture12111804

Chicago/Turabian Style

Yuan, Lei-Ming, Xiaofeng Yang, Xueping Fu, Jiao Yang, Xi Chen, Guangzao Huang, Xiaojing Chen, Limin Li, and Wen Shi. 2022. "Consensual Regression of Lasso-Sparse PLS models for Near-Infrared Spectra of Food" Agriculture 12, no. 11: 1804. https://0-doi-org.brum.beds.ac.uk/10.3390/agriculture12111804

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Consensual Regression of Lasso-Sparse PLS models for Near-Infrared Spectra of Food

Abstract

1. Introduction

2. Theory and Algorithm

2.1. Theory of Lasso

2.2. Deviation Weight Fusion (DW-F)

2.3. Estimation of Model’s Performance

2.4. Framework of Proposed Fusion Strategy

2.5. Software

3. Experimental Data

4. Results and Discussions

4.1. Performance of General PLS

4.2. Sparse Member Models by Lasso

4.3. Fusion of Member Models and Comparison

4.4. Comparison and Discussions

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI