Next Article in Journal
Incidence of Injuries in Elite Spanish Male Youth Football Players: A Season-Long Study with Under-10 to Under-18 Athletes
Next Article in Special Issue
A Rapid and Nondestructive Detection Method for Rapeseed Quality Using NIR Hyperspectral Imaging Spectroscopy and Chemometrics
Previous Article in Journal
Green Synthesis of Cobalt Oxide Nanoparticles Using Hyphaene thebaica Fruit Extract and Their Photocatalytic Application
Previous Article in Special Issue
On-Board Parameter Optimization for Space-Based Infrared Air Vehicle Detection Based on ADS-B Data
 
 
Article
Peer-Review Record

Development of Prediction Models for the Pasting Parameters of Rice Based on Near-Infrared and Machine Learning Tools

by Pedro Sousa Sampaio 1,2,3,*, Bruna Carbas 1,4 and Carla Brites 1,2
Reviewer 2: Anonymous
Reviewer 3:
Submission received: 12 March 2023 / Revised: 11 June 2023 / Accepted: 14 June 2023 / Published: 9 August 2023
(This article belongs to the Special Issue Spectral Detection: Technologies and Applications)

Round 1

Reviewer 1 Report

The manuscript discusses the use of machine learning and NIR spectroscopy for predicting pasting parameters. However, it is important to include data on the pasting properties of rice varieties and their correlation with the NIR spectra, which is currently missing from the manuscript. Although the machine learning aspect is adequately described, there is room for improvement. Similarly, the ANN modelling section could be improved. Furthermore, the manuscript does not discuss the prediction of pasting properties using the PLS and ANN models with a random unknown sample.

Abstract

Page 1, Lines 16-29

The experimental design should be explained in brief to understand the study. Also, the use of the generated model should be highlighted.

Page 1, Line 21

The word ‘PLS’ should be deleted.

Page 1, Lines 16-17

The sentence “Rice (Oryza sativa)……….the production processes” should be restructured for better clarity and flow.

 

 

Introduction

Page 1, Line 33

The botanical name “Oryza sativa L.” should be in italics.

Page 1, Line 36

The word ‘defined by’ should be deleted.

Page 2, Lines 46-47

Palatability is a complex trait which depends upon many factors such as texture, flavour, aroma, taste, nutritional content and appearance, etc. Instead of palatability, a setback can be used as an indicator of the textural attribute. The sentence “High setback….. palatability” may be revised.

Page 2, Lines 49-50

It is not always the decrease in PV and FV considered to be the best starch gelatinization and degradation. The optimum PV and FV values may differ according to starch type, product and process requirement. The sentence “The best…..viscosity (FV)” may be revised.

Page 2, Lines 59-70

Mention the latest literature which indicates the use of NIR for quality evaluation of cereals or cereal-based products.

Materials and Methods

Page 3, Line 101

The title of subheading 2.1 includes ‘quality evaluation’, however, only pasting parameters and amylose content is mentioned. Need to be rectified  

Page 3, Lines 102-104

The sentence should not start with a numeral. Also, it seems incomplete. Please revise the sentence “166 rice….(2014-2016)”.

Page 3, Section 2.1

Please describe the experimental plan of the study. The details of independent parameters and response variables should be provided.

Page 3, Line 116-117

What is the meaning of the sentence “Several methods………..calibration step”?

Page 3, Line 122

What do you mean by ‘future samples’ analyte concentration’?

Page 4, Lines 148-156

Please describe the architecture of the network i.e. no. of input layers, hidden layers and output layers, etc. and the basis for the selection of layers. Also, write the criteria for the selection of transfer functions.

Page 4, Lines 159-160

Please check the calculation of spectral distribution for training, testing and validation and revise the sentence “A total of …………(49 spectra)”.

Page 4, Lines 162-163

Describe the method by which the topology of ANN was defined.

Results and Discussion

 

 

Page 4, Line 181

It is mentioned that the NIR spectra were registered for all rice samples, however, it is not clear whether it is native rice flour or rice flour after the pasting profile. What was the particle size of the rice flour? Did the same particle-size flour be used for pasting properties? What were the criteria for choosing rice flour for NIR spectra absorption?

Page 4, Lines 181-203

The authors should show the NIR spectra of rice varieties in Figure either in the Manuscript or as supporting material. The main context of a manuscript is a prediction of pasting properties using machine learning models. Therefore, the authors should discuss the methods by which the pasting data and NIR spectra were correlated.

Page 4, Lines 192-193

Write full forms of MSC and SNV. What was the basis for use of only the MSC+2nd derivative and SNV+2nd derivative for pre-processing? Have the authors tested other pretreatment methods? If yes, write the details in the results and discussion.

Page 7, Lines 260-261

The sentence seems incomplete “Indicating ……gelatinization”.

Page 5-9, Lines 181-352

The use of iPLS and siPLS is discussed well to generate the model, however, the validation using the generated model for the unknown random sample seems missing in the manuscript. Moreover, the R values ranged from 0.57-0.85 for iPLS and 0.64-0.90 for siPLS. Can these values be considered good for a machine learning model to predict precise results? Did the author validate the model using predictive performance metrics? For better understanding, authors are suggested to provide data of RMSEC for calibration, RMSEP for prediction, R calculated and Rpredicted for both iPLS and siPLS models. It is also advisable to use different pretreatment combinations to improve fitness functions.  

Page 9, Lines 367-368

How the number of input layers was decided? Why hidden layers were fixed to 10?

Page 9, Table 2

Though the Rcalibration values are high, the Rvalidation and Rtesting values ranged between 0.55-0.85 which might not be considered good for a robust ANN model. Can you recheck the results?

Page 9-11, Lines 354-393

It is advised to show the data of prediction of pasting properties for a random unknown sample using the developed models and compare it with experimental results. Provide the percentage error between the predicted and experimental values of a random unknown sample. A similar exercise should be conducted for iPLS and siPLS models.

Conclusion

Page 11, Lines 399-402

The sentence “This strategy……………resources” needs to be revised

 

 

Comments for author File: Comments.pdf

Author Response

Dear Reviewer,

 

We would like to thank you for the opportunity to improve and clarify the information present in the manuscript.  Then we resent, for your consideration, the revised version of our manuscript that was significantly improved according to the Referee-1 suggestions, being added new sections, and references. The tables were improved and corrected present detailed and complete information. It was added m for a suitable definition of the mathematical treatment. The main changes performed in the manuscript are highlighted in yellow.

Author Response File: Author Response.docx

Reviewer 2 Report

Although the topic is interesting, I have some serious issues that have to be clarified.

Regarding Introduction part: the objective (aim) of the work should be clearly stated; this is not the case in this article. The aim of the work was written too general.

Background of the study should provide sufficient information about the problematic under consideration and supported with adequate literature. Furthermore, some paragraphs, for instance paragraph starting with Partial least squares (PLS) is a quantitative regression algorithm... should be rewritten (why mentioning RMSECV in Introduction part?).

Materials and methods: 2.1. Although experimental results used for modeling were obtained in previous study (properly cited), brief description of pasting parameters (how were they determined) and NIR spectra recording would be important for understanding the concept of the paper.

Subsections 2.2. and 2.3. should be shortened (opposite to 2.1., I know). In these subsections you are repeating well known facts, instead of adding some important information (examples are marked yellow and added comment in pdf version of your manuscript). Secondly, why did you use the correlation coefficient (R) instead of determination coefficient R2?

Results and discussion: Very difficult to read since you are repeating some information that would be more appropriate for Introduction or Materials and methods section. Also, comparison with existed literature is missing. Figures and tables should be clearly explained, providing enough details, making the understandable without reading the text. Regarding the obtained results, I have some serious objections: with presented values of correlation coefficient R, values of determination coefficients (R2) are lower. That means that PLS models could be used only for qualitative purposes (for some pasting parameters). Secondly, values of the RMSECV are too high for developed PLS models which means that these models are unappropriate for prediction of pasting parameters. You did not choose an option to present RMSECV values as percentages. RMSECV values lower than 10 % would have any sense.

Based on the experimental results presented in Fig 1A, I see a large number of them; supposing that each pasting parameter has different range of experimental values (from minimum to maximum value) causing a shift of the model. Can you explain this, please ?

You did not use raw spectra. Why? Regarding preprocessed spectra, did you perform PCA analysis and then used factors responsible for 99,9 % variability in the data as input variables for ANN development ? This is something unclear for me.

Regarding discussion part 3.2. Artificial Neural Network: RMSE values for calibration, validation and testing are too high. R values for validation and testing are low, and if we calculate R2 values, they are even lower which does not make any sense. Again, I suppose that range of experimental values differs significantly for each pasting parameter. If these models were developing for laboratory purposes, they are not applicable. Only if you are developing models for industrial, then they would have some effects.

Additionally, architecture with 1154 input layers, 10 hidden neurons (only 10 ??) and 1 output layer ?? Please explain.

Abstract and conclusion part should be rewritten.

Some other comments are added to pdf version.

 

 

 

Comments for author File: Comments.pdf

Author Response

Dear Reviewer,

We would like to thank you for the opportunity to improve and clarify the information present in the manuscript.  Then we resent, for your consideration, the revised version of our manuscript that was significantly improved according to the suggestions of the referees, being added new sections, the figures were improved and completed and it  was added suitable references. The tables were improved with more detailed and complete information. It was added more mathematical formulas for a suitable definition. The manuscript was completely revised in terms of the English language. The main changes performed in the manuscript are highlighted in yellow mark.

We look forward to hearing from you in due time regarding our submission and to responding to any further questions and comments you may have.

            I would like to inform you that all authors agree to publish this work in your journal.

 

Yours sincerely,

Pedro Sousa Sampaio

Author Response File: Author Response.docx

Reviewer 3 Report

The manuscript “Prediction of the pasting parameters of rice using machine learning and Near-Infrared spectroscopy” by Pedro Sousa Sampaio et al. reported the machine learning algorithms associated with NIR spectroscopy analysis. There are some comments for authors to improve their manuscript:

1.      Sec. 2.1, 166 rice samples were chosen by the study, but why authors choose the rice of 2014-2016? Why not use some fresh samples?

2.      According to Fig. 1C, there are some high absorbance at 4000-6000 cm-1, but why authors only care 4784-4395? Furthermore, in Table. 1, there are several spectral regions were mentioned, why the regions of iPLS processing are always 4784-4396 no matter how parameters change?

3.      The model of 3.1 is not clear enough for reader to understand how it works along with the spectral.

4.      Sec. 3.1 and 3.1, authors used different models, but without any comparisons or discussion, which let readers confuse about the connection of these methodologies.

5.      The study is aim for a fast, clean, and non-destructive mode, however, there is no any data to support the “fast, clean and non-destructive” characteristics.

Author Response

Dear Reviewer,

We would like to thank you for the opportunity to improve and clarify the information present in the manuscript.  Then we resent, for your consideration, the revised version of our manuscript that was significantly improved according to the suggestions of the referees, being added new sections, the figures were improved and completed and it was added suitable references. The tables were improved with more detailed and complete information. It was added more mathematical formulas for a suitable definition. The manuscript was completely reviewed in terms of the English language. The main changes performed in the manuscript are highlighted in yellow.

             We look forward to hearing from you in due time regarding our submission and to answer to any further questions and comments you may have.

            I would like to inform you that all authors agree to publish this work in your journal.

 

Yours sincerely,

Pedro Sousa Sampaio

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Although the authors have addressed the comments, the reply is still not convincing. It is strongly recommended that they carefully review the manuscript again and address the following points. They should also demonstrate the application of the generated models using new random unknown rice varieties and present the data in the manuscript. Therefore, a major revision is suggested. 

 

Page 1, Lines 16-29

It was requested that the experimental design, methodology, and use of the developed model be highlighted in the abstract. However, only the aim of the study has been mentioned, and these essential details have been left out. Therefore, it is necessary to make the required improvements.

Page 3, Lines 101-114

The author may use abbreviations for pasting properties, as already defined in the introduction section. Please make the necessary changes throughout the manuscript.

Page 3, Lines 101-114

It has been observed that the rice samples used in the study were from the years 2014-2016. As it is evident from various studies, storage can affect the pasting properties of rice. Therefore, it is questionable whether it is feasible to study rice that is 6-7 years old. Furthermore, it is uncertain whether the model generated will produce accurate results for freshly harvested rice cultivars. Please explain why such rice varieties were selected for the study.

Page 5, Lines 201-235

There is still confusion regarding the input layers in the ANN architecture. Although it is mentioned that there are 1154 input layers, it is not described how they were selected in the corresponding section. Additionally, the author has not yet described the independent parameters that serve as the input parameters to the ANN, which would provide the response as pasting properties. With 166 rice samples and 16 scans per sample over a range of 12000-4000 cm-1, the calculation of input layers needs to be clarified.

Prediction of pasting properties of unknown rice varieties  

After examining the correlation coefficients of the prediction models iPLS, siPLS, and the testing and validation coefficients for ANN, there is still uncertainty about the applicability of the developed model to accurately predict results for random unknown rice samples. While the authors addressed the comment in the reply report by mentioning R and RMSEP values and percentage errors, this information is still incomplete. Therefore, it would be beneficial to devote a separate section (3.3) to the application of the models in determining the pasting properties of random unknown rice varieties. This section should include a table that shows the pasting properties estimated using RVA and the developed models (iPLS, siPLS, and ANN), as well as a report of the difference between the experimental and predicted values. A minimum error of less than 10%, for example, is expected.

 

 

Comments for author File: Comments.pdf

Author Response

Response file was attached.

Author Response File: Author Response.pdf

Reviewer 2 Report

Dear authors, unfortunately you did not reply to all of my comments. You did not calculate R^2 values, RMSE values are still to high, did not provide the experimental range for each pasting parameter and you did not convince me in the applicability of developed models for industrial purposes.

Scientific soundness of the manuscript was improved to some extent, however, further modifications are required in terms of providing essential scientific information, avoiding repeating of very well known facts (use only citing appropriate literature). This should be applied mainly to MM section.

 

Author Response

Response file  (Referee 2) was attached.

Author Response File: Author Response.docx

Reviewer 3 Report

Authors have revised the manuscript as reviewers's comments. 

Author Response

Referee 3 did not send any questions.

Round 3

Reviewer 1 Report

Page 3, Lines 101-111

The author was asked to clarify the use of rice varieties harvested between 2014-2016. In response, the author stated that the pasting properties were determined immediately after the harvesting process and recommended that this information be included in section 2.1. Additionally, the moisture content range of the rice samples should also be mentioned.

Page 4, Lines 181-192

The authors were asked to provide details on the calculation of the 1154 input layer. In response, they clarified that 1154 represents the number of subintervals between 12,000–4000 cm-1 during spectra acquisition. It is requested that they also mention the wavenumber step-up value during acquisition.

Prediction of pasting properties of unknown rice varieties (Section 3.3, lines 407-427)

The authors were requested to add a separate section (3.3) detailing the application of their models in determining the pasting properties of unknown rice varieties. Although the authors provided pasting information in their response letter and added Table 3, the requested information is still not presented in the table. It is again advised that the authors show the actual values of pasting properties determined by RVA and compare them with the values predicted using their developed models. They should present the data as mean ± SD and calculate the difference between the experimental and predicted values. Authors should use at least 2-3 rice varieties to show the suitability of models to predict pasting properties, otherwise, there is no meaning to the developed models. Moreover, based on the data, the authors should describe the suitability of their developed models.

 

Please carefully review the manuscript and correct any grammatical errors as well as any inconsistencies in abbreviations and units.

Author Response

Reviewer 1

Manuscript ID: applsci-2297043.R2

Type of manuscript: Article

Title: Development of the prediction models for pasting parameters of rice based on Near-Infrared and machine learning tools

 

Dear Reviewer,

Authors are very grateful to the reviewer for the contributions to improve our work. The manuscript has been changed, taking all the reviewer’ remarks into account. Then, we resent, for your consideration, the revised version that was significantly improved according to the Referee-1 suggestions.

We hope that this version of our manuscript can be accepted for publication in the Applied Science, considering a important contribuition for evaluation of rice quality properties.

The main changes performed in the manuscript are highlighted in yellow marks. The English was reveised and the some sentences were improved.

 

Comment 1: Page 3, Lines 101-111: The author was asked to clarify the use of rice varieties harvested between 2014-2016. In response, the author stated that the pasting properties were determined immediately after the harvesting process and recommended that this information be included in section 2.1. Additionally, the moisture content range of the rice samples should also be mentioned.

 

Answer: Thank you very much for the comment and suggestion. In the sub-section 2.1. was added that the pasting properties were determined immediately after harvesting process and the range of moisture content of rice samples (line 107 – 110).

The moisture content of rice samples ranged 12 – 12.5%, determined by the AACC International Method 44-15.02.

 

Comment 2: Page 4, Lines 181-192: The authors were asked to provide details on the calculation of the 1154 input layer. In response, they clarified that 1154 represents the number of subintervals between 12,000–4000 cm-1 during spectra acquisition. It is requested that they also mention the wavenumber step-up value during acquisition.

 

Answer: Thank you for your comment. The input layer (1154) corresponds to the number of wavenumbers in the intervals [12,000 – 4000 cm-1] defined during the spectra resolution step. The wavenumber interval was segmented into 1154 data set, being used for models development. Each interval represent about 7 cm-1. In the total (1154 intervals) correspond to 8000 cm-1 (12.000 – 4000 cm-1). Those intervals were used for ANN and iPLS and siPLS methods.

It was added in the manuscript the following phrase:

 

“The wavenumber interval was segmented into 1154 data set, and each interval represents about 6.93 cm-1.

 

 

Comment: Prediction of pasting properties of unknown rice varieties (Section 3.3, lines 407-427): The authors were requested to add a separate section (3.3) detailing the application of their models in determining the pasting properties of unknown rice varieties. Although the authors provided pasting information in their response letter and added Table 3, the requested information is still not presented in the table. It is again advised that the authors show the actual values of pasting properties determined by RVA and compare them with the values predicted using their developed models.

They should present the data as mean ± SD and calculate the difference between the experimental and predicted values. Authors should use at least 2-3 rice varieties to show the suitability of models to predict pasting properties, otherwise, there is no meaning to the developed models. Moreover, based on the data, the authors should describe the suitability of their developed models.

 

Answer: In the table 3 was added the mean ± SD of experimental and predicted data for all pasting parameters. The difference between the experimental and predicted values were mentioned in the discussion of results. According to the referee suggestion, we present a new table Table 4) containing the experimental RVA parameters and the estimated values using the developed models (iPLS, siPLS, and ANN) for six samples belonging to three different rice varieties (Table 4).

The text and section was detailed changed:

 

3.3. Testing External Model

The iPLS, siPLS, and ANN models were tested using 93 external spectral data of rice, being evaluated in terms of the R2 and RMSE (Table 3, Fig. 5). According to the values obtained, the ANN method was shown to be significantly robust and suitable for pasting parameters prediction and, consequently, the rice quality (Table 3). These models can be considered a robust strategy for rice quality evaluation, being characterized by accuracy for different rice types, allowing, consequently, to show the applicability of NIR spectroscopy and the machine learning tools to assess the rice quality in the fast mode.

 

 

 

              Table 3 - Model for different parameters determined after models’ development.

Pasting Parameter

Model

Experimental data

Predicted data

R2

RMSE

% (RMSE)

BD

iPLS

1238 ± 396

1155 ± 459

0.95

76

6.8

siPLS

1134 ± 413

0.97

43

3.8

ANN

1133 ± 423

0.98

43

3.8

FV

iPLS

2984 ± 349

2887 ± 433

0.95

91

3.1

siPLS

2903 ± 468

0.91

117

4.0

ANN

2889 ± 419

0.95

87

3.0

PV

iPLS

2657 ± 652

2474 ± 720

0.97

97

19.0

siPLS

2503 ± 785

0.96

140

9.6

ANN

2468 ± 738

0.97

125

7.6

ST

iPLS

327 ± 514

436 ± 558

0.97

66

4.0

siPLS

419 ± 536

0.98

53

6.0

ANN

407 ± 528

0.99

50

5.0

TR

iPLS

1419 ± 282

1344 ± 313

0.95

66

5.0

siPLS

1326 ± 330

0.97

57

4.2

ANN

1333 ± 306

0.98

42

3.1

iPLS – interval PLS; siPLS - synergy interval PLS; ANN - Artificial Neural Networks.

 

 

 

 

The quality analysis methods, used in the food industry, are time-consuming and highly expensive due to requiring special testing instruments. For that reason, the main goal of this study was to develop different models based on machine learning algorithms, related to the rice pasting properties BD, FV, PV, TR, and ST, being, characterized by a specific spectral region that presents a significative influence in terms of the pasting parameters. This strategy represents a substantial contribution to the rice value chain since breeding programs, industry, and consumers, focusing on a non-destructive technique for the evaluation of rice quality.

 

 

 

 

                                              Table 4 – Pasting properties prectited using the several models developed.

Rice type

Breakdown (cP)

iPLS

siPLS

ANN

Sprint

957

958

957

952

Sprint

941

940

941

936

OP 1203-ceres

1654

1735

1665

1673

OP 1203-ceres

1748

1840

1760

1770

ARIETE 104

1249

1284

1254

1254

ARIETE 105

1242

1276

1247

1247

Rice type

Final Viscosity (cP)

iPLS

siPLS

ANN

Sprint

3235

3248

3292

3238

Sprint

3261

3277

3323

3266

OP 1203-ceres

3143

3146

3182

3139

OP 1203-ceres

3249

3263

3309

3253

ARIETE 104

3080

3077

3107

3072

ARIETE 105

3051

3044

3072

3041

Rice type

Peak Viscosity (cP)

iPLS

siPLS

ANN

Sprint

2235

2215

2219

2201

Sprint

2264

2245

2253

2232

OP 1203-ceres

3229

3241

3339

3248

OP 1203-ceres

3401

3418

3531

3428

ARIETE 104

2774

2772

2826

2769

ARIETE 105

2745

2742

2793

2738

Rice type

Setback (cP)

iPLS

siPLS

ANN

Sprint

1000

1075

1032

1010

Sprint

997

1071

1028

1007

OP 1203-ceres

-87

-103

-98

-102

OP 1203-ceres

-152

-173

-166

-169

ARIETE 104

306

323

310

299

ARIETE 105

306

322

309

299

Rice type

Trough (cP)

iPLS

siPLS

ANN

Sprint

1278

1278

1258

1265

Sprint

1323

1325

1306

1310

OP 1203-ceres

1576

1583

1578

1561

OP 1203-ceres

1652

1661

1660

1638

ARIETE 104

1525

1531

1523

1511

ARIETE 105

1503

1509

1499

1489

                                               Sprint, OP1203-Ceres, ARIETE correspond to rice varieties tested along the study.

 

After the development of the prediction models, the test with selected samples allowed estimating with significant accuracy the values of each pasting property. The rice samples belong to different varieties, which proves that the models are suitable for rigorous evaluation regardless of their origin and composition. From the evaluation between the experimental and estimated value for each property, it should be noted that the difference was greater for the models developed by iPLS algorithm, while the difference between the experimental and the estimated data was smaller for the developed model by the neural network (ANN) (Table 4). Based on these results, we can consider that the development of accurate prediction models can be considered an added value for producers, based on NIR technology and appropriate machine learning models adjusted to each specific condition, can easily estimate the pasting parameters which may be correlated with rice quality.

 

Author Response File: Author Response.docx

Back to TopTop