Combination of Transfer Deep Learning and Classical Machine Learning Models for Multi-View Image Analysis

Xu, Zheng; Wu, Cong

doi:10.3390/IOCMA2023-14401

Open AccessProceeding Paper

Combination of Transfer Deep Learning and Classical Machine Learning Models for Multi-View Image Analysis^†

by

Zheng Xu

^1,*

and

Cong Wu

²

¹

Department of Mathematics and Statistics, Wright State University, Dayton, OH 45435, USA

²

Department of Computer Science and Engineering, Wright State University, Dayton, OH 45435, USA

^*

Author to whom correspondence should be addressed.

^†

Presented at the 1st International Online Conference on Mathematics and Applications, 1–15 May 2023; Available online: https://iocma2023.sciforum.net/.

Comput. Sci. Math. Forum 2023, 7(1), 13; https://0-doi-org.brum.beds.ac.uk/10.3390/IOCMA2023-14401

Published: 28 April 2023

(This article belongs to the Proceedings of The 1st International Online Conference on Mathematics and Applications)

Download Versions Notes

Abstract

:

Deep learning has become widely used in image analysis. Transfer learning can make use of information from other datasets for the analysis of the chosen dataset. When there is a small number of images at hand, transfer learning using pre-trained models with coefficients already estimated from other datasets is recommended. This is in contrast to deep learning, where most model parameters are re-estimated. Deep transfer learning uses pre-trained models with fixed weight parameters in the lower layers; thus, deep learning can be viewed as a two-stage approach: (1) feature extraction from a lower neural network layer and (2) estimation of a neural network using the extracted features as inputs. Since deep transfer learning is feature extraction, we can extend the two-stage approach to a more general two-stage framework: (1) feature extraction using multiple methods and (2) machine learning methods taking extracted features as inputs. We evaluate the performance of methods with different Stage 1 and Stage 2 approaches in predicting the phenotype leaf numbers based on a multi-view plant imaging dataset. This paper contains an evaluation of different two-stage machine learning methods for multi-view image data in plant image phenotyping.

Keywords:

deep learning; transfer learning; feature extraction; random forest; plant image phenotyping; multiview images

MSC:

68T45; 62P10

1. Introduction

Deep learning has become widely used in image analysis. A typical deep learning model can have millions of parameters. For example, VGG16, VGG19 and ResNet50, respectively, have 138.4 million, 143.7 million, and 25.6 million parameters [1,2,3]. As a result of deep learning models having millions of parameters, large datasets have to be used to train the model to estimate model parameters. Researchers have spent a lot of resources (time and money) to collect and annotate large datasets for deep learning model purposes.

However, in many applications, it is not necessary to fully re-train the model based on a large annotated dataset. Fully re-estimating the deep learning model can incur a lot of cost, and the strategy of fully re-estimating the deep learning model is not feasible due to budget and resource limits. In addition, fully re-estimating the model with a small dataset may not lead to a good performance. For a small dataset, a simple model is often preferred over a complex model [4]. For example, if only limited observations are available for the regression, a linear regression or polynomial regression (with a degree less than four) is often preferred over non-parametric regression models [4]. The other way is to use pre-trained parameters in a complex model, such as deep learning neural network models, which are also recommended [4]. Researchers have developed the transfer learning approach to solve this issue. Transfer learning allows researchers to make use of pretrained models based on other datasets to analyze their own problem using a small- or medium-sized dataset [5]. One typical method of transfer learning for image analysis is a two-stage approach. In Stage 1, the lower neural network layers in the deep learning model (for example, in the VGG16 model, there are 16 convolutional neural network layers) are used with pretrained weights from another dataset, which is a standard big dataset such as ImageNet [6], to transform the input images into features. In Stage 2, the extracted features and the ground truth response (y) are fed into a neural network where all neural network parameters are estimable [7]. In this way, a satisfactory performance can be obtained even with a small dataset.

We extend the two-stage approach in general. In Stage 1, a feature extraction method is used to extract features from input images. The feature extraction methods can be principal component analysis or pre-trained deep neural network models with fixed parameters. Stage 2 is a supervised learning problem (regression or classification) with extracted features from Stage 1 as inputs and response variables (continuous or categorical y) as outputs. The method in Stage 2 can be a neural network or random forest. We intend to evaluate the performance of our general two-stage approach with different Stage 1 methods and different Stage 2 methods.

Our proposed two-stage machine learning strategy is a general approach in that researchers can select appropriate Stage 1 methods and Stage 2 methods according to the research problem, objective and data. In our previous work [8], we proposed machine learning methods to predict continuous phenotypes and binary phenotypes based on plant images. Our methods belong to the general framework of our proposed two-stage approach. In Stage 1, we adopt principal components analysis (PCA) to extract features, i.e., PCs. In Stage 2, we use a range of machine learning methods (random forest, partial least squares and LASSO) to predict the plant phenotypes, including the number of leaves of a plant, based on plant images. Our proposed methods work for plant image phenotyping [8].

Another example of a method that belongs to our proposed two-stage method is standard deep transfer learning [9]. In Stage 1, the lower layers of neural network models with pre-trained fixed weights are used to extract features. In Stage 2, these features are fed into the upper layers of neural networks. Thus, deep learning with the weights of lower layers fixed is also present in our proposed general framework [9].

Image-based plant phenotyping, i.e., plant image phenotyping, refers to a rapidly emerging research area concerned with quantitative measurements of the structural and functional properties of plants based on plant images [8]. Image-based plant phenotyping facilitates the extraction of traits noninvasively by analyzing a large number of plants in a relatively short period of time. Plant image phenotyping has the advantages of low cost, a high throughput and the fact that it is non-destructive [10]. Based on plant image phenotyping, agricultural and biological researchers can track the growth dynamics of plants and identify the time of critical events (such as plant flowering) and morphological changes (such as the number of leaves, plant size and position of each leaf) so that they can better analyze the problem. An example of a problem is how different factors (fertilizer usage amount, temperature and moisture) influence plants [8]. In this article, we illustrate our method by evaluating the performance of our general two-stage framework with different Stage 1 and Stage 2 methods. We evaluate how these methods work for plant image phenotyping, especially in detecting the number of leaves of plants by analyzing RGB images.

The remaining of the paper is as follows: Section 2 specifies the methods and data. Section 3 shows the results. Section 4 presents our discussions. Section 5 draws the conclusions.

2. Materials and Methods

The proposed method is a general two-stage approach. In Stage 1, a feature extraction method is used to extract features from input images. We adopt principal component analysis in this paper. In our ongoing project, we are evaluating the performance of other feature extraction methods, especially pre-trained deep neural network models with predetermined and fixed parameters. As transfer learning, the values of neural network parameters are pre-trained using a large dataset, i.e., ImageNet. ImageNet is a large image dataset organized according to the WordNet hierarchy. Each meaningful concept in WordNet is described by multiple words or word phrases [6]. ImageNet includes 80,000 nouns with each noun illustrated in on average 1000 images, and it was created to satisfy researchers’ critical need for more data to enable more general machine learning methods [6]. The use of ImageNet to pre-train neural network parameters to obtain parameter values and the use of fixed ImageNet weights in deep learning have demonstrated advantages in the literature [11].

Stage 2 is a supervised learning problem (regression or classification) with extracted features from Stage 1 as inputs and response variables (continuous or categorical y) as outputs. We adopted partial least squares (PLS) and LASSO as regression methods, and partial least squares discriminant analysis (PLS-DA) and LASSO as classification methods. LASSO often shows good prediction performance for high-dimensional data with the use of the

L 1

penalty [4]. When model interpretation is preferred instead of model prediction, the LASSO method is often used to identify predictors impacting the response variable, assuming sparse signals [4]. With recent developments in explanatory machine learning and artificial intelligence, researchers are trying to develop models that are easy to interpret, instead of black-box machine learning models. When an explanation is preferred, LASSO methods and decision trees are often used due to their good interpretability [4]. A range of methods on visual interpretability for deep learning have been developed in the literature [12]. In our ongoing project, we are working on evaluating the performance of random forest and neural networks as our Stage 2 methods.

The dataset used in our study is the University of Nebraska Lincoln (UNL) Component Plant Phenotyping Dataset (CPPD) [13]. The UNL-CPPD dataset consists of images of 13 maize plants from two side views (0 degrees and 90 degrees). Plants were imaged once per day from 2 days to 28 days after planting at the UNL Lemnatec Scanalyzer 3D high-throughput phenotyping facility using an RGB camera.

The RGB images were converted to grayscale images and resized to

224 \times 224

, which is the size of input images for deep learning models including VGG16, VGG19 and ResNet50 [1,2,3]. In this paper, each grayscale image was converted into a numerical matrix of 224 rows and 224 columns, which was vectorized/reshaped to a column vector of length

224^{2}

= 50,176. The data were centered and scaled before extraction of principal components. Principal components were extracted from the centered and scaled vectors representing the images. The extracted principal components were then fed into Stage 2 machine learning methods (any appropriate supervised learning method can be used) to make a prediction.

The phenotype leaf number refers to the number of leafs in a plant image. It is an integer and we treat it as a continuous phenotype. Then, the binary variable “leafy” was created as leafy = 1 if the leaf number was more than the median leaf number. We applied regression methods to predict the phenotype “plant leaf number” and applied classification methods for the binary phenotype “leafy”.

Five-fold cross validation (CV) was used to evaluate the performance. In the regression problem, the performance evaluation metrics are Mean Square Error (MSE), Root Mean Square Error (RMSE) and Mean Absolute Deviation (MAD), specified as

\begin{matrix} MSE & = & \frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - {\hat{Y}}_{i})}^{2}; \end{matrix}

(1)

\begin{matrix} RMSE & = & \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - {\hat{Y}}_{i})}^{2}}; \end{matrix}

(2)

\begin{matrix} MAD & = & \frac{1}{n} \sum_{i = 1}^{n} | Y_{i} - {\hat{Y}}_{i} |, \end{matrix}

(3)

where

Y_{i}

is the true response value and

{\hat{Y}}_{i}

is the predicted response value for observation i. In the classification problem, the performance evaluation metric is accuracy, which is the number of correct classifications divided by the total number of classifications.

3. Results

3.1. Performance of Regression Methods

We report in Table 1 the performance of regression methods for the continuous trait of leaf number. We found the performances of the PLS method and LASSO method are nearly the same, with the performance of the LASSO method slightly better than the performance of PLS in the regression problem. In the literature, researchers often report the superior performance of LASSO over PLS, and we wish to point out that we are not sure which method (LASSO or PLS) is better for plant image phenotyping since only one dataset is studied. We are working on evaluating the methods in other datasets in our ongoing project. Given the current results, we recommend adopting our proposed general framework, and try a range of Stage 1 methods and Stage 2 methods to decide which specific method should be used in Stage 1 and which to use in Stage 2. In our ongoing project, we will provide more thorough results based on multiple datasets.

3.2. Performance of Classification Methods

We report in Table 2 the performance of classification methods for the binary trait “leafy”. We found the performance of the LASSO method is slightly better than the performance of PLS-DA in the classification problem. In practice, although the LASSO method often shows superior performance over least square regression, ridge regression and partial least squares in real applications [4], the preferred method still depends on the specific applications so that LASSO method and partial least squares are likely to be the best performing methods for other datasets. We note that the performances of the methods are evaluated in only one dataset (UNL-CPPD multi-view images). In our ongoing projects, we are evaluating our methods on other datasets. We point out that evaluation of methods based on multiple datasets can provide more evidence of the performance of methods, and the results based on one dataset are of limited scope. In our ongoing project, we will provide more through results based on multiple datasets. The purpose of this article is to propose and communicate our methods with experts and a more thorough analysis will be provided in our ongoing project.

4. Discussion

Our proposed method is a general two-stage framework allowing the choice of a Stage 1 method and a Stage 2 method. When the Stage 1 method and the Stage 2 method are based on the same neural network model, it is reduced to the deep transfer learning model. The current report uses principal component as the Stage 1 method and partial least squares and LASSO as the Stage 2 methods based on the UNL COPD dataset. We note that the current report is of limited scope, and more methods and more datasets are needed to conduct a more thorough analysis and report. In our ongoing project, we are working to evaluate the performance of the two-stage approach using different Stage 1 methods (deep neural networks and principal component) and Stage 2 methods (partial least squares, LASSO and random forest), and will compare their performances with those of deep transfer learning in the literature based on multiple datasets.

Regarding Stage 1 methods to extract features from images, two widely used methods are (1) principal component analysis and (2) pre-trained deep learning. Both methods work for plant image phenotyping (image regression, classification and segmentation), as shown in the literature, including two of our previous studies [8,14].

Although our two-stage method is a general framework, the most widely used framework is deep transfer learning, which has already shown great success in the literature. We want to explore the possibilities of other models by using different Stage 1 and Stage 2 methods. In terms of prediction performance, we expect deep transfer learning may achieve the best prediction performance, but it is still worth comparing different methods. The objective of this article is to compare different methods belonging to our two-stage general framework for better prediction and interpretation so that researchers can have a better understanding and more tools when they want to develop novel machine learning methods.

5. Conclusions

We have proposed methods to extend the two-stage deep transfer learning models in the literature. Our general two-stage approach can include different Stage 1 methods and different Stage 2 methods. We evaluated the performance of our general two-stage approach with principal component analysis as our Stage 1 method and partial least squares (PLS), partial least squares-discriminant analysis (PLS-DA) and LASSO as our Stage 2 methods based on the UNL-CPPD plant phenotyping dataset.

Author Contributions

Conceptualization, C.W.; methodology, Z.X. and C.W.; software, Z.X. and C.W.; validation, C.W.; formal analysis, Z.X. and C.W.; investigation, Z.X. and C.W.; resources, Z.X. and C.W.; data curation, C.W.; writing—original draft preparation, Z.X. and C.W.; writing—review and editing, Z.X. and C.W.; visualization, C.W.; supervision, Z.X.; project administration, Z.X. and C.W.; funding acquisition, Z.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data used in this paper are publicly available.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CPPD	Component Plant Phenotyping Dataset
CV	Cross Validation
LASSO	Least Absolute Shrinkage and Selection Operator
PC	Principal Component
PLS	Partial Least Squares
PLS-DA	Partial Least Squares-Discriminant Analysis
RF	Random Forest
VGG	Visual Geometry Group from Oxford

References

Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Shafiq, M.; Gu, Z. Deep residual learning for image recognition: A survey. Appl. Sci. 2022, 12, 8972. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R.; Friedman, J.H.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: Berlin/Heidelberg, Germany, 2009; Volume 2. [Google Scholar]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
Weiss, K.; Khoshgoftaar, T.M.; Wang, D. A survey of transfer learning. J. Big Data 2016, 3, 9. [Google Scholar] [CrossRef]
Xu, Z.; Wu, C. Machine Learning and Statistical Approaches for Plant Phenotyping. In Intelligent Image Analysis for Plant Phenotyping; CRC Press: Boca Raton, FL, USA, 2020; pp. 195–220. [Google Scholar]
Tan, C.; Sun, F.; Kong, T.; Zhang, W.; Yang, C.; Liu, C. A survey on deep transfer learning. In Proceedings of the Artificial Neural Networks and Machine Learning—ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, 4–7 October 2018; Proceedings, Part III 27. Springer: Berlin/Heidelberg, Germany, 2018; pp. 270–279. [Google Scholar]
Das Choudhury, S.; Samal, A.; Awada, T. Leveraging image analysis for high-throughput plant phenotyping. Front. Plant Sci. 2019, 10, 508. [Google Scholar] [CrossRef] [PubMed]
Shermin, T.; Teng, S.W.; Murshed, M.; Lu, G.; Sohel, F.; Paul, M. Enhanced transfer learning with imagenet trained classification layer. In Proceedings of the Image and Video Technology: 9th Pacific-Rim Symposium, PSIVT 2019, Sydney, NSW, Australia, 18–22 November 2019; Proceedings 9. Springer: Berlin/Heidelberg, Germany, 2019; pp. 142–155. [Google Scholar]
Zhang, Q.S.; Zhu, S.C. Visual interpretability for deep learning: A survey. Front. Inf. Technol. Electron. Eng. 2018, 19, 27–39. [Google Scholar] [CrossRef]
Das Choudhury, S.; Bashyam, S.; Qiu, Y.; Samal, A.; Awada, T. Holistic and component plant phenotyping using temporal image sequence. Plant Methods 2018, 14, 35. [Google Scholar] [CrossRef] [PubMed]
Miao, C.; Xu, Z.; Rodene, E.; Yang, J.; Schnable, J.C. Semantic segmentation of sorghum using hyperspectral data identifies genetic associations. Plant Phenomics 2020, 4216373. [Google Scholar] [CrossRef] [PubMed]

Table 1. Performance of Machine Learning Methods for Continuous Traits.

Method	Criteria	Performance Score
PLS	MSE	2.960
PLS	RMSE	1.720
PLS	MAD	1.024
LASSO	MSE	2.959
LASSO	RMSE	1.719
LASSO	MAD	1.020

Table 2. Performance of Machine Learning Methods for Binary Traits.

Method	Criteria	Performance Score
PLS-DA	Accuracy	0.890
LASSO	Accuracy	0.895

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, Z.; Wu, C. Combination of Transfer Deep Learning and Classical Machine Learning Models for Multi-View Image Analysis. Comput. Sci. Math. Forum 2023, 7, 13. https://0-doi-org.brum.beds.ac.uk/10.3390/IOCMA2023-14401

AMA Style

Xu Z, Wu C. Combination of Transfer Deep Learning and Classical Machine Learning Models for Multi-View Image Analysis. Computer Sciences & Mathematics Forum. 2023; 7(1):13. https://0-doi-org.brum.beds.ac.uk/10.3390/IOCMA2023-14401

Chicago/Turabian Style

Xu, Zheng, and Cong Wu. 2023. "Combination of Transfer Deep Learning and Classical Machine Learning Models for Multi-View Image Analysis" Computer Sciences & Mathematics Forum 7, no. 1: 13. https://0-doi-org.brum.beds.ac.uk/10.3390/IOCMA2023-14401

Article Menu

Combination of Transfer Deep Learning and Classical Machine Learning Models for Multi-View Image Analysis^†

Abstract

1. Introduction

2. Materials and Methods

3. Results

3.1. Performance of Regression Methods

3.2. Performance of Classification Methods

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Combination of Transfer Deep Learning and Classical Machine Learning Models for Multi-View Image Analysis †

Abstract

1. Introduction

2. Materials and Methods

3. Results

3.1. Performance of Regression Methods

3.2. Performance of Classification Methods

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Combination of Transfer Deep Learning and Classical Machine Learning Models for Multi-View Image Analysis^†