Clustering and Segmentation of Adhesive Pests in Apple Orchards Based on GMM-DC

Wang, Yunfei; Liu, Shuangxi; Ren, Zhuo; Ma, Bo; Mu, Junlin; Sun, Linlin; Zhang, Hongjian; Wang, Jinxing

doi:10.3390/agronomy13112806

Open AccessArticle

Clustering and Segmentation of Adhesive Pests in Apple Orchards Based on GMM-DC

by

Yunfei Wang

¹,

Shuangxi Liu

^1,2,

Zhuo Ren

¹,

Bo Ma

¹,

Junlin Mu

¹,

Linlin Sun

^1,2,

Hongjian Zhang

^1,3 and

Jinxing Wang

^1,3,*

¹

College of Mechanical and Electronic Engineering, Shandong Agricultural University, Tai’an 271018, China

²

Shandong Provincial Laboratory of Agricultural Equipment Intellectualization Engineering, Tai’an 271018, China

³

Shandong Provincial Key Laboratory of Horticultural Machinery and Equipment, Tai’an 271018, China

^*

Author to whom correspondence should be addressed.

Agronomy 2023, 13(11), 2806; https://0-doi-org.brum.beds.ac.uk/10.3390/agronomy13112806

Submission received: 13 October 2023 / Revised: 6 November 2023 / Accepted: 11 November 2023 / Published: 13 November 2023

(This article belongs to the Special Issue In-Field Detection and Monitoring Technology in Precision Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

The segmentation of individual pests is a prerequisite for pest feature extraction and identification. To address the issue of pest adhesion in the apple orchard pest identification process, this research proposed a pest adhesion image segmentation method based on Gaussian Mixture Model with Density and Curvature Weighting (GMM-DC). First, in the HSV color space, an image was desaturated by adjusting the hue and inverting to mitigate threshold crossing points. Subsequently, threshold segmentation and contour selection methods were used to separate the image background. Next, a shape factor was introduced to determine the regions and quantities of adhering pests, thereby determining the number of model clustering clusters. Then, point cloud reconstruction was performed based on the color and spatial distribution features of the pests. To construct the GMM-DC segmentation model, a spatial density (SD) and spatial curvature (SC) information function were designed and embedded in the GMM. Finally, experimental analysis was conducted on the collected apple orchard pest images. The results showed that GMM-DC achieved an average accurate segmentation rate of 95.75%, an average over-segmentation rate of 2.83%, and an average under-segmentation rate of 1.42%. These results significantly outperformed traditional image segmentation methods. In addition, the original and improved Mask R-CNN models were used as recognition models, and the mean Average Precision was used as the evaluation metric. Recognition experiments were conducted on pest images with and without the proposed method. The results show the mean Average Precision for pest images segmented with the proposed method as 92.43% and 96.75%. This indicates an improvement of 13.01% and 12.18% in average recognition accuracy, respectively. The experimental results demonstrate that this method provides a theoretical and methodological foundation for accurate pest identification in orchards.

Keywords:

apple orchard pests; adhesive pests; point cloud reconstruction; multistage residuals; spatial density and curvature information; gaussian mixture model

1. Introduction

The apple industry is one of China’s important agricultural sectors and plays a significant role in the agricultural economy. The development speed and quality of the apple industry are crucially influenced by apple yield and quality, with pest infestations being a key factor leading to reduced apple yield and quality. Intelligent identification of pests is an essential foundation for precise pest control and management in apple orchards. With the advancement and application of machine vision technology, the use of machine vision to achieve automatic pest identification has gained widespread attention [1]. However, during pest classification and recognition, situations with multiple adhering pests are common. These situations significantly impact the accuracy of pest identification. Therefore, extracting individual target pests from adhesion images becomes a crucial prerequisite for accurate pest classification and recognition.

In the field of agricultural image adhesive segmentation, existing research mainly focuses on the segmentation of adhesive grains, fruits, leaves, and diseased patches. Yet, there is limited research on the segmentation of adhesive pests. The improved watershed segmentation method utilized grayscale distribution characteristics to generate accurate distance maps and eliminate false edges [2]. This approach resulted in the accurate segmentation of adhesive soybean images. A study used depth information from overlapping fruits to achieve regional segmentation of intersecting fruit areas [3]. An enhanced H-minima watershed segmentation technique was introduced [4]. It incorporated the least squares circle fitting error theory to successfully segment adhesive lesions on cotton leaves. An examination of a plant disease segmentation method based on watershed segmentation and K-means clustering was conducted [5]. This method successfully achieved background segmentation and adhesive segmentation of plant diseases. This method performed well when the degree of disease adhesion was low, but it tended to over-segment when dealing with multiple overlapping and adhering diseases. The aforementioned traditional segmentation methods have achieved satisfactory segmentation results to some extent. However, compared with grains, leaves, fruits, and diseased patches, the color, shape, and texture information of pests exhibit intricate diversity due to growth stages and species variations. Consequently, traditional segmentation methods are inadequate for achieving accurate and robust segmentation results for adhesive pests.

Compared with traditional segmentation methods, the application of deep learning methods for pest identification and segmentation has become a major research focus. To identify and classify strawberries at various ripeness levels, a strawberry image segmentation method was introduced [6]. This method is based on a DeepLabV3+ network and allows the neural network to ignore irrelevant feature information. It uses an attention mechanism to focus on important information, thereby improving model image segmentation accuracy with reduced computational effort. However, due to the diversity and complexity of strawberry ripeness, this method still tends to result in over-segmentation. The segmentation of adhesive moths was achieved using the flow orientation-based NCut segmentation algorithm [7]. However, this method is sensitive to lighting conditions and demands consistent moth sizes. Mask R-CNN was used for leaf instance segmentation [8]. However, the absence of data augmentation or model optimization resulted in interference and overlapping predictions among neighboring objects. An apple orchard rust detection method utilizing Mask R-CNN was introduced [9]. Three different backbone networks were compared, and the conclusion was that ResNet50 performed the best in detecting smaller rust lesions. However, the neural networks used in such methods come with high training costs, difficulties in hyperparameter tuning, high model complexity, and poor interpretability. Due to minimal inter-class distinctions and significant intra-class variations among pests, segmentation results for adhesive pests may still exhibit instances of under-segmentation and mis-segmentation. Moreover, utilizing the same accurate neural network for adhesive pest segmentation followed by recognition yields limited improvement in pest identification accuracy.

Currently, segmentation methods based on two-dimensional color images have reached a high level of maturity in research. However, in recent years, the transformation of 2D images into 3D data and the utilization of point cloud-based segmentation methods have garnered increasing attention in segmentation tasks. ITAKURA et al. [10] reconstructed three-dimensional point clouds of leaves using complete and dense depth information of the leaves and performed segmentation of each leaf using the watershed method. The Fast Euclidean Clustering (FEC) algorithm was introduced for efficient point cloud segmentation [11]. It classifies points individually, eliminating the need for nested loop traversal. The Taylor Gaussian mixture model (GMM) network known as TGNet was created [12]. Its purpose is the classification and segmentation of point cloud data by learning local geometric features and representation compositions. An algorithm based on the adaptive polar grid Gaussian mixture model was introduced for the segmentation of roadside LiDAR data [13]. This method is designed to segment both the foreground and background and to cluster vehicles and pedestrians, thereby reducing computational complexity and processing time.

Simultaneously, the separability of data is equally crucial in data segmentation tasks. Transforming segmented data into point cloud data offers advantages due to the unique characteristics of point cloud data, including three-dimensional information, spatial density, and curvature, in contrast with image data. This makes it particularly suitable for image tasks requiring more precise object segmentation and reconstruction. The continuous progress and refinement of point cloud reconstruction techniques play a vital role in mapping 2D image data into three-dimensional point cloud space for more accurate pest identification and segmentation. Multispectral images can generate multispectral point clouds using both spatial and spectral information. Excellent multispectral image reconstruction performance is achieved with reflectance correction, band alignment, feature extraction, and feature matching algorithms guided by NDVI [14]. Tong proposed a deep learning-based 3D reconstruction network, which overcomes the limitations of traditional methods in complex background images by encoding, decoding, and densifying operations with the most similar input images and retrieved point clouds [15]. It exhibits outstanding 3D reconstruction performance but requires significant computational resources and large-scale data and faces challenges related to model complexity, robustness, privacy, and security.

This research addresses the segmentation of adhesive pests, proposing an image segmentation method based on GMM-DC. The approach uses point cloud reconstruction techniques to map samples into high-dimensional space, enhancing spatial structural features, including spatial density and curvature, to improve data separability. Subsequently, based on the spatial density and curvature features of the point cloud, a GMM-DC segmentation model is constructed to segment adhesive pests. In the task of pest classification and recognition, the introduction of the residual network ResNeXt and the CBAM module to Mask R-CNN enhances the model’s feature extraction and classification recognition capabilities. Additionally, in conjunction with GMM-DC, the segmentation-first and recognition-later classification mode is used for adhesive pest classification and recognition.

2. Data Collection and Processing

2.1. Image Acquisition

The field pest image acquisition device is shown in Figure 1. Its overall structure mainly consists of a lure lamp, an industrial camera, a pest-killing unit, and a receptacle plate. During operation, the lure lamp and sex attractant attract pests, which then enter the box through the insect-receiving funnel. Under the action of the pest-killing unit, the pests settle on the receptacle plate. An industrial camera is used to capture images of the trapped pests. These captured pests are typically at the adult stage, exhibiting distinct shapes and colors that provide essential information for identification and classification. The industrial camera used is an MV-CE120-10GC CMOS camera, with technical specifications shown in Table 1.

To ensure the diversity of collected samples for the experiment, pest monitoring equipment was distributed across 18 test sites within Shandong Province. The collected pest images from these test sites are shown in Figure 2.

2.2. Background Separation Processing

Background impurities in images are interference factors. These factors typically encompass a variety of impurities, such as shed scales from insects and external dust, while also including glare and shadows. As shown in Figure 3, shed scales and external dust may lead to the blurring or disappearance of target edges and textures. Phenomena like glare and shadows can give rise to artifacts and misalignment. This, in turn, results in the image resembling the actual target.

In this research, the background separation of pest images was conducted within the HSV color model [16,17]. The image’s hue (H) value was adjusted to H = −145 to disperse the HSV thresholds between the foreground and background. Subsequently, an inversion process was applied to attenuate the threshold crossover points, differentiating between the foreground and background. Simultaneously, 150 pest images were selected as sample images, and they underwent hue adjustment and inversion processing individually. Manual extraction of the foreground and background regions was carried out to calculate their respective HSV channel values and histograms, as depicted in Figure 3.

As indicated by the histogram in Figure 3, it can be observed that following hue adjustment and inversion processing of the images, the threshold crossover points between the foreground and background are attenuated. Consequently, by establishing HSV thresholds, foreground–background segmentation is achieved. This leads to the generation of the target mask, as depicted in Figure 4. After conducting multiple sets of experiments, the threshold ranges for background separation were determined to be H [50, 360], S [12, 360], and V [110, 360]. Finally, the target contour search function was used, with a specified contour area threshold St₀ used to filter target contours. The definition of St₀ is as follows:

S t_{0} = \frac{1}{3} S_{\min}

(1)

where S_min represents minimum pest body area. Finally, all target contours meeting the condition St_i > St₀ were extracted. Subsequently, the bitwise_or algorithm was applied to process the contour-labeled regions with the pest image, yielding the background-separated pest image, as shown in Figure 4.

2.3. Analysis of Pest Image Adhesion Regions

2.3.1. Shape Factor

This research analyzes the method for determining the adhesion region of a pest image. The method is based on shape parameters and is determined according to the structural characteristics of pests. As shown in Figure 5, several major pests in orchards, such as mugwort looper, ostrinia furnacalis, and cotton bollworm, have their outer contours inscribed within a triangle. Additionally, the triangle formed by connecting the head and the roots of their two wings is also inscribed within the contour of the pest.

Therefore, a shape factor for determining the adhesion region of pests and describing the complexity of the target boundary is constructed. It is based on the morphological characteristics of pests and follows the principle that the boundary contour of adhering pests is more intricate than that of individual pests. The shape factor Pd is defined as follows:

P d = \frac{36 S}{\sqrt{3} C^{2}}

(2)

where S represents the connected region area in pixels and C represents the connected region boundary perimeter in pixels. It should be noted that in the areas where pests adhere and overlap, there may be some external disturbances leading to incompleteness and holes in the middle of the pest image. Therefore, the term ‘region boundary perimeter’ used here specifically refers to the outer perimeter.

2.3.2. Adhesion Determination Analysis

To validate the feasibility and effectiveness of the shape factor Pd, experiments were conducted in order to identify adhesive regions within pest images. Additionally, the impact of the shape factor Pd on the determination of adhesive regions and the corresponding group structures within the images was also analyzed.

This research randomly selected 100 pest images and conducted separate counts of the number of adhering pest targets and their corresponding group structures, as shown in Figure 6. From the figure, it can be observed that the occurrence frequency of 2 adhesive pests and 3 adhesive pests is relatively high, accounting for 47.52% and 31.82%, respectively. The data are concentrated around the mean, indicating that adhering targets consisting of 2 or 3 pests are commonly found in the images. In comparison, the occurrence frequency of 4 adhesive pests and 5 adhesive pests is relatively low, accounting for 15.29% and 5.37%, respectively. The data are primarily distributed toward the edges of the normal curve. This distribution pattern suggests that the chances of observing 4 or 5 pests adhering together in the images are relatively rare. Such instances constitute a minority among the adhering targets.

Therefore, the adhesion regions of 2 to 5 pests in the pest images were manually extracted, and the shape factor Pd was introduced as the criterion for determination. The shape factor values of adhering pests with different group structures were separately calculated, resulting in 100 sets of shape factor values for each group structure. The mean, standard deviation, maximum, and minimum values of these shape factor values were calculated. The shape factor values are shown in Figure 7, and the specific statistical data are shown in Table 2.

According to the data in Table 2, it can be observed that with an increase in the number of pests in the adhesion region, the mean and standard deviation of the shape factor gradually decrease. This indicates that the complexity of the adhering targets’ boundary contour is enhanced, while the boundary contour of the targets tends to stabilize. Adhering pest targets with different group structures exhibit significant differences in the maximum and minimum values of the shape factor. Specifically, as the number of adhering pests decreases in the adhesion region, the discrepancy between the corresponding maximum and minimum values of the shape factor increases. This increase indicates a more pronounced variability.

On the other hand, the statistical data for individual pests show variability mainly caused by morphological differences within non-interbreeding pest species. Figure 7 displays 100 sets of shape factor values for adhering pests based on different group structures. This indicates a certain degree of discriminability among their corresponding shape factors.

After conducting multiple sets of comparative experiments, the threshold boundaries for the shape factors of adhesive pests were ultimately determined. In the pest images, when a certain region satisfies Pd > 0.90, it indicates the absence of adhering pests; when a region satisfies Pd ∈ [0.60,0.90], it indicates the presence of 2 adhesive pests; when a region satisfies Pd ∈ (0.43,0.60], it indicates the presence of 3 adhesive pests; when a region satisfies Pd ∈ (0.37,0.43], it indicates the presence of 4 adhesive pests; and when a region satisfies Pd ∈ [0.31,0.37], it indicates the presence of 5 adhesive pests. The results are shown in Figure 8.

3. Image Segmentation Based on GMM-DC for Pest Adhesion

3.1. Construction of the GMM-DC Segmentation Model

3.1.1. Gaussian Mixture Mode

The Gaussian mixture model (GMM) can be considered as a model composed of K individual Gaussian components. Let x_i represent the attribute value of the ith data point in the three-dimensional attribute space, where x_i belongs to the Gaussian mixture distribution P. The Gaussian mixture distribution P consists of K components, each representing a different membership class. Each membership class follows a Gaussian distribution, and the linear combination of these K components forms the probability density function of the Gaussian mixture model. The parameter set θ_k (μ_k,∑_k) determines the Gaussian mixture distribution [18,19]. It provides the parameter sets for all classes, which enables the calculation of the probability density function of each data point composed of different classes.

P (x) = \sum_{k = 1}^{K} φ_{k} N (x | θ_{k}), \sum_{k = 1}^{K} φ_{k} = 1, 0 \leq φ_{k} < 1

(3)

where φ_k represents the weight coefficient. It indicates the prior probability that a certain data point is generated by the k-th Gaussian distribution, i.e., the probability of the data point belonging to that specific distribution. Here, k represents the index of the Gaussian distribution. In the Gaussian distribution N (X|μ_k,∑_k), μ_k represents the mean of the k-th Gaussian distribution, and ∑_k represents the covariance matrix. The probability density function is expressed as follows:

N (X | μ_{k}, Σ_{k}) = \frac{1}{{(2 π)}^{d / 2} | Σ_{k} |^{1 / 2}} \exp (- \frac{1}{2} {(x - μ_{k})}^{T} Σ_{k}^{- 1} (x - μ_{k}))

(4)

Upon observation, it is evident that the Gaussian mixture distribution model is entirely determined by the parameters φ_k, μ_k, and ∑_k. Therefore, to accomplish the task of data point allocation, it is essential to determine these parameter values. Let θ collectively denote the parameters φ_k, μ_k, and ∑_k, and let X represent the dataset. The likelihood function can then be expressed as follows:

P (x | θ) = \prod_{i = 1}^{N} P (x_{i} | θ)

(5)

To avoid potential issues of floating-point underflow during multiplication operations in computers, the likelihood function in the above equation is transformed into a summation form by taking its logarithm:

\log (P (x | θ)) = \sum_{k = 1}^{N} \log (P (x_{i} | θ)) = \sum_{k = 1}^{N} \log (\sum_{k = 1}^{K} φ_{k} N (X | μ_{k}, Σ_{k}))

(6)

3.1.2. Density-Based Spatial Neighborhood Information Function

In the three-dimensional attribute space, the spatial density between each data point and its neighboring points reflects the spatial distribution characteristics of the data points [20,21]. This includes attributes like clustering, trend, and boundary features, thereby describing the spatial correlations among the data points. Data points with closer spatial densities are more likely to possess similar attribute values. Therefore, in spatial clustering with Gaussian mixture models, it is advisable to use spatial density constraints for optimizing the clustering algorithm.

In the three-dimensional attribute space, data points are denoted by “p_i”, and k data points are defined as neighboring data points within their proximity as shown in Figure 9.

In the neighborhood space, the maximum distance from point p to the k nearest neighbors is calculated to measure the spatial density of point p. This calculation is used to construct the spatial neighborhood density information function. The formula for calculating the maximum spatial neighbor distance is as follows:

p \cdot d_{k - N} = \max (D i s t (p, p_{i}) : \forall p_{i} \in N N_{k} (p))

(7)

where p·d_k-N represents the maximum distance between the data points and the point p in the neighborhood space. Dist represents the Euclidean geometric distance, and NN_k (p) represents the set of neighboring data points in the neighborhood space of point p.

Spatial density can be represented by the ratio of the number of spatial neighbors (k) to p-point’s p·d_k-N, and the spatial density expression is as follows:

D e n s i t y_{p} = \frac{k}{d_{k - N}}

(8)

When the spatial neighbor count (k) is specified, as d_k-N increases, Density_p decreases, and as d_k-N decreases, Density_p increases. This consistency with spatial density in computational geometry is entirely applicable to the calculation and partitioning of point spatial density in three-dimensional space.

D v p (p, q) = \frac{| p \cdot d_{k - N} - q \cdot d_{k - N} |}{(p \cdot d_{k - N} + q \cdot d_{k - N}) / 2} = 2 \frac{| p \cdot d_{k - N} - q \cdot d_{k - N} |}{(p \cdot d_{k - N} + q \cdot d_{k - N})}

(9)

The clustering and partitioning of three-dimensional data points in space are achieved utilizing the rate of distance change. Given a non-negative threshold ε, when Dvp (p,q) ≤ ε, it indicates that points p and q possess similar local densities, increasing the likelihood that data points belong to the same category. Neighborhood data points of this type are referred to as neighborhood homogenous data points, denoted as Density_p = Density_q.

Conversely, when Dvp (p,q) > ε, it indicates that points p and q have dissimilar local densities, decreasing the probability that data points belong to the same category. Neighborhood data points of this kind are termed neighborhood heterogeneous data points, represented as Density_p ≠ Density_q.

The spatial neighborhood density information function is constructed using the aforementioned spatial density calculation formula:

ψ_{i k} = m i d d l e {γ (x, k)}, x \in N B (x i)

(10)

where NB (x_i) represents the set of neighboring spatial points for the central data point x_i, γ (x,k) represents the probability that the spatial point x belongs to the k-th class, and middle (·) represents the median value of the probability distribution for x.

3.1.3. Curvature-Based Spatial Neighborhood Information Function

In the three-dimensional attribute space, the curvature of each data point with respect to its neighboring points reflects local spatial shape variations and characteristics of the data point. Consequently, this description captures the spatial correlations among individual data points. Data points that have closer spatial curvatures are more likely to possess similar attribute values. Therefore, when applying Gaussian mixture models for spatial clustering, it is advisable to incorporate relevant spatial curvature constraints to optimize the clustering algorithm [22,23,24].

Within the neighborhood space centered at p_i, a quadratic surface is fitted using the least squares regression method.

f_{i} = a x^{2} + b y^{2} + c x y + d x + e y + f

(11)

where x and y represent the coordinates of p_i relative to point p_j in the local coordinate system, and a, b, c, d, e, and f are the parameters to be fitted. The fitting of the surface can be solved using the Singular Value Decomposition (SVD) method. Once the surface calculations for each data point’s neighborhood are completed, the curvature curve for point p_i is calculated using the following formula:

k_{i} (t) = \frac{{(1 - t)}^{2} k_{i} + 2 t (1 - t) k_{\min} + t^{2} k_{\max}}{{(1 - t)}^{2} + 2 t (1 - t) + t^{2}}

(12)

where k_i represents the mean curvature of point p_i, while k_min and k_max represent the minimum and maximum curvature values within the neighborhood. The parameter ‘t’ represents the position on the curvature curve, with a range of values between 0 and 1. The curvature curve illustrates the variations in curvature within the local neighborhood around point p_i. It can be used to describe the geometric shape features in the vicinity of this point.

Taking the k_i value of p_i as the center point of the curvature curve, a specific curvature range [μ − σ, μ + σ] is chosen, and the curvature variation rate is calculated using the following formula:

△ k_{i} = \frac{1}{N} △ k_{i} = \frac{1}{N} \sum_{j = 1}^{N} |k_{j} - k_{i}|

(13)

where N represents the number of points within the curvature range, and k_j represents the curvature of the j-th point. If ∆k_i is smaller, it indicates that p_i and its neighboring points are more similar in terms of curvature, thereby increasing the probability of belonging to the same class.

The clustering and partitioning of three-dimensional spatial data points are achieved using curvature variation rates. Given a non-negative threshold ε, when ∆k_i ≤ ε, it indicates that the curvature of points p_i and p_j is close, thereby increasing the probability that data points belong to the same class. Neighborhood data points of this type are referred to as neighborhood homogenous data points.

Conversely, when ∆k_i > ε, it indicates that the curvature difference between points p_i and p_j is significant, reducing the probability of data points belonging to the same class. Neighborhood data points of this type are termed neighborhood heterogeneous data points.

The spatial neighborhood curvature information function is constructed using the aforementioned spatial curvature calculation formula:

η_{i k} = m i d d l e {γ (x, k)}, x \in N B (x i)

(14)

where NB (x_i) represents the set of neighboring spatial points for the central data point x_i, γ (x,k) represents the probability that spatial point x belongs to the k-th class, and middle (·) represents the median value of the probability distribution for x.

3.1.4. Density-Curvature Weighted Gaussian Mixture Model

When performing clustering analysis on data points in the three-dimensional attribute space, it is essential to consider the continuity and spatial autocorrelation in the data point distribution in the attribute field. The homogenous data points in the neighborhood space not only exhibit correlations in their attributes but also demonstrate spatial associations. As a result, the probability of a particular data point belonging to the k-th class is influenced by the probabilities of the homogenous data points in its neighborhood space belonging to the k-th class.

Therefore, this research incorporates the relationship between spatial density and curvature of data points into the clustering analysis of the Gaussian mixture model (GMM). By combining the constructed spatial neighborhood density and curvature information function with a GMM, this research introduces a novel segmentation model based on spatial neighborhood density and curvature information weighting. It is named GMM-DC for calculating the posterior probability of a data point belonging to the k-th class.

γ^{A S I} (k | x_{i}, θ) = \frac{φ_{k} (a ψ_{i k} + b η_{i k}) N (x_{i} | μ_{k}, Σ_{k})}{Σ_{j = 1}^{K} φ_{j} ψ_{i j} N (x_{i} | μ_{j}, Σ_{k})}

(15)

The principles and derivation process of the EM algorithm determine the posterior probability formula designed in this research. This formula calculates the probability γ^ASI (k|x_i, θ) that the i-th data point belongs to the k-th class. The probability formula must satisfy the criteria of normalization and spatial autocorrelation.

(1): Adhering to the normalization criterion, Equation (16) can be obtained by summing over the entire space:

\sum_{i = 1}^{K} γ^{A S I} (k | x_{i}, θ) = \sum_{i = 1}^{K} \frac{φ_{k} (a ψ_{i k} + b η_{i k}) N (x_{i} | μ_{k}, Σ_{k})}{Σ_{j = 1}^{K} φ_{j} ψ_{i j} N (x_{i} | μ_{j}, Σ_{k})} = 1

(16)

(2): Adhering to the spatial autocorrelation criterion, the probability that a data point belongs to the k-th class in the spatial domain is influenced by its homogenous neighborhood data points. The class membership probability of the current data point is positively correlated with the class membership probabilities of its homogenous neighborhood data points. In other words, the higher the probability that homogenous neighborhood data points belong to the k-th class, the higher the probability that the current data point also belongs to the k-th class, and vice versa. Consequently, it can be demonstrated that the posterior probability γ^ASI (k|x_i,θ), which integrates the weighted spatial density and curvature information, monotonically increases. This increase is with respect to the spatial neighborhood density information function Ψ_ik and the curvature information function η_ik. The monotonic increase in the posterior probability γ^ASI (k|x_i,θ) is verified by taking derivatives with respect to Ψ_ik and η_ik, resulting in the following expressions:

\{\begin{cases} \frac{d γ^{A S I} (k | x_{i}, θ)}{d ψ_{i k}} = \frac{a φ_{k} (a + b η_{i k}) N (x_{i} | μ_{k}, Σ_{k}) Σ_{j = 1}^{K} φ_{j} (a ψ_{i j} + b η_{i j}) N (x_{j} | μ_{j}, Σ_{k})}{{(Σ_{j = 1}^{K} φ_{j} (a ψ_{i j} + b η_{i j}) N (x_{j} | μ_{j}, Σ_{k}))}^{2}} > 0 \\ \frac{d γ^{A S I} (k | x_{i}, θ)}{d η_{i k}} = \frac{a φ_{k} (a ψ_{i k} + b) N (x_{i} | μ_{k}, Σ_{k}) Σ_{j = 1}^{K} φ_{j} (a ψ_{i j} + b η_{i j}) N (x_{j} | μ_{j}, Σ_{k})}{{(Σ_{j = 1}^{K} φ_{j} (a ψ_{i j} + b η_{i j}) N (x_{j} | μ_{j}, Σ_{k}))}^{2}} > 0 \end{cases}

(17)

The values of the derivative formulas are consistently greater than 0. Thus, this substantiates that the weighted probability formula γ^ASI (k|x_i,θ), which integrates the neighborhood spatial density and curvature information, monotonically increases with respect to Ψ_ik and η_ik.

In conclusion, the weighted probabilities devised in this research adhere thoroughly to both of these criteria. The improved algorithm incorporates these weighted probabilities. This incorporation ensures that during each iterative calculation, the class membership probabilities of the current data point are influenced. The influence is by the spatial density and curvature information of homogenous neighborhood data points. This incorporation effectively introduces spatial neighborhood density and curvature constraints into the posterior probabilities, thereby enabling a weighted representation. Consequently, the clustering process is subject to the combined constraints of spatial and non-spatial information.

3.1.5. EM Solution Algorithm for GMM-DC

The iterative process of solving GMM using the EM algorithm consists of two parts: the E-step (expectation step) and the M-step (maximization of expectations step). In this research, an approach is introduced where spatial density and curvature information are incorporated into the EM algorithm [25]. This incorporation ensures that both the E-step and M-step iterations are constrained by spatial density and curvature information. This constraint leads to an improvement in parameter estimation for the logarithmic likelihood function.

(1): The E-step of the EM algorithm (expectation step) involves iterative calculations that rely on the parameter set θk and the prior probabilities Pk of the Gaussian mixture model. In each iteration, the calculated posterior probabilities of three-dimensional spatial data points belonging to the k-th Gaussian distribution are subjected to spatial constraints, resulting in weighted posterior probabilities:

γ^{A S I} (k | x_{i}, θ^{(t)}) = \frac{φ_{k}^{(t)} {(a ψ_{i k} + b η_{i k})}^{(t)} N (x_{i} | θ_{k}^{(t)})}{Σ_{j = 1}^{K} φ_{j}^{(t)} {(a ψ_{i k} + b η_{i k})}^{(t)} N (x_{i} | θ_{j}^{(t)})}

(18)

(2): The M-step (maximization of expectations step) of the EM algorithm culminates in the iterative calculation of parameters for the Gaussian mixture model (GMM). The calculation enables the clustering partition of three-dimensional spatial data points. Through t + 1 iterations, the M-step uses maximum likelihood estimation to iteratively compute and update parameter values, which are obtained as the final model parameters:

μ_{k}^{(t + 1)} = \frac{1}{N_{k}} \sum_{i = 1}^{N} γ^{A S I} (k | x_{i}, θ^{(t)}) x_{i}

(19)

\sum_{K}^{(t + 1)} = \frac{1}{N_{k}} \sum_{i = 1}^{N} γ^{A S I} (k | x_{i}, θ^{(t)}) x_{i} (x_{i} - μ_{k}^{(t + 1)}) {(x_{i} - μ_{k}^{(t + 1)})}^{T}

(20)

φ_{k}^{(t + 1)} = \frac{N_{k}}{N}

(21)

A threshold ε (sufficiently small) is set as the convergence criterion. By comparing the GMM clustering results between the (t + 1)-th iteration and the t-th iteration, if the error is less than ε, the process terminates. This termination indicates that the algorithm’s calculation has satisfied the convergence condition. Conversely, if the error is greater than ε, the algorithm returns to the EM algorithm and continues the calculation. The process continues until convergence is achieved, upon which the algorithm concludes.

The improved EM algorithm was used to estimate the Gaussian mixture model (GMM). This enabled clustering analysis of three-dimensional spatial data points based on the correlation among attribute values, spatial density, and spatial curvature. In other words, the clustering analysis of data points in three-dimensional space was achieved with the weighted posterior probability values designed in this research.

3.2. Reconstruction and Analysis of Pest Adhesion Region Point Clouds

A point cloud is a geometric data structure consisting of a collection of points in three-dimensional space. It provides richer three-dimensional information, including spatial position, shape, and size, and can be used to represent the spatial information of objects. In this research, the adhesion regions in pest images serve as the source for pest three-dimensional reconstruction, with some sample images shown in Figure 10.

The two-dimensional pixel coordinates, denoted as x and y, from the image are utilized as the X and Y coordinates for the pest’s three-dimensional reconstruction. Additionally, a grayscale value g, obtained with an efficient edge detection-based color image grayscale algorithm, serves as the depth information for the three-dimensional reconstruction [26]. The fundamental principle of the grayscale algorithm is as follows:

A linear combination of color channels is utilized to express the grayscale output value:

g = w_{r} I_{r} + w_{g} I_{g} + w_{b} I_{b}

(22)

where I_r, _Ig, and I_b represent the values of the red (R), green (G), and blue (B) channels of the input image, respectively. The parameters w_r, w_g, and w_b represent the optimization parameters associated with the red, green, and blue channels. The parameter set w, defined as, w = {w_r, w_g, w_b} is referred to as the weight space.

Subsequently, a positive constraint and an energy conservation constraint are further imposed on the weights. The two constraints are defined as follows:

\{\begin{cases} w_{r} \geq 0 \\ w_{g} \geq 0 \\ w_{b} \geq 0 \end{cases}

(23)

w_{r} + w_{g} + w_{b} = 1

(24)

Finally, a pixel difference network is used for robust and accurate edge detection. The optimal grayscale value g corresponding to the optimal weights is used as the depth information for three-dimensional reconstruction. This approach is applied to reconstruct the three-dimensional structure of pest adhesion images, as shown in Figure 11.

To investigate the intrinsic similarities and interspecies uniqueness in morphology and structure of the reconstructed pest point cloud data, point cloud information for different pests such as the cotton bollworm, fall armyworm, mugwort looper, and peach pyralid moth was collected. Spatial density and spatial curvature were calculated for each of these pests, and their corresponding spatial density histograms and spatial curvature heatmaps were constructed. These were used for comparative analysis of their geometric morphological features. Some sample images are shown in Figure 12.

The spatial density histograms in Figure 12 reveal that the point cloud data for the cotton bollworm are denser. In contrast, for the mugwort looper, there are more distinct boundaries between clustered and sparse regions. On the other hand, the point clouds for the fall armyworm and peach pyralid moth exhibit an intermingled distribution of clustered and sparse regions. These observations indicate significant differences among different pests in terms of spatial density and spatial distribution uniformity.

In the spatial curvature heatmaps shown in Figure 12, different colors represent regions with different curvature values. It can be observed that the surface morphology of the point clouds for the mugwort looper and the cotton bollworm is smoother. In contrast, the point cloud surfaces for the fall armyworm and the peach pyralid moth exhibit regular undulations. This suggests that the point cloud data for different pests exhibit varying surface morphology changes, with notable differences in flat regions, boundary areas, and transitional zones.

3.3. Point Cloud Training with Anisotropic Clustering Segmentation

To achieve precise segmentation of adhesive point clouds of pests for the extraction of individual pests, this study first conducts point cloud training on the GMM-DC segmentation model. This training process is aimed at adjusting the model component weights to accommodate various types of point cloud data distributions. The training procedure is described as follows:

The mean vectors μ, covariance matrices ∑, and mixture coefficients φ for each Gaussian component of the model are randomly initialized. The number of clusters K is specified to partition the point cloud into K clusters. In the three-dimensional attribute field, each data point in the point cloud is denoted by p_i. The posterior probability of each data point p_i belonging to each Gaussian distribution k is calculated.

P (k | x_{i}, θ^{(t)}) = \frac{φ_{k}^{(t)} {(a ψ_{i k} + b η_{i k})}^{(t)} N (x_{i} | θ_{k}^{(t)})}{Σ_{j = 1}^{K} φ_{j}^{(t)} {(a ψ_{i k} + b η_{i k})}^{(t)} N (x_{i} | θ_{j}^{(t)})}

(25)

where φ represents the mixing coefficient and N represents the probability density function of the multivariate Gaussian distribution. The GMM-DC parameters are updated based on the derived posterior probability formula (Equation (29)), and the parameter expressions are as follows:

μ_{k}^{} = \frac{1}{N_{k}} \sum_{i = 1}^{N} P (k | x_{i}, θ^{(t)}) (k | x_{i}, θ^{(t)}) x_{i}

(26)

where N_k represents the number of points belonging to the k-th Gaussian component.

Subsequently, the posterior probabilities of data point p_i belonging to each Gaussian distribution k are computed alternately, and the parameters of the GMM-DC model are updated. This iterative process is repeated until the parameters converge or reach the maximum iteration limit. This completion signifies the end of point cloud training and the optimization of component weights. The training procedure is shown in Figure 13.

Simultaneously, it should be noted that the point cloud data exhibit a clustering structure in the horizontal direction. However, they lack clear distribution characteristics in the vertical direction, as shown in Figure 14. The GMM-DC segmentation model uses anisotropic clustering to adjust the clustering effects in various directions [27]. This adjustment aims to control the performance of clustering results in different dimensions and further optimize the clustering effect of point cloud data.

Next, the covariance matrix of data point p_i in the spatial neighborhood is calculated. The covariance matrix describes the extent of variation in the data across its dimensions and is defined as follows:

C = \frac{1}{N_{k}} \sum ((p_{i} - μ) {(p_{i} - u)}^{T})

(27)

where p_i represents the data point of the point cloud, μ represents the mean of the data, and N represents the specified number of data points within the neighborhood space.

Utilizing a diagonal matrix to scale the variance in the corresponding dimension within the covariance matrix can enhance or weaken the clustering effect in that specific dimension. To enhance the clustering effect in the X- and Y-axis directions, the variances along the X- and Y-axes are amplified. Conversely, to reduce the clustering effect in the Z-axis direction, the variance along the Z-axis is diminished.

C^{'} = d i a g ([S_{x}, S_{y}, S_{z}]) C d i a g ([S_{x}, S_{y}, S_{z}])

(28)

where diag ([S_x, S_y, S_z]) represents a diagonal matrix. “S_x”, “S_y”, and “S_z” are the scaling factors for the corresponding dimensions. These factors are distributed along the diagonal of the matrix.

The application of anisotropic clustering aids in better detecting and extracting pest populations present in point cloud data. This is particularly beneficial for those pest populations exhibiting pronounced clustering characteristics in the horizontal direction. This further enhances the accuracy and applicability of the GMM-DC segmentation model.

4. Experimental Results and Discussion

4.1. Experimental Design

To evaluate the accuracy and applicability of the method proposed in this research for pest segmentation and recognition tasks, adhesion images were selected from pest collection images for experimentation. Initially, an orthogonal experimental design was conducted involving three factors: the spatial neighborhood size (k), the density weighting coefficient (a), and the number of adhesive pests (k_pd). With the optimization and configuration of these three factors, their impact on model performance was analyzed to identify the optimal parameter combination. Furthermore, comparative experiments were conducted between the proposed method and other traditional segmentation methods to validate its superiority in pest segmentation. Finally, comparative experiments were carried out between the improved Mask R-CNN embedded with the proposed method and the original Mask R-CNN. By comparing the recognition performance of the two models, the improvement achieved with the proposed method in adhesive pest recognition was assessed.

4.2. Optimization of GMM-DC Model Parameters

The optimization of the GMM-DC model involves crucial parameters. These parameters are namely, the spatial neighborhood size (k), weighting coefficient (a, with a + b = 1), and the number of adherent pests (k_pd). To enhance the model’s performance, accuracy, and robustness with optimal parameter configuration, an orthogonal experimental method was used. This method is based on the Box–Behnken Design (BBD) principle.

In this experimental design, the average accurate segmentation rate (ACC) was used as the assessment criterion, and the factors were encoded, as shown in Table 3.

The experimental design was devised following the Response Surface Methodology within Design-Expert 8.0.6 Trial software. A total of 17 experimental points were selected, utilizing 200 pest adhesion image regions as the testing subjects. The average correct segmentation rate (ACC) was used as the evaluative metric. The detailed experimental design scheme and corresponding response values are shown in Table 4. Notably, in the table, x₁, x₂, and x₃ represent the encoded factor values.

Multivariate regression analysis was conducted on the experimental data acquired from Table 4 and Table 5. The optimal segmentation regression equation was derived as follows:

\begin{matrix} Y = 92.13 - 5.40 x_{1} - 1.50 x_{2} - 0.76 x_{3} + 0.67 x_{1} x_{2} - 0.29 x_{1} x_{3} \\ - 0.34 x_{2} x_{3} - 0.89 x_{1}^{2} - 3.25 x_{2}^{2} - 0.57 x_{3}^{2} \end{matrix}

(29)

where the absolute values of the coefficients of each factor represent their respective influence on the predictive outcome of the model. As seen in Equation (29), the order of influence on the accurate segmentation rate Y, from greatest to least, is x₁, x₂, and x₃.

From Table 5, it can be observed that the response model’s p-value is less than 0.001, indicating that the regression model is highly significant. The lack-of-fit terms are all above 0.05, and the coefficient of determination R² = 0.9830 suggests a high level of variance explained and a well-fitted model with low error and no significant lack of fit. Thus, this regression model can be used to predict experimental results and perform analyses based on the predicted results.

Based on the regression model, a response surface is constructed to illustrate the relationship between model segmentation accuracy and the three factors. The shape of the response surface reflects the strength of interactions among factors. This aids in optimizing parameter configuration to achieve the optimal model segmentation accuracy.

As shown in Figure 15, the model’s accurate segmentation rate exhibits an inverse relationship with the spatial neighborhood size (k), increasing as the k value decreases. Considering the model’s processing speed, the minimum spatial neighborhood size (k_min) was set to 30 to reduce computational complexity and enhance the model processing speed. The correct segmentation rate does not show a clear linear relationship with the weighting function coefficient (a).

Instead, it exhibits a peak, which is attributed to the varying importance of the weighted information on spatial density and spatial curvature information at different values of “a”. When a = 0.57 (corresponding to spatial curvature weighting information b = 0.43), the model achieves its peak accuracy. The influence of the number of adhesive pests (k_pd) on the model’s segmentation accuracy is not significant. For the model’s segmentation accuracy, kpd is not a major influencing factor and has a limited impact on the segmentation results.

4.3. Segmentation Effects of Different Methods on Adhesive Pests

To further validate the superiority of the GMM-DC based on Pd cluster count definition in pest adhesion image segmentation, 200 pest adhesion images were used as segmentation samples. The pest adhesion images were segmented using four different methods: the approach presented in this research, GMM-based on Pd cluster count definition, BIC-based GMM, and the watershed method. The results are shown in Figure 16.

From Figure 16, it can be observed that the proposed method accurately segments the adhesion point clouds of pests, thereby obtaining individual pest masks. In contrast, the segmentation results obtained using GMM-Pd, GMM-BIC, and the watershed method exhibit a significant amount of over-segmentation and under-segmentation. To quantitatively evaluate the segmentation performance of adhesion point clouds, three metrics, namely, the accurate segmentation rate (ACC), over-segmentation rate (OVER), and under-segmentation rate (UNDER), are introduced to assess the segmentation results.

A C C = \frac{N u m_{A C C}}{N u m_{A L L}} \times 100 %

(30)

O V E R = \frac{N u m_{O V E R}}{N u m_{A L L}} \times 100 %

(31)

U N D E R = \frac{N u m_{U N D E R}}{N u m_{A L L}} \times 100 %

(32)

where Num_ALL represents the total number of pests, Num_ACC represents the number of correctly segmented pests, Num_OVER represents the number of over-segmented pests, and Num_UNDER represents the number of under-segmented pests. The accurate segmentation rate (ACC) metric reflects the ratio of correctly segmented pests to the total number of pests. A larger value indicates higher segmentation accuracy. The over-segmentation rate (OVER) and under-segmentation rate (UNDER) metrics represent the ratios of over-segmented and under-segmented pests to the total number of pests, respectively. A smaller value for these ratios indicates better segmentation accuracy.

The accurate segmentation rate (ACC), over-segmentation rate (OVER), and under-segmentation rate (UNDER) of the pest adhesion images obtained using different methods were individually calculated. These results are summarized in Table 6, along with their respective averages.

From the data in Table 6, it is evident that in terms of the accurate segmentation rate, the average accurate segmentation rate of the proposed method significantly surpasses that of GMM-Pd, GMM-BIC, and the watershed algorithm. The average accurate segmentation rate of the proposed method exceeds 94%. This displays fewer erroneous and invalid segmentations, making it notably more accurate compared with the other three methods.

Analyzing over-segmentation rates, the remaining three segmentation algorithms exhibit over 12 times the average over-segmentation rate of the proposed method. There are numerous over-segmentation errors in the segmentation results, including edge over-segmentation, texture over-segmentation, and aggregation over-segmentation.

Analyzing under-segmentation rates, the proposed method demonstrates a low average under-segmentation rate of only 1.42%. Under-segmentation is mainly caused by incorrect cluster number definitions. This prevents the model from capturing all pest instances within the image, resulting in under-segmentation. The intricate characteristics of adhesive pests significantly affect the accuracy of traditional segmentation methods. These characteristics encompass fuzzy and complex boundaries, as well as irregular changes in pest shapes. These factors contribute to under-segmentation. The descending order of under-segmentation rates is as follows: watershed algorithm, GMM-Pd, and GMM-BIC.

4.4. Improved Mask R-CNN for Pest Classification

4.4.1. Construction and Improvement in Mask R-CNN

The Mask R-CNN recognition model extends the architecture of Faster R-CNN. It uses a more accurate Region of Interest Align layer (RoI Align) for region feature extraction. Additionally, it introduces a mask branch based on Fully Convolutional Networks (FCNs) for pixel-level segmentation. The model architecture, as shown in Figure 17, consists of four main components: the backbone for feature extraction, the region proposal network (RPN), the RoI Align network for region alignment, and the object detection and segmentation module.

The improved Mask R-CNN introduces the ResNeXt residual network as a replacement for ResNet in the backbone feature extraction network. ResNeXt is an extended structure proposed on the basis of ResNet, using grouped convolutions within residual blocks [28], as shown in Figure 18. This approach uses layered convolutional operations to enhance network width and representational capability. It aims to balance network complexity and performance. As a result, it effectively captures and represents intricate relationships between various directions, scales, and features.

Simultaneously, to enhance the representation capability during the feature extraction stage, broaden the receptive field, and incorporate contextual information, an attention module is introduced. This module is known as the Convolutional Block Attention Module (CBAM) [29], as shown in Figure 19 and it serves as an enhancement module. The CBAM module consists primarily of a Channel Attention Module (CAM) and a Spatial Attention Module (SAM). These modules adaptively adjust the channel weights and spatial distribution of feature maps. This is achieved with element-wise multiplication with the channel attention map and pixel-wise addition. These operations facilitate feature map weighting and fusion. As a result, enriched feature representations are generated.

4.4.2. Model Testing Experiments and Segmented Image Recognition Experiments

In this research, the training and testing of the recognition model were performed on a computer equipped with an Intel (R) Xeon (R) Silver 4210R CPU running at 2.39 GHz, 64 GB of RAM, a GeForce RTX 2080Ti GPU, and the Windows 10 (64-bit) operating system. The system was further enhanced with an NVIDIA GeForce RTX 2080Ti GPU for acceleration. The recognition model in this research was deployed using the open-source frameworks TensorFlow (1.13.2) and Keras. The deployment involved configuring a Python 3.7 environment, CUDA 10 computing architecture, and cuDNN 7.4.1.5 acceleration library.

The model dataset consists of pest images collected from 18 experimental sites within Shandong Province. These images encompass a diverse range of pest species and quantities from various regions within the province. The pests were annotated using the polygon annotation tool in LabelMe, generating corresponding JSON label descriptions. These descriptions were then transformed into mask files, label visualization files, and label names files. The pest dataset was partitioned into training and testing sets in a 7:3 ratio, with 756 images for training and 324 images for testing. This division ensured ample data for model training and parameter optimization.

Under the same experimental conditions, the optimal model training parameters were established. This was performed by contrasting the performance of models with varying parameter configurations on the test set. The hyperparameter settings for the improved Mask R-CNN are shown in Table 7.

To quantitatively evaluate the model’s recognition performance, the mean Average Precision (mAP) is introduced to assess the recognition results. The mAP represents the average of the Average Precision (AP) for each class of pests, and AP represents the accuracy of recognizing individual pest categories. Each class’s AP is calculated by plotting the Precision–Recall (P-R) curve and calculating the area under the curve.

P = \frac{T P}{T P + F P} \times 100 %

(33)

R = \frac{T P}{T P + F N} \times 100 %

(34)

A P = \int_{0}^{1} P (R) d (R) \times 100 %

(35)

m A P = \frac{1}{N} \int_{0}^{1} P (R) d R \times 100 %

(36)

where TP represents the true positive samples where the model accurately identifies pests, FP represents the false positive samples where the model inaccurately identifies non-pests as pests, FN represents the false negative samples where the model fails to identify pests, TN represents the true negative samples where the model accurately predicts non-pests, and N represents the number of pest categories.

To validate the model performance of the improved Mask R-CNN for the recognition task, experiments were conducted on the proposed test dataset. The aim was to compare the recognition results between the improved Mask R-CNN and the original Mask R-CNN. The mean Average Precision of recognition (mAP) was separately calculated for each approach to quantitatively evaluate the experimental results. The results of these experiments are shown in Table 8.

According to the data in Table 8, the mean Average Precision of the improved Mask R-CNN described in this research reaches 96.41%. This value is 4.1% higher than the mean Average Precision of the original Mask R-CNN. The recognition performance of the improved Mask R-CNN is superior to the original model. This indicates that the proposed design of the improved Mask R-CNN for recognition tasks is well-founded and exhibits remarkable performance in pest recognition tasks.

The segmentation method proposed in this research was integrated into the improved Mask R-CNN framework, followed by an evaluation of its advantages in pest recognition performance. The original Mask R-CNN was used as the reference model for comparison. A dataset of 200 pest adhesion images was selected as sample images. These images were subjected to the segmentation process using the method proposed in this research, resulting in individual pest images for the experimental group. Simultaneously, the original pest adhesion images were retained for the control group. Both the improved Mask R-CNN and the original Mask R-CNN object detection algorithms were applied to recognize these images.

To quantitatively evaluate the recognition results of pest adhesion images processed using different recognition models, the mean Average Precision of the experimental and control groups was separately calculated. The recognition performance is shown in Figure 20, and the detection data are shown in Table 9.

According to the data in Table 9, it is evident that in the experimental group, the mean Average Precision of the improved Mask R-CNN with our embedded segmentation method reaches 96.75%. This is a 4.32% improvement over the original Mask R-CNN with our embedded method. Both models in the experimental group achieved a mean Average Precision of over 90%.

Conversely, in the control group results, where unprocessed pest adhesion images were input into both the improved Mask R-CNN and the original Mask R-CNN, their mean Average Precision was below 85%. Specifically, the mean Average Precision decreased by 12.18% and 13.01% compared with the experimental group’s results. These results indicate that in both control experiments, the recognition of adhesion pest images in the experimental group surpasses that of the control group. Moreover, the improved Mask R-CNN, with our embedded method, demonstrates significant advantages in recognition performance. This model shows the best pest recognition results.

The adhesion of pests leads to a fusion of their shape, texture, and color. This fusion increases as the level of adhesion deepens, causing a reduction in model recognition accuracy. Consequently, the recognition model struggles to precisely detect occluded pests in adhesion images. This struggle often results in the misidentification of multiple pests as one or even generating false positives. In contrast, within the experimental group, the recognition model initially segments pest adhesion images, transforming adhesive pests into individual pests. This transition from adhesive recognition to individual recognition simplifies the complexity of pest recognition, thereby improving recognition accuracy.

5. Conclusions

This research used point cloud reconstruction techniques to map samples into a high-dimensional space. This process enhances the spatial structural features of the data, including spatial density and spatial curvature. As a result, data separability is improved. The results indicate that the GMM-DC segmentation model constructed in this research demonstrates high accuracy and robustness. This model enables the accurate separation of adhesive pests in point clouds. In the task of pest classification and recognition, a significant improvement in the classification and recognition accuracy of adhesive pests was achieved by combining GMM-DC with an improved Mask R-CNN. This was accomplished by adopting a segmentation-first, recognition-later classification approach.

In our future research, we plan to further explore the field of pest image processing, specifically by developing pest image completion algorithms to enhance incomplete pest images after segmentation. This direction will contribute to advancing pest classification and recognition, making pest features more comprehensive and richer.

(1): To address the challenge of adhesive pests in apple orchard recognition, we propose a GMM-DC adhesion image clustering and segmentation method. The method utilizes the shape factor Pd to determine the adhesion regions and the quantity of adhesive pests, and thereby obtain the Pd-defined cluster number “k” for the model. Subsequently, the method combines pest color and spatial distribution information to reconstruct a point cloud of the adhesive pests, effectively capturing its adhesion characteristics. This reconstructed point cloud is then subjected to an anisotropic clustering segmentation process using the GMM-DC, resulting in the extraction of individual pests.
(2): The experimental analysis was conducted on the collected pest adhesion images. The results indicate that the GMM-DC achieves an average accurate segmentation rate of 95.75%. Additionally, it demonstrates average over-segmentation and under-segmentation rates of 2.83% and 1.42%, respectively. This segmentation approach outperforms traditional methods. Traditional methods exhibit average accurate segmentation rates below 50% and fail to accurately segment individual pests in adhesion images.
(3): To evaluate the advantages of embedding our segmentation method into the recognition model for adhesive pests, pest adhesion images segmented using our proposed method and those without segmentation were separately prepared. These segmented and unsegmented images were then input into both the original Mask R-CNN and the improved Mask R-CNN models for evaluation. The recognition results indicate that the mean Average Precision of adhesive pests in both the original and improved Mask R-CNN models, which have integrated our method, are 92.43% and 96.75%, respectively. This signifies an increase in mean Average Precision by 13.01% and 12.18%, respectively. These improvements effectively elevate the accuracy of adhesive pest recognition. This offers a theoretical and methodological foundation for precise orchard pest identification.

Author Contributions

Conceptualization, Y.W. and S.L.; methodology, Y.W. and L.S.; validation, B.M.; formal analysis, Z.R. and Y.W.; investigation, J.M.; resources, L.S.; writing—original draft, Y.W.; writing—review and editing, L.S. and S.L.; visualization, S.L.; supervision, H.Z.; project administration, S.L. and H.Z.; funding acquisition, J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the National Natural Science Foundation of China (32071908) and the China Agriculture Research System of MOF and MARA (CARS-27).

Data Availability Statement

All the data mentioned in the paper are available from the corresponding author.

Acknowledgments

The authors would like to acknowledge the valuable comments by the editors and reviewers, which have greatly improved the quality of this work.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yao, Q.; Feng, J.; Tang, J.; Xu, W.G.; Zhu, X.H.; Yang, B.J.; Wang, L.J. Development of an automatic monitoring system for rice light-trap pests based on machine vision. J. Integr. Agric. 2020, 19, 2500–2513. [Google Scholar] [CrossRef]
Liu, H.Q.; Zhang, W.J.; Wang, F.S.; Sun, X.H.; Wang, J.H.; Wang, C.; Wang, X.X. Application of an improved watershed algorithm based on distance map reconstruction in bean image segmentation. Heliyon 2023, 9, e15097. [Google Scholar] [CrossRef] [PubMed]
Peng, H.; Wu, P.F.; Zhai, R.F.; Liu, S.M.; Wu, L.L.; Jing, X. Image Segmentation Algorithm for Overlapping Fruits Based on Disparity Map. Trans. Chin. Soc. Agric. Mach. 2012, 43, 167–173. [Google Scholar]
Zhang, J.H.; Han, S.Q.; Zhai, Z.F.; Kong, F.T.; Feng, X.; Wu, J.Z. Improved adaptive watershed method for segmentation of cotton leaf adhesion lesions. Trans. Chin. Soc. Agric. Eng. 2018, 34, 165–174. [Google Scholar]
Li, X.; Ma, Z.; Sun, Z.; Wang, H. Automatic counting for trapped urediospores of Puccinia striiformis f.sp. tritici based on image processing. Trans. Chin. Soc. Agric. Eng. 2013, 29, 199–206. [Google Scholar]
Cai, C.; Tan, J.; Zhang, P.; Ye, Y.; Zhang, J. Determining Strawberries’ Varying Maturity Levels by Utilizing Image Segmentation Methods of Improved DeepLabV3+. Agronomy 2022, 12, 1875. [Google Scholar] [CrossRef]
Yao, Q.; Liu, Q.; Dietterich, T.G.; Todorovic, S.; Lin, J.; Diao, G.; Tang, J. Segmentation of touching insects based on optical flow and NCuts. Biosyst. Eng. 2013, 114, 67–77. [Google Scholar] [CrossRef]
Xu, L.; Li, Y.; Sun, Y.; Song, L.; Jin, S. Leaf instance segmentation and counting based on deep object detection and segmentation networks. In Proceedings of the Joint 10th International Conference on Soft Computing and Intelligent Systems (SCIS) and 19th International Symposium on Advanced Intelligent Systems (ISIS), Toyama, Japan, 5–8 December 2018; pp. 180–185. [Google Scholar]
Storey, G.; Meng, Q.G.; Li, B.H. Leaf Disease Segmentation and Detection in Apple Orchards for Precise Smart Spraying in Sustainable Agriculture. Sustainability 2022, 14, 1458. [Google Scholar] [CrossRef]
Itakura, K.; Hosoi, F. Automatic leaf segmentation for estimating leaf area and leaf inclination angle in 3D plant images. Sensor 2018, 18, 3576. [Google Scholar] [CrossRef]
Cao, Y.; Wang, Y.; Xue, Y.; Zhang, H.; Lao, Y. FEC: Fast Euclidean Clustering for Point Cloud Segmentation. Drones 2022, 6, 325. [Google Scholar] [CrossRef]
Li, Y.; Ma, L.; Zhong, Z.; Cao, D.; Li, J. TGNet: Geometric Graph CNN on 3-D Point Cloud Segmentation. Trans. Geosci. Remote Sens. 2020, 58, 3588–3600. [Google Scholar] [CrossRef]
Wang, L.; Lan, J. Adaptive Polar-Grid Gaussian-Mixture Model for Foreground Segmentation Using Roadside LiDAR. Remote Sens. 2022, 14, 2522. [Google Scholar] [CrossRef]
Wang, C.; Gu, Y.; Li, X. A Robust Multispectral Point Cloud Generation Method Based on 3-D Reconstruction From Multispectral Images. Trans. Geosci. Remote Sens. IEEE 2023, 61, 1–12. [Google Scholar] [CrossRef]
Tong, Y.; Chen, H.; Yang, N. 3D-CDRNet: Retrieval-based dense point cloud reconstruction from a single image under complex background. Displays 2023, 78, 102438. [Google Scholar] [CrossRef]
Song, J.; Liu, L.; Huang, W.; Li, Y.; Chen, X.; Zhang, Z. Target detection via HSV color model and edge gradient information in infrared and visible image sequences under complicated background. Opt. Quant. Electron. 2018, 50, 175. [Google Scholar] [CrossRef]
Trigano, T.; Bechor, Y. Fast background removal of JPEG images based on HSV polygonal cuts for a foot scanner device. J. Real-Time Image Proc. 2020, 17, 981–992. [Google Scholar] [CrossRef]
Pu, Y.; Sun, J.; Tang, N. Deep expectation-maximization network for unsupervised image segmentation and clustering. Image Vis. Comput. 2023, 135, 104717. [Google Scholar] [CrossRef]
Boukos, N.; Constantoudis, V. Segmentation of SEM images of multiphase materials: When Gaussian mixture models are accurate? J. Microsc. 2023, 289, 58–70. [Google Scholar]
Plaza, L.V.; Gomez, J.A.; Mandow, A.; García, C.A. Voxel-Based Neighborhood for Spatial Shape Pattern Classification of Lidar Point Clouds with Supervised Learning. Sensors 2017, 17, 594. [Google Scholar] [CrossRef]
Xiong, S.; Li, B.; Zhu, S. DCGNN: A single-stage 3D object detection network based on density clustering and graph neural network. Complex Intell. Syst. 2023, 9, 3399–3408. [Google Scholar] [CrossRef]
Tang, K.; Shi, Y.; Wu, J.; Peng, W.; Khan, A.; Zhu, P.; Gu, Z. NormalAttack: Curvature-aware shape deformation along normals for imperceptible point cloud attack. Secur. Commun. Netw. 2022, 2022, 1186633. [Google Scholar] [CrossRef]
Zhu, L.; Chen, W.; Lin, X.; He, L.; Guan, Y. Curvature-Variation-Inspired Sampling for Point Cloud Classification and Segmentation. IEEE Signal Process. Lett. 2022, 29, 1868–1872. [Google Scholar] [CrossRef]
Wu, Q.; Liu, J.; Gao, C.; Wang, B.; Shen, G.; Li, Z. Improved RANSAC point cloud spherical target detection and parameter estimation method based on principal curvature constraint. Sensors 2022, 22, 5850. [Google Scholar] [CrossRef] [PubMed]
Jannah, W.; Saputro, D.R.S. Parameter estimation of Gaussian mixture models (GMM) with expectation maximization (EM) algorithm. AIP Conf. Proc. 2022, 2566, 040002. [Google Scholar]
Zhang, L.; Wan, Y. Decolorization based on the weighted combination of image entropy and canny edge retention ratio. J. Electron. Imaging 2023, 32, 013024. [Google Scholar] [CrossRef]
Semenaite, A.; Sánchez, A.G.; Pezzotta, A. Beyond–ΛCDM constraints from the full shape clustering measurements from BOSS and eBOSS. Mon. Not. R. Astron. Soc. 2023, 521, 5013–5025. [Google Scholar] [CrossRef]
Bi, X.; Hu, J.; Xiao, B.; Li, W.; Gao, X. IEMask R-CNN: Information-Enhanced Mask R-CNN. IEEE Trans. Big Data 2022, 9, 688–700. [Google Scholar] [CrossRef]
Cao, Y.; Zhao, Z.; Huang, Y.; Lin, X.; Luo, S.; Xiang, B.; Yang, H. Case instance segmentation of small farmland based on Mask R-CNN of feature pyramid network with double attention mechanism in high resolution satellite images. Comput. Electron. Agric. 2023, 212, 108073. [Google Scholar] [CrossRef]

Figure 1. Insect trapping device. a, Insect attraction lamp. b, Impact plate. c, Pheromone lure core. d, Insect-receiving funnel. e, Insect-killing unit. f, Electronic control flip plate. j, First camera. h, Receptacle plate. i, Drive mechanism. j, Second camera. k, Collection unit.

Figure 2. Images of pest samples.

Figure 3. Image hue and inversion processing: (a) original image, (b) hue adjustment, (c) inversion processing, (d) histogram of HSV mean values for the background, (e) histogram of HSV means in the foreground, (f) histogram of the HSV mean values for the hue-adjusted foreground, and (g) histogram of HSV mean values for the inverted foreground.

Figure 4. Image background separation: (a) target mask with a segmentation threshold of [70, 30, 120], (b) target contour, (c) background separation, and (d–f) histogram of HSV three-channel with a segmentation threshold of [70, 30, 120].

Figure 5. Pest body geometry: (a) mugwort looper, (b) ostrinia furnacalis, and (c) cotton bollworm.

Figure 6. Statistics of adhesion targets for different group structures: (a) data distribution and (b) data frequency.

Figure 7. Shape factor.

Figure 8. The shape factor of the target area: Note: The shape factors of figures (a–o) are as follows: 1.0360, 1.0032, 0.9768, 0.7115, 0.6710, 0.8631, 0.4695, 0.4457, 0.5292, 0.3695, 0.3878, 0.3846, 0.3551, 0.3224, and 0.3672.

Figure 9. Neighborhood points diagram.

Figure 10. Adhesive pests.

Figure 11. Adhesive pest point cloud reconstruction. Note: (a–d) depict the reconstructed point clouds of adhesive pests, along with their point cloud top view and point cloud color registration.

Figure 12. Point cloud spatial density. Note: (a–d) correspond to the cotton bollworm, fall armyworm, mugwort looper, peach pyralid moth, and their respective spatial density histograms and spatial curvature heatmaps.

Figure 13. Point cloud training.

Figure 14. Point cloud distribution.

Figure 15. The effect of three factors on the accurate segmentation rate of the model: (a) the impact of parameters k and a on the model’s accuracy in segmentation, (b) the impact of parameters a and k_pd on the model’s accuracy in segmentation, and (c) the impact of parameters k and k_pd on the model’s accuracy in segmentation.

Figure 16. Pest adhesion image segmentation results: (a) original image, (b–e) represent the segmentation results of our method, GMM-Pd, GMM-BIC, and the watershed algorithm, respectively.

Figure 17. Improved Mask R-CNN model architecture.

Figure 18. ResNeXt structure: (a) Grouped convolution input and output diagram, (b) A block of ResNeXt with cardinality = 32, with roughly the same complexity.

Figure 19. Attention mechanism CBAM overall network architecture.

Figure 20. Pest adhesion recognition effectiveness: (a–c) depict the classification recognition results of the improved Mask R-CNN, (d–f) depict the classification recognition results of the original Mask R-CNN.

Table 1. Technical parameters.

Equipment Name	Parameters	Numerical/Formal
MV-CE120-10GC CMOS Camera	Maximum resolution	4024 × 3036
	Sensor model	Sony IMX226
	Effective pixels	12 Megapixels
	Dynamic range	70.5 dB
	Exposure time	34 μs~2 s
	Maximum frame rate	9.6 fps
	Signal-to-noise ratio	40.5 dB
	Gain	0~20 dB

Table 2. The statistical values of the shape factor for adhesion pest targets.

Statistical Variables	Individual Pests	Adhesion of 2 Pests	Adhesion of 3 Pests	Adhesion of 4 Pests	Adhesion of 5 Pests
Mean	0.999690	0.761705	0.513228	0.400592	0.336708
Maximum	1.037982	0.901134	0.594163	0.431322	0.361901
Minimum	0.922179	0.597632	0.436363	0.367075	0.316565
Standard deviation	0.023571	0.096015	0.044865	0.019089	0.012746

Table 3. Test factor code.

Encoding Value	Factors
Encoding Value	Spatial Neighborhood Size k	Weighting Coefficient a	Number of Adhesive Pests k_pd
−1	30	0.5	2
0	50	0.6	3
1	70	0.7	4

Table 4. Experimental design scheme and the impact of segmentation-related factors on responses.

Index	Factors			Accurate Segmentation Rate/%
Index	Spatial Neighborhood Size x₁	Weighting Coefficient x₂	Number of Adhesive Pests x₃	Accurate Segmentation Rate/%
1	0	1	−1	87.56
2	0	0	0	92.13
3	−1	0	1	95.25
4	0	−1	1	89.75
5	1	0	−1	85.67
6	0	0	0	92.13
7	−1	−1	0	94.56
8	0	1	1	84.71
9	−1	0	−1	95.55
10	0	0	0	92.13
11	1	1	0	82.77
12	0	0	−1	91.23
13	0	0	0	92.13
14	0	0	0	92.13
15	−1	1	0	91.57
16	1	0	1	84.22
17	1	−1	0	83.07

This table presents the impacts of various factors on the accurate segmentation rate. Spatial neighborhood size: −1, 0, and 1 represent spatial neighborhood sizes of 30, 50, and 70; weighting coefficient: 1, 0, and 1 represent weighting coefficients of 0.7, 0.6, and 0.5; number of adhesive pests: −1, 0, and 1 represent the quantities of adherent pests as 2, 3, and 4.

Table 5. Analysis of variance for accurate segmentation rate.

Source	Sum of Squares	Degrees of Freedom	Mean Square	F	p
Model	310.39	9	34.49	45.02	<0.0001
x₁	233.28	1	233.28	304.53	<0.0001
x₂	18	1	18	23.50	0.0019
x₃	4.62	1	4.62	6.03	0.0437
x₁x₂	1.81	1	1.81	2.36	0.1682
x₁x₃	0.33	1	0.33	0.43	0.5322
x₂x₃	0.47	1	0.47	0.61	0.4595
x₁₂	3.33	1	3.33	4.34	0.0757
x₂₂	44.44	1	44.44	58.01	0.0001
x₃₂	1.36	1	1.36	1.78	0.2241
Residual	5.36	7	0.77
Lack-of-fit	5.36	3	1.79
Pure error	0.000	4	0.000
R²	0.9830	Adjusted R²	0.9612
Comprehensive	315.75	16

Table 6. Analysis of variance for accurate segmentation rate.

Segmentation Method	ACC (Average)	OVER (Average)	UNDER (Average)
The proposed method in this research	95.75%	2.83%	1.42%
Pd-based cluster-defined GMM	44.56%	31.24%	24.22%
BIC-based GMM	23.76%	60.36%	15.88%
The watershed algorithm	33.85%	25.32%	40.83%

Table 7. Model hyperparameter setting.

Parameters	Value
The initial learning rate	0.00001
Batch size	4
Number of epochs	200
RPN anchors	(16, 32, 64, 128, 256)
Mask resolution	56 × 56
Momentum factor	0.9

Table 8. Model hyperparameter setting.

Model	Mean Average Precision (mAP)/%
Improved Mask R-CNN	96.41%
Original Mask R-CNN	92.31%

Table 9. Model hyperparameter setting.

Baseline Model		Improvement Module	Mean Average Precision (mAP) %
Original Mask R-CNN	Improved Mask R-CNN	Proposed Segmentation Method (GMM-DC)	Mean Average Precision (mAP) %
√			79.42%
√		√	92.43%
	√		84.57%
	√	√	96.75%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Liu, S.; Ren, Z.; Ma, B.; Mu, J.; Sun, L.; Zhang, H.; Wang, J. Clustering and Segmentation of Adhesive Pests in Apple Orchards Based on GMM-DC. Agronomy 2023, 13, 2806. https://0-doi-org.brum.beds.ac.uk/10.3390/agronomy13112806

AMA Style

Wang Y, Liu S, Ren Z, Ma B, Mu J, Sun L, Zhang H, Wang J. Clustering and Segmentation of Adhesive Pests in Apple Orchards Based on GMM-DC. Agronomy. 2023; 13(11):2806. https://0-doi-org.brum.beds.ac.uk/10.3390/agronomy13112806

Chicago/Turabian Style

Wang, Yunfei, Shuangxi Liu, Zhuo Ren, Bo Ma, Junlin Mu, Linlin Sun, Hongjian Zhang, and Jinxing Wang. 2023. "Clustering and Segmentation of Adhesive Pests in Apple Orchards Based on GMM-DC" Agronomy 13, no. 11: 2806. https://0-doi-org.brum.beds.ac.uk/10.3390/agronomy13112806

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Clustering and Segmentation of Adhesive Pests in Apple Orchards Based on GMM-DC

Abstract

1. Introduction

2. Data Collection and Processing

2.1. Image Acquisition

2.2. Background Separation Processing

2.3. Analysis of Pest Image Adhesion Regions

2.3.1. Shape Factor

2.3.2. Adhesion Determination Analysis

3. Image Segmentation Based on GMM-DC for Pest Adhesion

3.1. Construction of the GMM-DC Segmentation Model

3.1.1. Gaussian Mixture Mode

3.1.2. Density-Based Spatial Neighborhood Information Function

3.1.3. Curvature-Based Spatial Neighborhood Information Function

3.1.4. Density-Curvature Weighted Gaussian Mixture Model

3.1.5. EM Solution Algorithm for GMM-DC

3.2. Reconstruction and Analysis of Pest Adhesion Region Point Clouds

3.3. Point Cloud Training with Anisotropic Clustering Segmentation

4. Experimental Results and Discussion

4.1. Experimental Design

4.2. Optimization of GMM-DC Model Parameters

4.3. Segmentation Effects of Different Methods on Adhesive Pests

4.4. Improved Mask R-CNN for Pest Classification

4.4.1. Construction and Improvement in Mask R-CNN

4.4.2. Model Testing Experiments and Segmented Image Recognition Experiments

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI