Next Article in Journal
ELiT, Multifunctional Web-Software for Feature Extraction from 3D LiDAR Point Clouds
Next Article in Special Issue
Determining Cover Management Factor with Remote Sensing and Spatial Analysis for Improving Long-Term Soil Loss Estimation in Watersheds
Previous Article in Journal
Personalized Legibility of an Indoor Environment for People with Motor Disabilities: A New Framework
Previous Article in Special Issue
Machine Learning Framework for the Estimation of Average Speed in Rural Road Networks with OpenStreetMap Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Unsupervised Crop Classification Method Based on Principal Components Isometric Binning

1
College of Land Science and Technology, China Agricultural University, Beijing 100083, China
2
Key Laboratory of Remote Sensing for Agri-Hazards, Ministry of Agriculture and Rural Affairs, Beijing 100083, China
3
Key Laboratory for Agricultural Land Quality, Ministry of Natural Resources of the People’s Republic of China, Beijing 100083, China
*
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2020, 9(11), 648; https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi9110648
Submission received: 24 September 2020 / Revised: 20 October 2020 / Accepted: 26 October 2020 / Published: 29 October 2020
(This article belongs to the Special Issue Geospatial Artificial Intelligence)

Abstract

:
The accurate and timely access to the spatial distribution information of crops is of great importance for agricultural production management. Although widely used, supervised classification mapping requires a large number of field samples, and is consequently costly in terms of time and money. In order to reduce the need for sample size, this paper proposes an unsupervised classification method based on principal components isometric binning (PCIB). In particular, principal component analysis (PCA) dimensionality reduction is applied to the classification features, followed by the division of the top k principal components into equidistant bins. Bins of the same category are subsequently merged as a class label. Multitemporal Gaofen 1 satellite (GF-1) remote sensing images were collected over the southwest of Hulin City and Luobei County of Hegang City, Heilongjiang Province, China in order to map crop types in 2016 and 2017. Our proposed method was compared with commonly used classifiers (random forest, K-means and Iterative Self-Organizing Data Analysis Techniques Algorithm (ISODATA)). Results demonstrate PCIB and random forest to have the highest classification accuracies, reaching 82% in 2016 in the southwest of Hulin City. In Luobei County in 2016, the accuracies of PCIB and random forest were determined as 81% and 82%, respectively. It can be concluded that the overall accuracy of our proposed method meets the basic requirements of classification accuracy. Despite exhibiting a lower accuracy than that of random forest, PCIB does not require a large field sample size, thus making it more suitable for large-scale crop mapping.

1. Introduction

Food security is crucial for the livelihood and economic development of the global population. The timely and accurate acquisition of crop planting area and its spatial distribution is of great significance for the accurate estimation of crop yield, the management of crop planting structure adjustments and the formulation of relevant regulatory policies [1,2,3,4]. The majority of current remote sensing-based crop classification methods fall into the category of supervised classification, including machine learning [5,6,7,8,9,10,11,12] and deep learning [13,14]. Such methods require the collection of ground sample data in order to train the classification model. The accuracy and spatial distribution of the sample data collected in the field also affects the classification accuracy. The recent increase in demand of large-scale crop classifications has resulted in the dramatic rise of ground sampling [8,15]. Consequently, annual field sampling consumes an extensive amount of manpower, material and financial resources [16]. Furthermore, the majority of the current crop classification techniques require current-year samples, while the sample collection, data processing and additional steps are highly time consuming, leading to a lag in classification data. This prevents the effective application of the classification results for the current agricultural season.
Current research focuses on reducing the dependence on large sample datasets while ensuring the accurate and timely classification of crops for the current season. At present, there exist three methods to directly classify crops using limited or no sample data from the current season: (i) the supervised classification of crops based on historical samples; (ii) semi-supervised classification; and (iii) unsupervised classification.
Historical-based classification is often established using auxiliary information such as the stability of the crop planting structure, or the use of past unchanged crop pixels as the “training samples” for the crops in the present season. For example, Hao [17] used historical year cropland data layer (CDL) data collected in Kansas, the United States to extract hypothetical samples. These samples were then screened via the Artificial Antibody Network (ABNet) method to obtain "training samples" for the classification of crops in the current season at 90% accuracy. Zhang [18] integrated historical information with current crop planting structure and spectral data to investigate new sample production and screening methods based on cluster analysis. The proposed seasonal crop classification model was able to attain a classification accuracy of 80.44% for samples located in Hulin City, China. Despite the high accuracy of crop classification based on historical data, such methods are limited in areas lacking historical samples.
Semi-supervised learning methods are able to leverage large amounts of relatively inexpensive unlabeled data with a limited amount of labeled data [19,20,21,22]. For example, Liu [20] proposed a semi-supervised support vector machine approach using a self-training algorithm for land cover classification in Yanji City, Jilin province, China, with the highest accuracy achieved for the marked and unmarked sample ratio of 1:3. In Hu [23], a cooperative training method based on the random forest and matrix completion collaborative representation (MCCR) classifiers was applied to map land cover using Landsat time-series (LTS) images of Wuhan City, China, between 2000 and 2015. Results revealed that the MCCR classifier exhibited the same working efficiency as that of the random forest classifier in the presence of data noise, yet the cooperative training method was able to improve the classification through the iterative learning of unlabeled data. However, these semi-supervised learning-based models are not suitable for the land cover classification of remote sensing images with a very small number of labeled and large unlabeled samples because of high computational complexities associated with the use of a greater number of kernels [24]. Ratle [25] proposed a semi-supervised remote sensing image classification framework based on neural networks, improving the classification accuracy and scalability of previous techniques. Solano [26] combined hierarchical correlation clustering (HCC) and an artificial neural network (ANN) for a semi-supervised classification method (HHC-ANN) based on the Normalized Difference Vegetation Index (NDVI) and phenological characteristics. The classification accuracy of HHC-ANN (84%) surpassed that of both K-means and ANN. Neural networks are associated with multiple benefits, including adaptivity, speed, fault tolerance and optimality [27]. Thus, we can choose robust semi-supervised classification models for efficient crop classification.
Unsupervised classification methods have been applied in order to efficiently process a large number of unlabeled samples in remote sensing images. Such methods do not require sample data and only rely on spectrum or texture information to extract and divide image features based on their statistical characteristics. For example, Gumma [28] and Xiong [29] used K-means and ISODATA classification for the clustering of Moderate-resolution Imaging Spectroradiometer (MODIS) 250 m 16-day-derived NDVI time series to determine a rice distribution map of South Asia and crop distribution across Africa. Hao [30] and Cai [31] combined MODIS and Landsat images with field data to determine NDVI time series. This was then used to extract the spatial distribution information of the key crops in northeast China and Hubei province via the ISODATA algorithm and spectral coupling technology. Sherrie [32] used K-means and Gaussian mixture model (GMM) to perform unsupervised classification of crops in the midwestern United States, with CDL data employed for ground truth. Classification accuracies were observed to vary with vegetation density. Iounousse [33] proposed an unsupervised approach based on a Probabilistic Neural Network (PNN) with the implementation of the cluster validity technique. The approach was applied for a sequence of seven time series of NDVI remote sensing images acquired by Landsat and Systeme Probatoire d’Observation de la Terre (SPOT) to build a land use map. The obtained results are compared with real land use and exhibited an overall accuracy of 96.56%. Venkata [34] combined K-means and a probabilistic neural network to propose a new classification method with improved classification accuracy.
Unsupervised classification mapping does not require a large number of ground samples. This is particularly true for the traditional K-means and ISODATA methods which are widely used in land cover and crop classification [28,32,35]. However, these algorithms are highly sensitive to outliers, high dimensionality, and noise. Thus, in order to improve classification accuracy, the dimensionality of remote sensing images must be reduced prior to classification. Principal component analysis (PCA) is able to maintain the maximum amount of original information for the elimination of noise and can be applied to classify land cover via satellite imagery time series [36,37,38,39,40]. For example, Abedini [39] performed the land classification of the Ulu Kinta Catchment in the state of Perak and across Peninsular Malaysia based on Landsat Thematic Mapper (TM) images. PCA was used to determine the optimum band combination for classification, followed by ISODATA clustering with an overall accuracy of 80%. Dharani [41] performed PCA dimensionality reduction and morphological operations using Landsat-8 images and retained the first three principal components for K-means clustering to determine the changes in land use and cover (LULC), with promising classification results. Therefore, in order to obtain high-precision crop distributions mapping, we can also use PCA to obtain low-dimensional and high-quality feature data prior to classification.
Clustering algorithms such as K-means and ISODATA need to randomly select K0 samples from the dataset as the initial clustering center. However, the classification results are sensitive to the initial center. Thus, in order to avoid local optimality, multiple experiments are needed [42,43,44]. Data binning, commonly used in statistics, is often used to merge similar pixels in image processing. We attempted to integrate this method with PCA for the classification procedure. Although PCA and data binning have been widely applied in different fields, we combined and applied them to crop classification in remote sensing, which is a new idea. The principal components, regarded as a series of continuous variables, were divided into a number of bins. Unlike K-means and ISODATA, the process of binning does not require multiple iterations per pixel, and is suitable for application of high-resolution remote sensing images with a large number of pixels in crop classification.
Therefore, in the current paper, we propose a crop classification method based on principal components isometric binning (PCIB), taking the southwest of Hulin City and Luobei County of Hegang City, Heilongjiang Province, China as the study areas. We explore the feasibility of our proposed method via comparisons with the traditional random forest, K-means and ISODATA approaches, and demonstrate PCIB as a promising approach for crop classification mapping without the requirement of seasonal sample data.

2. Materials

2.1. Study Area

The southwest of Hulin City and Luobei County, with a total area of 3000 km2 in Heilongjiang Province, were selected as the research area (Figure 1). The planting calendar is relatively stable, with small inter-annual differences. The growing season of the key crops generally lies from March to October. For example, maize is sown in late April to early May, reaches maturity in September and is harvested around October. Rice is planted in April and matures in September.
The two research areas are located in the main grain-producing areas of Heilongjiang Province and are less affected by other field objects. At the same time, these areas can effectively compare the regional differences and verify the feasibility of the method proposed in this paper.

2.2. Data Sources

2.2.1. Multitemporal GF-1 Data

Since the revisit cycle of GF-1 image is 4 days and the spatial resolution is 16 m, it has the characteristics of large coverage area and high spatial and temporal resolution. We selected GF-1 wide field view (WFV) images during the main growing seasons of maize and rice in 2016 and 2017 (April–September) with cloud cover <10% (Table 1). The data were sourced from the China Centre for Resource Satellite Data and Application (CRESDA) and were the level-one product (L1) [18].

2.2.2. Field Sample Data

Field samples were used to verify the accuracy of our proposed classification approach. The sampling of land features included recording latitude and longitude coordinates with a handheld global positioning system (GPS), manually noting vegetation type and collecting photos. Because of the large plots of Heilongjiang, most of the sampling was taken along the roadside of the field. In addition to maize and rice, we also collected land cover samples of wheat plots, greenhouses, woodland, water bodies and other land types. However, these land cover types were not the focus of the study and thus their sampling was limited. These land cover types were combined into the “others” category during classification.
A total of 322 and 455 samples were taken of the southwest of Hulin City in 2016 and 2017, respectively. In Luobei County, Hegang City, a total of 383 samples were collected in 2016 and 406 in 2017 (Table 2). All samples within each year were used to build the supervised classifier, whereby the samples were randomly divided into training samples (2/3 of the total sample size) and verification samples (1/3 of the total sample size). Unsupervised classification methods require the same verification samples as supervised classification to determine classification accuracy. One-third of the training samples were taken randomly to determine the category label for the unsupervised classification results.

3. Methods

The workflow of the proposed method can be divided into the following five components: data preprocessing, feature value selection, comparison of classification methods, category label determination and accuracy evaluation (Figure 2). In brief, the GF-1 WFV automatic processing and sharing platform developed by our team was used to preprocess the image and sample data. Following this, the Near-Infrared Reflectivity (NIR) reflectance and vegetation indices Normalized Difference Vegetation Index (NDVI) and Normalized Difference Water Index (NDWI) were selected as the characteristic values for the classification. The images were then classified using the random forest, K-means, ISODATA and PCIB techniques. Category labels were determined for the classification results of the K-means, ISODATA and PCIB methods. Finally, the results were verified and compared between methods.

3.1. Data Preprocessing

The GF-1 images and field sample data were stored using the Raster Dataset Clean and Reconstitution Multi-Grid (RDCRMG) grid system developed by China Agricultural University. Based on the C # and the Geospatial Data Abstraction Library (GDAL), procedures such as radiometric calibration, ortho-rectification and the image registration were performed for all data [45,46]. Radiometric calibration was used to eliminate the errors generated by the sensor and convert the dimensionless digital number (DN) value recorded by the sensor into the radiation brightness or reflectivity of the atmospheric top layer. Based on the official radiometric calibration coefficient updated by the China Resources Satellite Application Center, we radiometrically calibrated the GF-1 data as follows:
Le(λe) = Gain·DN + Offset
where Le(λe) is the radiation brightness after conversion, DN is the observed satellite load value, Gain is the calibration slope, and Offset refers to the offset of absolute calibration coefficient.

3.2. Feature Selection for Classification

The selection of features for remote sensing imagery classification is a function of the classification objectives and accuracy, as well as the spatial and temporal scales. Potential features for crop classification include spectral, spatial, temporal and polarization characteristics, digital elevation model (DEM) and additional auxiliary information [5,11]. In order to identify the maize and rice crops within the study area, we selected the NIR spectral signal, the vegetation indices NDVI and NDWI, and temporal characteristics including categorical feature variables (Table 3).
Figure 3 presents the time-series curves of NDVI and NDWI during the growth periods of rice and maize in Luobei County, Hegang City in 2016. The growth period of rice and maize was very close, the NDVI and NDWI curves had a high similarity. The NDVI and the NDWI differed greatly in the early growth stage and differed slightly in the middle and late stages.

3.3. Random Forest Classification

We employ the supervised classification of the target crops using the random forest training model on the feature values extracted from the GF-1 time series. Random forest classification (RF) is a multi-decision tree classification method that was proposed by Breiman [47] in 2001. In particular, multiple classification and regression trees (CART) were constructed through random resampling of the data and feature variables. The classification of the data was then executed via multi-decision tree voting [48].
We used the Python tool scikit-learn [49] to execute the random forest classifier, with a total of 100 trees generated. The number of features of each tree is equal to the square root of the number of original features, and the number of samples selected for each tree is consistent with the number of training sets.

3.4. Unsupervised Classification

We performed K-means and ISODATA unsupervised learning for the clustering of feature data into crop types using ENVI (ver. 5.1, ESRI). K-means and ISODATA are the most basic and commonly used unsupervised classification algorithms. They are simple in principle and easy to implement, and are widely employed in the remote sensing field [28,29,30,35,39].

3.4.1. K-Means

K-means partitions m samples into k clusters by alternately assigning samples to the nearest cluster centroid, as measured by Euclidean distance. The cluster centroids are updated using the mean of the samples assigned to the cluster.

3.4.2. ISODATA

ISODATA initially calculates the average of evenly distributed classes within the data space, and subsequently employs the rule of minimum distance to iteratively aggregate the remaining pixels. The mean value is recalculated in each iteration, and the pixel is reclassified according to the new mean value. This process continues until the number of pixels in each class changes less than the selected threshold or reaches the maximum number of iterations. The parameters required for the ISODATA classification include the initial clustering center and the number of categories [50].

3.5. Principal Components Isometric Binning Classification

The PCIB proposed in this paper is based on PCA dimensionality reduction and principal components isometric binning. Figure 4 presents the classification process.

3.5.1. PCA Dimensionality Reduction

Key applications of principal component analysis (PCA) include compressing and reducing the dimensionality of data, as well as converting multiple interrelated numerical variables into a few unrelated comprehensive indicators. These indicators are the principal components of the original multiple variables. Each principal component is a linear combination of the original variables, and the individual principal components are not related to each other.
Let m denote the number of pixels in the target area, and assume that n is the feature variable, then matrix X can be used to represent the m × n data as follows:
X = [ x 11 x 1 n x m 1 x m n ]
where x m n represents the value of the m th pixel and the nth feature vector. The correlation coefficient matrix R is then calculated as:
R = [ r 11 r 1 n r m 1 r m n ]
where rij (i = 1,2,…,m, j = 1,2,…,n) is the correlation coefficient of x i and xj. The eigenequation |λE − R| = 0 is solved for eigenvalue λ j ( j = 1 , 2 , , n ) and λ 1 λ 2 λ n 0 . Eigenvector ej( j = 1 , 2 , , n ) corresponds to eigenvalue λ j . When the cumulative contribution rate of the principal components following the eigenvector transformation reaches a high percentage, the required dimensionality reduction of the data is attained.
In this study, we apply the cumulative contribution rate to determine k. When the cumulative contribution rate of the current k principal components reaches more than 70%, the former k principal components are retained, that is, the k -dimensional data is reduced. The cumulative contribution rate of the first k principal components is defined as i = 1 k λ i i = 1 n λ i , namely, the number of original variables information extracted from the first k principal components.
PCA can be used for dimensionality reduction, classification and feature extraction [51,52].

3.5.2. Principal Components Isometric Binning

We reduce the original dataset to k -dimensions to obtain matrix Y , described as follows:
Y = [ r 11 r 1 k r m 1 r m k ]
where k is the number of principal components following the dimensionality reduction and m is the number of pixels in the study area. Matrix Y is divided into bins of equal distance, which are then further divided into k 1 bins. The detailed steps are as follows:
(i)
The first principal component [ r 11 r m 1 ] is divided into k 11 intervals. All pixels falling into each interval are gathered together, with bin distance h 1 = r j 1 r i 1 k 11 , where r j 1 is the maximum value of the column and r i 1 is the minimum value.
(ii)
The second principal component [ r 12 r m 2 ] is divided into k 11 intervals corresponding to the binning result of the first principal component. Each interval is then divided into k 12 sub-intervals. All pixels in each sub-interval are divided into a bin, and bin distance h = r j 2 r i 2 k 12 , where rj2 and r i 2 are the maximum and minimum values of the column.
(iii)
The k-th principal component [ r 1 k r m k ] is divided into k 11 * k 12 * …* k 1 k bins, that is, k1 = k11* k 12 * …* k 1 k , until the end. A frequency distribution histogram is drawn to show the binning situation intuitively.
By exploring the results of the first binning, we found that there will be confusion if a large number of pixels fall into a bin. In order to improve the classification accuracy, these bins are regathered. More specifically, each bin is divided into k 2 ( k 2 = k 21 * k 22 * … * k 2 k ) bins, with the division steps equal to those of the first binning.

3.6. Determination of Category Labels

The binning results do not represent crop type, and crop labels need to be assigned to them. Visual recognition was combined with minimal ground auxiliary data (auxiliary identification data mentioned in Section 2.2.2.) to perform visual discrimination via ArcMap (ver. 10.2, ERSI) using a 432 band false color synthesis and Google Earth high-resolution images.
The first binning results in the discrimination of water bodies, grasslands, woodlands, bare land and towns (denoted as “other classes”), while the naming of rice and maize is based on the auxiliary ground samples. If a bin contains a single sample type, it is denoted as a single class; if multiple sample types are present, it is denoted as a confusion class. As the accuracy of the first binning results must be evaluated to determine the value of k1, a label for each confusion bin is determined. This allows for the selection of the ground objects compromising the largest proportion in this bin as the bin category.
We labelled the second binning result using both the aforementioned method and the auxiliary ground samples.

3.7. Accuracy Assessment

One-third of the total sample size collected in the field was taken as the verification sample, with the confusion matrix employed to evaluate the accuracy of the classification results. Evaluation indicators include the total, producer and user accuracies and Kappa coefficient:
OA = T P + T N N ,
PA = T P T P + F N ,
UA = T P T P + F P ,
K = O A P e 1 P e ,
P e = ( T P + F N ) ( T P + F P ) + ( F P + T N ) ( F N + T N ) N 2 ,
where TP and FN refer to the true category of samples as positive examples, and the model prediction results as positive and negative examples, respectively. TN and FP refer to the negative examples of the true category of samples, which are predicted by the model as negative examples and positive examples, respectively. N is the total number of real samples.

4. Results

4.1. Effect of Parameter Selection

The selection of parameters is a critical step in our proposed method as it has a direct impact on the binning results. In the following, we detail the selection of two key user-defined parameters.
(1) Number of the first bin, k 1
The determination of parameter k 1 is key. If the value of k 1 is too small, the classification results will be highly inaccurate; whereas, if k 1 is too high, visual recognition proves to be difficult. Thus, we set the initial range to 5 k 1 50, with the various k (number of principal components) used to further determine k 1 . Here, based on the dataset after dimensionality reduction, only two cases are discussed (k = 1, 2). Taking the southwest of Hulin City as an example, in the following, we determine parameter k 1 .
For k = 1 , an interval of 5 is used to divide the k-dimensional data into 5, 10, 15, …, 50 bins and calculate the classification accuracy of each bin (Figure 5, left). The classification accuracy increases with k1 until the maximum accuracy is reached for k1 = 35. Thus, we set 30 k 1 40 .
When k = 2 , the k -dimensional data are divided into k1 = k11 * k12 bins. The contribution rate of the first principal component is greater than that of the second principal component, k11 > k12, and 5 ≤ k11 * k12 ≤ 50, thus, there are 51 combinations of k11 and k12 that satisfy the above conditions (Table 4). According to a certain interval, 15 values of k1 are selected relatively uniformly. Figure 5 (right) demonstrates the classification accuracy corresponding to each value. The peak classification accuracy is observed for k 12 = 3 and k 12 = 4 . Thus, k1 is set as follows:
k 1 = { 12 3 , , 16 3 ,    12 k 11 16 , k 12 = 3 5 4 , , 12 4 ,    5 k 11 12 , k 12 = 4
The value of k1 that corresponds to the highest classification accuracy is selected as the final value.
(2) Number of the second bin, k 2
Taking the southwest of Hulin City as an example, the value range of k 2 is derived in a similar way. We initially set 3 k 2 20 . According to the different k ( number of principal components ) values, k 2 is then determined as follows:
(i) When   k = 1 :  11 k 2 16
(ii) When   k = 2 :  k 2 = { 4 3 , , 6 3 , 4 k 21 6 , k 22 = 3 5 4 , k 21 = 5 , k 22 = 4
The confusion bin for k 1 = X (the classification accuracy is highest when k 1 = X ) is divided into k 2 bins again, and the accuracy of the binning results is evaluated. The value of k 2 is selected as that with the highest classification accuracy.

4.2. Comparison of PCIB Results

Taking the southwest of Hulin City for the example, we compared the results of the two sets of data binning in 2016.
The contribution rate of the first and second principal components reached 70% following the principal component dimension reduction of the multidimensional characteristic data, and thus k = 2. The two-dimensional data were divided into k 1 bins, with the corresponding frequency distribution histograms depicted in Figure 6.
Taking k1 = 12 * 4 as an example, Table 5 reported the number of pixels falling into the bins.
We assign a crop type label to each bin according to the method described in Section 3.6. The bins with the same label are merged to obtain the classification result map of the region corresponding to the k1 values. The overall accuracy is reported in Table 6. The maximum overall accuracy is observed for k 1 = 12 * 4 at 75.23%.
The k 1 = 12 * 4 confusion bin is divided for the second time. Table 7 reports the overall classification accuracy of the k 2 values. The overall accuracy peaks at k 2 = 5 * 4, reaching 81.65%.
Figure 7 presents the corresponding 2016 crop distribution. Figure 7a,b are able to distinguish between water and forests. However, in Figure 7a, the road was misclassified into maize, and rice was misclassified into maize. In Figure 7b, this misclassification was greatly reduced and the rice plot outline is clear. This indicates the improvement of the crop classification accuracy via the second binning.

4.3. Comparison of Classification Methods

We applied the random forest (RF), K-means, ISODATA and PCIB classifiers to the preprocessed images and based on the verification samples, we used the confusion matrix to evaluate the accuracy of the classification results. Evaluation indicators include the total accuracy, user accuracy, producer accuracy and Kappa coefficient. The classification accuracy of the PCIB method in the southwest of Hulin city and Luobei County of Hegang City was slightly lower than that of random forest, yet it was consistently more accurate than K-means and ISODATA (Table 8). In 2016, the PCIB method exhibited the most accurate classification in the southwest of Hulin City, reaching 82%, which is equal to that of random forest and higher than that of K-means (76%) and ISODATA (78%). The PCIB classification accuracy of Luobei County, Hegang City was reduced in 2017 (76%), and the accuracy of the other three methods in this area was also low, at 79%, 74% and 75% for random forest, K-means and ISODATA, respectively. This is attributed to the limited availability of the images for the region in 2017 (early April to late June). However, images in July, when rice is in the jointing stage with vigorous growth, are key to distinguishing maize from rice. Thus, the absence of images from July to September affected the final classification accuracy.
The classification accuracy of the PCIB method is generally slightly lower than that of random forest, but it is consistently higher than K-means and ISODATA. For years where ground and historic samples are insufficient, the classification accuracy of PCIB can potentially meet the classification requirements.
Confusion matrices are commonly used to evaluate the accuracies of algorithms. However, due to the small sample size of our data, applying just the confusion matrix data size to evaluate the classification accuracy is insufficient. Thus, we integrate the crop distribution map to assess the quality of the classification results. Figure 8 depicts the crop distribution map of 2016 for Luobei County. The maps are superimposed with false color images in the area. Figure 8a–d correspond to the random forest, K-means, ISODATA and PCIB results, respectively. All four methods are able to distinguish water and forests from maize and rice, but the magnified area demonstrates that the misclassification of roads as maize was more serious for the K-means, ISODATA and random forest classification methods. Moreover, the rice plot outline is not clear for K-means and ISODATA. The PCIB method exhibits improved classification results, with the clear identification of the rice plot contour, roads and buildings. This can be attributed to the initial PCA dimensionality reduction performed by the PCIB method, eliminating the influence of noise on the classification results by maintaining as much of the original information as possible, thus achieving better classification results on the crop distribution map compared to the other methods.

5. Discussion

5.1. Advantanges, Deficiencies and Improvenments of PCIB

We propose a classification method based on principal components equidistant binning, denoted as PCIB, for the extraction of crop planting structure. PCIB essentially performs PCA dimensionality reduction on the classification features and subsequently divides the top k principal components into equidistant bins. The category of each bin is visually recognized based on a small number of field samples, and the same category bins are merged as a class label. PCIB includes PCA and data binning, PCA has low computational costs and is not limited by parameters. Furthermore, the dimensionality reduction technique can effectively remove the redundancy and noise of data by keeping the original information as far as possible, and thus it is employed widely in the remote sensing field [53,54,55]. Data binning is a common method in statistics, for the merging similar pixels in image processing. In statistics, nonparametric estimation [56] is mainly used for binning in datasets with unknown probability density, including histogram, kernel density estimation and non-parametric regression. In this study, we chose the simplest histogram, which has the advantages of constant bin distance, rapid binning and high efficiency, and is suitable for large data volumes.
In recent years, soft clustering methods such as GMM and PNN have also been gradually applied in the field of remote sensing for land cover and crop type, and promising classification results have been obtained. The GMM method has a long run time because it includes the time used for searching optimal components [32,57]. PNN is a special form of radial basis function neural networks (RBFNN), and it provides satisfactory results if the initial target classes are defined correctly. So, finding the basis function centers with their appropriate number is an important step to achieve suitable classification. However, this process has a high requirement for space-time complexity, so computational optimization is essential [33,34,58,59,60]. Compared with these clustering methods, PCIB has high computational efficiency, but has some deficiencies in classification accuracy. In order to solve this problem, we can explore such dynamic methods as kernel density estimation and non-parametric regression to divide the dataset and automatically determine bins in our future research.

5.2. Analysis of Sources of the Errors

In this study, the classification accuracy of Luobei County, Hegang City in 2017 was low, which we believe resulted from two aspects. On the one hand, the available images for the region in 2017 only cover early April to late June. In July, rice is in the jointing stage with vigorous growth. Images from this period may be beneficial to distinguish maize from rice. However, the absence of images from July to September affects the classification accuracy. On the other hand, the planting structure of Luobei County was adjusted in 2017. The planting area of maize was reduced, and instead small amounts of soybeans, grains, fruits and vegetables were planted, which increased the crop diversity in this region and had a certain impact on the classification results [32].
Therefore, we consider the fusion of GF-1, Sentinel-2 and Landsat-8 medium-resolution images. Multiple data sources can alleviate the problem of missing images. In addition, both Sentinel-2 and Landsat-8 contain short-wave infrared bands, which perform well in crop classification [9].

5.3. Comparison of Computational Complexity of Three Clustering Algorithms

Computational complexity is an important evaluation indicator of an algorithm. Thus, we compared the clustering time between the PCIB, K-means and ISODATA clustering algorithms. To make the comparison more complete, we selected a commonly used clustering benchmark dataset (iris dataset). Our model was implemented in the Windows 10 Professional operation system with 1.60 GHz Intel Core i5-8250U. The times of the three clustering methods were shown in Table 9.
K-means exhibits a faster clustering time than ISODATA, as the latter automatically merges and splits classes in the clustering process, thus making the algorithm more complex. The PCIB clustering time is far shorter than that of K-means and ISODATA, it does not require multiple iterations per pixel and can realize rapid classification.

5.4. Additional Experiments

We chose Luobei County and the southwest of Hulin City in Heilongjiang Province as the research areas. Both regions have a temperate monsoon climate and are cultivated once a year. The planting structure is relatively simple, and the main crops are maize and rice. However, considering that crop characteristics vary with different planting regions, we also selected the northwest of Wuwei City, Gansu Province (Figure 9) as the research area for additional experiments. The region has a temperate continental arid climate, which is characterized by a large temperature difference between day and night, sufficient sunshine, drought and little rain. With its extremely advantageous geographical, soil, water and climatic resources, the region provides a good planting environment for crops such as maize, pear, wheat and grape.
We also selected GF-1 WFV images during the main growing seasons of main crops in 2018 (April–September) with cloud cover <10% as remote sensing data sources. The image phase is shown in Table 10.
The samples for this region were also from field research. Sampling points are regularly distributed within the area. The collected data include crop types, growth situations, geographical coordinates and field photographs. The number of sample points are shown in Table 11. Others include buildings, water and small amounts of pepper, soybeans and greenhouses.
The PCIB method described above was used to classify the region. The classification results are shown in Figure 10. Additionally, the classification accuracies are shown in Table 12, with the overall accuracy reaching 84%. The region is rich in crop types, including corn, spring wheat, grapes and pears. As crop planting is concentrated, good classification results are obtained. This also verifies the feasibility of PCIB method in areas with abundant crop types.

6. Conclusions

In the current paper, we propose a classification method (PCIB) with GF-1 remote sensing images acquired during the growing season as the data source to extract the planting information of the key crops in the southwest of Hulin City and Luobei County, Heilongjiang province, China. The underlying motivation of this work is to reduce the dependence of crop distribution mapping on a large number of field samples and to determine the spatial distribution information of crops in a timely manner. We compare PCIB with the traditional random forest, K-means and ISODATA classifiers. The key results can be summarized as follows:
(1)
The overall accuracy of the PCIB method in the southwest of Hulin City in 2016 reaches 82%, exceeding the K-means and ISODATA algorithms. In 2017, Luobei County of Hegang City exhibited the lowest PCIB classification accuracy, with the other three methods also exhibiting low accuracies, (79%, 74% and 75% for random forest, K-means and ISODATA, respectively). Although the overall accuracy of PCIB is slightly lower than that of the random forest classifier, it meets the mapping accuracy requirements for years where large amounts of field samples are absent.
(2)
PCIB conducts the isometric binning of k principal components directly after the PCA dimensionality reduction. Multiple iterations per each pixel are not required, and the time complexity is linear. This consequently improves the computational efficiency compared with the Euclidean distance-based K-means and ISODATA classifiers.
(3)
The dependence on a large number of field samples for classification is reduced. In addition, the spatial distribution information of crops is determined in a timely and accurate manner. Our proposed method can potentially be applied to the mapping of crop classification.
The PCIB method proposed in the paper can obtain the spatial distribution information of crops accurately and timely, which is of great significance for guiding agricultural production. The classification results can be used to calculate the evapotranspiration of crops and calculate the planting area of crops, and so on.

Author Contributions

Conceptualization, Shaoming Li ; Formal analysis, Zhe Ma and Lin Zhang ; Investigation, Zhe Ma and Lin Zhang ; Methodology, Zhe Ma; Data curation, Diyou Liu; Project administration, Zhe Liu ; Resources, Yuanyuan Zhao; Software, Diyou Liu; Validation, Zhe Ma; Visualization, Tianwei Ren; Writing—original draft, Zhe Ma; Writing—review and editing, Zhe Liu and Yuanyuan Zhao; Funding acquisition, Xiaodong Zhang. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 41771104.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Yang, B.J. Remote Sensing Monitoring of Agricultural Conditions, 1st ed.; China Agricultural Press: Beijing, China, 2005. [Google Scholar]
  2. Song, X.; Potapov, P.V.; Krylov, A.; King, L.; Di Bella, C.M.; Hudson, A.; Khan, A.; Adusei, B.; Stehman, S.V.; Hansen, M.C. National-scale soybean mapping and area estimation in the United States using medium resolution satellite imagery and field survey. Remote Sens. Environ. 2017, 190, 383–395. [Google Scholar] [CrossRef]
  3. Guo, W.; Zhao, C.Y.; Gu, X.H.; Huang, W.J.; Ma, Z.H. Remote sensing monitoring of maize planting area at town level. Trans. CSAE 2011, 27, 69–74. [Google Scholar]
  4. Li, Y.; Zhu, Y.; Dai, T.; Tian, Y.; Cao, W. Quantitative relationships between leaf area index and canopy reflectance spectra of wheat. Chin. J. Appl. Ecol. 2006, 17, 1443–1447. [Google Scholar]
  5. Wardlow, B.D.; Egbert, S.L.; Kastens, J.H. Analysis of time-series MODIS 250 m vegetation index data for crop classification in the U.S. Central Great Plains. Remote Sens. Environ. 2007, 108, 290–310. [Google Scholar] [CrossRef] [Green Version]
  6. Yang, Y.J.; Zhan, Y.L.; Tian, Q.J.; Gu, X.F.; Yu, T.; Wang, L. Crop classification based on GF-1/WFV NDVI time series. Trans. CSAE 2015, 31, 155–161. [Google Scholar]
  7. Hao, P.; Wang, L.; Zhan, Y.; Niu, Z. Using Moderate-Resolution Temporal NDVI Profiles for High-Resolution Crop Mapping in Years of Absent Ground Reference Data: A Case Study of Bole and Manas Counties in Xinjiang, China. ISPRS Int. J. Geo-Inf. 2016, 5, 67. [Google Scholar] [CrossRef] [Green Version]
  8. Yang, N.; Liu, D.; Feng, Q.; Xiong, Q.; Zhang, L.; Ren, T.; Zhao, Y.; Zhu, D.; Huang, J. Large-Scale Crop Mapping Based on Machine Learning and Parallel Computation with Grids. Remote Sens. 2019, 11, 1500. [Google Scholar] [CrossRef] [Green Version]
  9. Cai, Y.; Guan, K.; Peng, J.; Wang, S.; Seifert, C.; Wardlow, B.; Li, Z. A high-performance and in-season classification system of field-level crop types using time-series Landsat data and a machine learning approach. Remote Sens. Environ. 2018, 210, 35–47. [Google Scholar] [CrossRef]
  10. Zhang, L.; Liu, Z.; Ren, T.; Liu, D.; Ma, Z.; Tong, L.; Zhang, C.; Zhou, T.; Zhang, X.; Li, S. Identification of Seed Maize Fields with High Spatial Resolution and Multiple Spectral Remote Sensing Using Random Forest Classifier. Remote Sens. 2020, 12, 362. [Google Scholar] [CrossRef] [Green Version]
  11. Li, H.; Zhang, C.; Zhang, S.; Peter, M. Crop classification from full-year fully-polarimetric L-band UAVSAR time-series using the Random Forest algorithm. Int. J. Appl. Earth Obs. 2020, 87, 102032. [Google Scholar] [CrossRef]
  12. Mmamokoma Grace, M.; Adriaan van, N.; Zama Eric, M. Pre-harvest classification of crop types using a Sentinel-2 time-series and machine learning. Comput. Electron. Agric. 2020, 169, 105164. [Google Scholar]
  13. Zhong, L.H.; Hu, L.N.; Zhou, H. Deep learning based multi-temporal crop classification. Remote Sens. Environ. 2019, 221, 430–443. [Google Scholar] [CrossRef]
  14. Garnot, V.S.F.; Landrieu, L.; Giordano, S.; Chehata, N. Time-Space Tradeoff in Deep Learning Models for Crop Classification on Satellite Multi-Spectral Image Time Series. In Proceedings of the IGARSS 2019—2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 6247–6250. [Google Scholar]
  15. Hu, Q.; Wu, W.; Song, Q.; Lu, M. How do temporal and spectral features matter in crop classification in Heilongjiang Province, China? JIA 2017, 16, 324–336. [Google Scholar] [CrossRef]
  16. Gallego, J.; Craig, M.; Michaelsen, J. Best Practices for Crop Area Estimation with Remote Sensing; GEOSS: Ispra, Italy, 2008. [Google Scholar]
  17. Hao, P.; Wang, L.; Zhan, Y.; Wang, C.; Niu, Z.; Wu, M. Crop classification using crop knowledge of the previous-year: Case study in Southwest Kansas, USA. Eur. J. Remote Sens. 2016, 49, 1061–1077. [Google Scholar] [CrossRef] [Green Version]
  18. Zhang, L.; Liu, Z.; Liu, D.; Xiong, Q.; Yang, N. Crop Mapping Based on Historical Samples and New Training Samples Generation in Heilongjiang Province, China. Sustainability 2019, 11, 5052. [Google Scholar] [CrossRef] [Green Version]
  19. Lorenzo, B.; Luis, G.; Gustavo, C.; Javier, C. Mean Map Kernel Methods for Semi-supervised Cloud Classification. IEEE. Trans. Geosci. Remote Sens. 2010, 48, 207–220. [Google Scholar]
  20. Liu, Y.; Zhang, B.; Wang, L.M.; Wang, N. A self-trained semi-supervised SVM approach to the remote sensing land cover classification. Comput. Geosci. 2013, 59, 98–107. [Google Scholar] [CrossRef]
  21. Ghoggali, N.; Melgani, F. Genetic SVM Approach to Semi-supervised Multi-temporal Classification. IEEE Geosci. Remote Sens. Lett. 2008, 5, 212–216. [Google Scholar] [CrossRef]
  22. Bruzzone, L.; Chi, M.; Marconcini, M. A Novel Transductive SVM for Semi-supervised Classification of Remote-Sensing Images. IEEE Geosci. Remote Sens. 2006, 44, 3363–3373. [Google Scholar] [CrossRef] [Green Version]
  23. Hu, T.; Huang, X.; Li, J.Y.; Zhang, L.F. A novel co-training approach for urban land cover mapping with unclear Landsat time series imagery. Remote Sens. Environ. 2018, 217, 144–157. [Google Scholar] [CrossRef]
  24. Neeta, S.; Saroj, K. Semi-supervised classification of remote sensing images using efficient neighborhood learning method. Eng. Appl. Artif. Intell. 2020, 90, 103520. [Google Scholar]
  25. Ratle, F.; Camps, G.; Weston, J. Semi-supervised neural networks for efficient hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2010, 48, 2271–2282. [Google Scholar] [CrossRef]
  26. Solano, Y.T.; Bovolo, F.; Bruzzone, L. A Semi-Supervised Crop-Type Classification Based on Sentinel-2 NDVI Satellite Image Time Series and Phenological Parameters. In Proceedings of the IGARSS 2019—2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 457–460. [Google Scholar]
  27. Wan, L.; Zhang, H.; Lin, G. A small-patched convolutional neural network for mangrove mapping at species level using high-resolution remote-sensing image. Ann. GIS 2019, 25, 45–55. [Google Scholar] [CrossRef]
  28. Gumma, M.K.; Thenkabail, P.; Teluguntla, P. Mapping Rice Fallow Areas for Short Season Grain Legumes Intensification in South Asia using MODIS 250m Time-Series Data. Int. J. Digit. Earth 2016, 9, 981–1003. [Google Scholar] [CrossRef] [Green Version]
  29. Xiong, J.; Prasad, S.; Murali, K. Automated cropland mapping of continental Africa using Google Earth Engine cloud computing. ISPRS J. Photogram. Remote Sens. 2017, 126, 225–244. [Google Scholar] [CrossRef] [Green Version]
  30. Hao, W.P.; Mei, X.R.; Cai, X.L. Crop planting extraction based on multi—Temporal remote sensing data in Northeast China. Trans. CSAE 2011, 27, 201–207. [Google Scholar]
  31. Cai, X.L.; Cui, Y.L. Crop planting structure extraction in irrigated areas from multi-sensor and multi—Temporal remote sensing data. Trans. CSAE 2009, 25, 124–130. [Google Scholar]
  32. Sherrie, W.; George, A.; David, B. Crop type mapping without field-level labels: Random forest transfer and unsupervised clustering techniques. Remote Sens. Environ. 2019, 222, 303–317. [Google Scholar]
  33. Iounousse, J.; Er-Raki, S.; Elmotassadeq, A. Using an unsupervised approach of Probabilistic Neural Network (PNN) for land use classification from multi-temporal satellite images. Appl. Soft Comput. 2015, 30, 1–13. [Google Scholar] [CrossRef]
  34. Venkata Subramanian, N.; Saravanan, N.; Bhuvaneswari, S. K-means based probabilistic neural network (KPNN) for designing physical machine—Classifier. IJITEE 2019, 9, 800–804. [Google Scholar]
  35. Loveland, T.R.; Reed, B.C.; Brown, J.F.; Ohlen, D.O.; Zhu, Z. Development of a global land cover characteristics database and IGBP DIS Cover from 1 km AVHRR data. Int. J. Remote Sens. 2000, 21, 1303–1330. [Google Scholar] [CrossRef]
  36. Zhai, Y.G.; Qu, Z.Y. Crop classification based on nonlinear dimensionality reduction using time series remote sensing images. Trans. CSAE 2018, 34, 177–183. [Google Scholar]
  37. Yan, L.; Roy, D.P. Improved time series land cover classification by missing-observation-adaptive nonlinear dimensionality reduction. Remote Sens. Environ. 2015, 158, 478–491. [Google Scholar] [CrossRef] [Green Version]
  38. Paul, S.; Kumar, D.N. Evaluation of Feature Selection and Feature Extraction Techniques on Multi-Temporal Landsat-8 Images for Crop Classification. Remote Sens. Earth Syst. Sci. 2019, 197–207. [Google Scholar] [CrossRef]
  39. Abedini, M.; Fauziah, A. Clustering Approach on Land Use Land Cover Classification of Landsat TM over Ulu Kinta Catchment. WAS J. 2012, 17, 809–817. [Google Scholar]
  40. Senthilnath, J.; Omkar, S.N.; Mani, N. Crop Stage Classification of Hyperspectral Data Using Unsupervised Techniques. IEEE J.-STARS 2013, 6, 861–866. [Google Scholar] [CrossRef]
  41. Dharani, M.; Sreenivasulu, G. Land use and land cover change detection by using principal component analysis and morphological operations in remote sensing applications. IJCA 2019, 1–10. [Google Scholar] [CrossRef]
  42. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction Springer Series in Statistics; Springer: New York, NY, USA, 2009. [Google Scholar]
  43. Tan, P.N.; Steinbach, M.; Kumar, V. Introduction to Data Mining, 1st ed.; Addison-Wesley Longman Publishing Co., Inc.: Boston, MA, USA, 2005. [Google Scholar]
  44. Hu, L.Y.; Chen, Y.L.; Xu, Y. A 30 m land cover mapping of China with an efficient clustering algorithm CBEST. Sci. China Earth Sci. 2014, 57, 2293–2304. [Google Scholar] [CrossRef]
  45. Ye, S.; Liu, D.; Yao, X. RDCRMG: A Raster Dataset Clean & Reconstitution Multi-Grid Architecture for Remote Sensing Monitoring of Vegetation Dryness. Remote Sens. 2018, 10, 1376. [Google Scholar] [CrossRef] [Green Version]
  46. Xiong, Q.; Wang, Y.; Liu, D.; Ye, S.; Du, Z.; Liu, W.; Huang, J.; Su, W.; Zhu, D.; Yao, X.; et al. A Cloud Detection Approach Based on Hybrid Multispectral Features with Dynamic Thresholds for GF-1 Remote Sensing Images. Remote Sens. 2020, 12, 450. [Google Scholar] [CrossRef] [Green Version]
  47. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  48. Liu, J.; Feng, Q.; Gong, J. Winter wheat mapping using a random forest classifier combined with multi-temporal and multi-sensor data. Int. J. Digit. Earth 2018, 11, 783–802. [Google Scholar] [CrossRef]
  49. Swami, A.; Jain, R. Scikit-learn: Machine Learning in Python. J. Mach. Learn Res. 2013, 12, 2825–2830. [Google Scholar]
  50. Kulkarni, N.M. Crop Identification Using Unsuperviesd ISODATA and K-Means from Multispectral Remote Sensing Imagery. IJERA 2017, 7, 45–49. [Google Scholar] [CrossRef]
  51. Jolliffe, I.T. Principle Components in Regression Analysis. Principle Component Analysis; Springer: New York, NY, USA, 1986; pp. 129–155. [Google Scholar]
  52. Abdi, H.; Williams, L.J. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 433–459. [Google Scholar] [CrossRef]
  53. Eastman, J.R.; Fulk, M. Long sequence time series evaluation using standardized principal components. Photogramm. Eng. Remote Sens. 1993, 59, 991–996. [Google Scholar]
  54. Hirosawa, Y.; Marsh, S.E.; Kliman, D.H. Application of standardized principal component analysis of land-cover characterization using multitemporal AVHRR data. Remote Sens. Environ. 1996, 58, 267–281. [Google Scholar] [CrossRef]
  55. Bellón, B.; Bégué, A.; Lo Seen, D.; De Almeida, C.A.; Simões, M. A Remote Sensing Approach for Regional-Scale Mapping of Agricultural Land-Use Systems Based on NDVI Time Series. Remote Sens. 2017, 9, 600. [Google Scholar] [CrossRef] [Green Version]
  56. Härdle, W.K. Nonparametric and Semiparametric Models; Springer Science & Business Media: New York, NY, USA, 2012. [Google Scholar]
  57. Hu, Q.W.; Shu, N. A study of a Gaussian mixture model for urban land-cover mapping based on VHR remote sensing imagery. Int. J. Remote Sens. 2016, 37, 1–13. [Google Scholar]
  58. Qu, Y.R.; Cai, H. Product-based neural networks for user response prediction. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM), Barcelona, Spain, 12–15 December 2016. [Google Scholar]
  59. Zhang, W.N.; Du, T.M. Deep Learning over Multi-Field Categorical Data; ECIR Springer: New York, NY, USA, 2016. [Google Scholar]
  60. Oliveira, A.L.I.; Costa, F.R.G. Novelty detection with constructive probabilistic neural networks. Neurocomputing 2008, 71, 1046–1053. [Google Scholar] [CrossRef]
Figure 1. Study area, including the Luobei County and the southwest of Hulin City. Both are in the Heilongjiang Province, China.
Figure 1. Study area, including the Luobei County and the southwest of Hulin City. Both are in the Heilongjiang Province, China.
Ijgi 09 00648 g001
Figure 2. Workflow of the proposed approach, including five key steps: (1) data preprocessing; (2) feature value selection; (3) comparison of classification methods; (4) category label determination; (5) accuracy evaluation.
Figure 2. Workflow of the proposed approach, including five key steps: (1) data preprocessing; (2) feature value selection; (3) comparison of classification methods; (4) category label determination; (5) accuracy evaluation.
Ijgi 09 00648 g002
Figure 3. Seasonal Normalized Difference Vegetation Index (NDVI) and Normalized Difference Water Index (NDWI) profiles during the rice and maize growth period.
Figure 3. Seasonal Normalized Difference Vegetation Index (NDVI) and Normalized Difference Water Index (NDWI) profiles during the rice and maize growth period.
Ijgi 09 00648 g003
Figure 4. Workflow of the principal components isometric binning (PCIB) method.
Figure 4. Workflow of the principal components isometric binning (PCIB) method.
Ijgi 09 00648 g004
Figure 5. Classification accuracy with different values of k 1 .
Figure 5. Classification accuracy with different values of k 1 .
Ijgi 09 00648 g005
Figure 6. Frequency distribution histogram of k1 (k = 2).
Figure 6. Frequency distribution histogram of k1 (k = 2).
Ijgi 09 00648 g006
Figure 7. PCIB classification mapping results for southwest of Hulin City in 2016.
Figure 7. PCIB classification mapping results for southwest of Hulin City in 2016.
Ijgi 09 00648 g007
Figure 8. Classification mapping results for Luobei County in 2016.
Figure 8. Classification mapping results for Luobei County in 2016.
Ijgi 09 00648 g008
Figure 9. Study area, the northwest of Wuwei City, Gansu Province, China.
Figure 9. Study area, the northwest of Wuwei City, Gansu Province, China.
Ijgi 09 00648 g009
Figure 10. Classification mapping results for the northwest of Wuwei City in 2018.
Figure 10. Classification mapping results for the northwest of Wuwei City in 2018.
Ijgi 09 00648 g010
Table 1. Specifications of GF-1 wide filed view (WFV) imagery used for classifications.
Table 1. Specifications of GF-1 wide filed view (WFV) imagery used for classifications.
Satellite TypeParameterValue (μm)Southwest of Hulin CityLuobei County of Hegang City
Image Dates in 2016Image Dates in 2017Image Dates in 2016Image Dates in 2017
GF-1 WFVBand 1-Blue0.45~0.524/25, 5/18,
7/16, 9/30
5/2, 5/10,
6/16, 7/23
4/19, 5/18, 8/8,
8/17, 8/20, 9/30
4/3, 4/7, 5/5, 5/9, 5/10, 6/16
Band 2-Green0.52~0.59
Band 3-Red0.63~0.69
Band 4-NIR0.77~0.89
Table 2. Details of the ground reference samples.
Table 2. Details of the ground reference samples.
AreaYearMaizeRiceOthersTotalNo. of Auxiliary SamplesNo. of Verification Samples
Southwest of Hulin City20161651035432271109
2017111194150455100152
Luobei County2016234915838384131
20171379917040690136
Table 3. Feature value selection. R, G and NIR are the reflectance of red, green and near-infrared bands, respectively.
Table 3. Feature value selection. R, G and NIR are the reflectance of red, green and near-infrared bands, respectively.
Feature ValueFormulaApplication
Near-Infrared ReflectivityNIRCanopy structure
Normalized Difference Vegetation Index (NDVI)NDVI = (NIR − R)/(NIR + R)Vegetation status, canopy structure
Normalized Difference Water Index (NDWI)NDWI = (G − NIR)/(G + NIR)Canopy structure, water content
Table 4. Different values of k1.
Table 4. Different values of k1.
k1 k 12 = 2 k 12 = 3 k 12 = 4 k 12 = 5 k 12 = 6
k11 = 3k1 = 3 * 2\\\\
k11 = 4k1 = 4 * 2k1 = 4 * 3\\\
k11 = 5k1 = 5 * 2k1 = 5 * 3k1 = 5 * 4\\
k11 = 6k1 = 6 * 2k1 = 6 * 3k1 = 6 * 4k1 = 6 * 5\
k11 = 7k1 = 7 * 2k1 = 7 * 3k1 = 7 * 4k1 = 7 * 5k1 = 7 * 6
k11 = 8k1 = 8 * 2k1 = 8 * 3k1 = 8 * 4k1 = 8 * 5k1 = 8 * 6
k11 = 9k1 = 9 * 2k1 = 9 * 3k1 = 9 * 4k1 = 9 * 5\
k11 = 10k1 = 10 * 2k1 = 10 * 3k1 = 10 * 4k1 = 10 * 5\
k11 = 11k1 = 11 * 2k1 = 11 * 3k1 = 11 * 4\\
k12 = 12k1 = 12 * 2k1 = 12 * 3k1 = 12 * 4\\
k11 = 13k1 = 13 * 2k1 = 13 * 3\\\
k11 = 14k1 = 14 * 2k1 = 14 * 3\\\
k11 = 15k1 = 15 * 2k1 = 15 * 3\\\
k11 = 16k1 = 16 * 2k1 = 16 * 3\\\
k11 = 17k1 = 17 * 2\\\\
k11 = 18k1 = 18 * 2\\\\
k11 = 19k1 = 19 * 2\\\\
k11 = 20k1 = 20 * 2\\\\
k11 = 21k1 = 21 * 2\\\\
k11 = 22k1 = 22 * 2\\\\
k11 = 23k1 = 23 * 2\\\\
k11 = 24k1 = 24 * 2\\\\
k11 = 25k1 = 25 * 2\\\\
Note: The bold values indicate the 15 values selected for k1.
Table 5. Number of pixels per bin for k 1 = 12 * 4, k11 refers to the division of the first-dimension data into k11 bins, k112 refers to the division of the second-dimension data into k12 bins. Number is the number of pixels (there are 11,718,750 pixels in a study area) falling into each bin.
Table 5. Number of pixels per bin for k 1 = 12 * 4, k11 refers to the division of the first-dimension data into k11 bins, k112 refers to the division of the second-dimension data into k12 bins. Number is the number of pixels (there are 11,718,750 pixels in a study area) falling into each bin.
k 11 Number k 12 Number k 11 Number k 12 Number
13667115547979,0721237,690
215392671,312
3323369,849
42514221
275,94311437878,26214197
219,095252,829
351,150320,716
442614520
3566,65111682930,1131650
2116,53229634
3408,409319,027
440,0284802
4810,072125,1301037,935199
2299,15822001
3454,909327,854
430,87547981
52,454,3121550,4651196,883110
21,751,2442491
3147,883379,420
44720416,962
66,562,11412,466,9211223,72617
23,799,395215,165
3290,14438274
456544280
Table 6. The overall accuracy for the k1 values in 2016, where k1 refers to the division of the 2D data after dimension reduction into k1 bins.
Table 6. The overall accuracy for the k1 values in 2016, where k1 refers to the division of the 2D data after dimension reduction into k1 bins.
k 1 Overall Accuracy k 1 Overall Accuracy
k 1 = 12 * 368.81% k 1 = 5 * 468.81%
k 1 = 13 * 368.81% k 1 = 6 * 464.22%
k 1 = 14 * 369.72% k 1 = 7 * 469.72%
k 1 = 15 * 367.89% k 1 = 8 * 468.81%
k 1 = 16 * 371.56% k 1 = 9 * 4
k 1 = 10 * 4
k 1 = 11 * 4
k 1 = 12 * 4
67.89%
74.31%
66.01%
75.23%
Table 7. The overall accuracy for the k2 values in 2016, where k2 refers to the division of the confusion bins that existed following the initial binning into k2 bins.
Table 7. The overall accuracy for the k2 values in 2016, where k2 refers to the division of the confusion bins that existed following the initial binning into k2 bins.
k 2 Overall Accuracy k 2 Overall Accuracy
k 2 = 4 * 379.82% k 2 = 5 * 481.65%
k 2 = 5 * 378.90%
k 2 = 6 * 377.98%
Table 8. Accuracy analysis of different classification methods.
Table 8. Accuracy analysis of different classification methods.
Study AreaYearTypesOAKappaProducer Accuracy, %User Accuracy, %
OtherMaizeRiceOtherMaizeRice
Southwest of
Hulin City
2016RF82%68.52%75%78%91%50%89%86%
K-means76%59.40%89%78%71%44%82%83%
ISODATA78%62.39%90%79%74%50%86%80%
PCIB82%69.84%67%88%79%56%82%94%
2017RF82%71.74%76%68%94%70%76%94%
K-means77%64.39%73%62%88%74%57%91%
ISODATA76%63.28%70%58%90%76%49%92%
PCIB80%68.27%73%71%88%82%54%92%
Luobei
County
2016RF82%65.24%68%83%88%65%90%71%
K-means78%58.60%55%81%88%60%86%68%
ISODATA80%60.01%82%79%87%45%94%68%
PCIB81%62.25%83%80%85%50%93%71%
2017RF79%67.96%74%78%93%85%77%74%
K-means74%60.47%76%69%79%72%74%76%
ISODATA75%61.46%73%73%81%74%74%76%
PCIB76%62.26%72%72%92%76%81%68%
Table 9. Clustering time comparison of K-means, ISODATA and PCIB.
Table 9. Clustering time comparison of K-means, ISODATA and PCIB.
Cluster MethodNumber of ClustersMaximum IterationsClustering Time
K-means3200.029″
ISODATA3200.059″
PCIB3/0.006″
Table 10. Description of GF-1 data used in the northwest of Wuwei City.
Table 10. Description of GF-1 data used in the northwest of Wuwei City.
Satellite TypeParameterValue (μm)Phase
GF-1 WFVBand 1-Blue0.45~0.524/16, 4/20,
5/27, 7/17, 8/13
Band 2-Green0.52~0.59
Band 3-Red0.63~0.69
Band 4-NIR0.77~0.89
Table 11. Details of the ground reference samples.
Table 11. Details of the ground reference samples.
AreaYearMaizeSpring WheatGrapePearForestOthersTotalNo. of Auxiliary SamplesNo. of Verification Samples
Northwest of Wuwei City201873253665368231710671
Table 12. Accuracy analysis of PCIB method for the northwest of Wuwei City.
Table 12. Accuracy analysis of PCIB method for the northwest of Wuwei City.
Study AreaYearMethodOAKappaProducer Accuracy %
OthersMaizeSpring WheatGrapePearForests
93%81%100%100%76%100%
Northwest of WuWei City2018PCIB84%76.5%User Accuracy%
OthersMaizeSpring WheatGrapePearForests
88%88%44%100%89%75%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ma, Z.; Liu, Z.; Zhao, Y.; Zhang, L.; Liu, D.; Ren, T.; Zhang, X.; Li, S. An Unsupervised Crop Classification Method Based on Principal Components Isometric Binning. ISPRS Int. J. Geo-Inf. 2020, 9, 648. https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi9110648

AMA Style

Ma Z, Liu Z, Zhao Y, Zhang L, Liu D, Ren T, Zhang X, Li S. An Unsupervised Crop Classification Method Based on Principal Components Isometric Binning. ISPRS International Journal of Geo-Information. 2020; 9(11):648. https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi9110648

Chicago/Turabian Style

Ma, Zhe, Zhe Liu, Yuanyuan Zhao, Lin Zhang, Diyou Liu, Tianwei Ren, Xiaodong Zhang, and Shaoming Li. 2020. "An Unsupervised Crop Classification Method Based on Principal Components Isometric Binning" ISPRS International Journal of Geo-Information 9, no. 11: 648. https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi9110648

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop