Mapping Agricultural Landuse Patterns from Time Series of Landsat 8 Using Random Forest Based Hierarchial Approach

Pareeth, Sajid; Karimi, Poolad; Shafiei, Mojtaba; De Fraiture, Charlotte

doi:10.3390/rs11050601

Open AccessArticle

Mapping Agricultural Landuse Patterns from Time Series of Landsat 8 Using Random Forest Based Hierarchial Approach

by

Sajid Pareeth

^1,*,

Poolad Karimi

¹

,

Mojtaba Shafiei

² and

Charlotte De Fraiture

^1,3

¹

Water Science and Engineering Department, IHE Delft Institute for Water Education, 2611AX Delft, The Netherlands

²

Hydroinformatics Department, East Water and Environmental Research Institute, Mashhad 9188737176, Iran

³

Department of Environmental Sciences, Wageningen University and Research Center, 6700HB Wageningen, The Netherlands

^*

Author to whom correspondence should be addressed.

Remote Sens. 2019, 11(5), 601; https://0-doi-org.brum.beds.ac.uk/10.3390/rs11050601

Submission received: 12 February 2019 / Revised: 3 March 2019 / Accepted: 6 March 2019 / Published: 12 March 2019

(This article belongs to the Special Issue Multitemporal Land Cover and Land Use Mapping)

Download

Browse Figures

Versions Notes

Abstract

:

Increase in irrigated area, driven by demand for more food production, in the semi-arid regions of Asia and Africa is putting pressure on the already strained available water resources. To cope and manage this situation, monitoring spatial and temporal dynamics of the irrigated area land use at basin level is needed to ensure proper allocation of water. Publicly available satellite data at high spatial resolution and advances in remote sensing techniques offer a viable opportunity. In this study, we developed a new approach using time series of Landsat 8 (L8) data and Random Forest (RF) machine learning algorithm by introducing a hierarchical post-processing scheme to extract key Land Use Land Cover (LULC) types. We implemented this approach for Mashhad basin in Iran to develop a LULC map at 15 m spatial resolution with nine classes for the crop year 2015/2016. In addition, five irrigated land use types were extracted for three crop years—2013/2014, 2014/2015, and 2015/2016—using the RF models. The total irrigated area was estimated at 1796.16 km², 1581.7 km² and 1578.26 km² for the cropping years 2013/2014, 2014/2015 and 2015/2016, respectively. The overall accuracy of the final LULC map was 87.2% with a kappa coefficient of 0.85. The methodology was implemented using open data and open source libraries. The ability of the RF models to extract key LULC types at basin level shows the usability of such approaches for operational near real time monitoring.

Keywords:

irrigated area; Mashhad; agriculture; landuse; remote sensing; Random Forest; Landsat 8

1. Introduction

Globally, there is an increase in population vulnerable to water scarcity and food security [1,2]. In the Asia and Middle East region, water scarcity is mainly driven by changing climate and unsustainable water use [3,4]. Many studies have reported rapid decrease of the ground water table in the semi-arid regions owing to the increasing use in irrigation [5,6,7]. United Nation’s Sustainable Development Goals (SDG) on “Zero hunger” (Goal 2) and “Clean water and sanitation” (Goal 6) target achieving food and water security by increasing agricultural productivity and efficient water use to reduce people susceptible to water scarcity and to feed the increasing population [8]. To improve land and water use efficiency, it is critical to monitor these at different scales, most importantly at a basin scale where water allocation to different sectors takes place [9,10]. The biggest share of fresh water (around 70%) allocation is for irrigation [5,11]. Hence, monitoring the spatial extent and temporal dynamics of the irrigated land use would enable managers and policy makers to make timely decisions to achieve higher crop water productivity [12].

The majority of the irrigation systems around the globe, especially in the Asian and African countries, lack systems in place for periodic monitoring. The only source of agriculture related data is through national census by responsible ministries of respective countries. These census data, while providing some information on irrigated area, are often found to be less useful in monitoring due to inherent limitations of survey methods [13]. They also lack the ability to capture inter-annual changes in the extent of irrigated agriculture that are often considerable in semi-arid areas where lands are left fallow or uncultivated periodically. Since the 1980s, remote sensing has been used in different studies to map land cover and agricultural areas at various scales [14]. It is a viable alternative where many countries use these technologies to complement ground surveys [10,15]. Over the past decades, several national and international agencies have developed remote sensing based land cover maps that report on the irrigated areas with global or regional coverages. Moderate Resolution Imaging Spectroradiometer (MODIS) sensor based land cover maps (MCD12Q1), CORINE land cover, Global Irrigated Area Mapping (GIAM), among others are widely used by researchers to understand the land cover land use changes at larger scales, including changes in agricultural systems [16,17,18,19,20]. Similarly, several agricultural monitoring systems have been developed at global and national scale. Examples are Global Information and Early Warning System (GIEWS), Famine EarlyWarning Systems NETwork (FEWS NET), MARS crop yield forecasting system (MCYFS), CropWatch, United States Department of Agriculture-Foreign Agricultural Service (USDA-FAS), GEOGLAM, World Food Programme Seasonal Monitor, and Anomaly hot Spots of Agricultural Production (ASAP) [21,22,23,24,25,26]. However, a recent review [26] comparing the aforementioned monitoring systems identifies the lack of ability to operationalize the new methods at a basin level as one of the gaps. These systems provide information at coarser scale meant for global or national assessments, but not always suitable at a basin level monitoring. Besides, there are lack of systems which enable continuous monitoring of irrigated land use dynamics over time at regional scale, mainly due to: (i) intensive data processing and storage requirement; and (ii) the need for locally calibrated models and regional expertise [26,27,28]. With emerging new techniques in remote sensing, increasing availability of satellite data at high spatial and temporal resolution, and open data policies adopted by space agencies, there are more opportunities to establish remote sensing based monitoring systems at regional scales [26].

The rapid increase in availability of high resolution satellite imagery by various space agencies [29,30] together with the advances in algorithms and computational capacity has opened new possibilities for creating frequent dynamic land cover maps [31]. Newer satellite missions in orbit are providing weekly observations at the fine resolution of 10–30 m, which are ideal for regional scale monitoring purposes [29]. These images can provide for the application of advanced algorithms that allows pixel/object based feature extraction for land use mapping [29]. Additionally, machine learning and decision tree based approaches in remote sensing has triggered a plethora of studies looking at its usability in extracting agricultural land use, urban mapping, crop type mapping, etc. [32]. The RF based methods result in higher accuracy compared to classical distance based approaches [31,33,34,35,36]. The resulting time series maps enable us to capture the land use dynamics, especially relevant to rainfed and irrigated agricultural landscapes. Identifying these land use classes within irrigated area allows for understanding and locating different existing farming systems and help in formulating specific policy interventions based on the production system [7,37].

This study focused on developing a RF based irrigated land use monitoring work flow using time series of L8 optical data. The workflow was designed in such a way that the decision tree-based prediction mechanism can be automatically applied to upcoming acquisitions. This feature is a key element in setting up a high resolution near real time agricultural monitoring system. The developed workflow was tested in Mashhad basin in Iran, which, besides being intensively cultivated, is characterized by high variability of irrigated area due to changing water availability and rainfall. We estimated the irrigated area land use dynamics for three crop years starting from November 2013 to October 2016. For deriving LULC map of the basin including irrigated area for the year 2015/2016, RF models were developed for each major LULC type from four candidate L8 data. The RF model was then used to generate irrigated area maps for other years to capture the inter-annual changes in cropped areas. Further, we estimated irrigated area with double crop, summer crop and winter crops using seasonal aggregation of irrigated areas derived from time series of L8 data.

2. Study Area

Mashhad basin, located in the north east of Iran (Figure 1) covers an area of 16,750 km². The basin has a semi-arid climate with an average annual rainfall of 236 mm. The elevation of the terrain varies significantly from 391 to 3249 m, with flat and fertile areas concentrated in the central plains. Despite having limited water resources and low rainfall, the basin has nearly 140,000 ha of irrigated lands to primarily feed its population of 3.5 million and provide for the booming food processing industry that aim at export. Surface water resources has long fallen short in supplying the growing demands and much of the recent economic and agricultural expansions have happened at the expense of depleting groundwater resources. Estimates show groundwater levels are dropping at a whopping rate of 0.8 m per year, making groundwater growingly unavailable to some areas and more expensive to pump up in others [38,39]. Hence, the situation calls for immediate action to curb water consumption in agriculture. To this end, targeted field level interventions are required to increase water productivity and water use efficiency. Up to date, reliable and verifiable information on the irrigated agriculture with high spatial and temporal resolution is, thus, needed to identify priority locations for improvements and inform strategic decisions in the basin.

There are two distinct cropping season in this basin. Winter crops, dominated by wheat and barley, are planted in November/December and harvested by June in the following year. Summer crops, mainly vegetables, are planted in July/August and harvested by October. Each crop year is defined by 12 months starting from October to September the next year, constituting both winter and summer cropping periods.

3. Methods

The proposed methodology has two major steps: (i) developing single class RF model for each major class (see Table 1) and preparing LULC map of Mashhad basin for the year 2015/2016 at 15 m spatial resolution; and (ii) developing irrigated land use maps of Mashhad basin for the crop years 2013/2014, 2014/2015 and 2015/2016 using the RF models. The entire methodology and data acquisition were implemented using open source libraries. The pre-processing of L8 time series data, which includes data fusion to resample spectral data using HPFA technique, was implemented in GRASS GIS version 7.4 [40] and the RF model training and classification was implemented in Orfeo Toolbox version 5.8.0 [41]. The following subsections explain the implemented approach in details.

3.1. Satellite Data

Data acquired from the Operational Land Imager (OLI) instrument aboard L8 satellite launched in February 2013 were used in this study. L8 has a temporal cycle of 16 days, which ensures a maximum of two images per month over the study area. All L8 acquisitions over the study area in the time period from November 2013 to October 2016 were processed. The entire study area can be covered by five tiles of L8, as shown in Figure 2. The L8 data with more than 30% of cloud cover were discarded from further analysis. The images available in these tiles for the study period are listed in Figure 3. To automate the acquisition and pre-processing of the L8 data using scripts, we used Amazon public bucket (https://aws.amazon.com/public-datasets/landsat/) and Amazon Web Service Command Line Interface(AWS CLI), which facilitates bulk downloading without pre-ordering. The L8 data from tile 158/035 (Path = 158, Row = 35) and 159/034 were used only for developing LULC map of the year 2015/2016. As the basin area covered exclusively in these tiles were negligible (2% of total basin area), we discarded them from irrigated area land use estimates.

3.2. Pre-Processing Landsat 8 Data

The acquired L8 data were pre-processed to create a time series of reflectance bands from all the valid cloud-free pixels. The pre-processing step included conversion from Digital Number (DN) to Top Of Atmosphere (TOA) reflectances (Radiometric calibration), cloud removal using the Quality Assessment (QA) band, resampling the spectral bands (2–7) from 30 to 15 m spatial resolution using panchromatic band (band 8) and converting the reflectances to integers.

The DN values were converted into TOA reflectances using Equations (1) and (2) [42]. The required parameters to apply the calibration is provided in the metadata file of each L8 data.

ρ_{λ}^{'} = M_{i} * D N + A_{i}

(1)

where

ρ_{λ}^{'}

is the TOA reflectance without solar elevation angle correction, DN is digital number (pixel value), and

M_{i}

and

A_{i}

are multiplicative and additive reflectance scaling factors, respectively, for the spectral band i. However,

ρ_{λ}^{'}

is not corrected for the solar elevation angle, and this correction is applied using Equation (2).

M_{i}

and

A_{i}

are provided as REFLECTANCEW_MULT_BAND_n and REFLECTANCE_ADD_BAND_N, respectively, in the metadata file.

ρ_{λ} = ρ_{λ}^{'} / s i n θ_{s}

(2)

where

ρ_{λ}

is TOA reflectance and

θ_{s}

is the solar elevation angle given in the metadata file. After the calibration, the gridded spectral bands had a spatial resolution of 30 m and the panchromatic band with a resolution of 15 m.

To remove the cloud pixels, we used the QA band provided with the L8 data to develop cloud masks. The L8 QA band contains 16 bit integers representing certain atmospheric conditions, as shown in Figure 4. All the bit combinations showing mid to high confidence clouds were used to create cloud masks. If the masked area was greater than 30% of the total tile area, then that scene was discarded from further analysis.

The L8 panchromatic band was used to resample the spectral bands (2–7) from 30 to 15 m. We used a High Pass Filter Additive (HPFA) fusion technique, as explained in [43], to resample the spectral data. All the spectral bands were then multiplied by 10,000 and converted to integer data type to optimize the performance of RF model. The L8 spectral bands 2–7 were stacked together for each image and used for classification.

3.3. Training and Validation Samples

The spatial and temporal samples representing the major LULC types (hereafter, also referred as classes) in the study area are the major components of any supervised classification approach [44]. Table 1 lists the major LULC types and irrigated sub-classes extracted for Mashhad basin in this study.

Four candidate cloud free L8 images acquired on 26 February 2016, 19 July 2016, 23 October 2016 and 16 March 2017 (Figure 3) for tile 159/035 were selected to delineate the training samples for RF models. Tile 159/035 was selected because more than 80% of the Mashhad basin area is covered by this tile (Figure 2). The urban classes (8–9) urban cropland and urban vegetation were derived from impervious/urban area class (Table 1), as explained in the next section. Hence, training samples for seven major LULC types (1–7 in Table 1) were delineated from the candidate images. Samples for each major class were digitized from candidate L8 images with the support of very high resolution spatial and temporal images available in Google Earth. A total of 28 different multi-polygon sets (7 LULC types X 4 candidate images) were developed for training the RF models. Each of these polygon set has two attributes—“LULC type” and “other LULC types”. These samples were used only for RF model training and validation (not for LULC map validation).

Further field survey data were collected from the basin by a regional expert team. A pre-designed database with Kobo toolbox (http://www.kobotoolbox.org/) as front end was used by the team to collect data from the field. The Kobo toolbox is available as an Android application and supports offline cache while in areas with no internet coverage. The data were stored in Amazon cloud services, and were then extracted and mapped. In total, 52 samples were collected from the study area, which were used only for the validation of LULC map. The field survey points covered only five major LULC types except water bodies and fallow cropland, and had very few samples per class. Therefore, to do a comprehensive accuracy assessment, 221 random stratified points were extracted and assigned classes using images provided in Google Earth for validating the final LULC map. Figure 5 shows the validation points collected from the field and stratified random samples. These validation samples were generated independent of the training samples for RF models and used only for LULC map validation. Accuracy assessment was performed separately for these two sets of training samples and also by combining them.

3.4. Classification Using Random Forest (RF)

Random Forest is an ensemble classifier that uses multiple decision trees followed by majority voting to predict a class. The RF classifier uses bootstrap samples to develop each tree and the samples that are not used to construct the tree are called Out Of Bag (OOB) samples. Internally, RF model prediction is validated using these samples and the estimated error rate is referred to as OOB error [45]. The two steps of the classification process were: (i) train RF classifier; and (ii) classify using RF model. The first step “train RF classifier” was to train and develop RF models for each of the major LULC types (except for urban classes 8 and 9 in Table 1) from the resampled spectral bands (2–7) of four candidate L8 scenes (Figure 3) for tile 159/035. Training and validation sample ratio of 0.5 was used, meaning half of the samples were used to check the accuracy of model prediction and estimate OOB error. A maximum number of 50 decision trees with a maximum depth of 25 were used for training RF model. Minimum number of samples at each node of the decision tree was set to 25. In the second part of the classification, all valid pixels in the pre-processed L8 data (see Figure 3) were assigned classes by the RF models.

The classification approach has following key steps, as depicted in Figure 6:

Developing seven single class RF models from the four candidate L8 data.
Applying those single class RF models to individual scenes of all the required tiles to extract single class maps for each acquisition and tile.
Applying majority fusion to aggregate over multiple acquisitions to single class maps for each tile.
For irrigated area class, applying a single occurrence selection method to aggregate the single class maps for each tile.
Mosaicking the single class maps for each tile to a single class maps for entire Mashhad basin.
Conditionally patching all the single class maps for basin to a multi-class LULC map for the year.
Seasonally aggregating to derive land use types for irrigated area.

For all the major classes except the irrigated area, the majority fusion technique was applied to aggregate the single class map from individual L8 scenes. Each output pixel was assigned the value of most frequent class in the input single class maps, which was either the “LULC type” or “other LULC type”. In the absence of unique majority, the pixel was labeled as “undecided”. The single class maps, developed for the entire basin (seven maps), were patched together to produce a multi class LULC map for the year 2015/2016. The patching was performed conditionally using RF model kappa estimate as decisive factor. Thus, those classes with higher kappa estimates were prioritized while patching. The gaps mainly due to pixels with “undecided” labels were filled using spatial filter with “maximum” method over 5 by 5 moving window. After extracting the seven major classes for the entire basin, the urban classes (8 and 9) were derived by reclassifying irrigated area class to urban cropland and orchards/parks to urban vegetation class within the Mashhad city boundary.

Irrigated area change dynamically in both space and time over a crop year. A single occurrence selection method was used to develop yearly irrigated area maps for crop years 2013/2014, 2014/2015 and 2015/2016. If a pixel belongs to irrigated area at least once over multiple L8 acquisitions, then it was assigned to this class. Subsequently, the irrigated area class was divided into five sub-classes representing agricultural land use regimes for the three crop years: double crop, single summer crop, single winter crop, summer crop and winter crop. All irrigated area pixels derived from the L8 data acquired in the time period from January to June were labeled as winter crops (see Figure 3), while all the irrigated area pixels derived from the L8 data acquired in the time period from July to November were labeled as summer crops. Those pixels that were classified as irrigated area in both seasons were labeled as double crop. If a pixel was irrigated area only in one of the two seasons, then it was labeled as single summer or single winter crop area, accordingly. Finally, the fallow cropland class was updated by removing those pixels that were irrigated at least once in the three crop years.

4. Results

4.1. Resampling L8 Spectral Bands

According to the correlation estimates, resampled bands could retain around 98% of the spectral details while improving the spatial resolution to 15 m. Table 2 lists the performance indicators of L8 data acquired on 26 February 2016 before and after resampling. Within the candidate images, an average correlation coefficient of 0.98 was reported. The mean and standard deviation computed from each spectral band, before and after the resampling were in the same range (Table 2).

Figure 7a1–a3,b1–b3 shows the L8 data before and after resampling respectively. Visual inspection shows considerable increase in the sharpness of edges due to the high pass filter applied and higher spatial resolution.

4.2. RF Model Performance

Table 3 lists the model validation indicators for each of the seven major LULC classes. The lowest OOB error of 0.005 and the highest kappa estimate of 0.99 was observed for water bodies class. The highest OOB error of 0.08 and lowest kappa estimate of 0.85 was observed for sparse vegetation class. These indicators do not represent the actual accuracy of the LULC map, whereas they indicate how well the model predicted the OOB samples. Figure 7c1–c3 shows the extracted irrigated area by the RF model from the L8 data.

4.3. Classification and Accuracy Assessment

A LULC map at a spatial resolution of 15 m was created for Mashhad basin for the year 2015/2016. Figure 8 shows the LULC map with nine classes as listed in Table 1. Table 4 lists the area in km² of each class in 2015/2016 computed from the LULC map. Around 75% of the total area was covered with sparse vegetation, followed by irrigated area and cropland/rainfed class. The impervious/urban area was reported to be 230.2 km² with 8.2 km² of urban cropland and 43.2 km² of urban vegetation.

The accuracy assessment using combined field survey and stratified random points reported an overall accuracy of 87.2% with an estimated kappa of 0.85 over all the classes. The highest user’s accuracies of 100% and 95.8% were reported for water bodies and impervious/urban area, respectively, while lowest user’s accuracy of 72.7% was reported for cropland/rainfed class. Table 4 lists the user’s accuracy and estimated kappa for major classes. The overall accuracy reported were 84.5% and 91.2% using stratified random and field survey points, respectively.

4.4. Spatial and Temporal Crop Patterns

Five irrigated land use types for three crop years were extracted. Figure 9 shows variation in surface area of fallow, irrigated and five irrigated land use types over three crop years. Total irrigated area in the Mashhad basin was reported to be 1796.16 km², 1578.48 km², and 1578.33 km² for the crop years 2013/2014, 2014/2015, and 2015/2016, respectively. Fallow crop area was reported to be 668.75 km², 886.43 km² and 886.86 km², respectively. Irrigated area with winter crops was 1447.37 km², 1324.84 km², and 1451.80 km², and summer crops was 1155.37 km², 698.78 km² and 1164.41 km² for the crop years 2013/2014, 2014/2015, and 2015/2016, respectively. An average area of 764.68 km² was irrigated twice during summer and winter over the three crop years. Around one third of the summer and winter irrigated area was only irrigated once over the three years.

5. Discussion

Machine learning and decision tree based approaches to classify time series of satellite data have shown great promise in recent years with higher level of accuracy (>85%) compared to classical approaches [31]. Recent studies explore the usability of machine learning algorithms in setting up an operational monitoring system using high resolution satellite data [33,34]. Being an ensemble classifier, RF provides more accurate classification output and is more robust in handling noise compared to other classifiers [31,34,35]. The new method to monitor irrigated area from time series of L8 data implemented in this study reiterates the usability of RF models for such applications. The novelty of this approach is that single class RF models were used to extract respective classes followed by a post processing scheme to create a LULC map of the basin. The advantage is being able to extract single LULC type of interest depending on the application, for example as shown in this study to monitor the irrigated area dynamics in the basin. In an operational monitoring system, a key step would be to extract the extent of irrigated area from a newly acquired L8 data and compare it with the extent extracted from the previous L8 acquisition. Hence, this approach can be used for continuous monitoring of a LULC type, although there needs to be a system in place to periodically validate the results and subsequent fine tuning of the models. The irrigated area estimates could not be validated due to the lack of official statistics at basin scale, however the higher accuracy obtained for the LULC map demonstrates the usability of this method. The new approach is “hierarchical” because the patching of single class maps was performed by prioritizing them based on model prediction indicators. Seasonal aggregation of predicted irrigated area maps gives winter and summer crop area extent. We obtained an average OOB error of 0.04 and observed accuracy of 96% for the single class models demonstrating good prediction capability of the RF models. Accuracy assessment of final LULC map reported an overall accuracy of 87.2%, which is in line with similar studies [34,35]. The lowest user’s accuracy of 72.7% with 0.68 kappa was reported for the class cropland/rainfed class. Separating rainfed cropland from irrigated area is challenging, if a single date imagery is used. In this study, we used training samples from different time periods over a crop year corresponding to the L8 acquisition dates, thus RF model could differentiate both classes at an acceptable accuracy.

Although L8 data are available every sixteen days, cloud coverage and snow reduce the availability of valid observations over the study area (see Figure 3). This may affect the seasonal aggregation of irrigated land use types due to lack of valid data to represent all the irrigated pixels in a season. However, an irrigation cycle covers multiple months, thus the model would be able to capture the standing crop if there is a cloud free image once in 1–2 months during the growing seasons. Due to lack of official statistics for this basin over the study period, we could not validate our results on extent of irrigated land use types. Parameterization of RF model is an important and tedious step that determines the model performance [33]. These parameters often depend on the input data used, number of variables and geographical area, especially for classification [31]. Here, we determined them by training the model with multiple combination of input parameters and subsequently checking the performance using prediction indicators and the classification results. The prediction indicators of the RF models must be taken in to account with the caveat that lower OOB and higher Kappa do not ensure higher accuracy in classification. Instead, it shows how well the model can predict random validation samples (OOB samples). Although the OOB error is unbiased because it is computed from randomly selected OOB, lower estimate could be attributed to over fitting of the classes due to unbalanced sampling sizes [36]. The resampling of L8 data to 15 m preserved the spectral properties of the other bands (Table 2), although it did not improve the spectral quality of the data. The impact of this step on accuracy of LULC map was not covered in this study.

One of the major advantages of using RF is that it can run on large datasets efficiently in less time. For Mashhad, to run a RF model on tile 159/035 which cover more than 80% of the basin, it took 2.4 min on a Linux machine with Intel^® Core™ i7-3770 Quad-Core processor and 32 GB RAM. The use of open source software ensured reproducibility and extendibility, although not investigated in this study. The advantage of using GRASS GIS and OrfeoToolbox is that their functionalities can be scripted together using the command line interfaces or the Python API’s enabling complete automation of the method established [40,41]. The usability of the model to predict “classes” from future acquisitions of satellite data is key in establishing an operational monitoring system. Future work can look into ways to add Sentinel 1 Synthetic Aperture Radar (SAR) and Sentinel 2A/B optical data in order to maximize the number of valid observations from space. It would also be interesting to see how well the model developed for Mashhad basin predicts classes for other semi-arid regions with similar climate and topographical features. This would open up a lot of opportunities in building workflows for periodic land use monitoring at larger scale using one comprehensive unsupervised model for similar regions.

6. Conclusions

The study developed a new approach to extract irrigated land use types from time series of L8 data and using Random Forest machine learning algorithm. This approach was used to develop a LULC map of 2015/2016 with nine major classes for Mashhad basin in north east of Iran, and to extract five irrigated land use types for three crop years—2013/2014, 2014/2015, and 2015/2016. We used HPF based data fusion technique to develop LULC map at spatial resolution of 15 m. The correlation between spectral bands before and after the resampling were reported to be 0.98 averaged over all the spectral bands. Major steps included: (i) developing RF models to extract single classes; (ii) applying a majority filter over time to develop single class map over a year; (iii) hierarchical aggregation based on prediction indicators to derive multi-class LULC map; and (iv) seasonal aggregation to derive five irrigated land use types. An overall accuracy of 87.2% was reported for the LULC map with an estimated kappa of 0.85. The RF models reported an average OOB error of 0.04 and kappa of 0.92. Total irrigated area in the Mashhad basin was estimated to be 1796.16 km², 1578.48 km², and 1578.33 km² for the crop years 2013/2014, 2014/2015, and 2015/2016, respectively.

Irrigated area dynamics over multiple seasons can give critical information about pattern of agricultural land use and corresponding consumptive water use. Agriculture is the dominant water use of both surface and subsurface resources, thus spatially and temporally distributed data on land and water use are essential to identify location specific issues, and then finding solutions applicable to improve water use efficiency and water productivity. The method presented in this study can be adapted to an operational monitoring system using remotely sensed satellite data and machine learning based classification approach. The approach uses open source libraries that provide vast opportunities in optimizing the performance and allows extending the method with limited cost. Finally, such a system has the potential to support water and agricultural officials, managers, planners, and decision makers to monitor and plan resources for sustainable use. Information flow can be customized depending on the level of expertise, be it technical, decisive, spatial, infographics or simple text, catering to a wider audience.

Author Contributions

S.P. and P.K. conceived the research idea; S.P. processed the data; and S.P., P.K. and M.S. analyzed the results. All the authors contributed to the manuscript.

Funding

This research received funding from the Dutch Ministerie van Infrastructuur en Milieu (IenM). Additional funding was also received from European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement No. 606838, with additional support from IHE Delft.

Acknowledgments

We acknowledge the funding received from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement No. 606838, with additional support from IHE Delft. We are grateful to the Dutch Ministerie van Infrastructuur en Milieu (IenM) for the financial support to carry out this research in Mashhad. We also want to thank the field team in Mashhad for their support in data collection. We thank the anonymous reviewers for their critical reading and suggestions which substantially helped to improve the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Rijsberman, F.R. Water scarcity: Fact or fiction? Agric. Water Manag. 2006, 80, 5–22. [Google Scholar] [CrossRef]
Gosling, S.N.; Arnell, N.W. A global assessment of the impact of climate change on water scarcity. Clim. Chang. 2016, 134, 371–385. [Google Scholar] [CrossRef]
Vörösmarty, C.J.; Green, P.; Salisbury, J.; Lammers, R.B. Global Water Resources: Vulnerability from Climate Change and Population Growth. Science 2000, 289, 284–288. [Google Scholar] [CrossRef] [PubMed]
Mukherji, A.; Facon, T.; de Fraiture, C.; Molden, D.; Chartres, C. Growing more food with less water: How can revitalizing Asia’s irrigation help? Water Policy 2012, 14, 430–446. [Google Scholar] [CrossRef]
Seckler, D.; Barker, R.; Amarasinghe, U. Water Scarcity in the Twenty-first Century. Int. J. Water Resour. Dev. 1999, 15, 29–42. [Google Scholar] [CrossRef]
Cosgrove, W.J.; Rijsberman, F.R. World Water Vision: Making Water Everybody’s Business; Earthscan Publications: London, UK, 2000. [Google Scholar]
Karimi, P.; Molden, D.; Notenbaert, A.; Peden, D. Nile basin farming systems and productivity. In The Nile River Basin; Water, Agriculture, Governance and Livelihoods; Routledge: Abingdon, UK, 2012; pp. 133–153. [Google Scholar]
United Nations (UN). General Assembly, Transforming Our World: The 2030 Agenda for Sustainable Development; Resolution Adopted by the United Nations General Assembly on 25 September 2015; United Nations: New York, NY, USA, 2015. [Google Scholar]
Levidow, L.; Zaccaria, D.; Maia, R.; Vivas, E.; Todorovic, M.; Scardigno, A. Improving water-efficient irrigation: Prospects and difficulties of innovative practices. Agric. Water Manag. 2014, 146, 84–94. [Google Scholar] [CrossRef]
Ozdogan, M.; Yang, Y.; Allez, G.; Cervantes, C. Remote Sensing of Irrigated Agriculture: Opportunities and Challenges. Remote Sens. 2010, 2, 2274–2304. [Google Scholar] [CrossRef]
Grafton, R.Q.; Williams, J.; Perry, C.J.; Molle, F.; Ringler, C.; Steduto, P.; Udall, B.; Wheeler, S.A.; Wang, Y.; Garrick, D.; et al. The paradox of irrigation efficiency. Science 2018, 361, 748–750. [Google Scholar] [CrossRef]
Bastiaanssen, W.G.M.; Karimi, P.; Rebelo, L.M.; Duan, Z.; Senay, G.; Muthuwatte, L.; Smakhtin, V. Earth Observation Based Assessment of the Water Production and Water Consumption of Nile Basin Agro-Ecosystems. Remote Sens. 2014, 6, 10306–10334. [Google Scholar] [CrossRef]
Ajaz, A.; Karimi, P.; de Fraiture, C.; Xueliang, C. National and global censuses or satellite-based estimates? Asia’s irrigated areas: In a muddle. In Proceedings of the 2nd World Irrigation Forum, Chiang Mai, Thailand, 6–8 November 2016. [Google Scholar]
Bastiaanssen, W.G.M. Remote Sensing in Water Resources Management: The State of the Art; IWMI Research Report; International Water Management Institute: Colombo, Sri Lanka, 1998; p. 118. [Google Scholar]
Bastiaanssen, W.G.M.; Molden, D.J.; Makin, I.W. Remote sensing for irrigated agriculture: Examples from research and possible applications. Agric. Water Manag. 2000, 46, 137–155. [Google Scholar] [CrossRef]
Thenkabail, P.S.; Biradar, C.M.; Noojipady, P.; Dheeravath, V.; Li, Y.; Velpuri, M.; Gumma, M.; Gangalakunta, O.R.P.; Turral, H.; Cai, X.; et al. Global irrigated area map (GIAM), derived from remote sensing, for the end of the last millennium. Int. J. Remote Sens. 2009, 30, 3679–3733. [Google Scholar] [CrossRef]
Biradar, C.M.; Thenkabail, P.S.; Noojipady, P.; Li, Y.; Dheeravath, V.; Turral, H.; Velpuri, M.; Gumma, M.K.; Gangalakunta, O.R.P.; Cai, X.L.; et al. A global map of rainfed cropland areas (GMRCA) at the end of last millennium using remote sensing. Int. J. Appl. Earth Obs. Geoinf. 2009, 11, 114–129. [Google Scholar] [CrossRef]
Zhang, H.K.; Roy, D.P. Using the 500m MODIS land cover product to derive a consistent continental scale 30m Landsat land cover classification. Remote Sens. Environ. 2017, 197, 15–34. [Google Scholar] [CrossRef]
Cieślak, I.; Szuniewicz, K.; Pawlewicz, K.; Czyża, S. Land Use Changes Monitoring with CORINE Land Cover Data. IOP Conf. Ser. Mater. Sci. Eng. 2017, 245, 052049. [Google Scholar] [CrossRef]
Ambika, A.K.; Wardlow, B.; Mishra, V. Remotely sensed high resolution irrigated area mapping in India for 2000 to 2015. Sci. Data 2016, 3, 160118. [Google Scholar] [CrossRef] [PubMed]
Funk, C.; Verdin, J.P. Real-Time Decision Support Systems: The Famine Early Warning System Network. In Satellite Rainfall Applications for Surface Hydrology; Gebremichael, M., Hossain, F., Eds.; Springer: Dordrecht, The Netherlands, 2010; pp. 295–320. [Google Scholar] [CrossRef]
López-Lozano, R.; Duveiller, G.; Seguini, L.; Meroni, M.; García-Condado, S.; Hooker, J.; Leo, O.; Baruth, B. Towards regional grain yield forecasting with 1km-resolution EO biophysical products: Strengths and limitations at pan-European level. Agric. For. Meteorol. 2015, 206, 12–32. [Google Scholar] [CrossRef]
Wu, B.; Meng, J.; Li, Q.; Yan, N.; Du, X.; Zhang, M. Remote sensing-based global crop monitoring: Experiences with China’s CropWatch system. Int. J. Digit. Earth 2014, 7, 113–137. [Google Scholar] [CrossRef]
Becker-Reshef, I.; Justice, C.; Sullivan, M.; Vermote, E.; Tucker, C.; Anyamba, A.; Small, J.; Pak, E.; Masuoka, E.; Schmaltz, J.; et al. Monitoring Global Croplands with Coarse Resolution Earth Observations: The Global Agriculture Monitoring (GLAM) Project. Remote Sens. 2010, 2, 1589–1609. [Google Scholar] [CrossRef]
Rembold, F.; Meroni, M.; Urbano, F.; Lemoine, G.; Kerdiles, H.; Perez-Hoyos, A.; Csak, G. ASAP—Anomaly hot Spots of Agricultural Production, a new early warning decision support system developed by the Joint Research Centre. In Proceedings of the 2017 9th International Workshop on the Analysis of Multitemporal Remote Sensing Images (MultiTemp), Brugge, Belgium, 27–29 June 2017; pp. 1–5. [Google Scholar] [CrossRef]
Fritz, S.; See, L.; Bayas, J.C.L.; Waldner, F.; Jacques, D.; Becker-Reshef, I.; Whitcraft, A.; Baruth, B.; Bonifacio, R.; Crutchfield, J.; et al. A comparison of global agricultural monitoring systems and current gaps. Agric. Syst. 2019, 168, 258–272. [Google Scholar] [CrossRef]
Atzberger, C. Advances in Remote Sensing of Agriculture: Context Description, Existing Operational Monitoring Systems and Major Information Needs. Remote Sens. 2013, 5, 949–981. [Google Scholar] [CrossRef]
Jones, J.W.; Antle, J.M.; Basso, B.; Boote, K.J.; Conant, R.T.; Foster, I.; Godfray, H.C.J.; Herrero, M.; Howitt, R.E.; Janssen, S.; et al. Toward a new generation of agricultural system data, models, and knowledge products: State of agricultural systems science. Agric. Syst. 2017, 155, 269–288. [Google Scholar] [CrossRef] [PubMed]
Gómez, C.; White, J.C.; Wulder, M.A. Optical remotely sensed time series data for land cover classification: A review. ISPRS J. Photogramm. Remote Sens. 2016, 116, 55–72. [Google Scholar] [CrossRef]
Liu, P. A survey of remote-sensing big data. Environ. Inform. 2015, 3, 45. [Google Scholar] [CrossRef]
Inglada, J.; Arias, M.; Tardy, B.; Morin, D.; Valero, S.; Hagolle, O.; Dedieu, G.; Sepulcre, G.; Bontemps, S.; Defourny, P. Benchmarking of algorithms for crop type land-cover maps using Sentinel-2 image time series. In Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 3993–3996. [Google Scholar] [CrossRef]
Maxwell, A.E.; Warner, T.A.; Fang, F. Implementation of machine-learning classification in remote sensing: An applied review. Int. J. Remote Sens. 2018, 39, 2784–2817. [Google Scholar] [CrossRef]
Inglada, J.; Vincent, A.; Arias, M.; Tardy, B.; Morin, D.; Rodes, I. Operational High Resolution Land Cover Map Production at the Country Scale Using Satellite Image Time Series. Remote Sens. 2017, 9, 95. [Google Scholar] [CrossRef]
Inglada, J.; Arias, M.; Tardy, B.; Hagolle, O.; Valero, S.; Morin, D.; Dedieu, G.; Sepulcre, G.; Bontemps, S.; Defourny, P.; et al. Assessment of an Operational System for Crop Type Map Production Using High Temporal and Spatial Resolution Satellite Optical Imagery. Remote Sens. 2015, 7, 12356–12379. [Google Scholar] [CrossRef]
Rodriguez-Galiano, V.F.; Ghimire, B.; Rogan, J.; Chica-Olmo, M.; Rigol-Sanchez, J.P. An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J. Photogramm. Remote Sens. 2012, 67, 93–104. [Google Scholar] [CrossRef]
Berhane, T.M.; Lane, C.R.; Wu, Q.; Autrey, B.C.; Anenkhonov, O.A.; Chepinoga, V.V.; Liu, H. Decision-Tree, Rule-Based, and Random Forest Classification of High-Resolution Multispectral Imagery for Wetland Mapping and Inventory. Remote Sens. 2018, 10, 580. [Google Scholar] [CrossRef]
Karimi, P.; Molden, D.; Bastiaanssen, W.; Cai, X. Water accounting to assess use and productivity of water: Evolution of a concept and new frontiers. In Water Accounting: International Approaches to Policy and Decision-Making; Edward Elgar Publishing: Cheltenham, UK, 2012; pp. 76–88. [Google Scholar]
Motagh, M.; Djamour, Y.; Walter, T.R.; Wetzel, H.U.; Zschau, J.; Arabi, S. Land subsidence in Mashhad Valley, northeast Iran: Results from InSAR, levelling and GPS. Geophys. J. Int. 2007, 168, 518–526. [Google Scholar] [CrossRef]
Karimi, P.; Qureshi, A.S.; Bahramloo, R.; Molden, D. Reducing carbon emissions through improved irrigation and groundwater management: A case study from Iran. Agric. Water Manag. 2012, 108, 52–60. [Google Scholar] [CrossRef]
Neteler, M.; Bowman, M.H.; Landa, M.; Metz, M. GRASS GIS: A multi-purpose open source GIS. Environ. Model. Softw. 2012, 31, 124–130. [Google Scholar] [CrossRef]
Grizonnet, M.; Michel, J.; Poughon, V.; Inglada, J.; Savinaud, M.; Cresson, R. Orfeo ToolBox: Open source processing of remote sensing images. Open Geospat. Data Softw. Stand. 2017, 2, 15. [Google Scholar] [CrossRef]
United States Geological Survey (USGS). Landsat 8 (L8) Data Users Handbook; Department of the Interior U.S. Geological Survey: Sioux Falls, SD, USA, 2016.
Gangkofner, U.G.; Pradhan, P.S.; Holcomb, D.W. Optimizing the High-Pass Filter Addition Technique for Image Fusion. Photogramm. Eng. Remote Sens. 2007, 73, 1107–1118. [Google Scholar] [CrossRef]
Foody, G.M. Status of land cover classification accuracy assessment. Remote Sens. Environ. 2002, 80, 185–201. [Google Scholar] [CrossRef]
Janitza, S.; Hornung, R. On the overestimation of random forest’s out-of-bag error. PLoS ONE 2018, 13, e0201904. [Google Scholar] [CrossRef]

Figure 1. Study area: Mashhad basin in the northeast of Iran.

Figure 2. Five Landsat tiles covering the study area. The numbers indicate the path and row, for example 160/034 represents path 160 and row 34.

Figure 3. List of L8 data used in this study over five tiles. The dates in black were used for extracting summer crop area and those in red were used for winter crop area. The blue points represent dates used to develop RF models. The dates given in gray area for the year 2016 were used for developing the LULC map. Note that L8 data from path 158 were used only for LULC map and the data acquired on 16 March 2017 were used only for developing RF model.

Figure 4. L8 QA band bits and its explanation (Source: https://landsat.usgs.gov/qualityband).

Figure 5. Validation points collected from field (red) and stratified random points (blue) used for accuracy assessment.

Figure 6. Block diagram showing the key steps of the methodology used in this study. For each Major Class (MC), Steps 1–3 were repeated to develop single class maps, followed by temporal aggregation.

Figure 7. (a) Shows the False Color Composite(FCC) from spectral bands in 30 m; (b) shows the FCC in 15 m; and (c) shows the extracted irrigated area in 15 m for the respective dates.

Figure 8. Classified land cover map of Mashhad basin for 2015/2016.

Figure 9. Bar graph showing annual surface area of fallow, irrigated and five irrigated land use types. Note that the classes “summer crop” and “winter crop” include “double crop” area.

Table 1. Major LULC types and irrigated sub-classes of Mashhad basin extracted in this study.

Class ID	Major LULC Type	Sub-Classes
1	Impervious/urban area	N.A
2	Orchards/parks/dense trees	N.A
3	Water bodies	N.A
4	Sparse vegetation	N.A
5	Cropland/rainfed	N.A
6	Irrigated area	Double crop Single summer crop Single winter crop Summer crop Winter crop
7	Fallow cropland	N.A
8	Urban cropland	N.A
9	Urban vegetation	N.A

Table 2. Correlation analysis results of L8 image dated 26 February 2016. R (correlation coefficient), Intercept and Slope are linear regression outputs, Mean_orig and SD_Orig are mean and standard deviation of original spectral bands, respectively; and Mean_HPFA and SD_HPFA are mean and standard deviation of resampled spectral bands, respectively.

Spectral Band	R	Intercept	Slope	Mean_orig	SD_orig	Mean_HPFA	SD_HPFA
B2	0.98	−7.2	1	0.15	0.03	0.15	0.03
B3	0.99	6.2	1	0.15	0.04	0.16	0.04
B4	0.99	23.5	0.99	0.18	0.05	0.18	0.05
B5	0.98	67.6	0.97	0.24	0.06	0.24	0.06
B6	0.97	40.1	0.98	0.26	0.08	0.26	0.08
B7	0.98	33.8	0.98	0.23	0.08	0.22	0.08

Table 3. RF model prediction indicators. OOB, Out Of Box error.

Class ID	Major Land Cover	Observed Accuracy	OOB	Kappa
1	Impervious/urban area	95%	0.05	0.91
2	Orchards/parks	98%	0.02	0.97
3	Water bodies	100%	0.005	0.99
4	Sparse vegetation	92%	0.08	0.85
5	Cropland/rainfed	97%	0.03	0.93
6	Irrigated area	97%	0.03	0.95
7	Fallow cropland	95%	0.05	0.91

Table 4. Reported user’s accuracy, kappa estimates and surface area of major land cover classes.

Class ID	Land Cover	User’s Accuracy	Kappa	Area (Sq.Km)
1	Impervious/urban area	95.8%	0.95	230.2
2	Orchards/parks	88.6%	0.87	140.6
3	Water bodies	100%	1	4.8
4	Sparse vegetation	88.2%	0.81	12,442.4
5	Cropland/rainfed	72.7%	0.68	1442.5
6	Irrigated area	87.2%	0.85	1578.3
7	Fallow cropland	88.8%	0.87	886.7

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pareeth, S.; Karimi, P.; Shafiei, M.; De Fraiture, C. Mapping Agricultural Landuse Patterns from Time Series of Landsat 8 Using Random Forest Based Hierarchial Approach. Remote Sens. 2019, 11, 601. https://0-doi-org.brum.beds.ac.uk/10.3390/rs11050601

AMA Style

Pareeth S, Karimi P, Shafiei M, De Fraiture C. Mapping Agricultural Landuse Patterns from Time Series of Landsat 8 Using Random Forest Based Hierarchial Approach. Remote Sensing. 2019; 11(5):601. https://0-doi-org.brum.beds.ac.uk/10.3390/rs11050601

Chicago/Turabian Style

Pareeth, Sajid, Poolad Karimi, Mojtaba Shafiei, and Charlotte De Fraiture. 2019. "Mapping Agricultural Landuse Patterns from Time Series of Landsat 8 Using Random Forest Based Hierarchial Approach" Remote Sensing 11, no. 5: 601. https://0-doi-org.brum.beds.ac.uk/10.3390/rs11050601

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Mapping Agricultural Landuse Patterns from Time Series of Landsat 8 Using Random Forest Based Hierarchial Approach

Abstract

1. Introduction

2. Study Area

3. Methods

3.1. Satellite Data

3.2. Pre-Processing Landsat 8 Data

3.3. Training and Validation Samples

3.4. Classification Using Random Forest (RF)

4. Results

4.1. Resampling L8 Spectral Bands

4.2. RF Model Performance

4.3. Classification and Accuracy Assessment

4.4. Spatial and Temporal Crop Patterns

5. Discussion

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI