Next Article in Journal
Airborne Remote Sensing of a Biological Hot Spot in the Southeastern Bering Sea
Previous Article in Journal
Mapping Topography Changes and Elevation Accuracies Using a Mobile Laser Scanner
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Virtual Interpretation of Earth Web-Interface Tool (VIEW-IT) for Collecting Land-Use/Land-Cover Reference Data

1
Center for Interdisciplinary Geospatial Analysis, Department of Geography and Global Studies, Sonoma State University, Rohnert Park, CA 94928, USA
2
Department of Biology, University of Puerto Rico, P.O. Box 23360, San Juan, PR 00931, USA
*
Author to whom correspondence should be addressed.
Remote Sens. 2011, 3(3), 601-620; https://0-doi-org.brum.beds.ac.uk/10.3390/rs3030601
Submission received: 20 January 2011 / Revised: 18 February 2011 / Accepted: 14 March 2011 / Published: 21 March 2011

Abstract

:
Web-based applications that integrate geospatial information, or the geoweb, offer exciting opportunities for remote sensing science. One such application is a Web‑based system for automating the collection of reference data for producing and verifying the accuracy of land-use/land-cover (LULC) maps derived from satellite imagery. Here we describe the capabilities and technical components of the Virtual Interpretation of Earth Web-Interface Tool (VIEW-IT), a collaborative browser-based tool for “crowdsourcing” interpretation of reference data from high resolution imagery. The principal component of VIEW-IT is the Google Earth plug-in, which allows users to visually estimate percent cover of seven basic LULC classes within a sample grid. The current system provides a 250 m square sample to match the resolution of MODIS satellite data, although other scales could be easily accommodated. Using VIEW-IT, a team of 23 student and 7 expert interpreters collected over 46,000 reference samples across Latin America and the Caribbean. Samples covered all biomes, avoided spatial autocorrelation, and spanned years 2000 to 2010. By embedding Google Earth within a Web-based application with an intuitive user interface, basic interpretation criteria, distributed Internet access, server-side storage, and automated error-checking, VIEW-IT provides a time and cost efficient means of collecting a large dataset of samples across space and time. When matched with predictor variables from satellite imagery, these data can provide robust mapping algorithm calibration and accuracy assessment. This development is particularly important for regional to global scale LULC mapping efforts, which have traditionally relied on sparse sampling of medium resolution imagery and products for reference data. Our ultimate goal is to make VIEW-IT available to all users to promote rigorous, global land-change monitoring.

Graphical Abstract

1. Introduction

Web-based applications that use geospatial information—part of the “geoweb”—are evolving at a rapid pace, especially with the rise of open-source web mapping services and associated application programming interfaces (APIs), such as the Google Maps API released in 2005 (code.google.com/apis/maps). Many geoweb applications blend geographic and non-spatial information for data visualization and exploration [1,2,3,4]. However, the technology can also allow multiple users to collaborate in collecting geospatial information in a Web-based platform [4,5,6]. Web sites that are open to a large user community can permit more data collection than could be acquired in a more closed system, a phenomena known as “crowdsourcing” [5,6]. “Volunteered geographic information”, or VGI, is a new term used to describe data collection by general Internet citizens without professional expertise [7]. Examples of Web-based VGI include OpenStreetMap (www.openstreetmap.org), Google Map Maker (mapmaker.google.com), E-Flora (www.geog.ubc.ca/biodiversity/eflora), eBird (ebird.org), and Crowdmap (crowdmap.com).
There is also great potential for geoweb technology in remote sensing science. Geoweb sites allow users to browse and visualize remote sensing products across broad areas with simple user interfaces, and can offer base imagery and other map layers that provide geographic context. These map services are generally cached for quick display with relatively slow Internet connections. As an example, the Web Fire Mapper webpage (firefly.geog.umd.edu/firemap) allows viewing of MODIS fire/hotspot and burned area products overlaid on cloud-free, coarse-resolution satellite imagery along with political boundaries, cities and protected areas layers. Another example is the Rapid Assessment of Land Use Change In and Around Protected Areas (RALUCIAPA) webpage, which allows users to view MODIS Vegetation Continuous Field (VCF) percent forest cover data (2000–2005) with overlays of protected areas and base satellite images [8]. In this case, base imagery is streamed from Google Maps or Earth or Microsoft Bing Maps (www.microsoft.com/maps/developers) services, which cover parts of the world with high resolution images. Most of these images, which outside the United States and Europe generally come from commercial satellites (e.g., Quickbird, IKONOS), would be prohibitively expensive to acquire at regional to global scales.
By allowing easy display of remote sensing products against high resolution imagery, geoweb applications allow a form of visual accuracy assessment. This development alone is an important step toward improving the science of remote sensing. In the past, many remote sensing products remained within the domain of experts and had little scrutiny beyond an initial accuracy assessment, which for large mapping projects is often based on samples from limited field plots or medium-scale satellite imagery (e.g., Landsat). Now a global user community can assess remote sensing products, which could help identify errors and improve product development. An example of this concept is the Geo‑Wiki Project, a new VGI geoweb application that seeks crowdsourced review of areas where three global land-cover maps disagree in terms of forest and agriculture [5]. Volunteers can compare hotspots of disagreement to Google Earth (GE) high resolution imagery (sub-meter to 4-m resolution), upload and view georeferenced photographs, and assess the accuracy of the map products within their pixel extents. Results are stored in a server-side spatial database, which can be downloaded for use in other applications.
One part of the remote sensing process that could greatly benefit from geoweb technology is the collection of large quantities of reference data needed for calibrating mapping algorithms (e.g., classifier training, regression analysis) and validation (e.g., accuracy assessment). Several recent studies have used visual interpretation of high resolution imagery in the GE desktop application to provide low-cost and reasonably accurate reference data for both producing land-cover maps and testing their accuracy [9,10,11,12]. The imagery has a spatial accuracy of <40 m, which is adequate for comparison to coarser-resolution satellite imagery [9,13]. An advantage of GE over other API-based map services accessed within webpages (e.g., Google Maps, Microsoft Bing) is that historical high resolution images are available with their dates, allowing temporal sampling that can span ten or more years [9]. We used GE to manually collect reference data in Argentina, Bolivia and Paraguay [9], but to expand this method to a global scale and automate the crowdsourcing of reference data interpretation, we needed a Web-based system. To fill this need, we designed the Virtual Interpretation of Earth Web-Interface Tool (VIEW-IT), which integrates the GE Javascript API with other geoweb technologies to form a system that automates interpretation of high resolution imagery. Here we report on the use of VIEW-IT to facilitate our mapping of land change in Latin America and the Caribbean with 250 m MODIS imagery; however, our broader goal is to build a global community of VIEW-IT volunteer interpreters, especially in the developing world, where land change is most rapid. To this end, we designed VIEW-IT with the following requirements: (1) accessible by Web browsers with slow Internet connections; (2) a simple user interface; (3) interpretation criteria require minimal training and apply across the globe; (4) the system manages and cross-checks basic user interpretations; (5) expert users can correct interpretations; (6) all information is stored in a database that facilitates management, queries and downloading; and, (7) the system is composed of open-source technology and free map services to minimize access costs and allow portability.

2. The Virtual Interpretation of Earth Web-Interface Tool (VIEW-IT)

The Virtual Interpretation of Earth Web-Interface Tool (VIEW-IT) is a collaborative, land-use/land-cover (LULC) reference data collection system that is accessed through a web browser. The system has been designed to provide reference data at the 250 m nominal scale of pixels in the MOD13Q1 MODIS satellite product (actual pixel size is 231.7 m), although other scales could be added in the future. A user visits sample locations centered on MODIS pixels and visually estimates percent cover of LULC classes within a 250 × 250 m interpretation grid from high-resolution images in Google Earth (GE), which in many parts of the developing world, are patches of scenes from the Quickbird and IKONOS commercial satellites. A detailed description of VIEW-IT’s design, capabilities and protocols are described in the following sub-sections.

2.1. Design Characteristics

VIEW-IT was developed on a Windows operating system, but its components are mainly open‑source software that could be ported to another operating system (Figure 1). Apache is used as the web server and web pages are formed from a combination of JavaScript and PHP scripts. The critical component of VIEW-IT is the Google Earth plug-in and its JavaScript API, which allows a 3D digital globe view, satellite imagery and navigation controls to be embedded within the web page. We used GE plug-in versions 4 and 5 in developing and using VIEW-IT, although a version 6 was released on 29 November 2010 and is shown in our figures. The new version allows browsing of historical high‑resolution imagery, while our research versions of the plug-in only allowed a user to look at the most recent images. Other JavaScript APIs integrated into VIEW-IT include (Figure 1): Panoramio (www.panoramio.com) to view georeferenced (e.g., “geo-tagged”) photographs uploaded by a global community; Google Charts for viewing temporal Enhanced Vegetation Index (EVI) data at sample points and administration data summary; and, ArcGIS JavaScript API for displaying biome and ecoregion polygon and VIEW-IT sample points GIS layers from ArcGIS Server 9.3 (ESRI, Inc.) Representational State Transfer (REST) map services.
Figure 1. Technical and user components of the Virtual Interpretation of Earth Web-Interface Tool (VIEW-IT).
Figure 1. Technical and user components of the Virtual Interpretation of Earth Web-Interface Tool (VIEW-IT).
Remotesensing 03 00601 g001
Each VIEW-IT sample is bounded by a 250 × 250 m square (62,500 m2) centered on a MODIS pixel with an internal 4 × 5-cell grid. Each internal cell covers 5% of the 250 m square (each grid cell is 62.5 × 50 m = 3,125 m2). Reference grids are developed in the Interrupted Goode Homolosine (IGH) projection, WGS84 datum and then projected to the geographic coordinate system (GCS, i.e., simple cylindrical projection with latitude, longitude) for viewing in GE as a KML overlay (for more information, see [9]). Sample attributes (e.g., biome, ecoregion, country, municipality), vector interpretation grid (KML code), and user-interpreted data are stored in a MySQL database (Figure 1).
Server-side checks and automated processes are carried out with a combination of common gateway interface (CGI) and Python scripts. For example, a check is made to insure that no two samples are within 1,000 m from each other, reducing spatial autocorrelation among samples. Automated processes include acquiring attributes for new samples from spatial overlays with existing GIS layers, adding new samples to a GIS point layer, and restarting an ArcGIS Server map service to display the updated layer. Raster stacks of EVI data for the whole study area, extracted from the MOD13 product for each MODIS tile, are stored in binary files and web-server CGI and Python programs provide an EVI temporal profile for the sample pixel (Figure 1).

2.2. Human Interpretation Protocol

For each 250 × 250 m sample, human interpreters use VIEW-IT to estimate percent cover of LULC classes within the grid overlaid on high-resolution GE imagery (Figure 2). In the sample interpretation window, the user has the option to either continue with interpretation (e.g., imagery is “usable”), or reject the sample if it is does not have high resolution imagery, has no visible date, or is a sample with too many mixed classes (e.g., “mixed pixel”). Using pop-up menus, VIEW-IT users estimate percent cover for classes within the sample grid to the nearest 10% and record the year of the GE high resolution image. The VIEW-IT visual interpretation protocol includes seven primary classes with criteria described in Table 1. These classes were chosen because they could be reliably identified using visual features by interpreters that had little a priori understanding of the landscape [9]. Each interpreter is allowed to consider the larger landscape context around the sample by navigating within GE (e.g., zoom, rotate, tilt) or by clicking on fixed view heights. Other options that can aid interpretation include the ability to view (Figure 2): attribute information on the sample’s biome, ecoregion, country, municipality; interpretation grid and GE imagery overlaid on 3D terrain; GIS layers of biome, ecoregion and existing VIEW-IT sample locations; icons and associated pop-up windows for available Panoramio geo-tagged photographs in the vicinity; and an interactive Google Chart of EVI data from 2001 to 2009. However, the final percent cover estimate is confined to the LULC within the sample grid. A VIEW-IT user inputs the year of the high resolution image being sampled by reading the date information from GE (Figure 2).
Figure 2. The VIEW-IT interpretation window with light-blue sample and its internal 4 × 5 grid (each cell covers 5% of the sample’s total area), overlaid on Google Earth imagery. The entire square sample is 250 × 250 m, the nominal size of a MODIS pixel and its location and orientation is centered on a MODIS pixel projected to geographic coordinates (latitude, longitude) from the Goode Homolosine projection. The sample is in a sugar cane field near Juan Bautista Alberdi, Argentina. The high resolution image is from the Quickbird satellite, as indicated by the DigitalGlobe copyright (bottom middle of Google Earth window), and the image was acquired in 16 May 2008 (bottom left corner of Google Earth window). The EVI profile shows an annual cycle of sugar cane growth and harvest, with a value of 0.59 on 8 May 2008 (dot on EVI graph, data displayed top right of graph) coinciding with the ground conditions seen in the Google Earth imagery. Note that the field was harvested shortly after this image was acquired, as indicated by a lower 0.24 EVI on 9 June 2008. The blue icon in the top left of the image on the road represents a clickable Panoramio photograph. See the online supplemental materials for a higher resolution version of this figure.
Figure 2. The VIEW-IT interpretation window with light-blue sample and its internal 4 × 5 grid (each cell covers 5% of the sample’s total area), overlaid on Google Earth imagery. The entire square sample is 250 × 250 m, the nominal size of a MODIS pixel and its location and orientation is centered on a MODIS pixel projected to geographic coordinates (latitude, longitude) from the Goode Homolosine projection. The sample is in a sugar cane field near Juan Bautista Alberdi, Argentina. The high resolution image is from the Quickbird satellite, as indicated by the DigitalGlobe copyright (bottom middle of Google Earth window), and the image was acquired in 16 May 2008 (bottom left corner of Google Earth window). The EVI profile shows an annual cycle of sugar cane growth and harvest, with a value of 0.59 on 8 May 2008 (dot on EVI graph, data displayed top right of graph) coinciding with the ground conditions seen in the Google Earth imagery. Note that the field was harvested shortly after this image was acquired, as indicated by a lower 0.24 EVI on 9 June 2008. The blue icon in the top left of the image on the road represents a clickable Panoramio photograph. See the online supplemental materials for a higher resolution version of this figure.
Remotesensing 03 00601 g002
Table 1. Visual criteria used for estimating percent cover of land-use/land-cover classes in Virtual Interpretation of Earth Web-Interface Tool (VIEW-IT) (Section 2.2).
Table 1. Visual criteria used for estimating percent cover of land-use/land-cover classes in Virtual Interpretation of Earth Web-Interface Tool (VIEW-IT) (Section 2.2).
ClassAbbreviationVisual Criteria
Built-up areas BuiltUrban and industrial buildings, infrastructure and associated roads
Water WaterLakes and large rivers
Bare areas BareIn addition to including areas of bare soil, which could be common in deserts, this class also includes ice, snow, sand dunes, rock, salt flats, and dry riverbeds. Open-pit mines with exposed soil/rock are included in this class.
Agriculture AgAgricultural fields with annual crops (e.g., sugar cane, corn, wheat, soybean, rice). Perennial crops (e.g., citrus plantations) are included in the plantation class. Crops can usually be detected by plow lines, rectilinear shapes, and nearby roads and infrastructure. Bare soil in this context was classified as agriculture, but fallow agricultural land was classified as herbaceous or woody vegetation.
Plantations PlantThe major characteristics of plantations are: perennial vegetation and the regular spacing of the plants. Common examples in the Chaco are pine and eucalyptus plantations, citrus and olive orchards, and vineyards. Roads, bare ground, or grass within the plantation were considered as part of the plantation.
Herbaceous vegetation HerbThis class is usually dominated by native or planted grasses and herbs. The most common land use in this class is cattle pasture, which can be distinguished by trails and watering holes. This class can be confused with agriculture but is usually more heterogeneous in color (green, gray, brown) and texture.
Woody vegetation WoodyTrees and shrubs are the major components of this class. Although most areas in this class are natural areas, woody vegetation can also occur within agricultural and urban regions.
Mixed woody vegetation MixedWoodySamples with a mix of Woody, Herb and Bare percent cover. Not interpreted directly in VIEW-IT, but assigned in post-processing of samples (See Section 2.5).

2.3. Cross-Checking of Interpretations

Two basic-level VIEW-IT users, hereafter referred to as “users”, visit each sample without knowledge of the other user’s interpretation. If the majority cover class and date for the two interpretations are the same, then the sample is labeled as “classified” and percent cover data are averaged for the sample’s final estimate. Conflicts in majority cover class or image year are labeled as “conflict” by VIEW-IT for later review by an expert-level user, hereafter referred to as “expert”, who then inputs data that supersedes that from the users. When reviewing a sample, the expert has the ability to observe the percent cover, date and user names from the two user interpretations. This provides the additional benefit of allowing the expert to identify users that consistently have problems in their interpretations for subsequent training and iterative improvement in accuracy. When an expert visits a sample without user data, then the expert’s data are accepted as “classified”, without further verification.
As implemented, our system does not filter two interpretations that have the same majority class, but with large differences in estimated cover. For example, two users could each interpret a sample to have 50% and 100% Woody, respectively. In this case, upon download VIEW-IT would average the user interpretations, created a final sample with 75% Woody cover. However, the original individual interpretations are stored in the VIEW-IT database, can be downloaded, and these data could be filtered with post-processing (e.g., in a spreadsheet) with custom criteria as needed.
There are situations when a user’s percent cover estimates have no clear majority class (e.g., 50% cover of two classes, or 40%, 40%, 20% cover of three classes). If one user has a clear majority in a class, and this class is one of the tied classes from the other user, then the sample is labeled as classified, with no conflict. For example, the first user’s estimate could have 40% Ag and 60% Herb, and the second user could have 50% Ag and 50% Herb—a tie. In this example, the majority class for the sample will be Herb. Similarly, one user may have 60% Ag and 40% Herb, and the other user 40% Ag, 40% Herb and 20% Bare (another tie). In this case, the majority class would be Ag as it was the majority class in the first user’s estimate and one of the dominant tied classes in the second user’s estimate.
Another situation that can cause conflicts is when substantial time passes between the sample’s first and second interpretations, and Google updates high resolution imagery for the location in the interim. There is potential for two conflicts that would need to be resolved by an expert. One is a date conflict if the image years differ. Also, the updated imagery may include a change in majority class (e.g., from deforestation), which can cause a class conflict. The new historical imagery features of the API can now be used to ensure image dates are the same for the two user estimates.

2.4. Sampling Protocol

There are currently two methods for placing sample points in VIEW-IT. One option is for an administrator to pre-load samples into VIEW-IT. In our initial work, these points were randomly generated within ecoregions [14] using ArcGIS Desktop (ESRI, Inc.) and converted to a GE KML file for upload to the system using a custom-designed tool. However, we found that too many samples landed in areas without high resolution imagery, or had other issues such as cloud coverage, no date or heavy mixture of classes, and thus we wasted labor in rejecting samples. Furthermore, this method mainly sampled woody vegetation, and we did not generate enough samples in less prevalent classes, such as built-up areas [9]. To resolve these issues, we tried a stratified sampling approach [9], where in GE application we digitized large polygons over homogenous patches of each class and then generated random samples within those patches, which were then uploaded to VIEW-IT. Although this method was preferable from a statistical perspective, it became apparent that collecting samples with the stratified sampling method would take too long to cover all of LAC. We next developed an alternate and preferred sampling method, which is to have a user select a sample location manually within VIEW-IT. In this case, a user has the flexibility to navigate GE, zoom in on high resolution imagery, click a desired sample location, and observe the pixel-centered interpretation grid and EVI profile before accepting the location for sampling. Once a location is selected for sampling, the user or expert then follows the interpretation protocols (Section 2.1.3). If the first interpretation is from a user, then the point is available for an interpretation by a second user.
Users can browse existing samples, either pre-loaded by an administrator or manually placed by another user, based on search criteria of biome, ecoregion, country, and municipality. Experts can also browse samples by name of user, samples with percent cover and/or date issues, and by sample identification number.

2.5. Additional Capabilities

All users have access to a “Charts” page that provides graphical summaries of samples in the VIEW-IT database (Figure 3(A)). Users can first filter the samples in the database by biome, ecoregion, country, municipality, class type, status (e.g., classified, one user), issue type (e.g., class conflict, date conflict) and user name. The selected set can then be viewed as a pie chart, bar chart or map using Google Charts API. A key component of the filter is the ability to identify those samples with majority cover over a specified threshold. Our current mapping method labels samples by their class with ≥80% cover, and samples with woody vegetation mixed with bare, herbaceous vegetation or agriculture (all < 80%) are labeled as MixedWoody [9]. For example, with an 80% cover threshold, a sample with 80% woody vegetation is a Woody sample, and a sample with 75% woody vegetation and 25% herbaceous vegetation is a MixedWoody sample. This threshold can be altered in the Charts page, and a server process then populates a table that counts those samples in the seven basic classes and the MixedWoody class. The Charts page is thus a powerful tool for any user to graphically explore the database based on geographic (e.g., biomes, countries) and class thresholds (e.g., Built, MixedWoody at 80% cover threshold). By using the filter criteria employed by the map-making process, Charts help users identify areas and classes that are in need of more sampling effort, thereby improving the efficiency in which the group’s labor is allocated.
Administrator-level users have access to a “Statistics” page that provides a tabular count of samples by their status and issues and additional links to sort and access individual samples for review (Figure 3(B)). In the principal table, information on “Classification Status” includes count of samples with none, one or two user interpretations, expert review, as well as samples that have been discarded. Samples with “Issues” are tallied by class or date conflicts, or reasons for discarding the point (no high resolution imagery, no date, mixed pixel). An administrator can then click an item in the statistics table and view a table with detailed data for those samples. For example, if an administrator clicks the “Classified” sample count (46,279 in Figure 3(B)), then a table displays all sample interpretation data for classified samples. Attributes include the user names, year, percent cover, and other geographic information (e.g., ecoregion) for the samples, with two user and/or expert interpretations for the sample displayed in the same row (Figure 3(C)). In these data tables, color codes indicate if the sample is classified (green) or has an issue (red). Light green indicates a classified sample with two users in agreement, while dark green indicates that an expert reviewed the sample. Light red indicates a class or date conflict issue, while dark red indicates the sample was discarded. The administrator can click on any point in these tables and view and edit the sample’s data on the GE interpretation page.
Administrators also have access to pages that allow uploading and downloading of samples to the VIEW-IT database. In the upload section, an administrator can upload sample locations and their interpretation grids as a KMZ file. Data from the database can be downloaded as a table after passing through selection criteria (e.g., by biome or country). In the output, the percent cover data for each sample are either the averages of data from two users, if there was no class conflict, or from the expert if there was a conflict.
Figure 3. (A) Example pie graph from the Charts page using all available VIEW-IT samples. (B) Tabular view of samples available in the Statistics page. (C) Example of detailed attribute data for samples.
Figure 3. (A) Example pie graph from the Charts page using all available VIEW-IT samples. (B) Tabular view of samples available in the Statistics page. (C) Example of detailed attribute data for samples.
Remotesensing 03 00601 g003

3. Latin America and the Caribbean Reference Dataset

Here we present results from our use of VIEW-IT to collect a spatially and temporally robust reference dataset for mapping LULC across Latin America and the Caribbean (LAC) using MODIS satellite data. Our goal was to sample the range of countries and biomes [14] found in LAC (Figure 4). We will then use reference data collected in VIEW-IT for both tree-based classifier training and accuracy assessment, following methods in [9]. There are 40 MODIS tiles that cover LAC (Figure 4) and we processed nine full years (2001 to 2009) of EVI data for display in VIEW-IT’s graphs. Interpretations were allowed on Quickbird and IKONOS high resolution imagery found in GE, with no restriction on date.
Figure 4. Biomes [14] and MODIS tiles covering Latin America and the Caribbean. See Table 2 for full names of biome abbreviations.
Figure 4. Biomes [14] and MODIS tiles covering Latin America and the Caribbean. See Table 2 for full names of biome abbreviations.
Remotesensing 03 00601 g004
There were 23 user interpreters, which were mostly undergraduate and graduate students from the authors’ host departments, working part-time over a span of 18 months (March 2009 to August 2010). All users received training using an example dataset and were tested in their estimates of percent cover before using VIEW-IT. There were seven experts, comprised of the authors, post-doctoral researchers, and graduate students with extensive experience. User interpretation and expert review occurred simultaneously, and experts worked closely with users to resolve recurring errors. Not all users worked through the whole time period (e.g., typically a single semester). The team collected an average of 2,567 samples per month (min. = 248, max. = 6,357) and the median user collected 1,297 samples (min. = 175, max. = 20,823). We do not have comprehensive data on sampling efficiency among users for the whole team, but we did note that it took users longer to locate and then interpret a sample relative to just interpreting a sample already in the database. A user’s productivity also tended to increase with time. At Sonoma State, 10 users dedicated to VIEW-IT sampling worked a total of 752 hours (min. = 103 h, max. = 190 h, avg. = 83.6 h) and had 20,379 separate interpretations (27 interpretations/hour); however, note that an interpretation could also include selecting sample locations.

3.1. Spatial and Temporal Distribution of Samples

There were 46,207 total samples collected as part of our LAC sampling campaign, with 37% of the samples from random sampling (pre-uploaded locations) and 63% of the samples from manual placement—functionality that was not added until August 2009. Although most high resolution imagery available in GE is concentrated around urban areas, roads, rivers and areas of economic interest (e.g., mining exploration), there was more than adequate imagery to sample LAC (Figure 5). The areas with the sparsest coverage of high resolution imagery were remote areas of the Amazon Basin, eastern Nicaragua, and southern Argentina. About a third of our samples were collected in the moist broadleaf forests (TSMBF; Table 2), which cover 44% of LAC area and include the Amazon basin, Atlantic forests of Brazil, Choco region of Colombia, and the Caribbean side of Central America and Mexico. This large area translated into relatively lower sample density among biomes (Table 2). However, the ability to stratify the random sampling or select sample locations manually meant that we were able to adequately sample the other biomes. For example, mangrove forests only cover 0.6% of LAC area, yet had a relatively high sample density (Table 2).
Figure 5. Sample density for Latin America and the Caribbean.
Figure 5. Sample density for Latin America and the Caribbean.
Remotesensing 03 00601 g005
Table 2. VIEW-IT sample count and percent of samples with class conflicts between users for all biomes in Latin America and the Caribbean.
Table 2. VIEW-IT sample count and percent of samples with class conflicts between users for all biomes in Latin America and the Caribbean.
Biome NameAbbreviationArea (km2)Total SamplesSample Density (per 1,000 km2)% Class Conflict
Tropical & Subtropical Moist Broadleaf ForestsTSMBF9,275,41815,6941.77
Tropical & Subtropical Dry Broadleaf ForestsTSDBF1,195,0436,4045.411
Tropical & Subtropical Coniferous ForestsTSCF605,9462,9974.99
Temperate Broadleaf & Mixed ForestsTBMF412,7742,0244.98
Tropical & Subtropical Grasslands, Savannas, ShrublandsTSGSS4,064,1796,5221.68
Temperate Grasslands, Savannas & ShrublandsTGSS1,617,7892,9391.816
Flooded Grasslands & SavannasFGS250,8771,4135.68
Montane Grasslands & ShrublandsMGS874,6908441.020
Mediterranean Forests, Woodlands, and ScrubMFWS187,5978814.714
Deserts & Xeric ShrublandsDXS2,346,6725,3952.315
Mangrove ForestsMF125,1471,0688.59
Lakes, Rock and IceLRI33,492240.70
ALL 20,989,62346,2072.210
The high resolution images sampled in GE came from years 2000 to 2010, with most images from 2003 to 2007 (Figure 6). From 2000 to 2003 there was a steady increase in sampling, partly explained by when the high resolution imagery became available—the IKONOS and Quickbird satellites that acquired many of these early images were launched on September 1999 and October 2001, respectively. This pattern of temporal sampling was similar across biomes (data not shown). The lower frequency of samples from 2008 to 2010 can partially be explained by the ending of our sampling campaign in September 2010 and the lag time between when commercial images are acquired by these satellites and when they appear as updated images (without clouds) in GE. Although global demand for commercial imagery has been growing, it is also possible that the global financial crisis, beginning in 2007, decreased the demand for commercial satellite imagery in Latin America, thus reducing the flow of free images into GE.
Figure 6. Distribution of sampled Google Earth images by year within Latin America and the Caribbean.
Figure 6. Distribution of sampled Google Earth images by year within Latin America and the Caribbean.
Remotesensing 03 00601 g006

3.2. Class Interpretation and Accuracy

Of the 46,207 samples, 8% were interpreted by a single expert, 82% by two users, and 10% with two users followed by expert review due to a class or date conflict (Table 2). The classes with the highest disagreement among users were Ag, Herb, and MixedWoody (Figure 7). As explained in [9], Ag and Herb were difficult to distinguish from GE in the absence of clear features, such as plow lines for agriculture and watering holes for cattle pasture, and MixedWoody, a composite class of many classes (Woody, Ag, Herb, Bare, all < 80%), was sensitive to slight differences in interpretation of percent cover near the 80% threshold.
Figure 7. Class distribution of all samples collected from Latin America and the Caribbean (see Section 2.2). Samples were interpreted by a single expert or two users with no class conflict (green colors), or were resolved by an expert after the two users had a majority‑class or date conflict (red colors).
Figure 7. Class distribution of all samples collected from Latin America and the Caribbean (see Section 2.2). Samples were interpreted by a single expert or two users with no class conflict (green colors), or were resolved by an expert after the two users had a majority‑class or date conflict (red colors).
Remotesensing 03 00601 g007
Forest biomes (TSMBF, TSDBF, TSCF, TBMF, MF) had relatively high percentages of Woody samples; however, other biomes with extensive scrub and shrublands (TSGSS, MFWS, DXS) also had many Woody samples, since all types of perennial woody vegetation were included in this class (Figure 8). Biomes with grasslands (TSGSS, TGSS, FGS, MGS) had the highest percentage of samples as herbaceous vegetation. There were high proportions of agriculture samples in the drier biomes (TSDBF, TSGSS, TGS, MFWS, DXS). Plantations, which include oil palms, timber species, and vineyards, were proportionally high in flooded grasslands/savannas (FGS) and Mediterranean forests/woodlands/scrub, such as in Chile (MFWS).
Figure 8. Class distribution of samples within each of the eleven biomes in Latin America and the Caribbean. See Table 2 for full names of biome abbreviations.
Figure 8. Class distribution of samples within each of the eleven biomes in Latin America and the Caribbean. See Table 2 for full names of biome abbreviations.
Remotesensing 03 00601 g008

4. Discussion and Conclusions

Web-based geospatial technology (e.g., the geoweb) has the potential to transform the way in which remote sensing products are processed and assessed, with great benefits in terms of efficiency, accuracy and capabilities. Here we described VIEW-IT, a geoweb tool for managing the human interpretation of reference samples from high resolution images. We purposely designed VIEW-IT to offer a simple, intuitive user interface with basic interpretation criteria that speed sampling, reduce errors, and allow scaling to a global crowdsourcing system. The cross-checking of independent user interpretations and expert review tools helps to improve the accuracy of reference samples, especially when combined with an iterative training of users by expert users.
The linchpin of VIEW-IT is its integration with the Google Earth browser plug-in through the Javascript API, which was only released in 2008. Google Earth high resolution imagery offers many important features for LULC mapping in that they: (1) are free for non-profit use, (2) stream quickly to the web browser from Google’s server, even with relatively slow Internet connections; (3) have sufficient spatial and color detail to distinguish basic LULC classes at multiple spatial scales (e.g., 30 m Landsat to 500 m MODIS); (4) are georeferenced with sufficient accuracy for LULC mapping [9,13]; (5) are distributed across the globe and cover a wide range of LULC classes, including those that are relatively small areas of the landscape; and, (6) range in date from 2000 to present, allowing temporal sampling. Regional to global-scale mapping with low resolution imagery has typically used interpretation of Landsat imagery, or other medium resolution (e.g., 30 m) products for reference data [15,16] since high resolution imagery and field campaigns are too expensive and time consuming when dealing with such large geographic areas. These medium resolution data are often limited in time and spatial extent, and researchers often limit sampling to large homogenous areas with persistent cover, such as undisturbed forest or long-established urban and agriculture areas. This type of sampling does not necessarily cover the full range of spatio-temporal variation across a landscape, and low spatial resolution precludes clear discrimination of areas with mixed classes. Google Earth is an important development for the science community as it opens a trove of free georeferenced, high resolution images that continue to increase in spatial and temporal depth. When harnessed by a geoweb application, such as VIEW-IT, visual interpretation of GE imagery can be considerably efficient, particularly in terms of labor, cost and accuracy, when sampling large areas. However, there are certainly errors in interpretations [9] and many classes cannot be discriminated with the available imagery (e.g., different annual crops); and thus, the technology is not a panacea nor substitute for the deeper understanding and accuracy gained from field work. There are also important legal restrictions in the free use of the Google Maps/Earth APIs (code.google.com/apis/maps/terms.html) in geoweb applications. For one, the image stream from Google cannot be manipulated or stored for further image processing, and the logos, trade names and trademarks must remain in the map window. Also, the terms explicitly require a Google Maps/Earth webpage “be generally accessible to users without charge”, although the webpage can require a login providing there is no cost for an account.
Using VIEW-IT, our team of 23 user and seven expert interpreters collected over 46,000 reference samples across LAC in 18 months of part-time work. Samples covered all biomes in the continent and spanned years 2000 to 2010, with a peak in the middle of the decade. As opposed to sampling many pixels within a large polygon, which can miss sample variability and bias accuracy assessments due to spatial autocorrelation, VIEW-IT samples are a minimum 1,000 m apart and associated with individual MODIS pixels. These samples thus represent a large dataset of spatial and temporal sampling that, when matched with predictor variables from satellite imagery, can provide robust reference data for training map classifiers, modeling percent cover, and assessing map accuracy. For example, we used the 80% cover threshold to assign reference data to classes for mapping of LAC land-cover from MODIS imagery (years 2001 to 2009) with a tree-based classifier, following methods in [9]. These class reference data could also be used for mapping with medium-resolution imagery from existing satellite archives, such as Landsat 5 and 7 and CBERS, if pixels within a VIEW-IT sample were used (although these pixels would be spatially autocorrelated). The percent cover reference data (i.e., no assignment to a single class) could also be used to produce a VCF-type map based on regression trees (sensu [12,16]). It is anticipated that Google will continue to add high resolution imagery to GE, thus allowing interpretation of images that match dates from future earth-observing satellites, such as NASA’s Visible/Infrared Imager Radiometer Suite (VIIRS), Landsat Data Continuity Mission (LDCM) and Hyperspectral Infrared Imager (HyspIRI) satellites.
In all cases of linking VIEW-IT reference data to imagery, one should consider the georeferencing mismatch between the area interpreted in GE and the ground instantaneous field of view (GIFOV) of the sensor. Empirical results indicate that GE high resolution imagery has a spatial accuracy of <40 m [9,13], while MODIS average accuracy is 18 ± 38 m and 4 ± 40 m in the across-track and along-scan directions, respectively, with error increasing with scan angle [17]. Since GE samples were 250 × 250 m, or 18.3 m larger on each side than the MOD13 raster data (231.7 m pixels), they accommodated some spatial mismatch between the two datasets. It is expected that georeferencing error is especially minimal for samples located in areas mainly covered by one class (e.g., samples with 90–100% majority cover). These samples can be emphasized in VIEW-IT by training users to select sample locations away from LULC patch edges. The sensitivity of map product accuracy to georeferencing error and mismatch of interpretation scale to sensor GIFOV is an important topic for future research.
A new capability of the GE plug-in (v6) is access to historical imagery with a time slider (Figure 9). This feature was not available when VIEW-IT was developed, but the implications for the tool are very important. Since a user can pan through multiple high-resolution images, more temporal sampling at any given location is possible. In some cases, the imagery records changes in land cover (e.g., seasonal leaf phenology, natural disturbance) or land use (e.g., forest conversion, degradation or recovery from abandonment). For example, Figure 9(A) shows a sample with leaf-on Dry Chaco forest in Argentina during the wet summer (December 2002), while Figure 9(B) shows the sample after conversion to agriculture (October 2005). With user interpretations of percent cover, VIEW-IT could be programmed to automatically identify samples that show a change in majority land cover over time, such as in this example, and perhaps assisted with time-series analysis of the EVI profile (e.g., Figure 9(C)). Such change samples could then be used to help develop and test the accuracy of land change map products.
An exciting direction for VIEW-IT is in connection with cloud-based image processing. In the past, LULC maps were developed by relatively small groups of scientists and staff. Reference data generally came from field visits, interpreted from limited archives of aerial photographs, or sampled from medium-resolution satellite imagery (e.g., Landsat). Image processing was typically done on local computer clusters with relatively few processors and small data storage. Combined, these factors greatly limited the spatio-temporal extents and accuracy of map products. The recent expansion of cloud computing will overcome the hurdle of processing large image datasets. For example, Google Earth Engine (earthengine.googlelabs.com), which opened for beta-testing in December 2010, is a Web-based portal to a vast archive of free NASA satellite imagery and derived products that overlay on Google Maps and Earth imagery. A new API, still in development and currently limited to forest monitoring partnerships, will allow fast, cloud-based image processing of massive amounts of image data. VIEW-IT can play a vital role in such cloud-based image processing by providing a large global reference dataset going back to year 2000.
Geoweb and allied cloud-based technology provide a new frontier for remote sensing of the earth. Here we focused on integrating this emerging technology for land-use/land-cover mapping. Web-based tools, such as VIEW-IT, allow access to global data and labor resources for collecting reference data—archives of high resolution imagery and a distributed network of human interpreters. When leveraged with satellite image data and processing power in a cloud-based system, there we will be fewer economic and technical barriers to producing maps with consistent spatial and temporal properties (e.g., resolution, values and accuracy) that permit rigorous, global land-change monitoring.
Figure 9. Example of the historical imagery time slider within VIEW-IT’s Google Earth window for a sample near Pampa del Infierno, Argentina (panels copied from VIEW-IT’s interface). (A) Image in December, 2002 shows the sample as dry forest. (B) Image in October, 2005 shows the sample area converted to agriculture (likely soybeans). (C) The VIEW-IT Enhanced Vegetation Index (EVI) profile for the sample shows the seasonal cycle of dry forest up until year 2004, followed by an agricultural cycle with low EVI values from exposed bare soil after harvest in the dry season. Red arrows indicate the image dates in images (A) and (B).
Figure 9. Example of the historical imagery time slider within VIEW-IT’s Google Earth window for a sample near Pampa del Infierno, Argentina (panels copied from VIEW-IT’s interface). (A) Image in December, 2002 shows the sample as dry forest. (B) Image in October, 2005 shows the sample area converted to agriculture (likely soybeans). (C) The VIEW-IT Enhanced Vegetation Index (EVI) profile for the sample shows the seasonal cycle of dry forest up until year 2004, followed by an agricultural cycle with low EVI values from exposed bare soil after harvest in the dry season. Red arrows indicate the image dates in images (A) and (B).
Remotesensing 03 00601 g009
We developed VIEW-IT to support our land change research in LAC, which require a large dataset of reference data for both classifier training and accuracy assessment. Here we present VIEW-IT as a “proof of concept”, demonstrating how new geoweb and open-source technologies can be effectively integrated as a tool to support Earth remote sensing science. As such, VIEW-IT is a work in progress that we plan to develop with future funding. Future development goals are to: expand VIEW-IT to a global scale; allow crowdsourcing from a larger user community; permit users to organize their own data collection and sample subsets in workspaces; allow users to add LULC classes within a hierarchical scheme (e.g., crop types); allow more than two interpretations per sample to increase accuracy; allow multiple samples in time for the same location and automatically detect samples with large LULC change; and, permit sample grids of different sizes corresponding to the different GIFOV of current and new sensors. Since VIEW-IT is built mainly with open-source components, software costs are minimized and new functionality can be readily integrated and tested as geoweb and cloud‑based technology evolves. For example, we envision eventually integrating VIEW-IT within an automated, cloud-based mapping system. To explore the tool’s current capabilities and participate in image interpretation, please contact the lead author by email.

Acknowledgments

VIEW-IT development was funded by a Dynamics of Coupled Natural and Human Systems grant from the US National Science Foundation (NSF # 0709645 and 0709598). Programming expertise was provided by Alberto Estrada, Joseph Muller, Pedro Pastrana, Hector Rodriguez, Zhahai Stewart and David Turover. We thank George Riner and students at Sonoma State University and University of Puerto Rico that helped interpret Google Earth imagery in VIEW-IT. We thank anonymous reviewers for helping us improve this paper.

References

  1. Butler, D. Mashups mix data into global service. Nature 2006, 439, 6–7. [Google Scholar] [CrossRef] [PubMed]
  2. Beaudette, D.E.; O’Geen, A.T. Soil-Web: An online soil survey for California, Arizona, and Nevada. Comput. Geosci. 2009, 35, 2119–2128. [Google Scholar] [CrossRef]
  3. Stensgaard, A.-S.; Saarnak, C.F.L.; Utzinger, J.; Vounatsou, P.; Simoonga, C.; Mushinge, G.; Rahbek, C.; Møhlenberg, F.; Kristensen, T.K. Virtual globes and geospatial health: The potential of new tools in the management and control of vector-borne diseases. Geospatial Health 2008, 3, 127–141. [Google Scholar] [CrossRef] [PubMed]
  4. Boulos, M.N.K.; Scotch, M.; Cheung, K.-H.; Burden, D. Web GIS in practice VI: A demo playlist of geo-mashups for public health neogeographers. Int. J. Health Geogr. 2008, 7, 1–16. [Google Scholar] [CrossRef] [PubMed]
  5. Fritz, S.; McCallum, I.; Schill, C.; Perger, C.; Grillmayer, R.; Achard, F.; Kraxner, F.; Obersteiner, M. Geo-Wiki.Org: The use of crowdsourcing to improve global land cover. Remote Sens. 2009, 1, 345–354. [Google Scholar] [CrossRef]
  6. Hudson-Smith, A.; Batty, M.; Crooks, A.; Milton, R. Mapping for the masses: Accessing web 2.0 through crowdsourcing. Soc. Sci. Comput. Rev. 2009, 27, 524–538. [Google Scholar] [CrossRef]
  7. Goodchild, M.F. Citizens as sensors: The world of volunteered geography. GeoJournal 2007, 69, 211–221. [Google Scholar] [CrossRef]
  8. Mulligan, M. RALUCIAPA: Rapid assessment of land use change in and around protected areas (2000–2005), Version 1.0, 2008. Available online: http://www.unep-wcmc.org/protected_areas/raluciapa/ (accessed on 30 December 2010).
  9. Clark, M.L.; Aide, T.M.; Grau, H.R.; Riner, G. A scalable approach to mapping annual land-cover at 250 m using MODIS time-series data: A case study in the Dry Chaco ecoregion of South America. Remote Sens. Environ. 2010, 114, 2816–2832. [Google Scholar] [CrossRef]
  10. Thenkabail, P.S.; Biradar, C.M.; Noojipady, P.; Cai, X.; Dheeravath, V.; Li, Y.; Velpuri, M.; Gumma, M.; Pandey, S. Sub-pixel area calculation methods for estimating irrigated areas. Sensors 2007, 7, 2519–2538. [Google Scholar] [CrossRef] [Green Version]
  11. Helmer, E.H.; Lefsky, M.A.; Roberts, D.A. Biomass accumulation rates of Amazonian secondary forest and biomass of old-growth forests from Landsat time series and the Geoscience Laser Altimeter System. J. Appl. Remote Sens. 2009, 3, 033505. [Google Scholar]
  12. Hansen, M.C.; Egorov, A.; Roy, D.P.; Potapov, P.; Ju, J.; Turubanova, S.; Kommareddy, I.; Loveland, T. Continuous fields of land cover for the conterminous United States using Landsat data: First results from the Web-Enabled Landsat Data (WELD) project. Remote Sens. Lett. 2011, 2, 279–288. [Google Scholar] [CrossRef]
  13. Potere, D. Horizontal positional accuracy of Google Earth’s high-resolution imagery archive. Sensors 2008, 8, 7973–7981. [Google Scholar] [CrossRef]
  14. Olson, D.M.; Dinerstein, E.; Wikramanayake, E.D.; Burgess, N.D.; Powell, G.V.N.; Underwood, E.C.; D’amico, J.A.; Itoua, I.; Strand, H.E.; Morrison, J.C.; Loucks, C.J.; Allnutt, T.F.; Ricketts, T.H.; Kura, Y.; Lamoreux, J.F.; Wettengel, W.W.; Hedao, P.; Kassem, K.R. Terrestrial ecoregions of the world: A new map of life on earth. Bioscience 2001, 51, 933–938. [Google Scholar] [CrossRef]
  15. DeFries, R.S.; Hansen, M.C.; Townshend, J.R.G.; Sohlberg, R.S. Global land cover classifications at 8 km spatial resolution: The use of training data derived from Landsat imagery in decision tree classifers. Int. J. Remote Sens. 1998, 19, 3141–3168. [Google Scholar] [CrossRef]
  16. Hansen, M.C.; DeFries, R.S.; Townshend, J.R.G.; Carroll, M.; Dimiceli, C.; Sohlberg, R.A. Global percent tree cover at a spatial resolution of 500 meters: First results of the MODIS Vegetation Continuous Fields algorithm. Earth Interactions 2003, 7, 1–15. [Google Scholar] [CrossRef]
  17. Wolfe, R.E.; Nishihama, M.; Fleig, A.J.; Kuyper, J.A.; Roy, D.P.; Storey, J.C.; Patt, F.S. Achieving sub-pixel geolocation accuracy in support of MODIS land science. Remote Sens. Environ. 2002, 83, 31–49. [Google Scholar] [CrossRef]

Share and Cite

MDPI and ACS Style

Clark, M.L.; Aide, T.M. Virtual Interpretation of Earth Web-Interface Tool (VIEW-IT) for Collecting Land-Use/Land-Cover Reference Data. Remote Sens. 2011, 3, 601-620. https://0-doi-org.brum.beds.ac.uk/10.3390/rs3030601

AMA Style

Clark ML, Aide TM. Virtual Interpretation of Earth Web-Interface Tool (VIEW-IT) for Collecting Land-Use/Land-Cover Reference Data. Remote Sensing. 2011; 3(3):601-620. https://0-doi-org.brum.beds.ac.uk/10.3390/rs3030601

Chicago/Turabian Style

Clark, Matthew L., and T. Mitchell Aide. 2011. "Virtual Interpretation of Earth Web-Interface Tool (VIEW-IT) for Collecting Land-Use/Land-Cover Reference Data" Remote Sensing 3, no. 3: 601-620. https://0-doi-org.brum.beds.ac.uk/10.3390/rs3030601

Article Metrics

Back to TopTop