The current trend is to deploy and organize Geospatial Information (GI) into what is known as Spatial Data Infrastructures (SDIs) [1
]. Geospatial content is managed by means of regulated and standardized service types. In the Geographic Information System (GIS) field, there are services (known as geospatial services), where their particularities differentiate between services for general purposes [2
]. The Open Geospatial Consortium (OGC) has performed a laborious task in order to propose different standards to solve each one of the categories, including: Web Map Device (WMS) [3
], Web Feature Service (WFS) [4
], Sensor Observation Service (SOS) [5
] or Catalogue Services for the Web (CSW) [6
], among others. Other specifications, such as the Web Processing Service (WPS) [7
] provide an interface to perform any type of geoprocessing.
The attempt to create a European SDI has a legal name, called: Infrastructure for Spatial Information in Europe (INSPIRE) [8
]. At the European level, INSPIRE proposes a general framework for SDI for the purposes of European environmental policies. It should provide environmental data related to 34 themes, including transport networks, land cover or hydrography, among others. INSPIRE provides important parts of the European contribution to a Global Earth Observation System of Systems (GEOSS) by the Copernicus project (http://www.copernicus.eu/
). The purpose of the GEOSS is to achieve comprehensive and coordinated observations of the Earth to improve monitoring and enhance prediction of the behaviour of the Earth. INSPIRE regulates different service categories depending on the functionality: discovery, view, download, transformation and invocation. This imposes a life cycle of geospatial content in distributed environments, which can be described in four steps as illustrated in Figure 1
]. First, content must be available in a distributed system. Second, users need keywords to discover content, and they can access the content (third step). Finally, users process and generate new content, which should be integrated and published in the distributed system to close this cycle.
The INSPIRE directive dictates the creation and maintenance of metadata that are published into discovery services [10
]. Metadata are applied to the data that have geospatial information, either explicit or implied. These objects can be kept in a type of storage like a database, called a catalogue. Also, the metadata have the objective of facilitating the interoperability in an environment distributed as for the SDIs. According to the INSPIRE directive, for each published dataset, metadata have to exist and they have to be published and correctly linked to the dataset. The metadata must be validated and accurately described [11
On the Internet, there are published geospatial data whereby associated metadata are not available; this situation makes data discovery more difficult [11
]. The two main reasons for the difficulty in metadata creation are the lack of mechanisms to automatically achieve the metadata, and also that the metadata are not created at the same time that the geospatial data is published, which is the right moment according to the authors of [13
]. Another difficulty is keeping the metadata updated when the geospatial data are refreshed.
Nowadays, there is no trivial automated mechanism for metadata creation, so expert users are required to assist in the process [14
]. This work aims to bridge the gap between the publication of geographical data and metadata generation, through an interoperable and scalable component. To achieve this, we present the GEOSS Service Factory (GSF) [9
]. It is a proposal to develop a generic publication service to assist both experts of environmental domains and non-expert users in order to publish geospatial data into standard services.
In this paper, we will explain all new features for the GSF. In the past, GSF only offered geospatial data publication in the different services [16
]. We have improved the generation and publication of the metadata for each type of geographic data published. Also, we have extended GSF with the support of more geospatial data types, among them the sensor data. The main improvements of GSF are: (a) it offers an interoperable standard service to publish different kinds of geospatial data; (b) it links data publication to metadata generation and publication in order to update the metadata with their geospatial data; and (c) it generates standarized metadata using transformation templates.
lists the different literature available regarding the generation and publication of metadata. Section 3
describes GSF as part of SDI architectures and how it is extended to publish sensor data. Section 4
explains the new features and components involved for data publication and metadata generation and publication. Section 5
shows the experimentation carried out. The conclusions in Section 6
provides a discussion and an outlook to future work.
2. Related Work
In the literature, there is research that addresses the issue using automatic tools for metadata generation [17
]. There are different methods of generation: manual, semi-automatic or automatic [20
]. Beard [23
] describes five categories: manually, extending the data stored with values obtained by consulting, using automatic measurements and observations, using data extracted and calculated and finally inferred from other elements; while other authors categorize the automatic generation into two classes: metadata extraction and inference [24
Tools such as CatMDEdit [13
] are used for automatic extraction of metadata in different formats, as reflected in the work of [25
]; the amount of information that can be extracted depends on the abstraction model used to store, on the representation model and its file format. Thus, there are elements that can only be extracted from certain types of data and files, while others, such as the size of the data, can be obtained in all circumstances.
There are works that describe we can integrate the generation of metadata using a GIS. In this way, they use ad hoc mechanisms that process the data to extract information [18
], for example the gvSIG metadata editor [24
]. Moreover, generation techniques can describe the nature of the tools which provide metadata generation functionality. Normally this functionality is included in catalogue services; these are, for example, Geonetwork (http://geonetwork-opensource.org/
) or ESRI ArcCatalog (http://www.esri.com/software/arcgis
) (included in ArcGIS). These tools enable the automatic creation of basic fields and update data synchronization and metadata.
In our work, in order to provide a component for metadata generation and publication using extraction techniques, this functionality has been provided as an interoperable service that can be combined with other services for data generation. The aim is to join the two processes, to publish data and to generate metadata in the same workflow as a standard service in SDI.
The metadata creation improves the linkage between data and metadata in order to facilitate data visibility and accessibility. This approach is used in a publication service, GSF [9
]. In order to implement this service, the standard OGC Web Processing Service (WPS) [7
] has been chosen. WPS allows the deployment of any functionality on the Internet. This standard provides, as described, any computation (process) and how to make requests and responses in the services.
] the authors follow the same approach to link metadata creation when the data is published. We use this approach in this work and it was detailed in [9
]. However, our work goes a step further, offering a standard interface using the WPS standard and enabling more kinds of geospatial data.
Another improvement is that GSF offers a solution for publishing geospatial content aimed at increasing data accessibility for improving data sharing. This provides a tool for the end-user and improves the publication stage, using a simple and automatic publication process.
3. GSF in a Nutshell
The GSF is designed to be implemented as a service component with a standard interface to be re-used in different scenarios. WPS [7
] is used to guarantee the interoperability; it has been selected to implement the GSF. It has the ability to publish geospatial data in the different geospatial services detailed in the Section 1
. Figure 2
shows where the GSF is deployed in the SDI architecture; it is deployed in the service layer as a publication service. The ‘application’ layer includes end user applications, ranging from complex environmental decision support systems to simple clients in mobile devices. The ‘geospatial content’ layer includes data, metadata and models.
shows the main architecture of the GSF and its components. As we detailed below, GSF follows a WPS interface, and it encapsulates all the components. The left part of the figure shows the inputs and outputs of the GSF. As inputs we will have the dataset to publish and the keywords to describe this dataset. In the case of sensor data, we will have a sensor observation. And as outputs, we will find the different links where either the data or metadata has been published.
When GSF is invoked, the Service Publication Profile (SPP) describes for each kind of data in which geospatial service the dataset will be published. Each geospatial service is defined as a factory; each factory has the responsibility to publish the data into the service instances. The GSF contains two other modules: the metadata generator and the transformation service. The first of them is able to generate a small discovery-purpose metadata of the published content. This component will be detailed in depth in Section 4
. The transformation service deals with data format transformation. This module prepares data format and encoding to generate appropriated formats, which can be handled by the existing services as data storage. However it is beyond the scope of the project.
GSF’s internal design uses a factory software design pattern [26
]. The factory pattern is a creational pattern that provides a scalable mechanism to create new entities according to particular criteria. In this case, the factory pattern is used to deploy the different kinds of SDI services [9
]. This allows the GSF to publish new content (data and metadata) by adding new entries to different service types (according to factory type): view, download and discovery.
GSF is able to publish raster or vector data. Also, GSF is able to publish shapefile (vector data) in the WFS and WMS service. GSF is enabled by publishing raster data, such as Georeferenced Tagged Image File Format (GEOTIFF), and other kinds like Keyhole Markup Language (KML) and Geography Markup Language (GML) in the WMS service. In addition, the capability of publishing Observations and Measurements (O&M) in an SOS service is added in this work.
shows the Unified Modeling Language (UML) class diagram with a simplification of the GSF interface and the input and output parameters.
A single process, called publish, has been offered (see Figure 4
). WPS specifies that inputs and outputs can be encoded in many alternative ways. The publish process considers the following input parameters:
Content: only this input parameter is mandatory. This content can be passed by value or by reference, where a Uniform Resource Locator (URL) to the content can be used. It can vary from a vector or raster data set, a workflow description, or a metadata document.
ServicePublicationProfile: eXtensible Markup Language (XML)-encoded parameter that describes the publication policy. This parameter includes information regarding where each data type should be published within this SDI.
MD_URL: this parameter indicates that this content is already published in the SDIs and there are available metadata that should be reused when updating it. This parameter is optional.
Keywords: the optional ‘keywords’ parameter provides an initial capability for metadata creation.
DiscoveryLink: this is the only output parameter. This parameter contains the information needed to discover the content published in the system. In the case of the SDIs, where content is registered in catalogue services, this parameter contains the endpoint to the metadata available in the catalogue service that contains the description of the content that has just been published. This contains information about the data services end points serving the content. This parameter is optional.
shows the workflow that GSF follows to publish geospatial data. When the geoprocessing is invoked by the Execute operation
), the SPP is consulted to establish the connection in order to publish the data (Figure 5
). Depending on the data type, the content shall be published in the relevant services, so it will be called by the factory responsible (Figure 5
). Each factory will use the appropriate publish method to realize the publication, the data will be sent to publishing in each one of those services (Figure 5
). Finally, the connection will be established for each service instance that the SPP indicates (Figure 5
). As a result, we obtain the different links where those data have been published. Subsequently, these links will be needed to generate the metadata.
3.1. Extending GSF to Publish Sensor Data
As mentioned, GSF is able to publish sensor data. As in the rest of geospatial content, sensor data also are organized following the GI initiatives. The OGC established a workgroup called Sensor Web Enablement (SWE) [27
], which defines a set of specifications related to sensors, proposing models of data and web services, in order to be used as a bridge between the sensors and the users, allowing that the sensors are accessible and controllable via the web [28
]. SWE includes different standards, including: Sensor Model Language (SensorML) [29
], Observations and Measurements (O&M) [30
], SOS [5
], Transducer Markup Language (TransducerML) [31
], Sensor Planning Service (SPS) [32
], and Sensor Alert Service (SAS) [33
]. However, in this work only the first three specifications are used.
When a new observation arrives from a sensor, GSF will be invoked in order to publish this observation in a sensor data server. (Figure 6
-1). Then, as for other formats, SPP stores the necessary information to publish in standard services, in this case over the SOS service (Figure 6
-2). The factory responsible for publishing is the download (Figure 6
-3). The Publish
method connects to the specified instance in the SPP (Figure 6
-5). In this way, the publish method uses the InsertObservation
operation to publish in SOS. In addition, GSF also allows the option to register new sensors using RegisterSensor
4. Extending GSF to Generate and Publish Metadata
The other new functionality in this paper is a module that has been able to generate and publish metadata. To create metadata, the site where the content (dataset) is published must be known. Some of this information can be extracted from the services where the dataset has been published. These services conform to INSPIRE implementing standards such as OGC services WMS, WFS, the Web Coverage Service (WCS) and SOS.
The last adoptions promote that metadata follow the rules of the International Organization for Standardization (ISO) [34
]. There are a number of standards, inside ISO 191xx family [35
], that define geospatial metadata. ISO 19115 defines the form to describe the GI and the services associated, including the spatio-temporal content, the quality of the data, the access and the rights of use. The standard defines more than 400 items. A catalogue of metadata contains the geospatial metadata content. Catalogues allow searches to be performed, using name, geographic area, coordinates, subject, category or data type. Through the catalogue of metadata, the content described in the metadata can be accessed. The OGC service, which is responsible for defining how it has carried out the process of publication, called Catalogue Services for the Web (CSW). The three most famous catalogue implementations are Geonetwork (http://geonetwork-opensource.org/
), Deegree (http://www.deegree.org/
) or ESRI ArcCatalog.
In Figure 7
, the Discovery Factory passes the execution to the metadata generator (Figure 7
-1). The next step is to consult WMS and WFS services to get the needed information to create metadata (Figure 7
-2,3). We can do that through a GetCapabilities request. After obtaining the information, the process applies the metadata generation and proceeds to be published in the catalogue (CSW) (Figure 7
-4,5). Finally, the WPS will return an URL with the location where metadata is published (Figure 7
To further explain the process of metadata generation, the following image (Figure 8
) shows the inputs and outputs of the metadata generator module. As inputs, it has the services links (WMS and WFS). These will be useful to connect with the content in order to retrieve information about the dataset itself. In addition, it could have different keywords provided by the users. These keywords will be used to enrich the metadata. As an output, it will obtain the link where the metadata will be published.
shows the steps involved. In the first step (Figure 9
(1)), GSF performs a GetCapabilities request in different services where the content has been published. For example, if the content has a vector data, it will be published as visualization (WMS) and download (WFS). This request returns an XML with the characteristics of the information that will be necessary for the metadata creation.
The following step (Figure 9
(2)) must apply eXtensible Stylesheet Language Transformations (XSLT) to generate metadata. This technology is a standard provided by the World Wide Web Consortium (W3C) organization that offers the possibility to transform XML documents or other types of documents.
The transformation will take the XML obtained from the previous request as an input. The obtained result after the transformation is another XML document (metadata) that will fulfill with the ISO 19139 and INSPIRE standards.
The third step, after the transformation, consists of parsing the metadata and completing fields that are incomplete (Figure 9
(3)). In this step we add the keywords provided using the WPS inputs, and also we add the URLs to services where the dataset has been published. After this step, the metadata will be considered completed.
Then, the metadata will be published in a catalogue that is defined in the SPP. This step (Figure 9
(4)) is similar to data publication, but in this case, the content will be published in a metadata catalogue. In our work, a transactional profile (CSW-T) has been used according to the OpenGIS Catalogue Services specification. In this way, this system implements all operations in order to comply with the CSW-T protocol. These are: GetCapabilities (generate an XML document with the service metadata about the server), GetRecords (execute query and generate a response document with the result records), GetRecordById (query repository with specified id and generate a response, GetDomain (determine value domain and generate a response document), DescribeRecord (generate a record schema) and Transaction (execute transaction and generate a response document).
The last step is to return the URL (Figure 9
(5)) that identifies the published metadata in a catalogue. After obtaining the metadata URL, the process is considered finished.
It should be mentioned that the generated metadata do not have all fields that the standard offers because some fields cannot be obtained automatically. All fields enumerated in Table 1
are created automatically. GSF uses 19139:2007 that provides the XML implementation schema for ISO 19115. We have selected the fields in order to comply with the ISO 19115 core (The ISO 19115 has been withdrawn by ISO 19115-1 and ISO 19139 is under review by new ISO 19139-1). These fields are listed in Table 1
Currently, GSF is able to publish different raster and vector formats. As this data is available in different SDI services, in the experiment, a shapefile has been used, as an example for vector data and Geotiff file, and as an example for raster format. To facilitate the process to publish data and metadata, we provide a web-based client (Figure 10
Particularly, the vector data file describes the fire-burned areas in the province of Castellón, in Spain in the year 2005. The file contains 158 points, with the cause of the fire and the areas burned in hectares. Having these data available in different SDI services can improve the accessibility and interoperability and can be reused by other users. They can carry out analysis of environmental impact caused by such events, by using this data. Furthermore, in order to improve visibility, the data published has to be published not only in a data service, but also for visualization or download. Also, the metadata has to be published in a metadata catalogue, offering information about the data, the site and the keywords. In this XML file [36
] we have the executed example to publish this content.
One implementation of the OGC-based SDI services is Geoserver, which offers implementation for WMS and WFS. Geoserver has been chosen to test the geospatial data being published. To publish the metadata using WCS, we have chosen GeoNetwork.
After the execution, the content has been published to the geospatial data server (Geoserver) (Figure 11
and Figure 12
) and the getCapabilities is accessible using the WMS and WFS services. Figure 11
shows all published layers and the first one corresponds with the fires in Castellón. In Figure 12
all the features (bounding box, spatial reference system,...) on the layer with fire-burned areas in the province of Castellón appear. The file [37
] shows the layer information. The metadata has been published in the metadata server (GeoNetwork) (Figure 13
A full execution of the GSF, data publication and metadata generation and publication, has a time cost of 18 s (Intel(R) Xeon(R) CPU 5160 @ 3.00GHz, RAM Memory 16 GB DDR2 FB-DIMM Synchronous 667 MHz). Basically, this delay is caused by the XSLT template, because these kinds of transformations are slow.
Finally, some operations (register a sensor and insert an observation) have been executed to be published into SOS server. A sensor has been published, using the SensorRegister operation. The file [38
] shows an example of sensor registration.
The execution of WPS for the publication of sensors or observations has a time cost of 2 s on the same server.
In this work an automatic geospatial data and generation and publication of metadata has been presented extending previous work [9
]. Also, we offer the possibility to publish new kids of geospatial data, such as sensor data. In this way, GSF can publish sensors and observations in an SOS service.
We have provided a possible tool to solve the consistency problem between data and metadata. It also offers a metadata link at the same time that the geospatial content has been published. The system improves the visibility and discovery of the data as well as the accessibility.
The proposed solution offers a mechanism to facilitate the automatic production of ISO-compliant metadata. It hides the generation of metadata in the user’s workflows, and it links data and metadata together. It increases the safety to link data with the good and reliable metadata and facilitates the discovery of resources. Also, GSF ensures that when content has been updated, the related metadata is automatically updated at the same time.
Another improvement is the reduction of the associated costs of metadata production, avoiding duplication of efforts to create the same metadata, and adding the possibility of sharing and reusing metadata.
The GSF has the ability to (a) publish geographic data as standard services by other standardized interface like WPS; (b) use this component to publish different geospatial data types; and (c) facilitate the process to generate the metadata, that has traditionally been considered a difficult task [9
]. Furthermore, GSF offers for experts and non-experts users the possibility to publish geospatial data and create and publish metadata automatically. In addition, the GSF offers an improvement in availability of data on the SDI environment, and it improves data and metadata maintenance. The generation and publication of metadata facilitates knowing where the data can be found.
Different extensions can be seen as future work. The first would be continuing the current work of increasing the elements of the metadata generated using ontologies. Ontologies will offer a process to obtain the keywords automatically using the data content itself. In this way, the user will not have to indicate the keywords and the process will offer the possibility to obtain these keywords automatically. Another future extension could be to offer more additional metadata standards profiles, such as Federal Geographic Data Committee (FGDC) or INSPIRE. Finally, the system can publish sensor data, but it can not create the associated metadata. We propose this feature as future work.