Next Article in Journal
Morphological Operations to Extract Urban Curbs in 3D MLS Point Clouds
Next Article in Special Issue
An Integrated Software Framework to Support Semantic Modeling and Reasoning of Spatiotemporal Change of Geographical Objects: A Use Case of Land Use and Land Cover Change Study
Previous Article in Journal
Towards Narrowing the Curation Gap—Theoretical Considerations and Lessons Learned from Decades of Practice
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Geospatial Information Categories Mapping in a Cross-lingual Environment: A Case Study of “Surface Water” Categories in Chinese and American Topographic Maps

1
School of Resource and Environmental Sciences, Wuhan University, Wuhan 430079, China
2
Tianjin Institute of Surveying and Mapping, Tianjin 300381, China
*
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2016, 5(6), 90; https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi5060090
Submission received: 4 April 2016 / Revised: 23 May 2016 / Accepted: 6 June 2016 / Published: 14 June 2016
(This article belongs to the Special Issue Geospatial Semantics and Semantic Web)

Abstract

:
The need for integrating geospatial information (GI) data from various heterogeneous sources has seen increased importance for geographic information system (GIS) interoperability. Using domain ontologies to clarify and integrate the semantics of data is considered as a crucial step for successful semantic integration in the GI domain. Nevertheless, mechanisms are still needed to facilitate semantic mapping between GI ontologies described in different natural languages. This research establishes a formal ontology model for cross-lingual geospatial information ontology mapping. By first extracting semantic primitives from a free-text definition of categories in two GI classification standards with different natural languages, an ontology-driven approach is used, and a formal ontology model is established to formally represent these semantic primitives into semantic statements, in which the spatial-related properties and relations are considered as crucial statements for the representation and identification of the semantics of the GI categories. Then, an algorithm is proposed to compare these semantic statements in a cross-lingual environment. We further design a similarity calculation algorithm based on the proposed formal ontology model to distance the semantic similarities and identify the mapping relationships between categories. In particular, we work with two GI classification standards for Chinese and American topographic maps. The experimental results demonstrate the feasibility and reliability of the proposed model for cross-lingual geospatial information ontology mapping.

1. Introduction

The vision of a “Digital Earth” articulated by US Vice President Al Gore [1,2,3] has contributed significantly to the growth in global geospatial information (GI) on physical and social environments. However, how to query, retrieve, and manipulate those data from heterogeneous sources has challenged the GI community [2,3,4,5]. Thus, an approach to integrating GI data from various heterogeneous sources has found increased importance [6].
A data integration process is not as simple as joining several systems because any effort at information sharing runs into the problem of semantic heterogeneity [7]. Semantic heterogeneity occurs when enabling interoperability across geographic information systems (GIS) [8,9,10,11] because GIS are often designed to address data from highly distributed, multidisciplinary, and cross-lingual data sources with different application demands [12]. Clarifying the semantics of data is therefore a crucial step toward successful data integration [13]. To achieve this, domain ontologies are built as a mediator to exchange information in such a way that the precise meaning of the data (i.e., semantics) is readily retrievable beyond simple keyword matching via knowledge representation languages and reasoning [7,13,14,15]. Thus, ontology engineering has been regarded as an effective means of providing seamless connection between component GIS at the semantic level [8,12,16].
While the GI community widely acknowledges the utility of ontology technologies, two main problems need to be solved for GI ontology engineering and sharing are as follows: (1) traditional ontology research and technologies focusing on terminology and schema cannot answer the question surrounding how to engineer GI ontologies and integrate them with GIS or Spatial Data Infrastructures (SDI) [6]; and (2) mechanisms still need to be explored for GI ontology mapping in cross-lingual environments to facilitate semantic integration between GI ontologies described in different natural languages [17,18,19,20].
The reason for the first problem is that GI features and categories are a product of spatial cognition and social convention; thus, the ontology engineering works in GI domains are different from others, in which the location, topology, mereology, and other spatial relations play a major role in the identification and representation of GI semantics [14]. For example, from a feature-driven ontology perspective, the geographic categories “river” and “bank” should be specified into different classes, and normally, the spatial relation “adjacent-to” between these two categories is missing. Moreover, geographic and non-geographic entities are ontologically distinct in a number of ways [21]. To enhance the semantic expressiveness and overcome the issue of semantic heterogeneity during the GI ontology engineering process, the spatial-related characteristics of GI categories must be considered to enrich the spatial-related semantics of the given ontology.
Although a majority of current GI ontologies have been developed in English with English vocabularies, the amount of multilingual content on the Semantic Web and thus the number of vocabularies/ontologies in multiple languages continue to grow [22]. Thus, methods for matching vocabularies across languages have become increasingly more important for promoting the accessibility of the data in multiple languages by end users [23]. As a motivating scenario, if a user wants to query the water level data along the Mekong River (The seventh longest river in the world, covering six different countries—Cambodia, Laos, Myanmar, Thailand, Vietnam and China—and the official languages of each country are different), there are several data providers offering the related GI data via their national GIS in their native natural languages. This situation has generated a substantial challenge to integrating highly heterogeneous GI data across natural language barriers.
The purpose of this study is to establish a formal ontology model for cross-lingual geospatial information ontology mapping. Starting from two GI classification standards with different natural languages—Chinese and English(for the sake of simplicity and clarity, this study was restricted to the “surface water” categories from these two standards)—a set of semantic primitives are extracted from the free-text definition of the categories in the standards by applying Natural Language Processing (NLP) techniques. Then, an ontology-driven approach is used, and the formal ontology model is established to formally represent these semantic primitives using semantic statements, in which the spatial-related properties and relations are considered as crucial statements for the representation and identification of the semantics of the GI categories. To overcome the natural language barrier, the statements in Chinese are translated into English by using machine translation tools, and the mapping relationships between statements are determined within an English context, which then serve as the basis for the similarity calculation between categories in different GI ontologies. Finally, a similarity calculation algorithm is designed to distance the semantic similarity between GI categories in different ontologies, and the final mapping relationships between pairs of categories are determined based on calculated similarity values. The contributions of the proposed approach include (1) the construction of the spatial-related semantic properties and relations to serve the requirement of the presentation and identification of the spatial characteristics of the GI categories and (2) the algorithms of GI ontology mapping in a cross-lingual environment based on formally represented and comparable semantic statements.
The remainder of this paper is organized as follows. Section 2 presents the related works in the literature. Then, the main procedure of our methodology is presented in Section 3. Next, a case study demonstrating the application of our method is shown in Section 4. Finally, conclusions are drawn, and future works are noted.

2. Related Works

2.1. Semantic Interpretation

Knowledge acquisition (KA) is a broad field that encompasses the processes of extracting, creating, and structuring knowledge from heterogeneous resources [24]. Semantic interpretation (SI) for KA is defined as the composition of two sub-processes: the extraction of semantic primitives from the free-text definition in ontologies and semantic enrichment based on the extracted semantic primitives.
The research on semantic primitive extraction builds on a large body of works within the fields of Natural Language Processing (NLP) [25]. NLP and text mining are research fields aimed at exploiting rich knowledge resources with the goal of understanding, extracting and retrieving semantic information from unstructured written text. Knowledge resources that have been used for these purposes include the entire range of terminologies, including lexicons, controlled vocabularies, thesauri, and ontologies [26,27]. Although numerous methods and algorithms have been developed recently (such as symbolic, statistical, and hybrid approaches) [26], a fully automated algorithm for semantic extraction using NLP techniques seems unachievable, and a manual process as an assistant is normally inevitable.
For semantic enrichment, authors in [28,29] proposed a systematic methodology to explore and identify semantic information provided by categories in geographic ontologies, in which the semantic representations of categories are enriched with a set of semantic properties and relations to reveal similarities and heterogeneities between these categories. Authors in [30] presented an axiomatic formalization of a theory of top-level relations (parthood relations, sub-universal relations, and cross-categorical relations) between three categories of geospatial-related entities, namely, individuals, universals, and collections. In addition, they demonstrated how a more exact understanding helps to overcome the semantic heterogeneous problems in the information integration process. In [13], the semantics of a concept in GI ontologies were presented using an extendable and structural definition framework composed of a number of RDF triple statements, and a comparison algorithm was designed to determine the semantic relationships of concepts between different domains. The primary objective of these studies was to extract and represent the semantics of concepts/entities based on structural common vocabularies, which make the semantics of the concepts/entities comparable. However, the structural common vocabularies in these works are determined by domain experts manually; thus, the objectivity and automation of the algorithms (avoiding ad hoc manual procedures and subjective experts’ knowledge) remain quite limited.

2.2. Ontology Mapping

Mapping relationship discovery for ontologies has attracted considerable attention in recent years. Various approaches based on processes to find similarities between different but related ontologies have emerged [31]. With respect to the literature specifically oriented toward geospatial information (GI) ontologies, authors in [32] performed an analysis of the different models of semantic similarity measurement and evaluated these models with respect to the particular requirements of geospatial data. Authors in [7,33] systematically surveyed several of the most recent and often-referenced works on integrating GI and GI ontology mapping by applying comparison criteria, such as logical inference, mapping approaches, degree of automation, and geospatial relativity. In addition, a general conclusion is proposed that, for the ontology mapping task, the use of formal ontologies and, consequently, the use of reasoners should be mandatory.
In recent years, Volunteered Geographic Information (VGI) has been proposed for GI ontology mapping in a web environment. Authors in [34,35] devised a mechanism for computing the semantic similarity of the Open Street Map (OSM) geographic classes using volunteered lexical definitions to alleviate the semantic gap between different VGI producers. Another set of studies focused on introducing an artificial neural network approach to simulate the human perception and measure the semantic similarity between spatial entities for the purpose of improving the automaticity of the ontology mapping process [36,37].
All these proposals, combining the use of different models of semantic similarity measurement, have emerged to provide solutions to existing GI ontology mapping problems in English environments. However, the semantic web and ontology engineering have experienced significant advancements in standards and techniques, and increasingly more domain ontologies and localization content in the semantic web are described using native natural languages [23]. There is a pressing need for cross-lingual ontology mapping mechanisms in the GI community that are designed to reconcile semantics of different ontologies in multilingual environments and to improve the accessibility of various GI ontologies across language barriers [38].
Author in [39] groups the existing cross lingual ontology mapping(CLOM) algorithms into following categories: manual processing [40,41,42], corpus-based approach [43], linguistic enrichment [44], indirect alignment composition [45], and translation-based approach [39,46]. Compared to these CLOM approaches, translation-based CLOM is currently a very popular approach that is exercised by several researchers [47,48,49,50], which is enabled by translations achieved through the use of machine translation (MT) tools, bilingual/multilingual thesauri, dictionaries etc. Typically, these approaches rely only on string-based lexical comparisons of entity names and descriptions [51,52,53,54], while the comparisons between semantic interpretation, e.g., model-theoretic semantics of entities are missing.

3. Methodologies

The main procedure for our methodologies is divided into two sub-processes, as shown in Figure 1. In the semantic interpretation process, two GI formal ontologies, namely, OA and OB, are established from the free-text definition of the corresponding classification standards with different natural languages. In the ontology mapping process, all the category names and semantic statements in OA are translated from LA into LB, and the mapping relationships between category names and semantic statements are determined within the same language context, which then serve as the basis for the similarity calculation between categories in different GI ontologies. Finally, a similarity calculation algorithm is designed, and the final mapping results between pairs of categories in different classification standards are determined.

3.1. Semantic Interpretation

3.1.1. Semantic Primitive Extraction

In geospatial information repositories, free-text definitions are often the primary and only available objective descriptions of categories. Semantic primitives are syntactic and lexical patterns in the free-text definition and can be extracted using NLP tools [55]. The fields of studies on NLP have developed methods and algorithms for information retrieval and extraction from free-text knowledge resources. The methodology adopted here for analyzing definitions and extracting semantic primitives was introduced by [56]. In this research, the lexical patterns of nominal phrases and verb phrases are considered as semantic primitives. An example is illustrated in Figure 2, and the main steps of the process are as follows:
  • One category definition in free-text format is chosen as the input natural language material;
  • Word segmentation is performed to split the whole sentence into individual words;
  • Words are categorized and tagged into their parts-of-speech tag sets (see Table 1 and Table 2) and labeled accordingly;
  • The nominal phrases and verb phrases are chunked, and the sentence structure is analyzed to extract lexical patterns as the semantic primitives.

3.1.2. Construction of the Formal Ontology Model

From Wikipedia an “ontology in information science“ is a formal naming and definition of the types, properties, and interrelationships of the concepts that really or fundamentally exist for a particular domain of discourse. It is thus a practical application of philosophical ontology, with taxonomy. In addition, a domain ontology (or domain-specific ontology) represents concepts that belong to a general domain. Thus, for a formal representation [57,58], the domain ontology (denoted by ODomain), and concepts in the domain could be summarized by Equations (1)–(3).
O D o m a i n = { S ( C D o m a i n ) , S ( R C ) , S ( H C ) , S ( P C ) }
C D o m a i n = { T C , D C }
D C = { R C , H C , P C } , R C S ( R C ) , H C S ( H C ) , P C S ( P C )
In Equation (1), S(CDomain) represents the set of concepts in a domain, and the semantics of each concept in the domain are categorized into different groups, namely, S(HC), S(RC), and S(PC); S(HC) represents the set of the hierarchical relations about the taxonomic information in ODomain, S(RC) represents the set of other interrelations between these concepts, and S(PC) represents the set of the semantic properties belong to the concepts in this domain.
In Equation (2), the semantics of a concept in the domain are considered as the composition of terminology of this concept (denoted by TC) and structural definition of this concept (denoted by DC). Unlike the free-text format of definition, DC commonly consists of the semantic properties of the concept (PC), the hierarchical relation (HC) and other interrelations (RC) between this concept and other concepts in the domain. Thus, from Equations (2) and (3), a certain concept in the domain, CDomain can be deduced as a function of TC, RC, HC, and PC in Equation (4)
C D o m a i n = { T C , R C , H C , P C } , R C S ( R C ) , H C S ( H C ) , P C S ( P C )
in which RC, HC, PC are used to represent the semantics of this concept, and belong to S(RC), S(HC), S(PC), respectively.
Considering the situation in the GI domain, we use the word “category” instead of “concept”. Because the semantic characteristics of the GI category are highly correlated in space and time [59], the spatial- and temporal-related semantic properties and relations should be included in the model as crucial vocabularies for the representation and identification of the semantics of the GI categories. Thus, the GI ontology OGI and the semantics of a certain category CGI in OGI can be represented as Equations (5) and (6):
O G I = { S ( T C ) , S ( R S ) , S ( R T ) , S ( R C ) , S ( H C ) , S ( P S ) , S ( P T ) , S ( P C ) }
C G I = { ( T C = V T C ) ( R S = V R S ) ( R T = V R T ) ( R C = V R C ) ( H C = V H C )            ( P S = V P S ) ( P T = V P T ) ( P C = V P C ) }
In Equation (5), S(TC) represents the set of the category names in OGI; S(RS), S(RT) represent the set of the spatial-related and temporal-related semantic relations between categories; S(HC)represents the set of hierarchical relations; S(PS), S(PT) represent the set of the spatial-related and temporal-related semantic properties belong to the categories in OGI; and S(RC), S(PC) represent the set of other semantic properties and relations in OGI. And in Equation (6), Vx represents the values of certain semantic properties/relations of CGI; TC, RS, RT, RC, HC, PS, PT, PC are used to represent the semantics of CGI, and belong to S(TC), S(RS), S(RT), S(RC), S(HC), S(PS), S(PT), S(PC), respectively.
In order to solve the problems of geographic representation, authors in [60] distinguished three main theoretical tools that are required for the purposes of developing an overall formal theory of spatial representation, namely, mereology, location, and topology, these theoretical tools are selected as the basis for defining spatial-related semantics in our formal ontology model. In addition, geographic entities in reality is essentially dynamic, authors in [61] pointed out that a good ontology must be capable of accounting for spatial reality both synchronically (as it exists at a time) and diachronically(as it unfolds through time), thus the “time point” and “time period” properties should be used to describe dynamic characteristics of the geographic entities in our model. Moreover, in order to specify semantic relations and properties used in geographic definitions, authors in [28] analyzed several geographic ontologies and identified patterns which were systematically used to express specific semantic relations and properties, including hierarchical relations, part-whole relations and neighborhood relations, and semantic properties such as purpose, nature, material, size, and so on.
Based on previous researches and our formal ontology model in Equation (5), the semantic property and relation types in our model are subdivided and shown in Figure 3.

3.1.3. Transformation from Semantic Primitives to Formal Ontology Model

In order to make the semantic primitives structural and comparable, domain experts are responsible for analyzing these semantic primitives and transforming them into different groups of semantic properties/relations in our geospatial formal ontology model. The famous triple statement Subject-Predicate-Object and the web ontology language (OWL) are selected as the basis for presenting the semantic properties/relations and their values in a machine-readable manner. The Subject represents a CGI in OGI; the Predicate is a certain semantic property or semantic relation type illustrated in Figure 3, in which all of the semantic relations are presented by object property and the Object in these semantic relations is another CGI in OGI or an “owl:class” object type, while most of the semantic properties are presented by object property too, and a few of them are presented by datatype property in OWL syntax, and the Object in these semantic properties is a “rdfs:literal” datatype. The following rules are adopted to handle the formalization process:
(1)
The GI category can be represented by a number of semantic relations/properties; however, the number of semantic relations/properties involved should be minimized to avoid redundancy.
(2)
Not every GI category must cover all semantic relations/properties in the model. The situation whereby two different categories use the same set of semantic relations/properties to represent their semantics cannot be guaranteed.
(3)
The semantic information of a certain category in our model is the combination of different semantic relations-properties and their values. This combination should represent all the semantic information of the category and be able to distinguish the different geospatial categories within and beyond domain ontologies to avoid ambiguity.
(4)
The hypernym, hyponym, and synonym relations should be included in the hierarchical relation group. If category A is a hyponym of category B, A must inherent all the semantic properties/relations of B to retain semantic consistency.
According to the above-mentioned rules, the semantic primitives can be specified into these properties/relations types as structure statements for identification and representation of the GI categories. For example, the free-text definition of the “canal” category in English is “manmade waterway used by watercraft or for drainage, irrigation, mining, or water power”. In addition, the semantic primitives of the “canal” category are extracted by applying NLP tools to the set of phrases including “manmade waterway”, “used”, “watercraft”, ”drainage”, ”irrigation”, ”mining”, and “water power”. Then, transforming these semantic primitives into the proposed formal ontology model, the semantics of the category “canal” can be represented as a set of several semantic statements as follows:
C C a n a l = { T C = c a n a l H C = H y p e r n y m : w a t e r w a y P C = ( P u r p o s e : w a t e r c r a f t ) ( P u r p o s e : d r a i n a g e ) ( P u r p o s e : i r r i g a t i o n ) ( P u r p o s e : m i n i n g ) ( P u r p o s e : w a t e r p o w e r ) P C = N a t u r e : M a n m a d e }
In addition, the representation in OWL format is illustrated in Figure 4.

3.2. Ontology Mapping Algorithms

3.2.1. Semantics Translation

Assume that we have formal ontologies OA, OB presented in different natural languages, namely, language A (LA) and B (LB), respectively. According to the geospatial formal ontology model introduced in Section 3.1.2, the semantics of ontologies OA and OB consist of category name sets S (CNA) and S (CNB) and semantic statement sets S (SSA) and S (SSB), labeled in different natural languages, in which the semantic statement consists of semantic property/relation types (as illustrated in Figure 3 in Section 3.1.2) and their corresponding values. In order to cross the natural language barrier between OA and OB, algorithm 1 illustrates the process of semantics translation between LA and LB:
Algorithm 1. Semantics Translation.
1: Input: Formal ontologies OA(S(CNA), S(SSA)) in LA
2: Output: Translation candidate result set of the semantics in OA, OA (S(TC(CNA)), S(TC(SSA-object))) in
3: LB.
4: Symbols:
5: S(TC(CNA))—Translation candidate result set of S(CNA) in LB.
6: S(TC(SSA))—Translation candidate result set of S(SSA) in LB.
7: ssA-object—The Object part of the semantic statement ssA.
8:  1:for each category name cnA in S(CNA), translate cnA in LA into cnA in LB by using different Machine Translation (MT) web services (Google Translator API at” http://translate.google.cn/”, Bing Translator API at” http://www.bing.com/translator/?ref=SALL&mkt=zh-CN”, and Baidu Translator API at ”http://fanyi.baidu.com/?aldtype=16047#zh/en/”), collect all of the translation results about cnA, into the translation candidate results TC(cnA), and store all of the category name translation candidate results into the translation candidate set S(TC(CNA));
9:  2:for each semantic statement ssA in S(SSA), according to the OWL triple statement syntax, it can be subdivided into three part, Subject, Predicate, and Object, translate ssA-object in LA into ssA-object in LB by using different Machine Translation (MT) web services, collect all of the translation results about ssA-object, into the translation candidate results TC(ssA-object), and store all of the semantic statements translation candidate results into the translation candidate set S(TC(SSA-object)).
10: Take the “运河” category in Chinese as an example, the semantic primitives of the “运河” category are extracted by applying NLP tools to the set of phrases including “跨流域”, “开凿”, “供调水”, ”航运”, ”人工水道”. Then, transforming these semantic primitives into the proposed formal ontology model, the semantics of the category “运河”can be represented as a set of several semantic statements as follows:
  11:
C 运河 = { T C = 运河 H C = H y p e r n y m : 水道 P C = ( P u r p o s e : 调水 ) ( P u r p o s e : 航运 ) P C = N a t u r e : 人工 R S = T o p o l o g y : 跨流域 }
12: And the semantics translation result of C运河 in English is as follows:
  13:
C 运河 = { T C = ( Canal) H C = H y p e r n y m : ( Waterway , Aqueduct ) P C = ( P u r p o s e : ( Water transfer , Diversion ) ) ( P u r p o s e : ( Shipping ) ) P C = N a t u r e : ( Manual,Artificial ) R S = T o p o l o g y : ( Inter-basin , Across river basins ) }

3.2.2. Semantic Statement Mapping

To determine the mapping relationships between categories in different GI ontologies, the mapping relationships at the semantic statement level should be determined first because the semantic statement presents the most detailed semantic characteristics of the compared categories. Once their relationships are determined, the similarity between categories can be determined quantitatively. Algorithm 2 shows the comparison process for category names and semantic statements between OA and OB. In addition, all the mapping results M(OA, OB) are stored as the basis for the similarity calculation between the concepts in different GI ontologies.
Algorithm 2. Semantic Statement Mapping.
1: Input: OA (S(TC(CNA)), S(TC(SSA-object))) in LB, Formal ontologies OB(S(CNB), S(SSB)) in LB
2: Output: Mapping result set M(OA, OB) about category names and semantic statements between 3: OA and OB.
4: Symbols:
5: T(ss)—semantic property/relation types for a certain semantic statement ss.
6: M(OA, OB)—mapping relationships about category names and semantic statements between OA
7: and OB.
8: 1: for each category name cnA in S(CNA), find the translation candidate results of cnA, TC(cnA),
9:   2: for each translation candidate tc(cnA) in TC(cnA), search S(CNB) in OB, find the matched
10: category name cnB in S(CNB) by applying Equation(10),
11:    3: If there is a translation candidate tc(cnA) has the mapping relationship “exact match”
12:    with cnB, store the mapping result m(cnA, cnB, ‘exact match’) in M(OA, OB);
13:    4: else If there is a translation candidate tc(cnA) has the mapping relationship
14:    “close match” with cnB, store the mapping result m(cnA, cnB, ‘close match’) in M(OA, OB);
15: 5: for each semantic statement ssA in S(SSA), find the translation candidate results of ssA-object,
16: TC(ssA-object),
17: 6: for each translation candidate tc(ssA-object) in TC(ssA-object), search S(SSB-object) in OB,
18: find the matched semantic statement Object, ssB-object in S(SSB) by applying Equation(10),
19:    7: If there is a translation candidate tc(ssA-object) has the mapping relationship “exact
20:    match” with ssB-object, and T(ssA) equals T(ssB), store the mapping result m(ssA, ssB, ‘exact
21:    match’) in M(OA, OB);
22:    8: else If there is a translation candidate tc(ssA-object) has the mapping relationship “close
23:    match” with ssB-object, and T(ssA) equals T(ssB), store the mapping result m(ssA, ssB, ‘close
24:    match’) in M(OA, OB).
  25:
m ( A , B ) = { e x a c t l y m a t c h ,        A i s t h e s a m e w o r d o r s y n o n y m o f B c l o s e m a t c h ,                     A i s t h e n e a r s y n o n y m o f B n o t m a t c h ,                                            o t h e r w i s e

3.2.3. Similarity Calculation

Given two categories, Ca and Cb in the formal ontologies OA and OB, respectively, based on the M(OA, OB), the semantic similarity between Ca and Cb can be calculated using algorithm 3.
Algorithm 3. Similarity Calculation.
1: Input: Categories Ca(CNa, SSa) in OA, Cb(CNb, SSb) in OB and mapping relationship set M(OA, OB)
2: about category names and semantic statements between OA and OB.
3: Output: Semantic similarity value between Ca and Cb, Sim(a, b).
4: Symbols:
5: Cot(SSa)—the number of semantic statements in SSa.
6: Cot(SSb)—the number of semantic statements in SSb.
7: m(CNa, CNb)mapping relationship between CNa and CNb.
8: m(SSa(i), SSb(j))mapping relationship between SSa(i) in Ca and SSb(j) in Cb.
9: Pt(SSab)the sum of the match point value between SSa and SSb.
10: Pt(CNab)the match point value between CNa and CNb.
11: 1: for each semantic statement SSa(i) in SSa, find the matched semantic statement SSb(j) in SSb
12: based on the mapping relationship set M(OA, OB);
13: If m(SSa(i), SSb(j)) = “exact match”, then the match point value between SSa(i) and SSb(j) is assigned 1;
14: Else if m(SSa(i), SSb(j)) = “close match”, then the match point value between SSa(i) and SSb(j) is
15: assigned 0.5;
16: 2: Record the sum of the match point values between SSa and SSb as Pt(SSab) and the number of
17: matched statements between SSa and SSb as Cot(SSab);
18: 3: find the mapping relationship between CNa and CNb based on M(OA, OB),
19: If m(CNa, CNb) = “exact match”, then the match point value between CNa and CNb is assigned 1;
20: Else if m(CNa, CNb) = “close match”, then the match point value between CNa and CNb is
21: assigned 0.5;
22: 4: Record the match point value between CNa and CNb as Pt(CNab);
23: 5: the similarity of categories Ca and Cb can be calculated using the following equation:
  24:
S i m ( a , b ) = { 1 2 * P t ( S S a b ) C o t ( S S a ) + 1 2 * P t ( S S a b ) C o t ( S S b ) ,                                      i f m ( C N a , C N b ) = n o t m a t c h 1 3 * P t ( S S a b ) C o t ( S S a ) + 1 3 * P t ( S S a b ) C o t ( S S b ) + P t ( C N a b ) 3 ,         i f m ( C N a , C N b ) = e x a c t m a t c h / c l o s e m a t c h
25: In addition, the mapping relationships between category pairs Ca and Cb, namely, MR(a, b), can
26: be determined using the following equation:
  27:
M R ( a , b ) = { e x a c t m a t c h ,       i f     S i m ( a , b ) = 1 c l o s e m a t c h ,     i f 0.5 < = S i m ( a , b ) < 1 r e l a t e d ,     i f       0 < S i m ( a , b ) < 0.5 n o t m a t c h ,     i f        S i m ( a , b ) = 0

4. A Case Study

4.1. Study Material

To illustrate the methodologies, two different classification standards in two corresponding natural languages have been selected for use in the mapping process. CSC is developed based on the national topographic map standards in China (Standards of “Cartographic symbols for national fundamental scale maps” and “Specifications for feature classification and codes of fundamental geographic information”). CSA is developed by the U.S. Geological Survey in America (http://cegis.usgs.gov/ttl/USTopographic.ttl). Both standards are digital literature materials; the category names and their free-text definitions are provided as source information for our experiment. In addition, for the sake of simplicity and clarity, our study was restricted to the “surface water” categories from these two classification standards. Table 3 briefly lists the characteristics of these two selected dataset, with detailed explanations as follows:
(1)
Both standards have their own classification system to address the categories of “surface water”. The categories in CSC are organized using a four-level hierarchy with six major categories. By contrast, the categories in CSA are organized by a four-level hierarchy with 81 major categories, which means that the hierarchical structure of CSA does not closely match that of CSC.
(2)
The free-text definitions in both standards are used as category definitions.
(3)
The number of categories in CSC is 74, and the number of concepts in CSA is 92; thus, the CSA covers more category types than does CSC.
(4)
The natural language in CSC is Chinese, whereas the natural language in CSA is English, which means that there is a natural language barrier between these two GI classification standards.

4.2. Results

The well-defined category definitions in both CSC and CSA serve as the basis for our study. The Web Ontology Language (OWL) API is integrated to facilitate the implementation of the proposed algorithm in Eclipse with the JAVA language, and the experiment results are as follows.

4.2.1. Semantic Statement Mappings

The semantic primitives are extracted using the Stanford Natural Language Processing Tools (http://nlp.stanford.edu/software/) and are transformed into the formal ontologies OC and OA with the set of category names and semantic statements by domain experts and encoded by the OWL via Protégé. Using the semantic statement mapping algorithm introduced in Section 3.2.2, the number of mapping relationships between the statements in OC and OA is recorded, and the mapping results for different semantic property/relation types are shown in Table 4.
The total number of semantic statements in OC is 142, and the total number of such statements in OA is 181. In addition, the total mapping rate of the semantic statements between OC and OA is 28.69%. The details of the mapping relationships between semantic statements in each type can be found in Appendix.
For the semantic statement about the semantic property types, the most matched type is “purpose”. This is because the semantic property type of “purpose” is used to represent the manmade category, which includes “ditch”, “canal”, and “dam”, and the free-text definitions in both the Chinese and American classification standards for these types of categories are very similar. The semantic information about purpose and functionality are considered as the crucial characteristics of the categories. It is easy to understand that the semantic property type “nature” has the highest mapping rate, namely, 100%, because there are only two values for this type of semantic statement, namely, “natural” and “manmade”, in both OC and OA. Considering the semantic property type “location”, there are seven semantic statements in OC, and eight in OA, but the mapping rate of this type is extremely low(only one semantic statement is mapped with mapping rate 7.14%). That’s because the semantic property type “location” is used to describe the region environment where certain geographic category is at, and a lot of the categories in OA are bay-related or glacier-related, such as “glacier”, “ice cap” and “iceberg tongue” with semantic property value of “location”, “mountainous area”, “regions of perennial frost”, and “coast”, respectively, and there are no such categories in OC. For the semantic statement about the relation types, the most matched type is “spatial relation”, which is also the type with the highest mapping rate, indicating that the spatial-related relations play a major role in the identification and representation of GI semantics.

4.2.2. Similarity Calculation and Category Mappings

The similarities between concepts are calculated using the semantic statement mapping relationships and Algorithm 3 proposed in Section 3.2.3. Three typical examples of the mapping results between categories are chosen for further discussion. Table 5 shows the names and free-text definitions of the compared category pairs. In addition, the corresponding semantic statements, calculated similarity values and final mapping relationships between these category pairs are presented in Table 6.
Example 1: Concept pair of “spillway” in OC and “spillway” in OA
These two concepts are comparable because the mapping relationship between their concept names is “exact match”. Because their concept names and four semantic statements are matched (detailed mapping relationships are illustrated in Table 6, line 1 and 2), the second condition in Equation (9) is used to calculate the final similarity between “spillway” in OC and “spillway” in OA. The similarity value between these two concepts is calculated as 0.78; thus, the mapping relationship between these two concepts is “close match”. This example demonstrates the simplest case for the calculation of the semantic similarity between concepts.
Example 2: Concept pair of “arroyo (dry river)” in OC and “wash” in OA
In this example, the mapping relationship between the concept name of “arroyo (dry river)” and “wash” cannot be determined based on the mapping algorithm in Section 3.2.1. However, the similarity value between these two concepts is higher than the value in example (1). This is because all the semantic statements used to represent the semantic meaning of these two concepts are correspondingly matched (detailed mapping relationships are illustrated in Table 6, line 3 and 4), and all the mapping relationships between them are “exact match”. The first condition in Equation 9 is used to calculate the final similarity between the concepts “arroyo (dry river)” in OC and “wash” in OA. The similarity value between these two concepts is calculated as 1.0; thus, the mapping relationship between these two concepts is “exact match”. This example demonstrates a common situation in the cross-lingual environment in that two concepts have the same semantic meaning while their names are definitely different. Moreover, the utility of applying our methodologies to the complex application of cross-lingual GI ontology integration has been proven.
Example 3: Concept pair of concept 3 “Water System” in OC and concept 3 “Surface water” in OA
At first glance, the semantic statements between the concept “water system” and “surface water” are not matched very well, and the concept names of these two concepts cannot be matched either.
This is because these two concepts are both the top concept in their own taxonomies, and these two concepts are abstract concepts in that they do not represent real-world objects with detailed characteristic entities, for example, rivers, lakes, and oceans. Thus, the definitions of this category in different languages may be very different, even when they are conveying the same meaning. Therefore, the solution for the semantic meaning representation of this type concept is not the same as the solution used in Examples (1) and (2). The sematic meaning of the hyponym-related concepts should be considered to infer the integrated semantic meaning of this abstract concept. After the implicit semantic statements have been inferred out (detailed mapping relationships are illustrated in Table 6, line 5 and 6), the first condition in Equation (9) is used to calculate the final similarity between the concepts “water system” in OC and “surface water” in OA. The similarity value between these two concepts is calculated as 0.92; thus, the mapping relationship between these two concepts is “close match”.

5. Conclusions and Future Work

The presented research focuses on the determination of semantic mapping relationships between categories in different GI ontologies with natural language barriers. The proposed formal ontology model in this study is used to represent and identify the semantic characteristics of the GI categories with OWL-based semantic statements transformed from free-text definitions of two GI classification standards. A new similarity calculation algorithm based on this formal ontology model is presented to distance the semantic similarities and identify the mapping relationships between categories.
In particular, we work with two classification standards of topographic maps in Chinese and American English. The conducted experiment indicates that the proposed approach successfully determines the mapping relationships between categories in different GI ontologies and facilitates ontology integration in a cross-lingual environment. Due to the usages of the multilingual supported NLP tools in our experiment, it is easy to replicate our model to determine the mapping relationships between other GI ontologies, which may be described using other native natural languages, in addition to Chinese. However, this model has only been applied to geospatial information (GI) integration at the category level, and research on GI integration at the data level has not been fulfilled. That will form the basis for future study. In addition, publishing the mapping information in a cross-lingual context as linked data in a semantic web environment should also be considered.

Acknowledgments

This research is supported by the National Administration of Surveying, Mapping and Geoinformation, China, under the Special Fund for Surveying, Mapping and Geographical Information Scientific Research in the Public Interest (No. 201412014), and Specialized Research Fund for the Doctoral Program of Higher Education (No. 20120141110048).

Author Contributions

This research was mainly performed and prepared by Xi Kuai and Lin Li. Xi Kuai and Lin Li contributed with ideas, conceived and designed the study. Xi Kuai wrote the paper. Heng Luo, Hang Shen and Yu Liu contributed the tools, and analyze the results of the experiment. Zhijun Zhang reviewed and edited the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix: Detail of the Mapping Statements between OC and OA

Table A1. Detail of the Mapping Statements between OC and OA.
Table A1. Detail of the Mapping Statements between OC and OA.
Property TypesSemantic Statements in Oc in ChineseTranslation of Semantic Statements in Oc in EnglishSemantic Statements in OA in EnglishMapping RelationsSemantic Statements in Oc in ChineseTranslation of Semantic Statements in Oc in EnglishSemantic Statements in OA in EnglishMapping Relations
MaterialwaterwaterExact matchstonestonesExact match
水蒸气water vaporvaporsExact match木桩wooden stakewoodClose match
mudmudExact match草地grasslandgrassyClose match
brickbrickExact match砾石gravelgravelExact match
sandsandExact match礁石reefreefExact match
水泥cementconcreteExact match
Nature自然的naturalnaturalExact match
人造的manmademanmadeExact match
Status流动flowflowingExact match倾泻pourmoving outward an downslopeClose match
独立stand alongfree standingExact match高潮时被水体淹没,低潮时露出submerged at high tide the water, exposed at low tidealternately covered and left bare by the tideExact match
有水潮浸tide water immersionwashed by waves or tidesClose match涌出emissionissue from the groundClose match
干涸dried updryExact match洪水泛滥floodsubject to floodingExact match
Temporality长期long-termpermanentClose match降水或融雪后短时间内within a short time after rainfall or snowmeltduring or after a local rainstorm or heavy snowmeltExact match
终年all year roundpermanentExact match季节性seasonaloccasionallyClose match
Location沙地sandydesertClose match
Purpose引水water diversionrun waterExact match减缓水流流速slow water flow raterestrain current or tideClose match
输水water deliveryconveying waterExact match保护港口protection of harborprotect harborExact match
贮水water storagecontain waterExact match护岸bank protectionsustain an embankmentExact match
将水位升高或降低,使船能在不同高低水位的水道间通行To raise or lower the water level, at different high and low water level so the ship channel trafficraise and lower vessels as they pass from one level to another.Exact match抬高水位raising of water levelraise the level of waterExact match
控制流量control flowcontrol the flow of waterExact match通行船只passage vesselroute for watercraftExact match
灌溉irrigationirrigationExact match拦截河流blocked riversAcross the course of a streamClose match
调节水流方向adjusting to the flow directiondirect current or tideExact match扬水pump up water-PumpExact match
Morphology陡坡steep slopea vertical or near vertical descentClose match坝式dam typedamExact match
虹吸式siphonsiphonExact match
Cause堆积accumulationaccumulateExact match
Relation Types
Hierarchical Relation源头sourcesourceExact match设施facilitiesfacilityExact match
河床riverbedchannel bottomExact match构筑物structureconstructionExact match
区域regionalregionExact match通道channelpathExact match
地带zonezoneExact match水道waterwayswaterwayExact match
设备devicedeviceExact match
Spatial Relation地面上on the groundon the surface of the landClose match水体平均大潮高潮面与水体最低低潮面之间mean high water springs of water and water between the lowest low waterBetween high water and low water marksExact match
水体内in body of waterin waterExact match沿河流along the riveralongside a streamExact match
海域内within the seain the seaExact match水陆间between land and watercontact between a body of water and the landExact match
水下underwaterbelow the surface of waterExact match洼地内in the depressionssurrounded by landClose match
跨流域across river basinsacross the course of a streamExact match陆地上on the landCovered with the earthClose match
跨道路cross roadscrossing road or trailExact match海岸线与干出线之间between the coastline and the dry lineBetween high water and low water linesExact match
海岸边coastaladjacent to the shoreExact match海岸边the coastoffshoreClose match
Is-part-of网状水系network drainagenetwork of interlacing channelsExact match水库reservoirdamClose match
网状水系network drainagea drainage networkExact match河渠canala river systemClose match
闸室chamberlock chamberExact match

References

  1. Gore, A. The digital earth: Understanding our planet in the 21st century. Photogramm. Eng. Remote Sens. 1999, 65. [Google Scholar] [CrossRef]
  2. Craglia, M.; Goodchild, M.F.; Annoni, A.; Camara, G.; Gould, M.; Kuhn, W.; Mark, D.; Masser, I.; Maguire, D.; Liang, S.; et al. Next-generation digital earth: A position paper from the vespucci initiative for the advancement of geographic information science. Int. J. Spat. Data Infrastruct. Res. 2008, 3, 146–167. [Google Scholar]
  3. Craglia, M.; de Bie, K.; Jackson, D.; Pesaresi, M.; Remetey-Fülöpp, G.; Wang, C.; Annoni, A.; Bian, L.; Campbell, F.; Ehlers, M.; et al. Digital Earth 2020: Towards the vision for the next decade. Int. J. Digit. Earth 2012, 5, 4–21. [Google Scholar] [CrossRef]
  4. Yue, P.; Di, L.; Yang, W.; Yu, G.; Zhao, P. Semantics-based automatic composition of geospatial Web service chains. Comput. Geosci. 2007, 33, 649–665. [Google Scholar] [CrossRef]
  5. Janowicz, K.; Hitzler, P. The digital earth as knowledge engine. Semant. Web 2012, 3, 213–221. [Google Scholar]
  6. Janowicz, K. Observation-driven geo-ontology engineering. Trans. GIS 2012, 16, 351–374. [Google Scholar] [CrossRef]
  7. Buccella, A.; Cechich, A.; Gendarmi, D.; Lanubile, F.; Semeraro, G.; Colagrossi, A. Building a global normalized ontology for integrating geographic data sources. Comput. Geosci. 2011, 37, 893–916. [Google Scholar] [CrossRef]
  8. Bishr, Y. Overcoming the semantic and other barriers to GIS interoperability. Int. J. Geogr. Inf. Sci. 1998, 12, 299–314. [Google Scholar] [CrossRef]
  9. Lemmens, R.L. Semantic Interoperability of Distributed Geoservices. Ph.D. Thesis, Delft University of Technology, Delft, The Netherlands, 2006. [Google Scholar]
  10. Fallahi, G.R.; Frank, A.U.; Mesgari, M.S.; Rajabifard, A. An ontological structure for semantic interoperability of GIS and environmental modeling. Int. J. Appl. Earth Obs. Geoinf. 2008, 10, 342–357. [Google Scholar] [CrossRef]
  11. Ma, X.; Wu, C.; Carranza, E.J.M.; Schetselaar, E.M.; van der Meer, F.D.; Liu, G.; Wange, X.; Zhang, X. Development of a controlled vocabulary for semantic interoperability of mineral exploration geodata for mining projects. Comput. Geosci. 2010, 36, 1512–1522. [Google Scholar]
  12. Kuhn, W. Geospatial semantics: Why, of what, and how? J. Data Semant. III 2005, 3534, 1–24. [Google Scholar]
  13. Hong, J.-H.; Kuo, C.-L. A semi-automatic lightweight ontology bridging for the semantic integration of cross-domain geospatial information. Int. J. Geogr. Inf. Sci. 2015, 29, 1–25. [Google Scholar] [CrossRef]
  14. Fonseca, F.T.; Egenhofer, M.J.; Davis, C.A., Jr.; Borges, K.A.V. Ontologies and knowledge sharing in urban GIS. Comput. Environ. Urban Syst. 2000, 24, 251–272. [Google Scholar] [CrossRef]
  15. Pundt, H.; Bishr, Y. Domain ontologies for data sharing–An example from environmental monitoring using field GIS. Comput. Geosci. 2002, 28, 95–102. [Google Scholar] [CrossRef]
  16. Yang, C.; Raskin, R.; Goodchild, M.; Gahegan, M. Geospatial Cyberinfrastructure: Past, present and future. Comput. Environ. Urban Syst. 2010, 34, 264–277. [Google Scholar] [CrossRef]
  17. Stoimenov, L.; Stanimirovic, A.; Djordjevic-Kajan, S. Discovering mappings between ontologies in semantic integration process. In Proceedings of the 9th AGILE Conference on Geographic Information Science, Visegrád, Hungary, 20–22 April 2006.
  18. Janowicz, K.; Raubal, M.; Kuhn, W. The semantics of similarity in geographic information retrieval. J. Spat. Inf. Sci. 2011, 2, 29–57. [Google Scholar] [CrossRef]
  19. Schwering, A.; Raubal, M. Spatial relations for semantic similarity measurement. In Perspectives in Conceptual Modeling; Springer-Verlag: Heidelberg, Germany, 2005; pp. 259–269. [Google Scholar]
  20. Hakimpour, F. Using Ontologies to Resolve Semantic Heterogeneity for Integrating Spatial Database Schemata; Zurich University: Zurich, Switzerland, 2003. [Google Scholar]
  21. Mark, D.M.; Skupin, A.; Smith, B. Features, objects, and other things: Ontological distinctions in the geographic domain. In Spatial Information Theory; Springer: New York, NY, USA, 2001; pp. 489–502. [Google Scholar]
  22. Stadler, C.; Jens, L.; Konrad, H.; Sören, A. Linkedgeodata: A core for a web of spatial open data. Semantic Web 2012, 3, 333–354. [Google Scholar]
  23. Trojahn, C.; Fu, B.; Zamazal, O.; Ritze, D. State-of-the-Art in Multilingual and Cross-Lingual Ontology Matching; Springer: Heidelberg, Germany, 2014. [Google Scholar]
  24. Liu, K.; Hogan, W.R.; Crowley, R.S. Natural Language Processing methods and systems for biomedical ontology learning. J. Biomed. Inform. 2011, 44, 163–179. [Google Scholar] [CrossRef] [PubMed]
  25. Buitelaar, P.; Cimiano, P.; Magnini, B. Ontology learning from text: Methods, evaluation and applications. Comput. Linguist. 2006, 32, 569–572. [Google Scholar]
  26. Bird, S.; Klein, E.; Loper, E. Natural Language Processing with Python; O’Reilly Vlg. GmbH & Co.: Sebastopol, CA, USA, 2009. [Google Scholar]
  27. Jurafsky, D.; Martin, J.H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition; Prentice Hall: Upper Saddle River, NJ, USA, 2000. [Google Scholar]
  28. Kavouras, M.; Kokla, M.; Tomai, E. Comparing categories among geographic ontologies. Comput. Geosci. 2005, 31, 145–154. [Google Scholar] [CrossRef]
  29. Kavouras, M.; Kokla, M. Theories of Geographic Concepts: Ontological Approaches to Semantic Integration; CRC Press: Boca Raton, FL, USA, 2008. [Google Scholar]
  30. Bittner, T.; Donnelly, M.; Smith, B. A spatio-temporal ontology for geographic information integration. Int. J. Geogr. Inf. Sci. 2009, 23, 765–798. [Google Scholar] [CrossRef]
  31. Zheng, J.G.; Fu, L.Y.; Ma, X.G.; Fox, P. SEM+: Tool for discovering concept mapping in Earth science related domain. Earth Sci. Inform. 2015, 8, 1–8. [Google Scholar] [CrossRef]
  32. Schwering, A. Approaches to semantic similarity measurement for geo-spatial data: A survey. Trans. GIS 2008, 12, 5–29. [Google Scholar] [CrossRef]
  33. Buccella, A.; Cechich, A.; Fillottrani, P. Ontology-driven geographic information integration: A survey of current approaches. Comput. Geosci. 2009, 35, 710–723. [Google Scholar] [CrossRef]
  34. Ballatore, A.; Bertolotto, M.; Wilson, D.C. Geographic knowledge extraction and semantic similarity in OpenStreetMap. Knowl. Inf. Syst. 2013, 37, 61–81. [Google Scholar] [CrossRef]
  35. Ballatore, A.; Wilson, D.C.; Bertolotto, M. Computing the semantic similarity of geographic terms using volunteered lexical definitions. Int. J. Geogr. Inf. Sci. 2013, 27, 2099–2118. [Google Scholar] [CrossRef]
  36. Li, W.; Raskin, R.; Goodchild, M.F. Semantic similarity measurement based on knowledge mining: An artificial neural net approach. Int. J. Geogr. Inf. Sci. 2012, 26, 1415–1435. [Google Scholar] [CrossRef]
  37. Xu, Y.; Xie, Z.; Chen, Z. Research on semantics of entity space similarity measure based on artificial neural networks. In Proceedings of the 23rd International Conference on Geoinformatics, Wuhan, China, 19–21 June 2015.
  38. Laurini, R. Geographic ontologies, gazetteers and multilingualism. Future Internet 2015, 7, 1–23. [Google Scholar] [CrossRef]
  39. Fu, B.; Brennan, R.; O’Sullivan, D. A configurable translation-based cross-lingual ontology mapping system to adjust mapping outcomes. Web Semant. Sci. Serv. Agents World Wide Web 2012, 15, 15–36. [Google Scholar] [CrossRef]
  40. Sini, A.; Sini, M. Mapping AGROVOC and the Chinese Agricultural Thesaurus: Definitions, tools, procedures. New Rev. Hypermedia Multimed. 2006, 12, 51–62. [Google Scholar]
  41. Wang, S.; Isaac, A.; Schopman, B.; Schlobach, S.; van der Meij, L. Matching multi-lingual subject vocabularies. In Research & Advanced Technology for Digital Libraries; Springer: Berlin, Germany, 2009; pp. 125–137. [Google Scholar]
  42. Meilicke, C.; García-Castrod, R.; Freitas, F.; van Hage, W.R.; Montiel-Ponsoda, E.; de Azevedo, R.R.; Stuckenschmidt, H.; Šváb-Zamazal, O.; Svátek, V.; Tamilin, A.; et al. MultiFarm: A benchmark for multilingual ontology matching. J. Web Semant. 2012, 15, 62–68. [Google Scholar] [CrossRef]
  43. Ngai, G.; Carpuat, M.; Fung, P. Identifying concepts across languages: A First step towards a corpus-based approach to automatic ontology alignment. In Proceedings of the 19th international conference on Computational linguistics, Stroudsburg, PA, USA, August 2002.
  44. Pazienza, M.T.; Stellato, A. Linguistically motivated ontology mapping for the semantic web. In Proceedings of the 2nd Italian Semantic Web Workshop, Trento, Italy, 14–16 December 2005; pp. 14–16.
  45. Jung, J.J.; Håkansson, A.; Hartung, R. Indirect alignment between multilingual ontologies. In Agent and Multi-Agent Systems: Technologies and Applications; Springer: Berlin, Germany, 2009; Volume 5559, pp. 233–241. [Google Scholar]
  46. Trojahn, C.; Quaresma, P.; Vieira, R. A Framework for multilingual ontology mapping. In Proceedings of the International Conference on Language Resources and Evaluation, Marrakech, Morocco, 28–30 May 2008; pp. 1034–1037.
  47. Wang, S.; Englebienne, G.; Schlobach, S. Learning concept mappings from instance similarity. In The Semantic Web—ISWC 2008; Springer: Berlin, Germany, 2008; pp. 339–355. [Google Scholar]
  48. Zheng, Q.; Shao, C.; Li, J.; Wang, Z.; Hu, L. RiMOM2013 results for OAEI 2013. In Proceedings of the 8th International Conference on Workshop on Ontology Matching, Sydney, Australia, 21 October 2013.
  49. Zhang, X.; Zhong, Q.; Shi, F.; Li, J.; Tang, J. RiMOM results for OAEI 2009. In Proceedings of the 4th International Conference on Workshop on Ontology Matching, Washington, DC, USA, 25 October 2009.
  50. Wang, Z.; Zhang, X.; Hou, L.; Zhao, Y.; Li, J.; Qi, Y.; Tang, J. RiMOM results for OAEI 2010. In Proceedings of the 5th International Conference on Ontology Matching, Shanghai, China, 7 November 2010.
  51. Euzenat, J.; Shvaiko, P. Ontology Matching; Springer: Berlin, Germany, 2007. [Google Scholar]
  52. Ehrig, M.; Sure, Y. Ontology mapping–An integrated approach. In The Semantic Web: Research and Applications; Springer: Berlin, Heidelberg, Germany, 2004; pp. 76–91. [Google Scholar]
  53. Kalfoglou, Y.; Schorlemmer, M. Ontology mapping: The state of the art. Knowl. Eng. Rev. 2003, 18, 1–31. [Google Scholar] [CrossRef]
  54. Doan, A.H.; Madhavan, J.; Domingos, P.; Halevy, A. Ontology matching: A machine learning approach. In International Handbooks on Information Systems; Springer: Berlin, Germany, 2004; pp. 397–416. [Google Scholar]
  55. Kantor, P. Foundations of statistical natural language processing. Nat. Lang. Eng. 1999, 26, 91–92. [Google Scholar]
  56. MacCartney, B. The Stanford Natural Language Processing Group. Available online: http://nlp.stanford.edu/ (accessed on 18 December 2015).
  57. Guarino, N. Formal ontology, conceptual analysis and knowledge representation. Int. J. Hum. Comput. Stud. 1995, 43, 625–640. [Google Scholar] [CrossRef]
  58. Herre, H. General Formal Ontology (GFO): A foundational ontology for conceptual modelling. In Theory & Applications of Ontology Computer Applications; Springer Netherlands: Dordrecht, The Netherlands, 2010; pp. 297–345. [Google Scholar]
  59. Frank, A.U. Ontology for spatio-temporal databases. In Spatio-Temporal Databases; Springer: Berlin, Germany, 2003; pp. 9–77. [Google Scholar]
  60. Casati, R.; Smith, B.; Varzi, A.C. Ontological tools for geographic representation. In Formal Ontology in Information Systems; IOS Press: Amsterdam, The Netherlands, 1998; pp. 77–85. [Google Scholar]
  61. Grenon, P.; Smith, B. SNAP and SPAN: Towards dynamic spatial ontology. Spat. Cognit. Comput. 2004, 4, 69–104. [Google Scholar] [CrossRef]
Figure 1. Main procedure for our methodologies.
Figure 1. Main procedure for our methodologies.
Ijgi 05 00090 g001
Figure 2. Extract the semantic primitives from the free-text definitions by applying NLP tools in Chinese and English.
Figure 2. Extract the semantic primitives from the free-text definitions by applying NLP tools in Chinese and English.
Ijgi 05 00090 g002
Figure 3. Semantic property and relation groups in the Geospatial Formal Ontology Model.
Figure 3. Semantic property and relation groups in the Geospatial Formal Ontology Model.
Ijgi 05 00090 g003
Figure 4. Representation of the category “canal” in OWL format: (a) The OntoGraf view in Protégé and (b) the semantic statement presentation in turtle file format.
Figure 4. Representation of the category “canal” in OWL format: (a) The OntoGraf view in Protégé and (b) the semantic statement presentation in turtle file format.
Ijgi 05 00090 g004
Table 1. Summary of the Penn Treebank Part-of-Speech Tag sets in English.
Table 1. Summary of the Penn Treebank Part-of-Speech Tag sets in English.
Part of SpeechAbbrPart of SpeechAbbrPart of SpeechAbbr
AdjectiveJJExclamationUHPossessive wh-pronounWP$
Adjective comparativeJJRExistentialEXPredeterminerPDT
Adjective superlativeJJSForeign wordFWProper noun pluralNNPS
AdverbRBGerundVBGProper nounNNP
Adverb comparativeRBRList item markerLSSymbolSYM
Adverb superlativeRBSModal verbMDtoTO
ArticleDTParticiple pastVBNVerb base formVB
Cardinal numberCDParticleRPVerb present tenseVBP
Common noun pluralNNSPast tense verbVBDVerb 3rd person singularVBZ
Common noun singular or massNNPersonal pronounPRPWh-determinerWDT
Conjunction coordinatingCCPossessive endingPOSWh-pronounWP
Conjunction subordinatingINPossessive pronounPRP$Wh-adverbWRB
Table 2. Summary of the Penn Treebank Part-of-Speech Tag sets in Chinese.
Table 2. Summary of the Penn Treebank Part-of-Speech Tag sets in Chinese.
Part of SpeechAbbrPart of SpeechAbbrPart of SpeechAbbr
adverbADdeterminerDTproper nounNR
aspect markerASfor words “dengdeng”(“等等”)ETCtemporal nounNT
in ba-constructionBAforeign wordsFWordinal numberOD
coordinating conjunctionCCinterjectionIJonomatopoeiaON
cardinal numberCDother noun-modifierJJpreposition excl. “bei”(“被”) and “ba”(“把”)P
subordinating conjunctionCS“bei”(“被”) in long bei-constLBpronounPN
“de”(“的”)in a relative-clauseDEClocalizerLCpunctuationPU
Associative “de”DEGmeasure wordM“bei”(“被”) in short bei-constSB
“de”(“得”) inV-deconst. and V-de-RDERother particleMSPsentence-final particleSP
“di”(“地”) before VPDEVcommon nounNNpredicative adjectiveVA
“shi”(“是”)VC“you”(“有”) as the main verbVEother verbVV
Table 3. Characteristics of CSC and CSA.
Table 3. Characteristics of CSC and CSA.
CharacteristicCSCCSA
Number of categories7492
Classification systemTaxonomy (without overlap)Taxonomy (without overlap)
Levels of hierarchy44
Number of major categories681
DefinitionFree-text, unstructuredFree-text, unstructured
AttributeId, Category nameCategory name, Source of the definition
LanguageChineseEnglish
Table 4. Condition of the mapping statements between OC and OA.
Table 4. Condition of the mapping statements between OC and OA.
Number of Semantic Statements in OCNumber of Semantic Statements in OANumber of Mapping StatementsMapping Rate
Property Types801114429.93%
Spatial PropertiesLocation7817.14%
Morphology423312.50%
Measurement2100.00%
Temporal PropertiesTime Period24375%
Time Point31133%
Other Semantic PropertiesMaterial Composition15221142.31%
Nature222100.00%
Status1922824.24%
Cause26114.29%
Purpose24221443.75%
Relation Types62702826.92%
Hierarchical Relations2524922.50%
Spatial RelationsTopology Relations27291433.33%
Part-Whole Relations917523.81%
Temporal Relations0000.00%
Other Related Relations1000.00%
Total1421817228.69%
Table 5. Names and free-text definitions of the compared concept pairs.
Table 5. Names and free-text definitions of the compared concept pairs.
Concept PairsConceptsNamesFree-Text Definitions
Pair 1Concept 1 in OC溢洪道水库的泄洪水道,用以排泄水库预定蓄水高度以上的洪水。
Translation of Concept 1 in OCSpillwayReservoir spillway channel to drain reservoir reservation head above the flood.
Concept 1 in OASpillwayA passage for surplus water to run over or around a dam.
Pair 2Concept 2 in OC干河床(干涸河)降水或融雪后短暂时间内有水的河床或河流改道后遗留的河道。
Translation of Concept 2 in OCArroyo (dry river)Precipitation or snowmelt water within a short time after the river or river diversions left after the river.
Concept 2 in OAWashThe usually dry portion of a stream bed that contains water only during or after a local rainstorm or heavy snowmelt.
Pair 3Concept 3 in OC水系江、河、湖、海、井、泉、水库、池塘、沟渠等自然和人工水体及连通体系的总称。
Translation of Concept 3 in OCWater SystemRiver, river, lake, sea, wells, springs and reservoirs, ponds, ditches, and other natural and artificial water bodies and the connected system in general.
Concept 3 in OASurface WaterThe water portion of the Earth’s surface, including the surface of sea and inland waters
Table 6. Example of categories definitions and similarity calculation.
Table 6. Example of categories definitions and similarity calculation.
ConceptsSemantic StatementsTranslation of Semantic Statement in OCMapping Relationships between StatementSimilarity ValuesMapping Results
Concept 1 in OC(Hypernym: 水道)⊓ (Is-Part-Of:水库) ⊓ (Purpose:排泄洪水)(Hypernym: Waterways) ⊓ (Is-Part-Of:Reservoir) ⊓ (Purpose:Drain flood)“Spillway” Exact match “Spillway”(Concept Name) “Hypernym:Waterways” Close match ”Hypernym:Passage” “Is-Part-Of:Reservoir”Close match “Is-Part-Of:Dam” “Purpose:Drain flood” Exact match ”Purpose:Surplus Water”0.78Close Match
Concept 1 in OA(Hypernym: Passage) ⊓ (Is-Part-Of: Dam) ⊓ (Purpose: Surplus Water)
Concept 2 in OC(Hypernym:河床)⊓(Material:水)⊓(Status:干涸)⊓(Temporality:降雪或融雪后)(Hypernym: riverbed) ⊓(Material: water) ⊓(Status: dry) ⊓(Temporality: After the rainfall or snowmelt)“Hypernym:riverbed” Exact match “Hypernym:Streambed” “Material:water” Exact match “Material:Water” “Status:dry”Exact match “Status:Dry” “Temporality:After the rainfall or snowmelt” Exact match “Temporality:during or after a local rainstorm or heavy snowmelt”1.0Exact Match
Concept 2 in OA(Hypernym: Streambed) ⊓(Material: Water) ⊓(Status: Dry) ⊓(Temporality: during or after a local rainstorm or heavy snowmelt)
Concept 3 in OC(Hyponym:江)⊓ (Hyponym:河) ⊓ (Hyponym:湖)⊓ (Hyponym:海)⊓ (Hyponym:井) ⊓ (Hyponym:泉) ⊓ (Hyponym:水库) ⊓ (Hyponym:池塘)⊓ (Hyponym:沟渠) ⊓ (Hyponym:水体) ⊓(Nature:自然⊔ Nature:人工) ⊓ (Material: 水)(Hyponym: river) ⊓ (Hyponym: river) ⊓ (Hyponym: lake) ⊓ (Hyponym: sea) ⊓ (Hyponym: well) ⊓ (Hyponym: spring) ⊓ (Hyponym: reservoir) ⊓ (Hyponym: pond) ⊓ (Hyponym: ditch) ⊓ (Hyponym: body of water) ⊓ ( Nature: natural ⊔ Nature: artificial) ⊓ (Material: water)“Hyponym: river” Exact match “Hyponym:River” “Hyponym: river” Exact match “Hyponym:Stream” “Hyponym:lake” Exact match “Hyponym:Lake” “Hyponym:sea” Exact match “Hyponym:Sea” “Hyponym:spring” Exact match “Hyponym:Spring” “Hyponym:reservoir” Exact match “Hyponym:Reservoir” “Hyponym:pond” Exact match “Pond” “Hyponym:ditch” Exact match “Ditch” “Hyponym:body of water” Exact match “Hyponym:Water body” “Nature:natural” Exact match “Nature:Natural” “Nature:artificial” Exact match “Nature:Manmade” “Material:water” Exact match “Material:Water”0.92Close Match
Concept 3 in OA(Material: Water) ⊓ (Hyponym: Sea) ⊓ (Hyponym: Inland Water) ⊓ (Is-Part-Of: Earth’s surface) ⊓ [(Hyponym: River) ⊓ (Hyponym: Stream)⊓ (Hyponym: Lake) ⊓ (Hyponym: Spring)⊓ (Hyponym: Reservoir) ⊓ (Hyponym: Pond)⊓ (Hyponym: Ditch) ⊓ (Hyponym: Water body)⊓ (Nature: Natural)⊓ (Nature: Manmade)] (The semantic statements in “(…)” were not the semantic information extracted from the free-text definition and were inferred based on the semantic statements in other concepts, which have a hierarchical relation with the concept. They were added to the concept by the domain expert manually.)

Share and Cite

MDPI and ACS Style

Kuai, X.; Li, L.; Luo, H.; Hang, S.; Zhang, Z.; Liu, Y. Geospatial Information Categories Mapping in a Cross-lingual Environment: A Case Study of “Surface Water” Categories in Chinese and American Topographic Maps. ISPRS Int. J. Geo-Inf. 2016, 5, 90. https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi5060090

AMA Style

Kuai X, Li L, Luo H, Hang S, Zhang Z, Liu Y. Geospatial Information Categories Mapping in a Cross-lingual Environment: A Case Study of “Surface Water” Categories in Chinese and American Topographic Maps. ISPRS International Journal of Geo-Information. 2016; 5(6):90. https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi5060090

Chicago/Turabian Style

Kuai, Xi, Lin Li, Heng Luo, Shen Hang, Zhijun Zhang, and Yu Liu. 2016. "Geospatial Information Categories Mapping in a Cross-lingual Environment: A Case Study of “Surface Water” Categories in Chinese and American Topographic Maps" ISPRS International Journal of Geo-Information 5, no. 6: 90. https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi5060090

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop