Next Article in Journal
Spatiotemporal Characteristics and Risk Factors of the COVID-19 Pandemic in New York State: Implication of Future Policies
Previous Article in Journal
Investigating Eco-Environmental Vulnerability for China–Pakistan Economic Corridor Key Sector Punjab Using Multi-Sources Geo-Information
Previous Article in Special Issue
Spatio-Temporal Machine Learning Analysis of Social Media Data and Refugee Movement Statistics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Assessing Place Type Similarities Based on Functional Signatures Extracted from Social Media Data

1
Department of Geography, University of Georgia, Athens, GA 30602, USA
2
Institute for Health Metrics and Evaluation, University of Washington, Seattle, WA 98195, USA
*
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2021, 10(9), 626; https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi10090626
Submission received: 10 July 2021 / Revised: 30 August 2021 / Accepted: 13 September 2021 / Published: 17 September 2021
(This article belongs to the Special Issue Applications and Implications in Geosocial Media Monitoring)

Abstract

:
Place types are often used to query places or retrieve data in gazetteers. Existing gazetteers do not use the same place type classification schemes, and the various typing schemes can cause difficulties in data alignment and matching. Different place types may share some level of similarities. However, previous studies have paid little attention to the place type similarities. This study proposes an analytical approach to measuring similarities between place types in multiple typing schemes based on functional signatures extracted from web-harvested place descriptions. In this study, a functional signature consists of three component signature factors: place affordance, events, and key-descriptors. The proposed approach has been tested in a case study using Twitter data. The case study finds high similarity scores between some pairs of types and summarizes the situations when high similarities could occur. The research makes two innovative contributions: First, it proposes a new analytical approach to measuring place type similarities. Second, it demonstrates the potential and benefits of using location-based social media data to better understand places.

1. Introduction

In gazetteers, place type is a crucial element used to search for places (e.g., schools in Clarke County, GA). The place type scheme used in a gazetteer refers to the classification system of places. However, different schemes exist across gazetteers, which causes challenges in aligning gazetteers at both the structural and semantic levels. Place classifications are used to describe places in different contexts such as nature or urban spaces (e.g., mountains, streams, and population-centric places such as cities). Place classifications are notably different between studies of nature and urban environments. However, the definitions of place categories are often ambiguous within urban environments, because uses vary for places according to the needs of people. Therefore, this study proposes an approach to measuring the similarity between place classifications based on functional signatures extracted from web-harvested place descriptions. In this study, functional signatures are related to people’s activities in terms of three factors: place affordances, events, and key-descriptors. The proposed approach was tested through a case study by matching place types used by local gazetteers using Twitter data.
Place categories in gazetteers are typically not standardized. They are inherently disarranged and continually evolving according to user needs [1,2]. The Geographic Names Information System (GNIS) and GeoNames are excellent examples of how one typing scheme system is adopted from another. Both the GNIS and GeoNames adopted the place classifications developed by the United States National Geospatial-Intelligence Agency (NGA), which contains a variety of place types for place names across the world. However, GNIS was developed only for US domestic place names, resulting in fewer place types at a simple level than the NGA. GeoNames includes place names from various countries using a variety of data sources. Accordingly, GeoNames adopted a basic place classification model from the NGA and introduced some additional classifications.
Another example of a place type scheme includes a feature type thesaurus (FTT), which was developed for the Alexandria Digital Library (ADL) Gazetteer [3]. The FTT contains place types at multiple levels of a hierarchy, which is markedly different from the previous examples in terms of data and structure. Non-traditional data sources such as Google Places provide place categories that focus more on urbanized areas such as supermarkets, restaurants, car dealerships, and real estate agencies. This is because the purpose of Google Places is to provide users with information about local businesses, and it uses collaborative knowledge gathered from users about places. Considering the different purposes of data sources, it might be difficult to establish a single scheme of place categories for all gazetteers. However, the similarity between place types in the same or different schemes can be measured to enrich current gazetteers by comparing thematic attributes extracted from web-harvested place descriptions.
The matching between place types should be built on thematic similarity, considering its original perspective. However, not every concept used in each system is clearly comparable among different gazetteers based on uniform categories. For example, point of interest (POI), a popular place type used in gazetteers, is ambiguous in terms of its definition and functionality. Any place entity can be classified as both a POI and another place type. Likewise, places can be referred to by multiple place types. For example, the Georgia Museum of Art is categorized as “Building” in the GNIS but grouped under “Museum” in Google Places. These examples illustrate that each place category does not reflect all the properties of a place. More importantly, some place types share similar properties.
In general, place types could relate to each other in various ways based on perspective. For example, some place types are related to each other based on their spatial relationship, which is rarely described in existing gazetteers (e.g., spatially, a university contains several buildings). In addition, taxonomies with an “is–a” relation are widely used in different typing schemes. Thus, if a place is categorized at a lower level, that place is also considered an instance of the upper level of the classifications. For example, “Building” is a parent concept of “Courthouse” in the FTT; thus, place entities classified as Courthouse are also classified as Building in the gazetteer system. On the other hand, Building and Courthouse are defined at the same level of the typing scheme used by GeoNames, which means they have separate data sets for each place type. Consequently, different schemes of taxonomies used for place classification may cause less consistent search results between gazetteers. The approach suggested in this article can be used to assess similarities between place types regardless of the hierarchies used by different data sources, such as traditional and non-traditional place databases.
This research contributes to the discourse on digital gazetteers and geographic information retrieval (GIR) by comparing place types using thematic similarities. Descriptions of places online were used to extract people’s activities, which can be specified by three factors to describe each place type: (1) place affordances, (2) events, and (3) key-descriptors. This approach includes natural language processing (NLP) [4], latent Dirichlet allocation (LDA) topic modeling [5], and cosine similarity measurement. The three factors were then combined to determine the overall similarity scores for the comparison between place types.
This study utilized Twitter data from a pilot study area, namely Athens, Georgia. However, the proposed approach can be applied to any user-generated content, such as blogs and articles that describe places. The findings with high similarity scores can be summarized into three cases:
  • Place types representing population-centric places such as administrative regions and classified as Populated Place;
  • Place types describing the relation of spatial containment such as POI and Populated Place;
  • Place types used for similar activities such as weekend activities and relaxing such as Park and Stream.
Using only the overall similarity score is insufficient to align different place classifications because of the biased data source (e.g., user groups and topic frequency). However, the observations from the case study show the possible relations between place classifications by using people’s knowledge of places that can be added to the existing gazetteer alignment strategy.
The remainder of this article is organized as follows: Section 2 provides an overview of the related works. Section 3 introduces the proposed functional signature conceptual framework and the computational flowchart to assess place type similarities, followed by a case study using the proposed approach in Section 4. The article concludes with a summary and a discussion of future research directions in Section 5.

2. Related Work

Digital gazetteers are dictionaries of place names that are widely used in GIR [6]. In gazetteers, places are identified based on several attributes, such as given names, classifications, and spatial footprints [7]. Among place references used in gazetteers, a list of place classifications such as parks, streams, neighborhoods, and libraries are often labeled by creators. Such place classifications could be relatively subjective in describing places, incorporating human cognition and language [8]. Moreover, these schemes are constantly evolving based on needs [1,2].
For example, illustrating a hierarchical scheme of place classifications, Hill [3] proposed an FTT developed in the ADL project. Subsequently, some studies have applied the FTT as a place type classification scheme for developing their place type categories. To improve gazetteer interoperability and reasoning capability based on feature typing, Janowicz and Keßler [2] proposed a feature-type ontology by utilizing the FTT scheme. The authors demonstrated that a feature-type ontology supports extended query functionality, which addresses the relationships between feature types.
There are other classification systems with specific objectives such as the North American Industry Classification System (NAICS). The NAICS was developed to identify business establishments by federal agencies with the aims of archiving and analyzing data. Thus, it could be essential to analyze specific types of places in urban areas [9]. In addition, place typing schemes used in location-based social media applications, such as Foursquare and Yelp, are also widely used in POIs studies [10,11,12]. However, there is insufficient information about the functional relatedness of those different place types used in the same or across different schemes.
Martins, Manguinhas, and Borbinha [13] proposed a geo-temporal gazetteer web service established in the DIGMAP project for integrating data from multiple sources, such as gazetteers of official toponymic authorities or general online sources with gazetteer data. In terms of the scheme of place classifications, the authors utilized the FTT with the classification schemes from the ECAI Time Period Directory and GeoNames to facilitate data integration from external sources.
Using the instance matching approach, Brauner, Casanova, and Milidiú [14] proposed an instance-based mapping rate between distinct feature type thesauri by pre-processing common instances from two gazetteers, namely the GEOnet Names Server and the ADL Gazetteer. Several examples show that mapping different typing schemes are essential to enhancing the capability of answering queries about place information using multiple gazetteers.
Place is an abstract concept and is difficult to describe or define comprehensively and objectively [15,16,17]. There are many attempts to understand places by using place names and semantics, which are limited to an unsophisticated view of place. Papadakis et al. [18] proposed a composite approach of formalizing places to facilitate a function-based query. Their approach enables a better representation of the context that people assign to a place based on the place functionality. Our approach examines human interactions associated with different place types by focusing on people’s activities. Therefore, we reviewed a few studies focused on the thematic perspective of places, which were used to define relationships between places based on shared common themes [19]. For example, Adams [20] proposed an approach to search places based on extracted thematic topics (e.g., civil war) from web documents using LDA-based approaches.
Other research focused on historical records of places. Exploring narrative documents to detect historical events and places (e.g., 7 January 1859, in Wakulla County, Florida, as the day when the offices of Tax Assessor and Collector and Sheriff were combined) was studied as part of the Perseus Digital Library Project [21]. In addition, Mostern and Johnson [22] proposed an approach to construct a historical event gazetteer using named places and historical events and visualized the links between such events and spatial changes. Therefore, a thematic perspective of places is often used to explore similar places either alone or in combination with spatial and temporal perspectives.
While traditional gazetteers created by authorities typically describe places using a formalized set of semantics, place descriptions with regard to named places made by the general public reflect more varied aspects. A few examples of approaches that utilize place descriptions from web resources for exploring place semantics are discussed next. Purves, Edwardes, and Wood [23] proposed a framework for gathering large collections of place descriptions from two different online communities named Geograph and Flickr. The authors addressed the fact that descriptions of places collected from different environments may vary in terms of sharing spatially referenced photographs with the associated content.
Kim, Vasardani, and Winter [24] addressed the ways in which place descriptions convey human spatial knowledge beyond geographical information systems. Therefore, the same place could be described in multiple ways based on multiple place perspectives. The authors proposed a graph-based matching method for integrating spatial information extracted from various descriptions by using string, linguistic, and spatial similarities. The approaches relied on spatial semantics matching with other types of similarity to find corresponding places. Our approach for matching place categories combines place affordances and event and key-descriptor similarities. Because the proposed method focuses on people’s descriptions of their experiences in different places, the similarity of spatial relations was not used in our matching process but was described in the results of applying the approach.

3. Methods

3.1. The Conceptual Framework

The research proposes to assess the similarity between two place types based on the designed functional signature. The concept of signature has been widely used in the literature. For research on places, the concepts of spatial, temporal, and thematic signatures have been developed and applied to capture spatial, temporal, and non-spatiotemporal characteristics of places and often used to classify places into place types [20,25,26]. In this research, the concept of functional signature is proposed to capture unique functional characteristics of a place type. As shown in Figure 1, the functional signature consists of three component signature factors, including the Place Affordance, Events, and Key-descriptors. Following the original definition of affordance, which is about opportunities that an environment has to offer [27], the concept in platial research is also approached from the individual perspective, and it is associated with the relationship between places and the action capabilities of individuals [27,28]. Subscribing to this tradition, the place affordance of a place type in this study is represented by human activities at places of the type. The second component factor, Events, refers to the set of events that have taken place at places of the respective place type. In comparison, Place Affordance captures the functional characteristic of a place type from the perspective of individuals who participate activities at respective places, while Events captures the functional characteristics from the perspectives of planners and practitioners who organize activities at respective places. They are not mutually exclusive but extract characteristics from different perspectives. Some factors may contribute to both Place affordance and Events. For example, in the case of “Party”, it is part of Events characteristic when it comes to Party as a specific event being held at a particular place and time. It would be a Place affordance factor while it refers to an activity such as going out to a gathering, drinking, talking, etc. The third component factor, Key-descriptors of a place type, refers to the major terms that are frequently mentioned by people to describe the type of places. Examples are School, Food, Downtown, Park, Garden, etc. Each of the three signature characteristics is presented as a vector. Quantitative definitions of the signature characteristics and examples are presented below:
  • Place affordance—Major types of human activities at places of the respective place type. It can be expressed by a list of activities {A1, … Ai, …, An} where each Ai a type of activity. An example of Place Affordance for the place type Restaurant is {Eating, Drinking, Party, Entertainment, Music activity}.
  • Events—Major types of events organized at places of the respective place type. It can be expressed by a list of activities {E1, … Ei, …, En} where each Ei a type of event. An example of Events for the place type Restaurant is {Anniversary, Party, …}.
  • Key-descriptors—Major descriptive terms that characterize the nature of places of the respective place type. Key-descriptors extracted from place descriptions may include more detailed type descriptions (e.g., burger places) than the categories used in gazetteers, and it may also include descriptions other than categories (e.g., downtown). The key-descriptors of a place type can be expressed with a list of key terms {K1, … Ki, …, Kn}. An example might be {City, Downtown, Building, Garden, Landmark}.
Figure 1 illustrates the framework of the research design. For any two place types a and b, the three component signatures for each type are extracted from real-world data, for instance, the social media data. Then signatures of the pair of place types are compared to estimate the similarity between them from the perspective of each component. Finally, the three similarities from the component factors are summarized to give the final similarity measure between the two place types. Technical details of the workflow to achieve these objectives are explained in Section 3.2.
Apparently, the signature for a place type might vary by study areas and by data sources. It is also sensitive to the sample places chosen to represent the place type. The situation is similar to what happens when studying relationships with a statistical method. A study can choose to construct local models or a global model, and the resulting model(s) can be sensitive to the sampling strategies for data collection.

3.2. The Computational Workflow

Figure 2 shows the data and processing workflow to assess the similarity between any two place types. The assumption is that place descriptions from social media data or other web documents can reflect the actual uses of those places. Twitter data are used in the pilot study. A keyword-based extraction process is needed for indexing and classifying relevant words in the texts. The keywords are a set of place names for each place type. Then a natural language processing (NLP) technique is used to pre-process the harvested place descriptions and generate a document for each place type. Technical details of this process have been introduced in previous studies [4]. A collection of place descriptions for each place type is seen as a document, d i = { t 1 i ,   t 2 i ,   ,   t N i } , where the document d i consists of place descriptions containing the contents of place names for the i t h place type.
In the second step, the LDA topic modeling is applied to identify the three component factors of the functional signature. LDA is a generative probabilistic model to automatically derive sets of words from a document to form latent topics. Please refer to [5] for details of the model. The input of the topic modeling model is the document for a specific place type. The document has a collection of descriptions for all places in the place type. It is represented as a mixture of latent topics in a topic model, where each topic is characterized by a set of words. LDA is conditioned on three parameters: document–topic distribution α , topic–word distribution β , and topic number K . For a data set of tweets, the parameters are typically set as α = 0.1 , β = 0.05 , and K = 30 , and multiple parameters were tested to see if the results showed any difference. The topic coherence score can be used to determine the quality of the learned topics that are automatically generated by the model. For each document d i , topics were labeled for the three factors: place affordances, events, and key-descriptors by the authors based on the model results (i.e., the estimated word distribution).
In the last step, a similarity score is calculated for each of the three component factors using the cosine similarity expressed in Equation (1). The cosine similarity is a vector comparison measure widely used in information retrieval, NLP, and text mining [29,30].
C o s i n e   s i m i l a r i t y   ( a ,   b ) = i = 1 n a i × b i i = 1 n ( a i ) 2 × i = 1 n ( b i ) 2
In Equation (1), a and b are two n-dimensional vectors to be compared; a i and b i are components of vectors a   and   b , respectively. The output similarity measure is a value between 0 and 1. A higher value means higher similarity between a and b . The similarity measure is calculated for place affordances, events, and key-descriptors independently to determine the degree of similarity for each functional signature component. The final similarity is obtained as a weighted sum of the three component similarity measures. This is expressed in Equation (2), where S O is the final similarity score; S A , S E , and S K are three component similarity scores; and w A , w E , and w K are three respective weights that are customizable based on context-contingent considerations. In the case study, for example, equal weights are taken, which means the three weights are one-third each.
S O = w A × S A + w E × S E + w K × S K

4. Case Study

Study Area and Data

A pilot study using the proposed approach was conducted for Clarke County in Athens, Georgia, which is a university town to the east of Atlanta. Figure 3 shows the location of this county in the state of Georgia, USA.
The place types are drawn from typing classifications from three gazetteers, GNIS, GeoNames, and Google Places. The GNIS, developed by the USGS, is the official repository of US domestic geographical name data. The GNIS applies the second level of classifications created by the NGA. GeoNames is a global geographic database that includes over 11 million place names worldwide. Google Places’ service provides a place-searching capability with a list of place categories and detailed information about a specific location. Place types used in Google Places include various types of business, such as clothing stores and home goods stores, which are not covered by the other two gazetteers. For the current study, these store data sets were combined under “Business” to collect sufficient relevant Twitter data for the analysis. In addition, the locality and postal code of the area were combined under “Region” for the same reason. As noted, specific types combined under Business and Region had a finer level of categories than others.
Many place types have shared meanings in different sources. However, a place could be referenced with different categories by various creators. For example, the Athens Regional Library is categorized as Building in the GNIS but as Library in GeoNames. Thus, a place may appear in multiple place types that are similar to some degree. Table 1 summarizes the statistical characteristics of the place categories in the study area using the three gazetteers.
The total number of places per place type may vary, partly due to actual differences and partly because places were classified differently in various gazetteers. Spatial patterns were identified based on the minimum and maximum distances between the places for each place type, along with its standard deviation. The entropy was calculated to measure the spatial relations between place categories. The entropy of the i t h place type is defined in Equation (3):
E i = i = 1 n n i N log ( n i N )
where N is the total number of place types in the area and n i is the number of nearest instances from the i t h place type [25]. A larger entropy value indicates a greater variety of places around the target place type.
Twitter data were used as the source of place descriptions for this case study. The data set was collected using the bounding box of the target study area between December 2017 and January 2019. The Twitter data collected were then filtered to identify texts that described places only. Place names were collected from the three gazetteers (i.e., GNIS, GeoNames, and Google Places) by place type and used as a set of keywords to extract tweets. This dissertation assumed that the combined texts categorized by place type contained several topics that could be used as place type references. Therefore, the data sets were processed using a topic-modeling method to identify three factors (i.e., place affordances, events, and key-descriptors) based on the calculated similarity scores. Figure 3 shows the numbers of texts collected in the study area for a sample of place types. Athens is a college town in the USA, so many tweets describe topics related to the place type “School”.

5. Results

It was assumed that some properties were shared between place types based on people’s activities that can be used to measure similarities between place types. Place types identified by more than 50 relevant tweets in the study area were included for the analysis. In total, 91 place types were detected in more than one tweet. Among them, 14 place types had enough tweets and were selected for the similarity measurement.
The parameters used in the LDA model were determined through theoretical fitting. The final sets can generally describe actual states of the dataset. For example, the number of key terms and topics were tested with a larger number until the model generates unsuitable word sets using the collected data. Using LDA, the top 20 key terms for 50 topics were calculated for 14 place categories. LDA was performed for 2000 iterations using the Gibbs sampling algorithm for each data set. Then latent topics were manually labeled for the three factors: place affordances, events, and key-descriptors. Table 2 shows the similar textual patterns in topics identified for two different place types: ADM2 (second-order administrative division) and Stadium. Some place categories were described by specific features. For example, schools were frequently mentioned with building names, school activities, and sports events. Streams were often mentioned with other place types, such as rivers, creeks, dams, and parks.
Similarity measurement between place types necessitated a comprehensive understanding of people’s activities, considering the three factors simultaneously. For instance, topics related to football games frequently appeared and formed a large proportion of the Twitter data in the study area. This resulted in an increase in associated activities and key-descriptors used for football games for a specific time period. A full list of three factors extracted from Twitter data in the study area is described in Table 3.
Similarity scores between place types were measured using cosine similarity. First, cosine similarity was calculated for each of the three factors: place affordances, events, and key-descriptors. Figure 4 shows the low-to-high similarity scores among the place types for each factor. The similarity scores for the events factor tend to have higher values than the other two factors, because the total number of events identified from the Twitter data was small. Subsequently, the overall similarity scores were calculated using Equation (2). Table 4 shows the overall similarity scores between place types with individual scores based on the three factors.

Interpretation and Discussions

The results for the three factors (i.e., place affordances, events, and key-descriptors) reveal the characteristics of the study area, and these can be expanded for other case study areas in future studies. Place affordances included different kinds of activity, such as economic, social, fun, and service activities. Among 37 place affordances for 14 place types, “hiring”, “weather reporting”, and “game activity” frequently appeared for different types of place. Figure 5 shows the number of place types for which a specific kind of activity appeared. For example, hiring might not be a common place reference in existing place-name databases. However, place types for hiring activities revealed the places where hiring activities took place. Therefore, places were categorized for hiring and linked to associated information.
The kinds of events extracted from Twitter data in the study include sport, social, and political events. Figure 6 displays the number of place types where each kind of events took place. Because football games are of the most significant interest to students and locals, the SEC Championship and Rose Bowl frequently appeared for different place types. However, these topics are not only related to places where football players play games, such as a stadium, but also related to places where people watch games, experience tailgating, or associate them with the name of an administrative region (e.g., a football game in Athens). Therefore, the event concept could be used to find place names and associated information either independently or together with other place-related concepts (e.g., to find place names using place type and event instances).
Key-descriptors extracted for different place types included specific types of business that might be grouped into other place types in existing place-name databases (e.g., waffle place into restaurants). Key-descriptors were discovered mainly from places that have relatively large areas. First, places that have several buildings or facilities are good examples (e.g., schools). Secondly, administrative regions are usually referenced for any places in that area. Finally, when man-made features exist near natural environments, they are frequently mentioned together (e.g., dam and river). Therefore, place types that co-appeared more frequently together were more related to each other.
It is evident that the highest similarity scores were found between population-centric place classifications, such as ADM2, Political, Region, and Populated Place. In addition, based on people’s activities extracted from the Twitter data, patterns were discovered between two place categories based on the spatial containment relationship, such as Church and Populated Place, and two distinct place categories, such as Business and School.
(1)
High similarity between place groups of population-centric places
The highest similarity scores were observed between pairs of population-centric place types, such as ADM2, Political, Region, Populated Place, and Civil. Although the concepts and definitions are all different, they provide similar functionality with slightly different ranges of application. For example, Clarke County is an instance of AMD2 and Civil in GeoNames and the GNIS gazetteer, respectively. In these two gazetteers, Civil includes a broad range of administrative divisions such as a borough, county, incorporated place, and township. AMD2 covers the second level of administrative division, which is a county-level unit in the US. The definition of Political used in Google Places is not provided, but it also covers county, city, and so forth. The number of instances of these five different concepts included in the three gazetteers (i.e., GNIS, GeoNames, and Google Places) are listed in Table 5.
These similarity patterns can be used in addition to existing alignment methods for place classification based on the highest similarity scores among population-centric place types calculated using people’s activities extracted from Twitter data. The proposed approach for matching different place categories alone may result in lower discriminatory power. However, our approach was not influenced by other alignment techniques for place types that use spatial signatures [25] or an instance-based matching approach [14]. Instead, it reveals thematic patterns and people’s daily use of places.
(2)
High similarity between place types with spatial containment, is–a, and part–of relationships
In the place descriptions that people create, they tend to refer to the place within the larger area unit (e.g., the University of Georgia in Athens). This pattern is well described in the similarity scores among some place types and population-centric place types. The typing schemes used in the three gazetteers (i.e., GNIS, GeoNames, and Google Places) do not describe the relation of spatial containment between the types. The limited information on semantics for place classification results in significant challenges for developing ontologies. High similarity scores among place types that show spatial containment relations could be helpful in constructing place type ontologies. Examples include high similarity scores between Populated Place and Church.
Other examples include place types, such as POI or Building, that have some levels of ambiguity in the definitions. These place categories possibly share some instances with other place types within the same or across gazetteers. For example, POI may include any place that people may consider interesting. It could be residential areas, schools, parks, restaurants, and churches. Thus, high similarity scores found between POI and other place types such as Populated Place and School is expected. Figure 7 illustrates the distribution of places of three place types, POI, Populated Place, and School in the study area. The individual places of these three place types are relatively close in terms of distances and have significant overlapping positions.
Building is another example that shares common properties with other place types. In the GNIS and GeoNames, Building describes a distinct place type, not including other types such as Church, Hospital, or School. However, buildings are individual structures that can be a part of other place types. For instance, Georgia Center, which was built as a multi-purpose construction serving as a hotel and a conference center, is classified under Building in the gazetteers but is, in fact, also part of a university (place type School). Therefore, individual buildings are often described with other place types (key-descriptors) in people’s descriptions of these places on social media.
(3)
High similarity between two distinct place type groups
Although some place types are not strongly related spatially, high similarity scores were identified between those place types according to the three factors based on people’s activities. For example, churches and neighborhoods in the study area are not close neighbors (Figure 8). However, Church and Neighborhood are frequently referred to in similar types of activities such as sport, tours, and weekend activities in the local area. Another example is the high similarity score identified between Business and School. These two types share many everyday activities such as entertainment, sport, political activities, and service activities. Key-descriptors were also observed for place types, such as gardens and schools. High similarity scores between two distinct place type groups, which are spatially separated from each other, show that they are meaningful in human behaviors and place uses. However, this case is not sufficient for direct use in gazetteer alignment processing because of the lack of spatial semantics. However, it can be used to search place names based on similar activities (functional signature) in local gazetteers. This study highlights that there are pairs of place types that share common activities.
Finally, the results allow us to compare different place types across gazetteers or search local gazetteers based on people’s activities. A preliminary mapping based on the case study is illustrated in Figure 9.

6. Conclusions

Our approach proposes a similarity measurement among place types using place descriptions created by people. The similarity scores can be calculated for three factors: place affordances, events, and key-descriptors. The three respective values are combined to obtain the overall similarity score among place types. Such a bottom–up approach using descriptions from social media data and other user-generated content is advantageous as it helps to capture the dynamic situations of how people utilize places. People’s experiences in places are not limited to the planned functions of places. People’s knowledge about places is shared through various channels, and it influences others in the way they interact with places. To demonstrate the proposed approach in a case study, we used Twitter data as a source of place descriptions because of its data availability and because Twitter has recently become the most popular social media platform for people to share their thoughts and activities. However, the proposed approach is generally applicable to other types of online sources.
The case study finds that similarity scores between place types do not show significant differences among the three factors. In other words, place types with a high similarity score based on place affordances usually also showed a high similarity score for events. However, place affordances were more related to daily activities, while events could be seasonal. We think that place affordances and key-descriptors observed for some place types could be affected by a special event in a specific timeframe (e.g., tailgating at a football game at the school). Thus, the overall similarity score is more powerful for comparing different place types than the three individual similarity measures. Overall, high similarity scores were identified among place types with overlapping functionalities, such as population-centric places, those representing spatial containment relations, or two distinct place type groups.
Compared to the existing methods for aligning place categories, our approach has a strong ability to explain the relations between place types based on people’s daily activities. The discovered functional factors and similarity scores help people to better understand part of the complex functionality of place types. Therefore, this study shows its great potential for the infusion of essential variables of place formalization for various fields such as urban planning, urban management, and location-based applications.
A major limitation of this study was the nature of the data source. Social media data are usually biased toward younger age groups and specific topics. Only place types with a sufficient amount of data could be analyzed to measure similarities. Thus, some place types may have been omitted due to sampling bias. Another existing limitation is that place types cannot be entirely quantified through the three factors, place affordance, events, and key-descriptors. There are other potential factors that can be taken into consideration such as human mobility [31,32], human dynamics, and human perceptions [33]. Future studies will focus on different typing scheme alignments using our results along with the existing matching strategies. Therefore, we aim to extend typing scheme ontologies to enrich gazetteers.

Author Contributions

Conceptualization, Doori Oh, Xiaobai A. Yao; Data curation, Doori Oh; Formal analysis, Doori Oh; Investigation, Doori Oh; Methodology, Doori Oh; Resources, Doori Oh; Supervision, Xiaobai A. Yao; Visualization, Doori Oh, Xiaobai A. Yao; Writing—original draft, Doori Oh, Xiaobai A. Yao; Writing—review & editing, Xiaobai A. Yao, Doori Oh All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Hastings, J.T. Automated conflation of digital gazetteer data. Int. J. Geogr. Inf. Sci. 2008, 22, 1109–1127. [Google Scholar] [CrossRef]
  2. Janowicz, K.; Keßler, C. The role of ontology in improving gazetteer interaction. Int. J. Geogr. Inf. Sci. 2008, 22, 1129–1157. [Google Scholar] [CrossRef]
  3. Hill, L.L. Feature Type Thesaurus. Alexandria Digital Library Project. 2002. Available online: http://legacy.alexandria.ucsb.edu/gazetteer/FeatureTypes/FTT_metadata.htm (accessed on 10 June 2021).
  4. Abilhoa, W.D.; De Castro, L.N. A keyword extraction method from twitter messages represented as graphs. Appl. Math. Comput. 2014, 240, 308–325. [Google Scholar]
  5. Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
  6. Goodchild, M.F.; Hill, L.L. Introduction to digital gazetteer research. Int. J. Geogr. Inf. Sci. 2008, 22, 1039–1044. [Google Scholar] [CrossRef] [Green Version]
  7. Hill, L.L. Georeferencing: The Geographic Associations of Information; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
  8. Mark, D.M.; Turk, A.G. Landscape categories in Yindjibarndi: Ontology, environment, and language. In International Conference on Spatial Information Theory; Springer: Berlin/Heidelberg, Germany, 2003; pp. 28–45. [Google Scholar]
  9. Jiang, S.; Alves, A.; Rodrigues, F.; Ferreira, J., Jr.; Pereira, F.C. Mining point-of-interest data from social networks for urban land use classification and disaggregation. Comput. Environ. Urban Syst. 2015, 53, 36–46. [Google Scholar] [CrossRef] [Green Version]
  10. Gao, S.; Janowicz, K.; Couclelis, H. Extracting urban functional regions from points of interest and human activities on location-based social networks. Trans. GIS 2017, 21, 446–467. [Google Scholar] [CrossRef]
  11. Liu, X.; Andris, C.; Rahimi, S. Place niche and its regional variability: Measuring spatial context patterns for points of interest with representation learning. Comput. Environ. Urban Syst. 2019, 75, 146–160. [Google Scholar] [CrossRef]
  12. Hao, P.-Y.; Cheang, W.-H.; Chiang, J.-H. Real-Time event embedding for POI recommendation. Neurocomputing 2019, 349, 1–11. [Google Scholar] [CrossRef]
  13. Martins, B.; Manguinhas, H.; Borbinha, J. Extracting and exploring the geo-temporal semantics of textual resources. In Proceedings of the 2008 IEEE International Conference on Semantic Computing, Santa Monica, CA, USA, 4–7 August 2008; pp. 1–9. [Google Scholar] [CrossRef]
  14. Brauner, D.F.; Casanova, M.A.; Milidiú, R.L. Towards Gazetteer Integration through an Instance-based Thesauri Mapping Approach. In Advances in Geoinformatics; Springer: Berlin/Heidelberg, Germany, 2007; pp. 235–245. [Google Scholar] [CrossRef] [Green Version]
  15. Goodchild, M.F. Formalizing Place in Geographic Information Systems. In Communities, Neighborhoods, and Health; Springer: New York, NY, USA, 2010; pp. 21–33. [Google Scholar] [CrossRef] [Green Version]
  16. Tuan, Y.F. Space and Place: The Perspective of Experience; University of Minnesota Press: Minneapolis, MN, USA, 1977. [Google Scholar]
  17. Winter, S.; Kuhn, W.; Krüger, A. Guest editorial: Does place have a place in geographic information science? Spat. Cogn. Comput. 2009, 9, 171–173. [Google Scholar] [CrossRef]
  18. Papadakis, E.; Resch, B.; Blaschke, T. Composition of place: Towards a compositional view of functional space. Cartogr. Geogr. Inf. Sci. 2019, 47, 28–45. [Google Scholar] [CrossRef] [Green Version]
  19. Adams, B.; Janowicz, K. Thematic signatures for cleansing and enriching place-related linked data. Int. J. Geogr. Inf. Sci. 2015, 29, 556–579. [Google Scholar] [CrossRef]
  20. Adams, B. Finding similar places using the observation-to-generalization place model. J. Geogr. Syst. 2015, 17, 137–156. [Google Scholar] [CrossRef]
  21. Smith, D.A. Detecting and browsing events in unstructured text. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, 11–15 August 2002; pp. 73–80. [Google Scholar]
  22. Mostern, R.; Johnson, I. From named place to naming event: Creating gazetteers for history. Int. J. Geogr. Inf. Sci. 2008, 22, 1091–1108. [Google Scholar] [CrossRef]
  23. Purves, R.; Edwardes, A.; Wood, J. Describing place through user generated content. First Monday 2011, 16. [Google Scholar] [CrossRef] [Green Version]
  24. Kim, J.; Vasardani, M.; Winter, S. Similarity matching for integrating spatial information extracted from place descriptions. Int. J. Geogr. Inf. Sci. 2016, 31, 56–80. [Google Scholar] [CrossRef]
  25. Zhu, R.; Hu, Y.; Janowicz, K.; McKenzie, G. Spatial signatures for geographic feature types: Examining gazetteer ontologies using spatial statistics. Trans. GIS 2016, 20, 333–355. [Google Scholar] [CrossRef]
  26. McKenzie, G.; Janowicz, K.; Gao, S.; Gong, L. How where is when? On the regional variability and resolution of geosocial temporal signatures for points of interest. Comput. Environ. Urban. Syst. 2015, 54, 336–346. [Google Scholar] [CrossRef]
  27. Bornstein, M.H.; Gibson, J.J. The ecological approach to visual perception. J. Aesthet. Art Crit. 1980, 39, 203. [Google Scholar] [CrossRef]
  28. Raymond, C.M.; Kyttä, M.; Stedman, R. Sense of place, fast and slow: The potential contributions of affordance theory to sense of place. Front. Psychol. 2017, 8, 1674. [Google Scholar] [CrossRef] [Green Version]
  29. Li, B.; Han, L. Distance weighted cosine similarity measure for text classification. In International Conference on Intelligent Data Engineering and Automated Learning; Springer: Berlin/Heidelberg, Germany, 2013; pp. 611–618. [Google Scholar]
  30. Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
  31. McKenzie, G.; Romm, D. Measuring urban regional similarity through mobility signatures. Comput. Environ. Urban. Syst. 2021, 89, 101684. [Google Scholar] [CrossRef]
  32. Yuan, J.; Zheng, Y.; Xie, X. Discovering regions of different functions in a city using human mobility and POIs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 186–194. [Google Scholar]
  33. Kang, Y.; Zhang, F.; Gao, S.; Peng, W.; Ratti, C. Human settlement value assessment from a place perspective: Considering human dynamics and perceptions in house price modeling. Cities 2021, 118, 103333. [Google Scholar] [CrossRef]
Figure 1. The conceptual framework for assessing similarity between two place types.
Figure 1. The conceptual framework for assessing similarity between two place types.
Ijgi 10 00626 g001
Figure 2. The overall research design for assessing place type similarities.
Figure 2. The overall research design for assessing place type similarities.
Ijgi 10 00626 g002
Figure 3. The number of tweets collected for each place type in the target study area (Athens, GA, USA, on the left-hand side).
Figure 3. The number of tweets collected for each place type in the target study area (Athens, GA, USA, on the left-hand side).
Ijgi 10 00626 g003
Figure 4. Cosine similarities between place categories by individual factors: place affordances (top left), key-descriptors (top right), and events (bottom).
Figure 4. Cosine similarities between place categories by individual factors: place affordances (top left), key-descriptors (top right), and events (bottom).
Ijgi 10 00626 g004
Figure 5. Multiplicity of place types vs. activity types.
Figure 5. Multiplicity of place types vs. activity types.
Ijgi 10 00626 g005
Figure 6. Multiplicity of place types vs. event types.
Figure 6. Multiplicity of place types vs. event types.
Ijgi 10 00626 g006
Figure 7. Distribution of POI, Populated Place, and School locations in Athens, GA.
Figure 7. Distribution of POI, Populated Place, and School locations in Athens, GA.
Ijgi 10 00626 g007
Figure 8. Distribution of Church and Neighborhood locations in Athens, GA.
Figure 8. Distribution of Church and Neighborhood locations in Athens, GA.
Ijgi 10 00626 g008
Figure 9. Place types matching between GNIS and GeoNames (left) and GeoNames and Google Places (right) based on the thematic characteristics.
Figure 9. Place types matching between GNIS and GeoNames (left) and GeoNames and Google Places (right) based on the thematic characteristics.
Ijgi 10 00626 g009
Table 1. Summary of statistical features of place categories in Athens, GA, USA.
Table 1. Summary of statistical features of place categories in Athens, GA, USA.
Place TypeNo. of PlacesMean Dist.Max Dist.Std DevEntropy
ADM21NANANA5.56
Building802991.2217,864.032934.492150.96
Business13726578.9926,495.573855.14931,056.33
Church1856678.8323,207.273984.492722.72
Civil66546.2218,358.745192.3425.06
Hospital332985.2513,693.412736.97535.9
Library235089.0417,296.753857.19434.92
Neighborhood187557.3916,422.304582.9467.49
Park874943.1020,750.673379.461479.89
POI5799391.7526,934.184941.425298.91
Political3810,083.0724,357.245366.87128.9
Populated Place1257963.2021,355.394341.64726.82
Region107539.4122,198.805231.8486.02
School1756313.8224,739.013891.632752.18
Stadium1NANANA11.87
Stream1610,326.3128,503.346786.1727.47
(Distance unit: meter)
Table 2. Examples of word sets forming topics for the place categories ADM2 and Stadium.
Table 2. Examples of word sets forming topics for the place categories ADM2 and Stadium.
Place TypeTopicA Collection of Words
ADM2Football gamego dawgs, dawgs on top, Georgia football, game Sanford stadium, …
Schooluniversity, Sanford stadium, college students, school Athens, …
Job marketingjob, G.A. job, job Athens, hiring, recommend job, apply, …
Weather reportingweather summary, forecasts UGA Sanford, sunrise, …
Weekend activityweekend, downtown weekend, getaway, camping, …
StadiumFootball gamego dawgs, Georgia football, game Stanford, AU vs. UGA, …
Schooluniversity, university Georgia, UGA, students, …
Weather reportingweather hours, raining UGA Sanford, day forecasts, …
Table 3. Candidate elements in the component functional signatures (identified from Twitter data).
Table 3. Candidate elements in the component functional signatures (identified from Twitter data).
Place AffordancesKey-DescriptorsEvents
AthleticsApartmentAnniversary
CampaignBarCollege football playoff
City planningBurger placeCrime
Day activityCoffee shopElection
DrinkingChurchFootball game
DrivingCityGun violence
EatingCity hallHomecoming
EntertainmentClothing storeRose bowl
Game activityCommunitySEC championship
HealthcareCreek
HiringDam
News reportingDepartment store
MarketingDowntown
Morning activityFurniture store
Music activityGarden
NightlifeGrocery store
PartyGym
Physical activityLandmark
Political activityLibrary
PostingMemorial hall
PurchasingMuseum
RecyclingPark
RelaxingPharmacy
Residential activityPool
RunningPractice field
SocializingResidential area
Sport activityRestaurant
StudyingRiver
Taking photosSchool
Running businessSheriff’s office
TrainingSports complex
TravelingStadium
Weather reportingStore
Weekday activityStream
Weekend activityTheater
Waffle place
Weather station
Table 4. Overall similarity scores for distinct place categories, with the individual similarity scores for three factors.
Table 4. Overall similarity scores for distinct place categories, with the individual similarity scores for three factors.
Type AType BAffordance SimilarityEvents SimilarityKey-Descriptors SimilarityOverall Similarity
ADM2Political0.70710710.7559290.821012
RegionPolitical0.35355310.7715170.708357
ADM2Region0.37510.6804140.685138
ADM2Populated Place0.6123720.577350.6666670.618796
RegionStadium0.53452210.2886750.607733
ChurchPopulated Place0.6666670.4082480.7071070.594007
ChurchPOI0.16666710.5345220.567063
PoliticalStadium0.37796410.2672610.548409
ADM2Stadium0.40089210.2357020.545531
RegionPopulated Place0.4082480.577350.5443310.509977
PoliticalPopulated Place0.2886750.577350.6299410.498655
Populated PlaceSchool0.3651480.7071070.4082480.493501
CivilRegion0.50.50.462910.487637
Populated PlaceStadium0.5455450.577350.2357020.452866
BusinessPopulated Place0.2800560.7071070.3636960.450286
POIPopulated Place0.1666670.4082480.7559290.443615
ChurchNeighborhood0.510310.81649700.442269
CivilStadium0.5345220.50.2672610.433928
CivilPolitical0.3535530.50.4285710.427375
ADM2Civil0.3750.50.3779640.417655
ADM2Church0.5103100.7071070.405806
POIPolitical0.5773500.5714290.382926
BusinessSchool0.3834820.6666670.0890870.379745
ChurchRegion0.40824800.7216880.376645
POISchool0.3651480.2886750.462910.372245
BusinessRegion0.3429970.4082480.3563480.369198
RegionSchool0.335410.4082480.3333330.358997
BusinessPolitical0.2425360.4082480.4123930.354392
ADM2School0.2236070.4082480.4082480.346701
ADM2POI0.40824800.6299410.346063
ChurchSchool0.4564350.2886750.2886750.344595
BusinessChurch0.2800560.2886750.462910.34388
NeighborhoodSchool0.5590170.47140500.343474
BusinessStadium0.4583490.4082480.1543030.3403
NeighborhoodPOI0.2041240.81649700.340207
BusinessCivil0.3429970.4082480.2474360.332894
ChurchPolitical0.14433800.8017840.315374
ADM2Business0.1714990.4082480.3636960.314481
SchoolStadium0.2390460.4082480.2886750.31199
CivilSchool0.335410.4082480.1543030.299321
PoliticalSchool0.1581140.4082480.3086070.291656
CivilPopulated Place0.3061860.2886750.2519760.282279
RegionPOI0.20412400.6172130.273779
ChurchCivil0.40824800.4008920.269713
BusinessNeighborhood0.1714990.4714050.1543030.265735
NeighborhoodPopulated Place0.4082480.33333300.247194
BusinessPOI0.1400280.2886750.2474360.22538
ChurchStadium0.32732700.250.192442
RegionPark0.35355300.2041240.185893
CivilNeighborhood0.250.28867500.179558
ADM2Park0.17677700.3333330.170037
ChurchPark0.14433800.3535530.165964
CivilPOI0.20412400.2857140.163279
POIStadium0.21821800.2672610.161826
ParkPopulated Place0.14433800.3333330.159224
ParkPolitical000.3779640.125988
BuildingCivil000.3418820.113961
BuildingChurch000.3198010.1066
BuildingBusiness000.2631810.087727
ParkStream0.25000.083333
RegionNeighborhood0.25000.083333
ADM2Neighborhood0.25000.083333
BuildingRegion000.2461830.082061
BuildingPolitical000.2279210.075974
BuildingStadium000.2132010.071067
BuildingNeighborhood000.2132010.071067
ParkSchool000.2041240.068041
BuildingPopulated Place000.2010080.067003
ADM2Building000.2010080.067003
StadiumStream0.188982000.062994
ParkStadium0.188982000.062994
ParkPOI000.1889820.062994
NeighborhoodPark0.176777000.058926
CivilPark0.176777000.058926
ChurchStream0.144338000.048113
NeighborhoodStadium0.133631000.044544
BuildingSchool000.1230910.04103
BusinessStream0.121268000.040423
BuildingPOI000.1139610.037987
BusinessPark000.1091090.03637
Table 5. The number of instances for selected place categories in Athens, GA, USA.
Table 5. The number of instances for selected place categories in Athens, GA, USA.
GNISGeoNamesGoogle Places
ADM2NA1NA
Civil6NANA
PoliticalNANA38
Populated Place125123NA
RegionNANA10
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Oh, D.; Yao, X.A. Assessing Place Type Similarities Based on Functional Signatures Extracted from Social Media Data. ISPRS Int. J. Geo-Inf. 2021, 10, 626. https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi10090626

AMA Style

Oh D, Yao XA. Assessing Place Type Similarities Based on Functional Signatures Extracted from Social Media Data. ISPRS International Journal of Geo-Information. 2021; 10(9):626. https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi10090626

Chicago/Turabian Style

Oh, Doori, and Xiaobai A. Yao. 2021. "Assessing Place Type Similarities Based on Functional Signatures Extracted from Social Media Data" ISPRS International Journal of Geo-Information 10, no. 9: 626. https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi10090626

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop