2.1. From the Actor-Network Theory (ANT) to the Concept of Rhizome
The
Actor-Network Theory (ANT), elaborated by Bruno Latour [
6,
7], can be considered the first approach adopted in a multidisciplinary way by sociology, geography and other disciplines, making networks the focus of attention for the study of spatialities. Geographers have adopted ANT because it goes much further than traditional absolute analysis of space; through ANT, geographers promote a relational approach, based on networks and relations among humans and non-humans. Bruno Latour claimed that this division needs to be overcome; the perspective is the world as it presents itself to us in the form of networks, relations and hybrids that cross the artificial boundaries drawn between culture and nature, and between the worlds of people and of things (for a clear analysis of the Actor-Network Theory, see [
8,
9,
10,
11,
12]).
ANT is an ideal framework to deal with spatiality, because it clearly directs our attention to the effects produced by the fluidity of spatial configurations of a variety of actors. However, from our perspective, ANT is also important for understanding the effects of their connections in movement. Thus, ANT is an excellent framework to describe the complex and mutable composition of networks of heterogeneous actors, especially in movement. In this respect, Sheppard argued [
9] that networks are non-hierarchical spaces, that is, spaces that are not important for quantitative reasons, such as the area, the population or the metric distance (the largest regions, the most inhabited places, the nearest places to cities and so on) that can be analyzed in a vertical way. In contrast, networks are spaces that are important for their connectivity and networking in a horizontal way. This means that all the involved human and non-human actors have the same importance, producing a lack of attention to their internal differentiation. On the contrary, relations in a network are not all the same, and their differences produce different spatialities [
10].
On the basis of the network perspective introduced by ANT, Jacques Lévyi and Michel Lussault in [
12] have proposed a more specific approach that focuses on networked spaces produced by humans and things in their movement. The two French geographers have started an important phase of social sciences of space, based on the importance of movement in the globalized world. In particular, they claimed that “contemporary urban” assumes a poly-centric and reticular configuration: it is no longer divided into center and periphery; rather, it is viewed as an “osmotic-centered system” of mobility; in fact, contemporary urban is inserted into a globalized network, where local scale and global scale interact by reconfiguring centrality and axes, and internal and external connections of the city [
13]. The creation of networks among the multiple places of contemporary urban is one of the processes that characterize the mobility of inhabitants, and more in general, of any kinds of city users: a new reticular dimension emerges, based on connections activated among places, exploited by individuals in their life experience; connections could be either real (transportation infrastructures) or virtual (information published either on the web or on social media about places, possibly produced by citizens). Such networking, produced by experience of individuals in urban space, is termed
rhizomatic, by resuming a concept born in the field of botany and then re-elaborated in the philosophical field: “Compared to centric (even poly-centric) systems, hierarchical communication and predetermined connections, a rhizome is an a-centric, non-hierarchical and not meaningful system” ([
3], page 33). The concept of rhizome was refined in spatial terms by Jacques Lévy: “A rhizome is the space of individual action in mobility, but also in the multiform relationship with other individuals” ([
14], page 19). A further definition could be the following: “A rhizome is a family of networks, characterized by the absence of identifiable boundaries and a meeting between topological metric inside and topographic metrics outside” ([
14], pages 18–19). In other words, a rhizome belongs to the topology metric that is to a discontinuous space, based on nodes and connections that produce a network without beginning, without end and without well-defined boundaries, because it is the result of the experience in space of individuals.
In order to explain our view of the concept of rhizome, consider three cities denoted as A, B and C. First of all, notice that the concept of distance is not only a matter of metric distance, but it is also a matter of accessibility. Suppose that the distance between the city centers of A and B is 30 km, but there is a train connection with fast and frequent trains that allow people to move from A to B and vice versa in 20 min every 15 min. Suppose now that the distance between the centers of A and C is only 10 km, but only a mountain road connects them, without public transportation: people without cars cannot reach A from C. Thus, the availability of an efficient transportation infrastructure makes cities A and B "closer" than cities A and C. Consequently, transfers from A to B and vice versa are more frequent than transfers from A to C and vice versa; thus we expect that, by analyzing data of moving people, cities A and B should be more frequently associated together than cities A and C. Thus, a metric distance is not always a valid parameter because accessibility is more important.
Figure 1 depicts the situation: on the left-hand side, the topographic representation shows that cities A and B are farer than cities A and C (based on their metric distance). On the right-hand side, the topological representation shows that cities A and B are much more connected than cities A and C: the points representing cities A and B are closer than the points representing cities A and C; furthermore, the line connecting cities A and B is thicker than the line connecting cities A and C. Thus, the reader can have a concrete shape to what was argued by Levy ([
14], pages 18–19): the rhizome is topographic outside-, i.e., places lived by people are physically located in the space; nevertheless, the rhizome is topological inside, because the strength of connections does not depend only on the physical distance between the connected places. If we consider connections between many places, both the topographic and the topological representations assume a reticular shape, where some place can be more attractive than the other ones, but no center clearly emerges; the reader can find examples in
Section 3.
Another aspect of interest is related to the fact that connections are not only related to physical accessibility but also to virtual connectivity. One example is related to promotional activities: suppose that in city A there is a restaurant whose fame is increasing, because some people living in city B are promoting it through social networks; friends of promoting people (living in city B too), could decide to go to that restaurant as well. This consideration has an impact on the visualization strategy to adopt for visually analyzing connections. Again, it appears that a topographic representation is necessary but it is not enough to enhance the strength of connections; a topological representation in which two very connected cities are depicted closer than two loosely connected cities (as in the right-hand side of
Figure 1) could provide a very useful visualization perspective, able to help analysts understand networks of connections.
By moving from the above considerations, it is possible to guess that “a rhizome is the space with a set of places frequently lived by a single person and by many people, on the basis of material and virtual connections among them”.
2.2. Mining Itemsets and Association Rules
Since the 1990s, a large number of data mining techniques have been developed to address a large variety of problems. One of the most famous data mining techniques is called
mining of association rules [
4]. Born for market-basket analysis, its original goal was to find frequent associations of (sold) items, i.e., items that frequently appear together in (commercial) transactions. An association rule has the form
, meaning that when a customer buys products
A,
B and
C, he/she also buys product
D. Each rule has two numerical weights, called
support and
confidence. The
support is the percentage of transactions that contain the rule. The
confidence is the conditional probability that the whole rule is found in a transaction having found the body (in the sample rule, the body is
, while
D is the head). Since the number of rules could be extremely high, it is necessary to prune the search space, in order to get only rules that could be really meaningful; pruning is done by setting a minimum threshold for support.
and so on are called itemsets; in particular, is called 3-itemset because it contains three items, while is called a 4-itemset because it contains four items. Notice that the rule is obtained from itemset ; this means that both the itemset and the rule have the same support. Given a minimum threshold for support named , itemsets having support greater than or equal to are called large itemsets.
The basic step to compute association rules is called
itemset mining: an algorithm extracts those sets of items that frequently appear together in transactions (large itemsets). This problem is not so easy to solve, particularly when the data set is very large and the number of items per transaction is large too. The reader can refer to [
15,
16] for some well-known algorithms developed to efficiently mine large itemsets. In this work, we used the implementation developed within the
Hints from the Crowd project [
17], which is a main memory algorithm able to deal with generic items. A reader willing to experiment the approach can exploit any algorithm available on the Internet for mining itemsets or association rules; a few lines of code written in some procedural programming language will usually be enough to pre-process the data set, so as to transform it into a format that is suitable for the specific implementation of the algorithm.
In order to detach from the context of commercial transactions, we can give the following generic formulation of the problem of itemset mining, originated from the semantics of the
MINE RULE operator introduced for relational databases [
18] and for XML databases [
19].
Definition 1 (Data set and items). Consider a data set . This is a set of groups; i.e., . Each group is, in turn, a finite set of items, i.e., . Groups are not disjoint; i.e., they can share items.
As an example, in the case of market-basket analysis, items are products, while groups are single commercial transactions.
Definition 2 (Large itemset).
An itemset is a finite set of items that appear together in some group in . The support of h is the number of groups in that contain the whole itemset, divided by the total number of groups in . Formally,Given a minimum threshold such that , the itemset h is said to be large if its support is greater than or equal to the minimum threshold, i.e., if . After these premises, we can say that the problem of large itemset mining is to compute all large itemsets from within a data set , given a minimum threshold for support denoted as .
As an example, consider the data set reported in
Table 1. If we set
, we obtain the large itemsets reported in
Table 2. Notice itemset
, with support
, that appears in three groups, i.e., groups
,
and
. This is the itemset with the largest cardinality (number of items) that we can extract from the data set: itemsets having a larger number of items have no sufficient support.
2.3. Related Work on Analysis of Twitter Messages for Studying Mobility
To complete the background of the paper, we consider the secondary contribution of the paper, i.e., how social media can push data-driven approaches to study mobility of people. Notice that we focus on works made on data gathered from Twitter, which are numerous: this is due to the fact that Twitter API does not pose any obstacle to gather data, while the other social media usually do.
Many studies have been published concerning analysis of tweets, particularly for studying mobility. In fact, researchers of many human sciences consider now micro-blogs (such as Twitter) a precious source of information for their studies. In fact, as stated in [
20], people blog to provide a record of their life to share with followers. Obviously, it is necessary to be aware of doubts concerning the representativeness of data obtained by analyzing traces of Twitter users [
21], because they represent only Twitter users, which are a subset of all moving people, and they are a subset of social media users.
Anyway, traces of Twitter users can complement other sources of information, in order to let a possibly unexpected perspective about studied phenomena emerge. Furthermore, often traces of Twitter users are the only source of information that describes paths of moving people, as in the context of tourism [
22,
23]. Notice that many researchers are interested in exploiting social-media sources to study how people move. For example, in [
24] the authors studied how cities influence mobility, by defining statistical metrics of centrality that they applied to data produced by Twitter users. In [
25], the authors analyzed digital footprints, such as data produced by phone networks, by cross analyzing them with geo-referenced photos posted on micro-blogs, to study the movement of tourists during their visit to Rome (Italy).
On a world-wide scale, geo-located posts by Twitter users can be also used to study global mobility patterns. In [
26], statistical approaches were proposed to study country-to-country patterns, by considering several aspects, such as country-to-country networks, temporal patterns of mobility and so on. Furthermore, mobility patterns could be very useful to analyze migrations as well. In [
27], the authors addressed the problem of understanding which countries migrants to the EU actually come from. They adopted a clustering algorithm with the goal of discovering the provenance of migrants, in spite of the (possibly false) countries they declared they came from.
The evolution of a city to become a
smart city can be significantly fostered by analyzing Big Data (in general) and traces of Twitter users (more specifically). In [
28], the authors tried to relate the choices made by tourists in the pre-trip phase with the experiences they shared on social media (post-trip). In [
29], the authors tried to create value by analyzing social media: by estimating kernel density and latent Dirichlet allocation, they showed that it is possible to investigate how social media can provide a platform to develop smart services for urban tourism. Always through social media analysis, in [
30] the authors explained that Big-Data analysis can help improve decision-making processes, and create marketing strategies with more personalized offers. Additionally, in [
31], the authors explored how the use of intelligent tourism technologies such as travel-related websites, social media and smartphones in travel planning, can improve traveler satisfaction. Micro-blogs can be effective also to influence tourists when they form their perceptions of the chosen destination; in [
32], the authors tried to demonstrate, through the use of
Sina Weibo, a Chinese micro-blogging site, how the choice of a touristic destination is influenced by information published on the social network.
Traces generated by Twitter users were used to analyze flows of people. For example, in [
33] the authors exploited traces of Twitter users to study flows within a city. They relied on a topographic representation of flows, and applied clustering techniques to identify places that were more active, i.e., where Twitter users mostly posted tweets. However, they did not exploit a topological representation and did not use itemset mining. At a regional scale, traces of Twitter users were used to analyze traffic [
34], in order to study critical areas and routes on a spatio-temporal basis, i.e., correlating traffic jams, routes and time. In this work, the authors adopted clustering techniques and topographic representations.
An interesting paper that proposed an approach significantly related to our approach is [
35]. The idea of the authors was to study mobility of individuals by clustering them instead of locations, in order to discover moving patterns. They adopted a method that clusters individuals having similar visitation rates for each location. Our approach is similar, since we try to associate places and transfers that frequently appear together in traces of Twitter users; however, the adoption of itemset mining provides a different perspective.
A paper that addresses the study of urban characterization is [
36]. The authors partitioned the space by adopting a clustering method that exploits density of tweets posted in the different areas. Then, numerical features based on the number of tweets in the different areas and the number of moving users were computed and temporally evaluated. With respect to our approach, it can be considered complementary, although focused on similar themes.
We also want to highlight that many approaches can be adopted for studying traces of Twitter users. One choice could be to develop automatic clustering techniques. However, specific techniques must be developed in this respect, because traces are sequences of visited places. In [
37,
38,
39], a clustering technique for traces of Twitter users (or trips) is proposed: the idea is to evaluate different similarity metrics between trips that were previously geo-partitioned on the basis of categorical coding systems (such as ZIP code) of the area containing coordinates denoting geo-coding of messages. Then, a multi-level fuzzy-clustering algorithm is applied to discover clusters of most popular trips. In those works, the reticular view of lived space is not considered, whereas that is the goal of this paper.
The technique of itemset mining was used in other works to analyze micro-blogs. For example, in [
40], the authors used this technique to extract patterns from messages, in order to use frequent itemsets for query expansion, when users formulate queries to find posts of interest. Similarly, in [
41] the authors adopted itemset mining for opinion mining and sentiment analysis of micro-blogs. Specifically, they proposed an opinion-descriptive model, that is the basis for an opinion mining method. This way, posts in the micro-blogs are classified on the basis of their sentiment.
To the best of our knowledge, itemset mining has been rarely used to study how people live space. We found only [
42], where the authors adopted itemset mining to address the problem of
geo-social co-location. Suppose that people that are found in the same place in a given time slice are described by features such as the university they are studying in, the course and so on; the itemset-mining approach is used to find out the most frequent associations of personal features that characterize people that frequently are in the same places. For example, if data describe students of universities, features are the names of the universities; thus, the results are the set of university names whose students are often found in the same places. In our work, we adopt the opposite approach: we want to obtain places that are often visited in the same trip by many tourists.
The work in [
43] addressed the problem of analyzing micro-blogs for business applications, namely, context-aware service profiling. To do that, they introduced the notion of
strong generalized flipping itemset that is able to highlight the existence of outliers in terms of the polarity of a relationship between concepts extracted from messages. In fact, the idea is that, given a generalized itemset (obtained, given a taxonomy, at a level higher than the leaf level) and its polarity (positive, negative or neutral), if one or more of its descendant itemsets (i.e., extracted at a lower level) shows a different polarity, this means that an anomaly is occurring. Again, they did not work with geo-location of messages to study mobility of users.
Another study that applied itemset mining to micro-blogs was [
44]. Specifically, the authors addressed the problem named
WTF (
who to follow): the idea is to recommend to users, other users to follow, on the basis of the topical users (popular users such as singers or politicians) and the semantic categories topical users belong to. The authors exploited itemset mining to profile users on the basis of semantic categories associated with topical users they follow. The work is quite interesting, but they considered neither posted messages nor geo-location.