Heterogeneous Network Embedding Based on Random Walks of Type and Inner Constraint

Chen, Xiao; Hao, Tong; Han, Li; Leng, Meng; Chen, Jing; Guo, Jingfeng

doi:10.3390/math10152623

Open AccessArticle

Heterogeneous Network Embedding Based on Random Walks of Type and Inner Constraint

¹

Research Center of Marine Sciences, Hebei Normal University of Science and Technology, Qinhuangdao 066004, China

²

School of Public Health and Health Sciences, Tianjin University of Traditional Chinese Medicine, Tianjin 301600, China

³

College of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China

^*

Authors to whom correspondence should be addressed.

Mathematics 2022, 10(15), 2623; https://0-doi-org.brum.beds.ac.uk/10.3390/math10152623

Submission received: 11 June 2022 / Revised: 20 July 2022 / Accepted: 22 July 2022 / Published: 27 July 2022

(This article belongs to the Special Issue Engineering Calculation and Data Modeling)

Download

Browse Figures

Versions Notes

Abstract

:

In heterogeneous networks, random walks based on meta-paths require prior knowledge and lack flexibility. On the other hand, random walks based on non-meta-paths only consider the number of node types, but not the influence of schema and topology between node types in real networks. To solve these problems, this paper proposes a novel model HNE-RWTIC (Heterogeneous Network Embedding Based on Random Walks of Type and Inner Constraint). Firstly, to realize flexible walks, we design a Type strategy, which is a node type selection strategy based on the co-occurrence probability of node types. Secondly, to achieve the uniformity of node sampling, we design an Inner strategy, which is a node selection strategy based on the adjacency relationship between nodes. The Type and Inner strategy can realize the random walks based on meta-paths, the flexibility of the walks, and can sample the node types and nodes uniformly in proportion. Thirdly, based on the above strategy, a transition probability model is constructed; then, we obtain the nodes’ embedding based on the random walks and Skip-Gram. Finally, in the classification and clustering tasks, we conducted a thorough empirical evaluation of our method on three real heterogeneous networks. Experimental results show that HNE-RWTIC outperforms state-of-the-art approaches. In the classification task, in DBLP, AMiner-Top, and Yelp, the values of Micro-F1 and Macro-F1 of HNE-RWTIC are the highest: 2.25% and 2.43%, 0.85% and 0.99%, 3.77% and 5.02% higher than those of five other algorithms, respectively. In the clustering task, in DBLP, AMiner-Top, and Yelp networks, the NMI value is increased by 19.12%, 6.91%, and 0.04% at most, respectively.

Keywords:

heterogeneous network embedding; random walks; non-meta-path; type and inner constraints

MSC:

68T30

1. Introduction

Many systems in the real world can be modeled as Heterogeneous Information Networks (HINs) [1], such as literature and technology networks, social media networks and medical information networks, etc. Among them, the DBLP literature and technology network (DBLP network for short) is a classic heterogeneous information network (HIN), as shown in Figure 1a. Compared with homogeneous networks [2], HINs contain multiple types of entities and relationships, and richer semantic information. Therefore, HINs have been widely used in various fields.

In big data era, with the increase of network scale, traditional methods (such as the adjacency matrix) present some problems, such as high dimensions, sparseness, and coupling; and become the bottleneck of network analysis and mining tasks [3]. Fortunately, inspired by the NLP model Word2Vec [4], the first network embedding (network representation learning) model DeepWalk [5] was proposed in 2014, and can effectively solve the above problems. Therefore, as the basis of dealing with large-scale network analysis tasks, network embedding has attracted extensive attention from industry and academia.

Due to the increase of semantic and structural information in HINs, the methods of homogeneous networks embedding either cannot be used directly or their complexity increases greatly. In contrast to this, the heterogeneous network embedding [6] can preserve the key structural attributes and the semantic attributes, and mine the potential semantic information. This is also of great significance for completing various network application tasks, such as classification [7,8], clustering [9,10], link prediction [11,12], and so on. Therefore, heterogeneous network embedding has become a current research hotspot.

At present, the method of heterogeneous network embedding based on random walks is classic and widely used. It mostly relies on meta-paths-guided random walks. For example, Metapath2Vec [1] manually selects “A-P-A” or “A-P-C-P-A” as a meta-path to guide the random walks in literature and technology networks. ESim [13] considers the information of multiple meta-paths and attempts to learn the optimal weight combination to guide the random walks. HIN2Vec [14] uses the different types of relations between nodes to combine meta-paths shorter than a certain length to guide the random walks. Meta-paths are the embodiment of semantics in HINs. The meta-path information of DBLP is shown in Figure 1b. In this case, the semantics of “A-P-A” is co-authorship, and “A-P-C-P-A” is two papers published by two authors at the same conference, and so on. There are many meta-paths in HINs. Different meta-paths can capture different semantic information, but the number of them increases exponentially as the length increases. The selection of meta-paths either requires domain experts, or the optimization of a set of predefined meta-paths, so that it is necessary to attempt a variety of situations. A determined meta-path limits the flexibility of the random walks. All of this brings significant challenges for the practical application of random walks based on meta-paths. Therefore, there is an urgent need for more flexible random walk methods in HINs.

In order to solve some existing problems of meta-paths, JUST (JUmp & STay) has been proposed by Yang et al. [15]. This is the first method of random walks based on non-meta-paths for HINs. It applies a Jump/Stay (Jump to other types/Stay on current node type) strategy when selecting the next node. In Figure 2, the current node type is P, and L = 3 means it has stayed in P for 3 times. In this case, the stay probability of the next node in P is α³. If α³ is greater than the given threshold, it will remain in P. Otherwise, the Jump strategy is performed. Q_hist of size m to memorize up-to-m previously visited types. Q_hist = {P, A} indicates that the recently visited types are P and A with m = 2. Then, JUST randomly samples one type from {T, V} as the target type, where the next node is sampled.

In Figure 2, we found some problems existing in JUST. (1) For the Stay strategy, 0/1 represents the cases that cannot/must Stay in the current node type. In other cases, α^l limits the probability of staying in the current node type, but without considering the schema of networks. For example, in DBLP, it can be seen from Figure 1c that only P can stay in its own type, while A, T and C cannot stay. (2) For the Jump strategy, the node types {Q − Q_hist} are given priority selection. But, the types are not considered to meet the Jump requirements. If the current random walk sequence is “···-P-P-P-A-?···”, and Q_hist = {P, A}, then the next node type will be T or C. But actually, A only has edge with P. So, A can only jump to P, where it makes no sense to consider type first. (3) JUST only provides the selection of node type, but does not consider the node selection. When the above problems arise, JUST shows great limitations.

To sum up, in view of the difficulties in choosing meta-paths, the poor flexibility of meta-paths, and the above problems existing in JUST, we design a novel random walks strategy based on Type and Inner constraint, which considers the node type and the adjacency relationship between nodes.

The main contributions of this paper can be summarized as follows.

We propose a novel model HNE-RWTIC. It performs the random walks based on Type and Inner constraint, and adopts Skip-Gram to learn dense and low-dimensional embedding in HINs.
We propose a novel random walks strategy based on Type and Inner constraint. In the Type strategy, the node types selection considers the co-occurrence probability of nodes. In the Inner strategy, the nodes selection considers the adjacency relationship between nodes. The strategy realizes the flexibility of node types selection in HINs (see Section 5), and the uniformity of proportional sampling between types and nodes (see Section 6.2.4).
We build a transition probability model based on the Type and Inner strategy. In the model, parameters control the selection of node type and node. Then, some properties are obtained. They indicate the relationship between the parameter value and the node type or node selection.
Using DBLP, AMiner-Top, and Yelp, we conduct the experiments in the classification and clustering tasks. By comparing with five classic networks embedding algorithms, the correctness and effectiveness of HNE-RWTIC are verified.

The remainder of this paper is organized as follows. Section 2 introduces the related work. Section 3 gives the preliminary knowledge and the problem definition. The random walks strategy, the transition probability model, and the HNE-RWTIC algorithm are described in Section 4. We present the properties and analysis of the transition probability model in Section 5, and show the experimental results and analysis in Section 6. The last section concludes the paper and forecasts future work.

2. Related Work

With the development of deep learning technology, network embedding has been widely used in various fields. At present, studies based on homogeneous networks are more extensive and in-depth, while few studies are based on heterogeneous networks. However, heterogeneous network embedding development momentum is rapid, and some research results have been achieved. The existing methods are mainly divided into three categories, methods based on decomposition [16,17,18,19,20], deep learning [21,22,23,24,25] and random walks [1,9,13,14,15,26,27,28,29].

The first set of methods usually decompose a heterogeneous network into multi-simple networks, learn the embedding of these networks, and integrate them. For example, EOE [16] transforms the academic network into a word co-occurrence network and an author co-occurrence network, and learns the vector representations of the node pairs within and between subnets. HERec [21] transforms a heterogeneous network into multi-homogeneous networks based on the meta-path extraction. Then, it fuses the vector representations of multi-homogeneous networks through fusion functions. The advantage of the above methods is that the existing homogeneous network embedding methods can be directly used for reference after decomposition. The disadvantage is that the quality of network decomposition and fusion methods directly affect the vector representation of the original network.

The second set of methods use the deep neural network model to obtain the embedding. For example, HeGAN [21] uses the generative adversarial networks to distinguish nodes connected through different relationships, and uses a generalized generator to sample the potential nodes, so as to obtain the representation of nodes. ActiveHNE [22] is a semi-supervised embedding method based on the graph convolution neural network, and adopts different active selection strategies according to uncertainty to make full use of supervisory information. MPDRL [24] uses reinforcement learning to find semantic-rich meta-paths with different lengths based on task accuracy, and performs node embedding based on the meta-paths set. The advantage of the above methods is that more abundant semantic and structural information can be learned. The disadvantage is that with the increasing number of network layers, the number of parameters may be as high as one million, which makes the running speed slow and greatly increases the requirements for hardware.

The third set of methods usually combine the random walk and skip-gram for embedding. According to the different strategies, the methods can be divided into two categories. (1) Random walks based on meta-path, including Metapath2Vec [1], ESim [13], HIN2Vec [14], HeteSpaceyWalk [26], etc. HeteSpaceyWalk systematically formalizes the random walks based on meta-path into a high-order Markov chain process, and proposes a heterogeneous personalized space random walk. These techniques are designed to optimize meta-paths. But they still require a specific meta-path. The experimental results show that the quality of node embedding is sensitive to the meta-path. (2) Random walks based on non-meta-path. For example, JUST [15], based on jump and stay strategy to guide random walks, breaks the constraints that need to define meta paths in advance, and can balance the sampling distribution of different node types in random walks. TANE-RWCN [9] based on a novel random walk strategy that combines the node’s degree, path length, the user’s preference for topics, and high-order proximity, and uses the set pair connection number to improve the accuracy of vector representation. The advantage of the above methods is that the generated walk sequences are regarded as articles in NLP, and the existing NLP model can be utilized in learning. The disadvantage is that the quality of the random walks affects the performance of embedding.

As a classical graph analysis model, the random walk is often used to describe the reachability between nodes in networks, and is widely used in network embedding. Most of the existing studies on heterogeneous network embedding based on random walks are based on meta-paths. However, meta-path selection is still challenging in reality, requiring sufficient domain knowledge. And the meta-path also limits the flexibility of walking. Therefore, there is an urgent need to study the methods of random walks based on non-meta-paths. However, the methods based on non-meta-paths either have high complexity, or only consider the selection of node type, and not how to select the nodes within the type. In order to solve the above problems, and inspired by the idea of selecting node type in JUST and the priority search strategy in Node2Vec [7], we propose a novel random walk strategy that can balance the selection of node type and node.

3. Preliminary Knowledge and Problem Definition

In this section, we first define the important terms, followed by a formal definition of the heterogeneous network embedding problem. For ease of presentation, a list of notations is given, as shown in Table 1.

Definition 1.

A heterogeneous information network (HIN) is defined as G = (V, E, A, R). Where, V is the node set, E is the edge set, A = {A₁, A₂, …, A_n, …, A_N} (N ≤ |V|) is the node type set, and R is the edge type set. For each node v_i ∈ V, it belongs to a specific node type, denoted by φ(v_i) = A_n ∈ A (1 ≤ n ≤ N). Where N = |A| is the number of node types. For each edge e_j = (v_i, v_j) ∈ E, it belongs to a specific relation type, denoted by ψ(e_j) ∈ R. Where, M = |R| is the number of edge types. It is generally believed that heterogeneous information networks satisfy M > 1 or N > 1.

Definition 2.

The network schema [30] is denoted as T_G = (A, R). This is a meta template for a heterogeneous network G = (V, E, A, R) with the object type mapping φ:V→A and the link type mapping ψ:E→R.

Definition 3.

Given a heterogeneous network G = (V, E, A, R), the heterogeneous network embedding is to learn a mapping function f: V →X ∈ R^|V|×d, d << |V|, so as to obtain the vector representation of nodes in the network. The vector representation can capture the structural and semantic relationships between nodes in the network.

The purpose of this paper is to study the embedding method of random walks based on non-meta-paths in HINs. Firstly, the random walks strategy is determined and described as a transition probability model. Secondly, the sequence W is obtained through random walks. Then, the obtained W is combined with the Skip-Gram model to learn the embedding of nodes in HINs.

4. HNE-RWTIC Model

In this section, the random walks strategy and transition probability model are introduced in detail. Then, the algorithm HNE-RWTIC and its detailed description are given. Finally, the time complexity of the algorithm is analyzed.

4.1. Random Walks Strategy

Due to the characteristics of HINs and the problems existing in meta-paths and JUST, we design a random walks strategy based on Type and Inner strategies. This strategy is divided into three steps:

Node type partitioning strategy. According to the network schema and the research purpose, the node types are divided into objective class and non-objective class.
Type strategy. In the strategy of node type selection, the co-occurrence probability of three consecutive node types in the walking sequence is considered. Three consecutive node types are the previous node type, the current node type and the next node type. And the next node type is selected with the largest probability value.
Inner strategy. In the strategy of nodes selection, based on the adjacency relationship of three consecutive nodes, the probability value of backtracking, breadth or depth is calculated. Then, the next node is selected by the largest value.

Thus, the node type partitioning strategy can solve the JUST problem (2), which occurs when the preferred type does not meet the jump requirement. Type strategy can solve the JUST problem (1) of being confined to the current node type. Inner strategy can solve the JUST problem (3) of not considering how to select the next node. The next step is how to build the transition probability model based on the strategy.

4.2. Transition Probability Model

4.2.1. Node Type Partitioning

In this paper, the HINs are unsigned. Therefore, the node type needs to satisfy N ≥ 2. In order to better select the node type and the next node, node type partitioning is necessary in HINs.

Based on network schema and application, node types are divided into objective and non-objective classes. Where, the objective class is the type of the entity being studied or the type connected to most classes in the network, denoted as

O

. The rests are non-objective classes, denoted as

\bar{O}

. Then, in Definition 1, the set of node types is also denoted as A =

O \cup \bar{O}

(

O \cap \bar{O} = \emptyset

). Where,

O = \{O_{1}, O_{2}, \dots, O_{n_{1}}\}

,

\bar{O} = \{{\bar{O}}_{1}, {\bar{O}}_{2}, \dots, {\bar{O}}_{n_{2}}\}

which are equivalent to

O

= {A₁, A₂, …, A_n},

\bar{O}

= {A_n+₁, A_n+₂, …, A_N}. So, n₁+n₂ = N, n₁ = n, n₂ = N − n. In this study, n₁ ≥ 1 and n₂ ≥ 1 are required.

In HINs,

\forall

v_i ∈ V, if φ(v_i) ∈

O

, the type of node v_i is the objective class. Otherwise, the type of node v_i is the non-objective class, denoted by φ(v_i) ∈

\bar{O}

. During the random walks, the stay probability of the node type is in Equation (1).

P_{S t a y} = \{\begin{matrix} α, O \\ 1 - α, \bar{O} \end{matrix}

(1)

In Equation (1), α ∈ [0, 1] is the probability that the node stays at

O

, and 1 − α is the probability that the node stays at

\bar{O}

, as shown in Figure 3.

4.2.2. Transition Probability Model

Given a start node v₀ and length L, we carry out the random walks; v_i₋₁ and v_i are the i − 1 and i node in the path. The transition probability of v_i+₁ is shown in Equation (2).

P (v_{i + 1} | v_{i}, v_{i - 1}) = P_{T y p e} (φ (v_{i + 1}) | φ (v_{i}), φ (v_{i - 1})) P_{I n n e r} (v_{i + 1} | v_{i}, v_{i - 1}),

(2)

where, P_Type is the selection probability of the φ(v_i₊₁), P_Inner is the selection probability of node v_i+₁. v_i−₁, v_i and v_i+₁ represent the previous node, the current node and the next node. φ(v_i₋₁), φ(v_i) and φ(v_i₊₁) represent the previous node type, the current node type and the next node type.

Note: P_Type, P_Inner, v_i−₁, v_i, v_i+₁, φ(v_i₋₁), φ(v_i) and φ(v_i₊₁), are used as simple descriptions below.

The probability of selecting node type

In the random walks, we use the parameters α and k to control the transition probability between node types. Given a G, when φ(v_i) and φ(v_i₋₁) are known, the probability of φ(v_i₊₁) is shown in Equation (3).

P_{T y p e} (φ (v_{i + 1}) | φ (v_{i}), φ (v_{i - 1})) = \{\begin{matrix} α^{3}, O O O \\ (1 - α) α^{2}, O O \bar{O} o r \bar{O} O O o r O \bar{O} O \\ k {(1 - α)}^{2} α, \bar{O} O \bar{O} \end{matrix}

(3)

In Equation (3) and Figure 3, when

O O O

, the probability of φ(v_i₊₁) is α³.

O O O

is short for φ(v_i₋₁) ∈

O

, φ(v_i) ∈

O

and φ(v_i₊₁) ∈

O

, and other cases are the same. Then, the probability of φ(v_i₊₁) ∈

\bar{O}

is α²(1 − α) when

O O \bar{O}

. The probability of φ(v_i₊₁) ∈

O

is (1 − α)α² when

\bar{O} O O

. The probability of φ(v_i₊₁) ∈

O

is α(1 − α)α when

O \bar{O} O

. The probability of φ(v_i₊₁) ∈

\bar{O}

is (1 − α)α(1 − α) when

\bar{O} O \bar{O}

.

In Equation (3) and Figure 3, the probability of φ(v_i₊₁) is divided into five cases. We can see when φ(v_i) ∈

O

, there are four cases. Otherwise, there is only one case. Since in the current HINs study, there are no edges between

\bar{O}

. When φ(v_i) ∈

\bar{O}

, φ(v_i₊₁) can only be O.

In Equation (3), when φ(v_i₋₁) ∈

\bar{O}

and φ(v_i₊₁) ∈

\bar{O}

, parameter k is used to adjust the influence of

\bar{O}

on the selection of φ(v_i₊₁) as shown in Equation (4).

k = \{\begin{matrix} 1, N = 2 \\ k_{1}, N > 2 a n d φ (v_{i - 1}) = φ (v_{i + 1}) \\ \frac{1}{k_{1}}, N > 2 a n d φ (v_{i - 1}) \neq φ (v_{i + 1}) \end{matrix}

(4)

When N = |A| = 2, there are only two types in HINs. In this case, k = 1. When N > 2, the

\bar{O}

contains multiple seed types, φ(v_i₋₁) = φ(v_i₊₁) or φ(v_i₋₁) ≠ φ(v_i₊₁) exists. When φ(v_i₋₁) = φ(v_i₊₁), let k = k₁, k₁ ∈ (0,+∞). Otherwise, k = 1/k₁.

2.: The probability of selecting nodes

After determining the node type, we consider the adjacency relationship between v_i₊₁, v_i and v_i₋₁, and adopt parameters h, p, and q to control the backtracking, breadth, or depth of the node. Therefore, the transition probability of v_i₊₁ is shown in Equation (5).

P_{I n n e r} (v_{i + 1} | v_{i}, v_{i - 1}) = \{\begin{matrix} \frac{h}{p}, d (v_{i - 1}, v_{i + 1}) = 0 \\ 1, d (v_{i - 1}, v_{i + 1}) = 1 \\ \frac{1}{q}, d (v_{i - 1}, v_{i + 1}) = 2 \end{matrix}

(5)

In Equation (5) and Figure 3, d(v_i₋₁, v_i₊₁) is the shortest distance from v_i₋₁ to v_i₊₁. d(v_i₋₁, v_i₊₁) = 2 represents v_i₊₁ is a neighbor of v_i but not a neighbor of v_i₋₁, then the probability of v_i₊₁ is 1/q. Where, q ∈ (0,+∞) controls the breadth or depth. When q > 1, the node adopts breadth-first. Otherwise, the node adopts depth-first. d(v_i₋₁, v_i₊₁) = 1 represents v_i₊₁ is a common neighbor of v_i and v_i₋₁, then the probability of v_i₊₁ is 1. d(v_i₋₁, v_i₊₁) = 0 represents v_i₊₁ is v_i₋₁, then the probability of v_i₊₁ is h/p. Where, p ∈ (0,+∞) and h ∈ {0, 1} are return parameters that control the probability of returning v_i₋₁. When p > max(q, 1), it tends not to return v_i₋₁. When p < min(q, 1), it tends to return v_i₋₁. h is set as shown in Equation (6). When

O O O

(φ(v_i₋₁) = φ(v_i₊₁)) or

\bar{O} O \bar{O}

(φ(v_i₋₁) = φ(v_i₊₁)), h = 1. In this case, v_i₊₁ may be v_i₋₁. In the rest of the cases, h = 0, v_i₊₁ cannot be v_i₋₁.

h = \{\begin{matrix} 1, (O O O o r \bar{O} O \bar{O}) a n d (φ (v_{i - 1}) = φ (v_{i + 1})) \\ 0, o t h e r \end{matrix}

(6)

4.3. Algorithm Description of HNE-RWTIC

The description of algorithm HNE-RWTIC is given, as shown in Algorithm 1.

Algorithm 1 is mainly divided into five steps. Step 1: initializes the random walk paths set W to be empty, at line 1. Step 2 is to randomly sort all nodes in the network, at line 2. Step 3: for each starting node, perform r random walks with a walk length of L; then, the final paths set W is obtained, at lines 3–16. Step 4: we put the W into the Skip-Gram model for training, so that the vector representation of each node in HIN is obtained, at line 17. Step 5: return Φ, at line 18.

Algorithm 1 HNE-RWTIC

Input: G, probability parameter α, control parameter k₁, return parameter p, controlling
            search mode parameter q, walks length L, each node as the start node times r,
            node vector dimension d, window size window.
Output: The embedding of nodes Φ = R^|V|^×d.
1.         W = Ø
2.         V = shuffle(G.nodes())
3.         for i = 1 to r do
4.             for v in V do
5.               path = Ø
6.                for walk_iter = 1 to L do
7.                  v_i = path[−1]
8.                if walk_iter == 1 then
9.                    path = path ∪ v
10.                else if walk_iter < L then
11.                    v_i₋₁ = path[−2]
12.                    φ(v_i+₁) = next_node_type(G, L, v_i, v_i−₁, α, k₁)
13.                    v_i+₁ = next_node(G, q, p,v_i−₁, φ(v_i+1))
14.                    path = path ∪ v_i+₁
15.                  W = W ∪ path
16.         end for
17.         Φ = SkipGram(W, d, window)
18.         return Φ

In Algorithm 1, two key works are included.

A key work is shown in Algorithm 2. That is, according to φ(v_i), φ(v_i−₁), parameters α and k, the next node type φ(v_i+₁) is selected, at line 12. Algorithm 2 is divided into three steps. Step 1 is shown in lines 1–5. If φ(v_i) and φ(v_i₋₁) are the same and α³ > α²(1 − α), then φ(v_i₊₁) is φ(v_i). Otherwise, φ(v_i₊₁) is different from φ(v_i). Step 2 is shown in lines 6–14. When φ(v_i) and φ(v_i₋₁) are different, choose the maximum of α²(1 − α), k₁α(1 − α)², and α(1 − α)²/k₁. When α²(1 − α) is maximum, φ(v_i₊₁) is φ(v_i). When k₁α(1 − α)² is maximum, φ(v_i₊₁) is φ(v_i₋₁). When α(1 − α)²/k₁ is maximum, φ(v_i₊₁) is different from φ(v_i) and φ(v_i₋₁). Step 3 is return φ(v_i₊₁), at line 15.

Algorithm 2next_node_type

Input: G, walks length L, v_i, v_i−₁, probability parameter α, control parameter k₁.
Output: φ(v_i+₁).
1.          If φ(v_i−1) == φ(v_i) then
2.             if α³ > α²(1 − α) then
3.                φ(v_i+1) = φ(v_i)
4.             else
5.                φ(v_i+1) ≠ φ(v_i)
6.           Else if φ(v_i−₁) != φ(v_i) then
7.                val = max{α²(1 − α), k₁α(1 − α)², α(1 − α)²/k₁}
8.               if α²(1 − α) == val then
9.               φ(v_i+1) = φ(v_i)
10.               if α(1 − α)²/ k₁ == val then
11.                φ(v_i+1) ≠ φ(v_i−₁) and φ(v_i)
12.              if k₁α(1 − α)² == val then
13.                φ(v_i+1) = φ(v_i−₁)
14.          end if
15.          return φ(v_i+1)

Another key work is shown in Algorithm 3. That is, according to parameters q, p, and h, the next node v_i₊₁ is selected, at line 13. Algorithm 3 is divided into five steps. Step1: Calculate p_in1, p_in2, and p_in2 according to parameters q, p, and h, at line 1. Step 2: Get the maximum value of p_in1, p_in2, and p_in2, at line 2. Step 3 is shown in lines 3–16. When p_in1 is the largest, the candidate set of v_i₊₁ is v_i₋₁. When p_in2 is the largest, v_i₊₁ is obtained through the breadth-first search for neighbor nodes of v_i. When p_in3 is the largest, v_i₊₁ is obtained through the depth-first search for neighbor nodes of v_i. Step 4: the node is randomly selected as v_i₊₁ in the candidate set, at line 10. Step 5 is return v_i₊₁, at line 11.

Algorithm 3 next_node

Input: G, search mode parameter q, return parameter p, v_i−₁, φ(v_i+₁).
Output: v_i+₁.
1.          Init p_in1 = h/p, p_in2 = 1, p_in3 = 1/q
2.          Max_val = max{ p_in1, p_in2, p_in3 }
3.           If p_in1 == Max_val then
4.            next_node_candidate = v_i−₁
5.          else if p_in2 == Max_val then
6.            next_node_candidate = BFS(v_i.neighbors())
7.          else if p_in3 == Max_val then
8.             next_node_candidate = DFS(v_i.neighbors())
9.          end if
10.         v_i+₁=Random(next_node_candidate)
11.          return v_i+₁

4.4. Time Complexity Analysis

The key of HNE-RWTIC is to use Type and Inner strategy to get random walks. In the strategy, it is assumed that there are n nodes in the network, the average degree of nodes is d, each node is selected repeatedly for r times, and the walk step is L. Firstly, when φ(v_i₊₁) and v_i₊₁ are selected, the time complexity of the first-order neighbor node is O(d), and the time complexity of the second-order neighbor node is O(d²). The time complexity of generating the random walks is O(r × L × n). Then, the time complexity of the strategy is O(r × L × n + d + d²), where r, d and L are constants. Therefore, the optimal time complexity of the strategy is O(n).

5. Properties and Analysis of the Transition Probability Model

5.1. Properties

After constructing the transition probability model based on Type and Inner strategy, we get the following properties of parameters α, k, p, q, and h selected with φ(v_i₊₁) and v_i₊₁.

Property 1.

When φ(v_i−1)∈

\{O \cup \bar{O}\}

and φ(v_i)∈

O

, with the increase of α ∈[0, 1], the selection ofφ(v_i+1) tends to change from

\bar{O}

to

O

.

We can see from Property 1 and Equation (3), without considering the influence of parameter k, φ(v_i₊₁) is affected by α, as shown in Figure 4a.

For φ(v_i₋₁) ∈ $O$ , when α < 0.5, φ(v_i₊₁) tends to jump $\bar{O}$ . When α > 0.5, φ(v_i₊₁) tends to stay at $O$ .
For φ(v_i₋₁) ∈ $\bar{O}$ , when α < 0.5, φ(v_i₊₁) tends to jump $\bar{O}$ . When α > 0.5, φ(v_i₊₁) tends to stay at $O$ .
When α = 0.5, φ(v_i₊₁) is selected randomly.

Property 2.

When φ(v_i−1)φ(v_i)φ(v_i+1) is

\bar{O} O \bar{O}

and N > 2, with the increase of k∈(0, +∞),φ(v_i+1) andφ(v_i−1) tend to change from different to same.

We can see from Property 2 and Equation (4), the relationship between φ(v_i+₁) and φ(v_i₋₁) is affected by k, as shown in Figure 4b.

When 0 < k₁ < 1, φ(v_i₊₁) does not tend to equal φ(v_i₋₁), that is φ(v_i₊₁) ≠ φ(v_i₋₁).
When k₁ > 1, φ(v_i₊₁) tends to equal φ(v_i₋₁), that is φ(v_i₊₁) = φ(v_i₋₁).
When k₁ = 1, φ(v_i₊₁) is randomly selected.

Property 3.

With the increase of q∈ (0, +∞), the selection of v_i+1 tends to change from depth to breadth.

Property 4.

With the increase of p∈ (0, +∞), the selection of v_i+1 tends to change from backtracking to non-backtracking.

According to Properties 3 and 4, and Equation (5), ignoring the influence of h, we consider the adjacency relationship of v_i₋₁, v_i, and v_i₊₁. Parameter q controls the preference of breadth-first or depth-first; p controls the backtracking. The influence of p and q on v_i₊₁ is shown in Figure 4c,d.

When 0 < q < 1, v_i₊₁ tends to depth-first. When q > 1, v_i₊₁ tends to breadth-first.
When p > max(q, 1), v_i₊₁ does not tend to return v_i₋₁. When p < min(q, 1), v_i₊₁ tends to return v_i₋₁.
When p = q = 1, v_i₊₁ is randomly selected.

Property 5.

When

d (v_{i - 1}, v_{i + 1}) = 0

, parameter h controls whether v_i−1 can be returned. In this case, when h = 0, v_i+1 cannot return v_i−1. When h = 1, v_i+1 might return v_i−1.

5.2. Analysis

A meta-path is usually used to guide random walks in HINs. For example, in DBLP, we generally choose “A-P-A” or “A-P-C-P-A” as the meta-path in experiments. It can be seen that the meta-path essentially only considers node types. We also describe the selection strategy of node type in Equations (3) and (4). According to Property 1 and 2, the strategy can realize random walks with a specified meta-path and other paths. In this section, DBLP is taken as an example to make a comparative analysis with the meta-path “A-P-C-P-A”.

In DBLP, the types of “A-P-C-P-A” are A, P and C. According to the network schema,

O

= {P},

\bar{O}

= {A, C}. For Equation (4), in this case, N = 3. so, k = k₁ or k = 1/k₁.

According to “A-P-C-P-A”, the starting node type of the random walks can only be A. When k₁ < 1, according to Property 2 and Equation (4), it can be known that k₁ < 1/k₁. At this time, the path type is

\bar{O} O \bar{O}

. Since

\bar{O}

= {A, C} and φ(v_i₋₁) ≠ φ(v_i₊₁), therefore,

\bar{O} O \bar{O}

= APC. In this case, φ(v_i₋₁) = P and φ(v_i) = C. According to the network schema (Figure 1c), φ(v_i₊₁) = P, the sequence APCP can be obtained. Now, φ(v_i₋₁) = C and φ(v_i) = P. Finally, according to Property 2 and Equation (4), φ(v_i₊₁) = A. To sum up, we get “A-P-C-P-A”.

Similarly, other meta-paths can be realized by the Type and Inner strategy. Therefore, this strategy is more flexible than the meta-path one.

6. Experiments

6.1. Experimental Setup

In experiments, three real HINs datasets are used, which are DBLP [4], AMiner-Top [27], and Yelp [20]. A detailed description of datasets is given in Table 2.

Five classic network embedding algorithms are selected, which are Node2Vec [7], HIN2Vec [14], Metapath2Vec [1], JUST [15], and HeGAN [21]. In these algorithms, the general parameters are consistent with those in the DeepWalk: each node as the start node times r = 10, walk length L = 100, and node vector dimension d = 128.

The unique parameters of the five algorithms are described as follows: In Node2Vec, the bias parameters are p = 1 and q = 1. In HIN2Vec, the range of the meta-path length is 1–4. In Metapath2Vec, DBLP and AMiner-Top network use the meta-path “A-P-C-P-A” suggested by the author, and Yelp uses “B-U-B”. In JUST, the stay probability is set to α ∈ [0.2, 0.5], and the number of node types recorded recently visited is set to m = 1. And in HeGAN, iteration time is epoch = 20; the training times of discriminator and generator are n_G = 5 and n_D = 15, Gaussian variance is σ² = 1.

6.2. Experimental Results and Analysis

In this section, we conducted the experimental comparison with five classic algorithms in three HINs from classification and clustering. Then, HNE-RWTIC was analyzed by parameter sensitivity analysis. Finally, the uniformity experiment was carried out on DBLP.

6.2.1. Classification

The goal of classification is to predict the most likely tag of some nodes based on the nodes with labels. In experiments, we divided the datasets into training sets and test sets. The training sets considered 100%, 80%, 60%, 40% and 20% of the dataset, and the rest were for testing. In addition, we trained a one-vs-rest logistic regression classifier to predict the labels of the test nodes, which is a recognized node classification algorithm in heterogeneous network embedding. Moerover, we compared the predicted results with their true labels. The 4 categories of authors in DBLP and 3 categories of businesses in Yelp are classified. There are 8 categories of authors in AMiner-Top, but because the number of authors with tags in some categories is too low, we chose the 3 categories with the most tags to classify. We repeated the experiment 10 times and took the mean value of Micro-F1 and Macro-F1 as the experimental results in Table 3. Bold indicates the maximum value and bold & italics indicates the next maximum value.

In Table 3, the results show that:

Compared with the other five classic algorithms, HNE-RWTIC has better classification results in the three networks.
With the increase of the proportion of the training set, the values of Micro-F1 and Macro-F1 of each algorithm increased significantly. When the training set is 100%, the result is the best.
HNE-RWTIC performed better than the baseline methods in 80% of the training sets. In DBLP, AMiner-Top, and Yelp, the values of Micro-F1 and Macro-F1 are the highest: 2.25% and 2.43%, 0.85% and 0.99%, 3.77% and 5.02% higher than those of other algorithms.
Compared with DBLP and AMiner-Top, the Micro-F1 and Macro-F1 value of HNE-RWTIC in Yelp improved more.

JUST and methods based on meta-path only consider the node type and ignore the node distribution balance. The problem of unbalanced node distribution is less affected when the node types and the number of edges in the network are large, because rich semantic relationships can make up for this shortcoming. However, Yelp only contains two types of nodes and one type of edge, and the quality of walking sequences will be seriously affected. Detailed analysis refers to the analysis of parameters p and q in Section 6.2.3.

6.2.2. Clustering

The goal of clustering is to aggregate similar nodes into the same community. In experiments, K-means was used to cluster nodes, and NMI was used to evaluate the results. Due to the sparse network and large community size, the NMI value is very small. This makes it difficult to compare different approaches. Therefore, referring to Metapath2Vec, we only selected the nodes of the two largest communities to calculate NMI values for DBLP and Yelp, and the three largest communities for AMiner-Top. Their community proportions are 53.2%, 82.01%, and 81.09%, respectively. We repeated the experiment for times and took the average value of NMI in Table 4. Bold indicates the maximum value and bold & italics indicates the next maximum value.

In Table 4, HNE-RWTIC has better clustering results in the three networks. It has the highest NMI values of 19.12%, 6.91%, and 0.04% higher than other algorithms in DBLP, AMiner-Top, and Yelp, respectively.

Interestingly, it can be seen that the embeddings obtained by the HNE-RWTIC algorithm have a decreasing NMI increment when clustering on DBLP, AMiner-Top, and Yelp. Therefore, the network analysis shows that the clustering is better when the structural information in the network is more detailed and the connection between nodes is closer. For example, there are four types of nodes in DBLP, P, C, T, and A, and four types of edges, P-P, P-A, P-T, and P-C. The NMI increment is the largest, 19.12%. And there are three types of nodes in AMiner-Top, P, C, and A, and three edges types of edges, P-P, P-A, and P-C. The NMI increment is 6.91%. However, Yelp only has two types of nodes, B and U, and a type edge, B-U. The NMI increment is the lowest, at 0.04%.

6.2.3. Parametric Sensitivity Analysis

In the classification and clustering of the three datasets, Micro-F1, Macro-F1, and NMI were used to analyze the sensitivity of the six main parameters of HNE-RWTIC, which are α, k₁, p, q, L, and d. The results are shown in Figure 5, Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10. The ordinates in the figure represent the results of the classification of micro-F1 and macro-F1 values and the clustering of NMI values.

Probability parameter α

Experimental results of α are shown in Figure 5. The abscissa represents the range of parameter α, α ∈ [0.1, 0.9], and the step size is 0.1. In Yelp, when α > 0.5, it tends to stay in the current node type, but there is no B-B or U-U edge in the real network, so α ∈ [0.1, 0.5].

Figure 5 shows that when α ∈ [0.1, 0.5], both classification and clustering have good results. In DBLP, AMiner-Top, and Yelp, when α is 0.4, 0.4, and 0.2, the classification results are the best. At this point, the values of Micro-F1 and Macro-F1 are 0.8863 and 0.8864, 0.8958 and 0.8855, 0.7437, and 0.7125, respectively. When α is 0.3, 0.2, and 0.4, the clustering results are the best, and NMI value is 0.7752, 0.4436, and 0.0015, respectively.

When the parameter α is smaller, the experimental results are better. The smaller α makes the nodes tend to different types in random walks, that is, to heterogeneous edges. On the contrary, when α is larger, the quality of node embedding will be affected because there are many homogeneous edges. The results show that the balance of edge types can be achieved by adjusting α.

2.: Control parameter k₁

Experimental results of k₁ in DBLP and AMiner-Top are shown in Figure 6. The abscissa represents the range of log values for parameter k₁, log₂(k₁) ∈ [−3, 3], and the step size is 1.

Figure 6 shows that when log₂(k₁) ∈ [−3, 0], both classification and clustering have good results. In DBLP and AMiner-Top, when k₁ = 0.5, the classification results are the best, and the Micro-F1 and Macro-F1 values are 0.8863 and 0.8865, 0.8962 and 0.8859, respectively. When k₁ is 0.25 and 0.5, the clustering results of nodes are the best, and the NMI values are 0.7826 and 0.4461.

Since Yelp has only two types of nodes, the value of k₁ is always 1. So parameter analysis of k₁ is not carried out.

When the parameter k₁ is smaller, the experimental results are better. The smaller value of k₁ indicates that the node types tend to be different, that is, there are more types of heterogeneous edges. On the contrary, when k₁ is larger, the quality of node embedding will be affected when there are too few heterogeneous edge types. The results show the balance of the number of edge types can be achieved by adjusting k₁.

3.: Return parameter p

Experimental results of p are shown in Figure 7. The abscissa represents the range of log values for p, log₂(p) ∈ [−3, 3], and the step size is 1.

Figure 7 shows that when log₂(p) ∈ [0, 2], in DBLP and Yelp, both classification and clustering have good results. When log₂(p) ∈ [−2, 0], in AMiner-Top, both classification and clustering have good results. In DBLP, AMiner-Top and Yelp, when p is 0.25, 4, and 2, the classification effect is the best, and the Micro-F1 and Macro-F1 values are 0.9031 and 0.9030, 0.8960 and 0.8856, 0.7479 and 0.7179, respectively. When p is 0.5, 4, and 2, the clustering effect is the best, and the NMI value is 0.8091, 0.4446, and 0.0017, respectively. A larger p indicates that the next node in the random walks tends to migrate to other nodes, that is, to visit more nodes. On the contrary, when p is smaller, the next node tends to return v_i₋₁, which will affect the distribution of nodes. The results show the balance of node distribution can be achieved by adjusting p.

4.: Controlling search mode parameter q

The experimental results of q are shown in Figure 8. The abscissa represents the range of log values for q, log₂(q) ∈ [−3, 3], and the step size is 1.

Figure 8 shows that when log₂(q) ∈ [−2, 0], classification has good results. When log₂(q) ∈ [0, 2], in DBLP and Yelp, the clustering has good results, and the clustering of AMiner-Top has good results when log₂(q) ∈ [0, 2]. In DBLP, AMiner-Top, and Yelp, when q was 0.5, 0.125 and 1, the classification effect was the best, and the Micro-F1 and Macro-F1 values were 0.9157 and 0.9159, 0.9002 and 0.8907, 0.7479 and 0.7179, respectively. When q is 1, 0.125, and 1, the clustering effect is the best, and the NMI value is 0.8091, 0.4689, and 0.0017, respectively.

When q ∈ [0.5, 1], the results of classification are better. When q ∈ [1, 4], the results of clustering are better. q > 1 indicates that nodes in the random walk tend to be breadth-first. q < 1 indicates nodes tend to be depth-first. The results show the distribution of nodes within the type can be balanced by adjusting q.

5.: Walk length parameter L

Experimental results of L are shown in Figure 9. The abscissa represents the range of parameter L is [40, 60, 80, 100, 150, 200].

Figure 9 shows that when the walk length is small, the classification result is poor. With the increase of L, it also gets better, but after L>100, there will be a downward trend. The clustering effect is generally better in L ∈ [80, 100], and it becomes better with the increase of L in Yelp. In DBLP, AMiner-top, and Yelp, when L is 100, 150, and 100, the node classification results are the best, and the Micro-F1 and Macro-F1 values are 0.9157 and 0.9159, 0.9015 and 0.8924, 0.7479 and 0.7179, respectively. When L is 60, 100 and 200, node clustering results are the best, and NMI value is 0.8167, 0.4689, and 0.0019, respectively. To be consistent with the baseline algorithm, we set L = 100.

6.: Dimension parameter d

The experimental results of d are shown in Figure 10. The abscissa represents the range of parameter d is [32, 64, 128, 256, 512].

Figure 10 shows that when d is smaller, the effect of classification and clustering tasks is relatively poor. With the increase of d, it also increased. But when d > 256, it tends to be balanced. In DBLP, AMiner-top, and Yelp, when d is 512, 512, and 256, the node classification effect is the best, and the Micro-F1 and Macro-F1 values are 0.9267 and 0.9267, 0.9019 and 0.8923, 0.7589 and 0.7260, respectively. When d = 128, the clustering effect of nodes is the best, with NMI values of 0.8091, 0.4689 and 0.0017, respectively.

Through the sensitivity analysis of the above parameters, the parameter values are set as follows in this paper: L = 100, d = 128, α ∈ [0.1, 0.5], k₁ ∈ [0.125, 1], p ∈ [1, 4] and q ∈ [0.5, 1] or p ∈ [0.5, 1], and q ∈ [1, 4].

6.2.4. Uniformity of Sampling

This section analyzes the sampling situation from node type and node of the same type. Taking DBLP as an example, when α = 0.4, k₁ = 0.5, p = 0.25, q = 0.5, L = 100, r = 10, 10 experiments were carried out. The mean values of each node type and each node were taken for analysis.

Distribution of node types

The distribution of four types of nodes in DBLP is shown in Table 5. NUM1 is the number of nodes P, T, C, and A. The number is 5237, 5915, 4479, and 18, respectively. Since the node of C is much smaller than the others, non-uniform data processing is required. According to the ratio of the sampling number of P, A, and T to the actual number, the filling value for C type is 1671. NUM2 is the average number of samples taken by P, T, C, and A, which are 78,400, 314,161, 346,665, and 13,073, respectively.

To make the results more intuitive, we take Log₂ for NUM1, NUM2, and NUM2/NUM1, and the comparison results are shown in Figure 11. The trend of Log₂(NUM1) and Log₂(NUM2) is the same in Figure 11a. In Figure 11b, the value of Log₂(NUM2)/Log₂(NUM1) is close to 1.5. In other words, the proportion of sampling quantity of four types in the network is the same as the actual quantity. Therefore, the strategy can achieve proportional and uniform sampling of node types.

2.: Distribution of nodes of the same type

The sampling distribution of nodes of the same type in DBLP is shown in Figure 12. The abscissa in the figure is the node identification. The ordinate is the Log₂ of the average sampling value of the node, and the minimum value is set to 6 for the comparison purpose. In Figure 12a, 90.95% of the sampling values of A-type nodes are distributed between 7 and 10. In Figure 12b, 84.79% of the sampling values of P-type nodes are in 10–12. In Figure 12c, 85.74% of the sampling values of T-type nodes are in 6–10. In Figure 12d, 100% of the sampling values of C-type nodes are in 14–17.

In summary, for A, P, T, and C, at least 85% or more of each type of nodes are uniformly distributed. Therefore, the strategy can achieve proportional and uniform sampling of nodes within the type.

6.2.5. Discussion

In order to prove the correctness and effectiveness of the model HNE-RWTIC, we conducted the above experiments in three real HINs. First, six models were used to learn the vector representation of nodes respectively, and then one-vs-rest logistic regression classifier and K-means were used to realize the classification and clustering tasks.

Among those used, Node2Vec is a homogeneous network embedding model based on random walks. Therefore, the vector representation learned by Node2Vec will lose heterogeneous semantic information, and impact the effect of classification and clustering tasks. HIN2Vec and Metapath2Vec are the models based on meta-paths. Due to the limitations of meta-paths, the performance of the models is unstable, and they only have good results in DBLP networks. JUST is a model based on non-meta-path random walks. Although JUST solves some shortcomings of meta-paths, it only considers the node type and ignores the uniform distribution of nodes and types, so the effect is general. HeGAN is a model based on generative adversarial networks. HeGAN focuses on node sampling, but it only has good results in AMiner-Top.

Compared with the five classic algorithms, HNE-RWTIC comprehensively considers the co-occurrence probability of nodes and the adjacency relationship between nodes; it not only realizes the flexibility of walk among all kinds of nodes in HINs, but also achieves the uniformity of proportional sampling between types and nodes. Therefore, HNE-RWTIC has the best effect on the whole.

7. Conclusions

In this paper, HNE-RWTIC is proposed based on the random walks strategy with Type and Inner constraints. HNE-RWTIC realizes a flexible random walk by adjusting parameters. It also realizes the uniform sampling of nodes and node types, it can balance homogeneous edges and heterogeneous edges, and balance node distribution in different types. In addition, in three real networks, the experimental results show that the F1-Score and NMI of HNE-RWTIC outperform other five state-of-the-art approaches in the classification and clustering tasks. The next step will be to improve the strategy of heterogeneous types and node constraints in more complex HINs, and to combine the dynamic network with the ideas of heterogeneous types and node constraints. Another important task ahead is to use heterogeneous graph convolution and the deep learning model to automatically learn richer semantic and structural information.

Author Contributions

Conceptualization, X.C. and T.H.; methodology, X.C., T.H. and J.G.; validation, L.H. and M.L.; data curation, T.H.; writing—original draft preparation, X.C. and T.H.; writing—review and editing, J.G., J.C. and L.H.; visualization, L.H. and M.L.; funding acquisition, X.C., J.G. and J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China, grant number 62172352 and 61871465; the Fundamental Research Funds for the Hebei Province Universities, grant number 2020JK005; the S & T Program of Hebei, grant number 226Z0102G and 21550803D; the Natural Science Foundation of Hebei Province, grant number F2017209070 and F2019203157, and the Key project of science and technology research in Hebei Province, grant number ZD2019004.

Data Availability Statement

DBLP at https://dblp.uni-trier.de/xml/ accessed on 8 June 2020, AMiner-Top at https://www.aminer.cn/knowledge_graph_cn accessed on 8 June 2020, Yelp at https://www.yelp.com/dataset accessed on 8 June 2020.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript.

HINs	Heterogeneous Information Networks/Heterogeneous Networks
HIN	Heterogeneous Information Network/Heterogeneous Network
DBLP	DataBase systems and Logic Programming
NLP	Natural Language Processing
ESim	Embedding for Similarity
HIN2Vec	Heterogeneous Information Networks to Vectors
JUST	JUmp & STay
EOE	Embedding of Embedding
HERec	Heterogeneous information network Embedding for Recommendation
HeGAN	HIN embedding with GAN-based adversarial learning
GAN	Generative Adversarial Networks
ActiveHNE	Active Heterogeneous Network Embedding
MPDRL	Meta-path Discovery with Reinforcement Learning
TANE-RWCN	Topic-Attention Network Embedding based on Random Walk of Connection Number
HNE-RWTIC	Heterogeneous Network Embedding based on Random Walks of Type & Inner Constraint

References

Dong, Y.; Chawla, N.V.; Swami, A. Metapath2vec: Scalable Representation Learning for Heterogeneous Networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017. [Google Scholar]
Zhang, D.; Yin, J.; Zhu, X.; Zhang, C. Network representation learning: A survey. IEEE Trans. Big Data 2017, 99, 1–25. [Google Scholar] [CrossRef]
Zhang, L.; Guo, J.; Wang, J.; Wang, J.; Li, S.; Zhang, C. Hypergraph and uncertain hypergraph representation learning theory and methods. Mathematics 2022, 10, 1921. [Google Scholar] [CrossRef]
Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. In Proceedings of the 1st International Conference on Learning Representations, Scottsdale, AZ, USA, 2–4 May 2013. [Google Scholar]
Perozzi, B.; AlRfou, R.; Skiena, S. Deepwalk: Online Learning of Social Representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014. [Google Scholar]
Yang, C.; Xiao, Y.; Zhang, Y.; Sun, Y.; Han, J. Heterogeneous network representation learning: A unified framework with survey and benchmark. arXiv 2020, arXiv:2004.00216. [Google Scholar] [CrossRef]
Grover, A.; Leskovec, J. Node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD In-ternational Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
Minaee, S.; Kalchbrenner, N.; Cambria, E.; Nikzad, N.; Chenaghlu, M.; Gao, J. Deep learning based text classification: A comprehensive review. arXiv 2021, arXiv:2004.03705. [Google Scholar] [CrossRef]
Chen, X.; Wang, Y.; Dong, H.; Pan, X.; Li, J. Network representation learning based on random walk of connection number. Int. J. Innov. Comput. Inf. Control 2022, 18, 883–900. [Google Scholar]
Yang, L.; Zhan, X.; Chen, D.; Yan, J.; Loy, C.C.; Lin, D. Learning to Cluster Faces on an Affinity Graph. In Proceedings of IEEE/CVF Conference on Computer Vision & Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Wang, Y.; Lei, X.; Pan, Y. Predicting microbe-disease association based on heterogeneous network and global graph feature learning. Chin. J. Electron. 2022, 31, 345–353. [Google Scholar] [CrossRef]
Rossi, A.; Firmani, D.; Matinata, A.; Matinata, A.; Merialdo, P. Knowledge graph embedding for link prediction: A comparative analysis. ACM Trans. Knowl. Discov. Data 2021, 15, 1–49. [Google Scholar] [CrossRef]
Shang, J.; Qu, M.; Liu, J.; Kaplan, L.M.; Han, J.; Peng, J. Meta-path guided embedding for similarity search in large-scale het-erogeneous information networks. arXiv 2016, arXiv:1610.09769. [Google Scholar]
Fu, T.; Lee, W.C.; Lei, Z. Hin2vec: Explore Meta-Paths in Heterogeneous Information Networks for Representation Learning. In Proceedings of the 26th ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017. [Google Scholar]
Hussein, R.; Yang, D.; Cudré-Mauroux, P. Are Meta-Paths Necessary? Revisiting Heterogeneous Graph Embeddings. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 22–26 October 2018. [Google Scholar]
Xu, L.; Wei, X.; Cao, J.; Yu, P.S. Embedding of Embedding (EOE) Joint Embedding for Coupled Heterogeneous Networks. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining, New York, NY, USA, 6–10 February 2017. [Google Scholar]
Chen, H.; Yin, H.; Wang, W.; Wang, H.; Nguyen, Q.; Li, X. PME: Projected Metric Embedding on Heterogeneous Networks for Link Prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, London, UK, 19–23 August 2018. [Google Scholar]
Shi, C.; Lu, Y.; Hu, L.; Liu, Z.; Ma, H. RHINE: Relation structure-aware heterogeneous information network embedding. IEEE Trans. Knowl. Data Eng. 2022, 34, 433–447. [Google Scholar] [CrossRef]
Shi, C.; Hu, B.; Zhao, W.X.; Yu, P.S. Heterogeneous information network embedding for recommendation. IEEE Trans. Knowl. Data Eng. 2019, 31, 357–370. [Google Scholar] [CrossRef] [Green Version]
Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4–24. [Google Scholar] [CrossRef] [Green Version]
Hu, B.; Fang, Y.; Shi, C. Adversarial Learning on Heterogeneous Information Networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Anchorage, AK, USA, 4–8 August 2019. [Google Scholar]
Chen, X.; Yu, G.; Wang, J.; Domeniconi, C.; Li, Z.; Zhang, X. ActiveHNE: Active heterogeneous network embedding. arXiv 2019, arXiv:1905.05659. [Google Scholar]
Hu, L.; Li, C.; Shi, C.; Yang, C.; Shao, C. Graph neural news recommendation with long-term and short-term interest modeling. Inf. Process. Manag. 2020, 57, 102142. [Google Scholar] [CrossRef] [Green Version]
Wan, G.; Du, B.; Pan, S.; Haffari, G. Reinforcement Learning Based Meta-Path Discovery in Large-Scale Heterogeneous Information Networks. In Proceedings of the 34th Association for the Advancement of Artificial Intelligence, New York, NY, USA, 7–12 February 2020. [Google Scholar]
Zhou, J.; Cui, G.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and ap-plications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
He, Y.; Song, Y.; Li, J.; Ji, C.; Peng, J.; Peng, H. Hetespaceywalk: A Heterogeneous Spacey Random Walk for Heterogeneous Information Network Embedding. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019. [Google Scholar]
Lee, S.; Park, C.; Yu, H. BHIN2vec: Balancing the Type of Relation in Heterogeneous Information Network. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019. [Google Scholar]
Yin, Y.; Ji, L.; Huang, R.; Cheng, X. Heterogeneous Network Representation Learning Method Based on Meta-path. In Proceedings of the 4th International Conference on Cloud Computing and Big Data Analytics, Chengdu, China, 12–15 April 2019. [Google Scholar]
Fu, Y.; Xiong, Y.; Yu, P.S.; Tao, T.; Zhu, Y. Metapath Enhanced Graph Attention Encoder for HINs Representation Learning. In Proceedings of the 6th IEEE International Conference on Big Data, Kyoto, Japan, 27 February–2 March 2019. [Google Scholar]
Shi, C.; Li, Y.; Zhang, J.; Sun, Y.; Yu, P.S. A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. 2017, 29, 17–37. [Google Scholar] [CrossRef]

Figure 1. Instance of DBLP network. (a) DBLP network; (b) Meta-path; (c) Network schema.

Figure 2. Schematic diagram of random walks strategy of JUST.

Figure 3. Illustration of the random walks model.

Figure 4. Properties analysis. (a) α; (b) k; (c) q; (d) p.

Figure 5. Sensitivity analysis of α. (a) Micro-F1 of classification; (b) Macro-F1 of classification; (c) NMI value of clustering.

Figure 6. Sensitivity analysis of k₁. (a) Micro-F1 of classification; (b) Macro-F1 of classification; (c) NMI value of clustering.

Figure 7. Sensitivity analysis of p. (a) Micro-F1 of classification; (b) Macro-F1 of classification; (c) NMI value of clustering.

Figure 8. Sensitivity analysis of q. (a) Micro-F1 of classification; (b) Macro-F1 of classification; (c) NMI value of clustering.

Figure 9. Sensitivity analysis of L. (a) Micro-F1 of classification; (b) Macro-F1 of classification; (c) NMI value of clustering.

Figure 10. Sensitivity analysis of d. (a) Micro-F1 of classification; (b) Macro-F1 of classification; (c) NMI value of clustering.

Figure 11. Uniformity of node types. (a) Log₂ of NUM1 and NUM2; (b) Log₂ ratio of NUM2 to NUM1.

Figure 12. Distribution uniformity of nodes of the same type. (a) Sampling times for A; (b) Sampling times for P; (c) Sampling times for T; (d) Sampling times for C.

Table 1. A summary of notations.

Notations	Descriptions
G	Heterogeneous information network
V	Set of nodes in HIN
E	Set of edges in HIN
A	Set of node types in HIN
R	Set of edge types in HIN
\|V\|	Number of nodes
\|E\|	Number of edges
\|A\|/N	Number of node types
\|R\|/M	Number of edge types
T_G	Network schema
d	Dimension of learned vertex representations
X ∈ R^\|V\|×d	The vector representation matrix
W	The sequence of random walks

Table 2. Description of datasets.

Datasets	Number of Nodes	Number of Edges	Node Types	Edge Types
DBLP	15,649	51,377	4 (A, P, C, T)	4 (P-A, P-P, P-C, P-T)
AMiner-Top	45,811	108,771	3 (A, P, C)	3 (P-A, P-C, P-P)
Yelp	3898	30,838	2 (B, U)	1 (B-U)

Table 3. Classification compared with different algorithms.

Data Sets	Index	Train Set (%)	Node2Vec	HIN2Vec	Metapath2Vec++	JUST	HeGAN	HNE-RWTIC
DBLP	Micro-F1	100	0.9099	0.9120	0.8916	0.9125	0.9026	0.9157
		80	0.8555	0.8508	0.8545	0.8461	0.8414	0.8639
		60	0.8340	0.8420	0.8492	0.8393	0.8378	0.8589
		40	0.8085	0.8267	0.8455	0.8006	0.8057	0.8272
		20	0.7908	0.7907	0.8304	0.7872	0.7793	0.7978
	Macro-F1	100	0.9097	0.9123	0.8919	0.9126	0.9028	0.9159
		80	0.8546	0.8510	0.8552	0.8453	0.8399	0.8642
		60	0.8336	0.8417	0.8497	0.8393	0.8371	0.8585
		40	0.8084	0.8270	0.8460	0.8004	0.8057	0.8273
		20	0.7912	0.7913	0.8308	0.7876	0.7794	0.7988
AMiner-Top	Micro-F1	100	0.9058	0.9030	0.8993	0.8797	0.9036	0.9000
		80	0.8971	0.8965	0.8959	0.8894	0.8977	0.8979
		60	0.8957	0.8957	0.8954	0.8777	0.8959	0.8961
		40	0.8949	0.8936	0.8941	0.8789	0.8924	0.8945
		20	0.8899	0.8907	0.8912	0.8705	0.8807	0.8910
	Macro-F1	100	0.8969	0.8940	0.8896	0.8682	0.8943	0.8907
		80	0.8862	0.8876	0.8854	0.8789	0.8879	0.8888
		60	0.8849	0.8852	0.8851	0.8664	0.8856	0.8865
		40	0.8857	0.8845	0.8837	0.8681	0.8819	0.8846
		20	0.8795	0.8809	0.8806	0.8582	0.8696	0.8810
Yelp	Micro-F1	100	0.7410	0.7441	0.7490	0.7184	0.7330	0.7460
		80	0.6805	0.6989	0.6987	0.6743	0.6973	0.7121
		60	0.6715	0.6936	0.6944	0.6672	0.6890	0.7107
		40	0.6637	0.6813	0.6762	0.6368	0.6803	0.6939
		20	0.6257	0.6500	0.6363	0.5941	0.6661	0.6800
	Macro-F1	100	0.7099	0.7180	0.7187	0.6785	0.6950	0.7157
		80	0.6385	0.6625	0.6596	0.6197	0.6521	0.6698
		60	0.6319	0.6560	0.6546	0.6150	0.6475	0.6677
		40	0.6203	0.6456	0.6382	0.5845	0.6326	0.6481
		20	0.5834	0.6096	0.5959	0.5425	0.6159	0.6356

Table 4. Clustering with different algorithms compared.

Dataset	Node2vec	HIN2Vec	Metapath 2Vec++	JUST	HeGAN	HNE-RWTIC
DBLP	0.7974	0.6179	0.7600	0.7784	0.7666	0.8091
AMiner-Top	0.4586	0.4206	0.4519	0.3999	0.4495	0.4689
Yelp	0.0014	0.0016	0.0015	0.0014	0.0015	0.0018

Table 5. Distribution of node types.

Node Type	P	A	T	C	C (Filling)
NUM1	5237	5915	4479	18	1671
NUM2	78,400	314,161	346,665	130,073
NUM2/NUM1	149.70407	51.42199	77.39793	92.84133

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, X.; Hao, T.; Han, L.; Leng, M.; Chen, J.; Guo, J. Heterogeneous Network Embedding Based on Random Walks of Type and Inner Constraint. Mathematics 2022, 10, 2623. https://0-doi-org.brum.beds.ac.uk/10.3390/math10152623

AMA Style

Chen X, Hao T, Han L, Leng M, Chen J, Guo J. Heterogeneous Network Embedding Based on Random Walks of Type and Inner Constraint. Mathematics. 2022; 10(15):2623. https://0-doi-org.brum.beds.ac.uk/10.3390/math10152623

Chicago/Turabian Style

Chen, Xiao, Tong Hao, Li Han, Meng Leng, Jing Chen, and Jingfeng Guo. 2022. "Heterogeneous Network Embedding Based on Random Walks of Type and Inner Constraint" Mathematics 10, no. 15: 2623. https://0-doi-org.brum.beds.ac.uk/10.3390/math10152623

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Heterogeneous Network Embedding Based on Random Walks of Type and Inner Constraint

Abstract

1. Introduction

2. Related Work

3. Preliminary Knowledge and Problem Definition

4. HNE-RWTIC Model

4.1. Random Walks Strategy

4.2. Transition Probability Model

4.2.1. Node Type Partitioning

4.2.2. Transition Probability Model

4.3. Algorithm Description of HNE-RWTIC

4.4. Time Complexity Analysis

5. Properties and Analysis of the Transition Probability Model

5.1. Properties

5.2. Analysis

6. Experiments

6.1. Experimental Setup

6.2. Experimental Results and Analysis

6.2.1. Classification

6.2.2. Clustering

6.2.3. Parametric Sensitivity Analysis

6.2.4. Uniformity of Sampling

6.2.5. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI