Next Article in Journal
D2D Assisted Cellular Networks in Licensed and Unlicensed Spectrum: Matching-Iteration-Based Joint User Access and Resource Allocation
Next Article in Special Issue
Chaos and Stability in a New Iterative Family for Solving Nonlinear Equations
Previous Article in Journal
An Exploratory Landscape Analysis-Based Benchmark Suite
Previous Article in Special Issue
Online Facility Location in Evolving Metrics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Improved Greedy Heuristic for the Minimum Positive Influence Dominating Set Problem in Social Networks

1
Mechatronics Laboratory (LMETR)—E1764200, Department of Computer Science, Ferhat Abbas University Sétif 1, Sétif 19000, Algeria
2
Artificial Intelligence Research Institute (IIIA-CSIC), Campus of the UAB, 08193 Bellaterra, Spain
*
Author to whom correspondence should be addressed.
Submission received: 4 February 2021 / Revised: 24 February 2021 / Accepted: 25 February 2021 / Published: 28 February 2021
(This article belongs to the Special Issue 2021 Selected Papers from Algorithms Editorial Board Members)

Abstract

:
This paper presents a performance comparison of greedy heuristics for a recent variant of the dominating set problem known as the minimum positive influence dominating set (MPIDS) problem. This APX-hard combinatorial optimization problem has applications in social networks. Its aim is to identify a small subset of key influential individuals in order to facilitate the spread of positive influence in the whole network. In this paper, we focus on the development of a fast and effective greedy heuristic for the MPIDS problem, because greedy heuristics are an essential component of more sophisticated metaheuristics. Thus, the development of well-working greedy heuristics supports the development of efficient metaheuristics. Extensive experiments conducted on a wide range of social networks and complex networks confirm the overall superiority of our greedy algorithm over its competitors, especially when the problem size becomes large. Moreover, we compare our algorithm with the integer linear programming solver CPLEX. While the performance of CPLEX is very strong for small and medium-sized networks, it reaches its limits when being applied to the largest networks. However, even in the context of small and medium-sized networks, our greedy algorithm is only 2.53% worse than CPLEX.

Graphical Abstract

1. Introduction

Dominating set problems have recently attracted much attention due to their potential application in a variety of real-life settings. Apart from the standard minimum dominating set problem [1,2], examples include the minimum connected dominating set problem [3], the minimum total dominating set problem [4] and the minimum vertex weight dominating set problem [5].

1.1. Problem Background and Motivation

A problem variant that has been studied especially in the context of online social networks is the minimum positive influence dominating set (MPIDS) problem in which a social network is modeled by a simple, connected undirected graph where vertices represent a group of individuals (people) and edges indicate relationships and interactions between them. The problem was first introduced by Wang et al. [6], based on the following motivation. With the explosive growth of online social networks, the need for social network analysis tools in order to study the social influences and interactions between individuals within groups and organizations has become a primary concern. As an example, one of the most popular online social networks worldwide is Facebook. In January 2021, Facebook had approximately 2.74 billion users [7], more than any other social network. In addition, ideas and information propagated in social networks can have a significant impact on society (negative or positive) and on various aspects of the life of people. Social norms theory has shown that the behavior of individuals can be affected by perceptions of others’ thoughts and behaviors [8]. Thus, exploiting the relationships among people in social networks can provide great benefits to both economy and society. The aim of the MPIDS problem is to identify a small subset of key influential individuals to speed up the spread of positive influence. It can be applied in viral marketing, which is an advertising strategy that utilizes social relationships, such as friendships, professional interactions and families to spread the awareness about a promoted product among individuals in a given social network [9,10]. The idea is to identify a limited number of customers that can quickly lead to the entire network being influenced to adopt the product. Other applications of the MPIDS problem can be found in e-learning software [11], online business [12], drinking, smoking, and drug related problems [6].

1.2. Problem Description and Existing Work

In technical terms, the MPIDS problem can be described as follows. Given a simple, connected undirected graph G = ( V , E ) it requires to find a dominating set of minimum cardinality such that at least half of the neighbors of each vertex form part of the dominating set. However, the problem was shown to be APX-hard [13]. Note that a problem is said to be APX-hard if there is a polynomial time reduction scheme from every problem in APX to that problem. Moreover, if a problem is APX-hard, it is also NP-hard. Therefore, most of the past and current research efforts concerning the MPIDS problem are focused on greedy heuristics and on some evolutionary approaches. Greedy heuristics [14] are procedures that generate a solution step by step, making a locally optimal choice at each stage based on a so-called greedy function. We briefly review the existing approaches in the following. The first greedy algorithm for the MPIDS problem, referred to as Wang’s greedy algorithm [13], is a H ( Δ ) -approximation algorithm with O ( n 3 ) time complexity, where n = | V | , Δ is the maximum vertex degree, and H is the harmonic function. Another greedy algorithm, referred to as Raei’s greedy, was published in [15]. This algorithm requires O ( n 2 ) time. It differs from the previous one in the way in which the next vertex at each construction step is chosen. That is, they differ with respect to the used greedy function. An improved version of Wang’s greedy, referred to as Fei’s greedy, was proposed in [16]. It incorporates a tie-breaking strategy based on Raei’s greedy. More recently, Pan et al. [17] presented a fast greedy heuristic with a complexity of O ( n lg n + m ) that outperforms all previous greedy approaches both in terms of solution quality and computational time. Therefore, Pan’s greedy is considered as the currently best-performing greedy algorithm for the MPIDS problem.
To the best of our knowledge, there are only two studies that have attempted to solve the MPIDS problem using metaheuristic approaches. Metaheuristics are approximate techniques for solving hard optimization problems of different types. They are among the most popular algorithms in the context of problem instances that are too large (or too complex) to be solved by exact techniques. Many metaheuristics are built upon subordinate algorithmic components such as greedy heuristics and local search algorithms. Examples of such metaheuristics include simulated annealing, tabu search, iterated local search, and greedy randomized adaptive search procedures. However, the family of metaheuristics also includes a whole range of bio-inspired techniques such as ant colony optimization, genetic and evolutionary algorithms, and particle swarm optimization. Especially in combinatorial optimization, metaheuristics have had considerable success in application areas such as scheduling, routing, bioinformatics, medical research, passenger and freight terminal operations, and data classification. We refer the interested reader to [18,19] for further information. The MPIDS problem was first tackled by a memetic algorithm [20], called ILPMA, which uses tabu search for improving solutions. The second one is a hybrid approach, referred to as HSIA, that combines a genetic algorithm with particle swarm optimization. Both ILPMA and HSIA share two common features: they incorporate a greedy randomized adaptive algorithm similar to GRASP [21] to seed the initial population with good solutions and their performances are compared with those of the greedy algorithms.

1.3. Motivation and Contribution

Wang’s greedy, Raei’s greedy and Fei’s greedy suffer from a common drawback that they are time-consuming and become highly ineffective with an increasing graph size. This is mainly due to the vertex selection strategy employed at each construction step of the solution construction process. In particular, the choice of the next vertex to be included in the current partial solution requires the evaluation of the corresponding greedy function for all vertices that do not form part of the current partial solution, which requires O ( n ) of time. This time complexity is reduced to O ( Δ ) time in the case of both Pan’s greedy and our proposal by considering only the neighbors of a particular vertex at each construction step. We expect our algorithm to outperform the currently best greedy algorithm (Pan’s algorithm) due to the following reasons. First, we develop a graph pruning procedure that identifies vertices that must form part of an optimal solution. Second, the vertex selection strategy of our algorithm benefits from the exploitation of two greedy functions (cover-degree and need-degree), which is in contrast to Pan’s greedy which only makes use of cover-degree. Finally, in a post-processing step we remove redundant vertices.
Our motivation for the development of a fast and effective greedy algorithm is as follows. The development of high-performing metaheuristics for the MPIDS problem is still a challenge. However, the performance of metaheuristics depends largely on the performance of their main components. Greedy heuristics are a key component of many metaheuristics. They are used for generating initial solutions or for reconstructing partial solutions. The contribution of this paper is to review the available literature on greedy heuristics for the MPIDS problem and make a performance comparison among them both in terms of solution quality and computation time. Furthermore, designing an efficient greedy approach is still a relevant challenge despite many prior attempts. Therefore, we also propose an improved greedy algorithm which outperforms the existing greedy approaches both on benchmark instances that have already been considered in the literature, but also on larger complex networks with millions of nodes. Our approach can be considered a further improvement of the basic idea of Pan’s work [17].

1.4. Structure of the Paper

The rest of this paper is organized as follows. In Section 2 we give a technical description of the MPIDS problem. In Section 3, we review the available literature on greedy heuristics applied to the MPIDS problem. The proposed greedy algorithm is described in Section 4. In Section 5, we present and discuss the experimental results. Finally, Section 6 summarizes the work and offers directions for future work.

2. The Minimum Positive Influence Dominating Set Problem

We describe the MPIDS problem by starting with basic notions and underlying definitions which are used throughout this paper. Let G = ( V , E ) be an undirected graph that is both simple and connected. (Note that an undirected graph is called simple if it does not contain multiple edges between pairs of vertices and loops.) Hereby, V is a set of n vertices—representing, for example, people—and E V × V is a set of m edges modeling, for example, relationships between those people. Let v and u be two distinct vertices from G. Now, v and u are said to be adjacent (neighbors) if they are connected by an edge. Further, the open neighborhood of v is defined as N ( v ) : = { u V ( v , u ) E } . N ( v ) is often simply called the neighborhood of v in G. The closed neighborhood of v is defined as N [ v ] : = N ( v ) { v } , i.e., N [ v ] includes all vertices adjacent to v, including v itself. The degree of v, denoted by d e g ( v ) , is the number of v’s neighbors, that is, d e g ( v ) = | N ( v ) | . A vertex of degree one—that is | N ( v ) | = 1 —is called a pendant vertex.
Definition 1
(Dominating set). A dominating set (DS) of G is a subset D V such that each vertex v V \ D is adjacent to at least one vertex in D.
Definition 2
(Positive influence dominating set). A positive influence dominating set (PIDS) of G is a dominating set D V such that each vertex v V has at least half of its open neighbors in D, that is, at least d e g ( v ) / 2 vertices of N ( v ) must belong to D for each v V .
Figure 1b shows an example of a DS of the graph shown in Figure 1a, where black vertices represent those that are in the DS. Figure 1c provides an example of a PIDS (black vertices). Vertex 6, for example, has degree 3 and 3 / 2 = 2 of its neighbors (vertex 4 and vertex 2) are part of the PIDS.
Definition 3
(Minimum dominating set problem). Given an undirected simple graph G = ( V , E ) , the minimum dominating set (MDS) problem asks to find a DS of G of minimum cardinality.
This problem can be easily formulated in terms of an integer linear program (ILP). For convenience, we assume that V is enumerated as v 1 , v 2 , , v n . A binary variable x i { 0 , 1 } is associated to each vertex v i V such that x i = 1 if and only if v i is part of the optimal solution, and x i = 0 otherwise. The ILP model can then be stated as follows.
Minimize i = 1 n x i
Subject   to v j N [ v i ] x j 1 v i V
x i { 0 , 1 }
Equation (2) ensures that the generated solution is a dominating set. Remember that N [ v i ] is the closed neighborhood of v i .
Definition 4
(Minimum positive influence dominating set problem). Given an undirected simple graph G = ( V , E ) , the minimum positive influence dominating set (MPIDS) problem asks to find a PIDS of G of minimum cardinality.
Based on the ILP model of the MDS problem from above, the MPIDS problem can also easily be stated in terms of an ILP as follow.
Minimize i = 1 n x i
Subject   to v j N ( v i ) x j d e g ( v i ) 2 v i V
x i { 0 , 1 }
Equation (5) ensures that a feasible solution contains at least half of the neighbors of each vertex v i V .

3. Greedy Heuristics

Greedy algorithms are very common techniques for constructing solutions to combinatorial optimization problems from scratch, in a step-by-step manner. They are either used as standalone algorithms, or as subordinate algorithmic components of more sophisticated metaheuristics. Algorithm 1 shows the pseudo-code of a basic greedy algorithm for the MPIDS problem. It takes as input a simple, connected undirected graph G = ( V , E ) representing an instance of the MPIDS problem and provides as output a subset of vertices S V corresponding to a positive influence dominating set. This algorithm builds a solution step by step, starting from an empty solution S = . At each construction step, it adds one feasible vertex v * S ¯ to S until a valid solution is obtained. S ¯ = V \ S denotes the set of vertices of V not belonging to S. Note that this set initially contains all the vertices of the graph.
Algorithm 1 MPIDS_Greedy()
Input: a simple, connected undirected graph G = ( V , E )
Output: a positive influence dominating set S
  1: S
  2: while (S is not a valid PIDS solution) do
  3:   v * argmax{greedy_function ( v ) | v S ¯ }
  4:   S S { v * }
  5: end while
  6: return S
At each construction step, the choice of the solution component to be added to the current partial solution is made deterministically based on a so-called greedy function which plays an important role for the performance of the algorithm. It evaluates each solution component by measuring the local improvement obtained by adding the corresponding component to the incumbent partial solution. Most of the existing greedy algorithms for the MPIDS problem are based on at least one of the two greedy functions that are known as cover-degree and need-degree. To define them, let S V be the incumbent partial solution which is not yet a PIDS and h S ( v ) = d e g ( v ) 2 | N S ( v ) | where N S ( v ) = N ( v ) S denotes the set of neighbors of v V belonging to S. Then, we say that v is covered if h S ( v ) 0 and not covered otherwise. Consequently, S is a valid solution (i.e., a PIDS) if and only if all vertices of V are covered. Now, for a given v S ¯ , cover-degree and need-degree are calculated as in Equations (7) and (8), respectively.
cover degree ( v ) = | { u N ( v ) : h S ( u ) > 0 } |
need degree ( v ) = u N ( v ) max ( h S ( u ) , 0 )
The first one represents the number of uncovered neighbors of v with respect to S, whereas the second represents the total need of the uncovered neighbors of v. Table 1 indicates which ones of these two greedy functions were used in the previous works from the literature. In the case in which an algorithm makes use of both functions, one of them was used as a secondary criterion for breaking ties in those cases in which the principal criterion provides the same value for several vertices. In the following, we recall the most recent and best-performing greedy algorithm, that is, Pan’s greedy [17]. Note that, even though Pan’s greedy uses the same greedy function (cover-degree) as Wang’s greedy, its time complexity is much lower. This is for the following reason. While Wang’s algorithm considers at each construction step any so-far unselected vertex for inclusion into the current partial solution, Pan’s algorithm—after choosing the next unselected vertex v in ascending order with respect to the degree—only considers so-far unselected neighbors of v in subsequent construction steps until v is covered.
The detailed pseudo-code for Pan’s algorithm is provided in Algorithm 2. This pseudo-code is the same as in the original publication, but with using our notation and after removing some unnecessary details in order to improve the readability of the algorithm.
Algorithm 2 Pan’s greedy algorithm [17]
Input: a simple, connected undirected graph G = ( V , E )
Output: a positive influence dominating set S
  1: Rename the vertices from V such that { v 1 , v 2 , , v n } are the vertices in ascending
   order of the degree
  2: S
  3: S ¯ V \ S
  4: for i = 1 to n do
  5:  if h S ( v i ) > 0 : v i is an uncovered vertex then
  6:   ρ h S ( v i )
  7:  for j = 1 to ρ do
  8:    u * argmax { cover degree ( u ) u N S ¯ ( v i ) }
  9:    S S { u * }
10:    S ¯ V \ S
11:  end for
12:  end if
13: end for
14: return S

4. The Proposed Algorithm

In the following, we describe our improved greedy algorithm—henceforth referred to as IGA-PIDS—which is based on Pan’s algorithm for the MPIDS problem.

4.1. The Greedy Procedure

The pseudo-code of IGA-PIDS is given in Algorithm 3. It receives as input a problem instance that consists of a simple, connected undirected graph G = ( V , E ) with n vertices, and works as follows. First, the given graph is pruned by applying procedure GraphPruning(G) (see Algorithm 4) which returns a partial solution S. This procedure identifies vertices that must form part of the optimal solution and adds them to S. This is done in order to speed up the process of solution construction of the greedy algorithm. Suppose that v V is a pendant vertex, i.e., d e g ( v ) = 1 , and let u be its unique neighbor. First, u is then added to S, because v must be covered and can only be covered by u. Second, if the degree of u is two—that is, d e g ( u ) = 2 , let w v be the second neighbor of u. As u must covered by exactly one vertex, and as no other vertex benefits from choosing v for covering u, we can savely choose w for covering u. Therefore, w is added to S.
At this point, let C be the set of all so-far uncovered vertices. Next, the algorithm picks yet uncovered vertices from C in (increasing) order of their degrees. The chosen vertex v i is then covered by choosing neighbors of v i from N S ¯ ( v i ) until v i is covered. In particular, at each step the vertex with the largest cover-degree value of all vertices in N S ¯ ( v i ) is chosen. If tie breaking is necessary, it is done with the need-degree function (see lines 8 and 9 of Algorithm 3). Finally, the algorithm is terminated when all vertices are covered, that is, when S corresponds to a valid PIDS solution.
Algorithm 3 IGA-PIDS: Improved greedy algorithm for the MPIDS problem
Input: a simple, connected undirected graph G = ( V , E )
Output: a positive influence dominating set S
  1: S GraphPruning(G)
  2: C set of all so-far uncovered vertices
  3: Rename the vertices from C such that { v 1 , v 2 , , v | C | } are the vertices in ascending
   order of the degree
  4: S ¯ V \ S
  5: while S is not a valid PIDS solution do
  6:  Let v i be an un-covered vertex with the smallest sub-index in C
  7:   ρ h S ( v i )
  8:  for j = 1 to ρ do
  9:     c d m a x max { cover degree ( u ) u N S ¯ ( v i ) }
10:     u * argmax { need degree ( u ) u N S ¯ ( v i ) cover degree ( u ) = c d m a x }
11:     S S { u * }
12:     S ¯ V \ S
13:  end for
14:   C C \ { v i }
15: end while
16: Reduce(S)
17: return S
Algorithm 4 Function GraphPruning(G)
Input: a simple, connected undirected graph G = ( V , E )
Output: a partial solution S
  1:  S
  2: for each pendant vertex v V do
  3:  Let u be the unique neighbor of v
  4:  if u S then
  5:   S S { u }
  6:  end if
  7:  if d e g ( u ) = 2 and v S then
  8:  Let w v be the second neighbor of u
  9:  if w S then
10:      S S { w }
11:  end if
12:  end if
13: end for
14: return S

4.2. Removing Redundant Vertices

The results produced by our greedy algorithm may contain redundant vertices. A redundant vertex is a vertex that can be removed from a solution without making the solution invalid. Formally, a vertex v S is redundant if each vertex u from its closed neighborhood N ( v ) has strictly more than half of its neighbors in S. In other words, v S is redundant if and only if u N ( v ) : h S ( u ) < 0 . That is, each vertex in S is checked in sequence to find whether it is redundant. If it is the case, this vertex may safely be removed from S and all values h S ( . ) of its neighbors will be decreased by one. A pseudo-code of the complete removal-function is presented in Algorithm 5. This function has a time complexity of O ( n + m ) . Note that our algorithm, by default, makes use of this function. However, we will also test our algorithm without removing redundant vertices. The resulting algorithm variant is henceforth labelled IGA-PIDS .
Algorithm 5 Function Reduce(S)
Input: a valid solution S that may contain redundant vertices
Output: a valid solution S without redundant vertices
  1: for each vertex v S do
  2:  if h S ( u ) < 0 for all u N ( v ) then
  3:    S S \ { v }
  4:   for all u N ( v ) do
  5:     h S ( u ) h S ( u ) 1
  6:   end for
  7  end if
  8: end for
  9: return S
Another way to implement this procedure, which is not considered here, would be to adopt the same removal strategy as presented in [23] for the minimum weight dominating set problem. For the latter, all redundant vertices are initially grouped in a unique set called S r . Then, a vertex from S r is iteratively chosen to be removed according to a particular greedy function. In the case of the MPIDS problems, we could select the vertex with the smallest degree, for example. This iterative process stops once S r becomes empty and all redundant vertices are removed.

4.3. Complexity

Here, we describe the time complexity of IGA-PIDS presented in Algorithm 3. We assume that the problem instance G = ( V , E ) is represented by an adjacency list and the solution S as a list. It obliviously that function GraphPruning ( · ) has complexity O ( n ) . Line 3 is done in O ( n lg n ) time as | E | | V | and n = | V | is the number of vertices in G. Since the size of S does not exceed n, all other lines, excluding lines 9 and 10, can be done in O ( n ) time. The running time of lines 9 and 10 is only proportional to the sum of degrees of all vertices in V. The hand shaking lemma [24] states that v V d e g ( v ) = 2 m , where m = | E | is the number of edges in G. Therefore, these two lines require O ( m ) time. In the light of the above, we can conclude that the time complexity of IGA-PIDS takes at most O ( n lg n + m ) time.

5. Experimental Evaluation

The proposed greedy algorithm (IGA-PIDS) is compared with four existing greedy algorithms, namely Wang’s greedy [13], Raei’s greedy [15], Fei’s greedy [16] and Pan’s greedy [17].

5.1. Computational Setting

In order to conduct a fair comparison with these greedy algorithms, all of them were implemented in ANSI C++ using Cygwin GCC 4.4 for compiling the software and carried out on the same computing platform. The experiments were performed on a laptop equipped with a 64-bit 2.5-GHz Intel® Core™ i5-7200U processor and 8 GB of RAM. Moreover, in order to get an impression about the solution quality provided by IGA-PIDS, we applied the ILP solver ILOG CPLEX 12.10 in single-threaded mode—with a time limit of 2 CPU hours per instance—to all problem instances. Note that the default optimality gaps of CPLEX were used. These default values indicate CPLEX to stop when an integer feasible solution is reached that is within 0.01% of optimality. The experiments with CPLEX were performed on a cluster of machines with two Intel® Xeon® Silver 4210 CPUs with 10 cores of 2.20 GHz and 92 Gbytes of RAM.

5.2. Problem Instances

We use three sets of benchmark instances for the experimental evaluation. The first two have already been used by other studies on the MPIDS problem, while the last one (SNAP networks) consists of large-scale complex networks that are studied for the first time in the context of the MPIDS problem.
  • Small social networks: this class of instances contains four well-known real and synthetic networks namely American College Football (Football) [25], Zachary’s Karate Club (Karate) [26], the Dolphins Network (Dolphins) [27] and the Jazz Network (Jazz) [28]. Characteristics such as the number of vertices and the number of edges of these networks are provided in Table 2. The values of the optimal solutions for each instance (except for the last one) were taken from [20] in which the authors also made use of CPLEX.
  • Large-scale social networks: this class of instances contains 13 real social networks which were provided by the authors of [20] and some of them were originally downloaded from the Network Data Repository [29]. Their characteristics, together with a brief description, are given in Table 3.
  • SNAP social networks: this class of instances contains 22 real complex networks with sizes ranging from 10 4 vertices to 3 × 10 7 . It was download from the Stanford Large Network Dataset Collection [30]. All these instances were originally directed graphs and they are transformed to undirected graphs by neglecting arc orientations and considering parallel edges as one edge. Table 4 gives a brief description of the SNAP networks used in our experiments after preprocessing.
In general, note that the benchmark instances are assumed to be simple connected graphs without isolated vertices. For this purpose, we employed a preprocessing step for removing isolated vertices, self-loops and parallel edges.

5.3. Results and Discussion

The numerical results for the first two benchmark sets—-that is, for social networks —are presented in Table 5 which has the following structure. The first column indicates the name of the social network. For each of the fives greedy algorithms the table contains two columns. The first one, labeled Val, provides the objective function values of the solutions found for the corresponding problem instances. The other one, labeled Time(s), shows the corresponding running times in seconds. In this context, note that a computation time of 0.0 means that the time was below 0.01 s. The best result per table row is highlighted in bold font. Moreover, note that provenly optimal results—see Table 6 for the full CPLEX results—are underlined.
The experimental results allow us to make the following observations. IGA-PIDS is able to obtain the best solution for 15 out of 17 problem instances. In contrast, all competitors together only return the best solution in 7 out of 17 cases. In addition, we can observe that both Pan’s greedy and IGA-PIDS are significantly faster than the other three competitors (approximately two orders of magnitude). Furthermore, IGA-PIDS is only marginally slower than Pan’s greedy, which means that the improved solution quality obtained by IGA-PIDS is not achieved at the expense of a significantly increased computation time. Moreover, without taking our greedy approach into consideration, it is worth to note that Pan’s greedy outperforms the other greedy algorithms both in term of solution quality and computational efficiency. Finally, while the performance of each of the four greedy algorithms starts to degrade in the context of the largest problem instances, this is not the case for IGA-PIDS.
In order to study the effect of the proposed function to eliminate redundant vertices —see function Reduce ( · ) described in Section 4.2Table 6 reports the performance of our greedy algorithm before (IGA-PIDS) and after (IGA-PIDS ) removing redundant vertices. From these results it can be concluded that most solutions suffer from the existence of redundant vertices. Thus, removing them can provide more accurate results which comes, however, at the cost of additional computation time. Having said that, the increase of the average computation time is very moderate: from 0.014 s to only 0.017 s. In Table 6, we additionally present the results obtained from the ILP solver CPLEX. The obtained results show that the newest CPLEX version (12.10) is very efficient for the MPIDS problem. In fact, only in two cases (social networks socfb-nips-ego and Dolphins) our greedy algorithm is able to match the results of CPLEX. Moreover, CPLEX provides provenly optimal solutions in 11 out of 17 cases (within 2 h of computation time). Provenly optimal results are marked by an asterisk. Nevertheless, the results provided by IGA-PIDS are, on average, only 2.53% worse than those of CPLEX (see last table column labeled Gap (%)).
Finally, the results for the large-scale SNAP networks are presented in Table 7, in the same format as outlined above. The results of CPLEX are also included. Moreover, note that no result is provided in those cases in which the computation time of the respective algorithm exceeded 7200 s (two hours). The results on these larger networks confirm our findings from the first two benchmark sets. However, the differences between the algorithms are now even more pronounced. Observe, for example, that—on average—IGA-PIDS produces solutions with more than 2000 vertices less per instance when compared to Pan’s greedy. Moreover, the limits of CPLEX are clearly reached in the application to problem instances of the size and the complexity found in the Amazon* instances and cit-Patents. The results provided by CPLEX in these cases are far worse than those of IGA-PIDS, as indicated by the gaps. In fact, the results of IGA-PIDS improve by approx. 55% over the results of CPLEX in these cases. Therefore, IGA-PIDS is overall the best algorithm, even outperforming CPLEX with an average gap of approx. 42% between the IGA-PIDS solutions and the CPLEX solutions.
Finally, we decided to provide a statistical assessment of the obtained results by means of critical difference (CD) plots [31]. In order to produce the average rank of each greedy algorithm considering the complete set of 39 problem instances, the scmamp R package [31] first applies the Friedman test to compare the five approaches simultaneously. In this way, the hypothesis that the five techniques perform equally was rejected. Then, the package performs pairwise comparisons using the Nemenyi post-hoc test [32]. As mentioned above, the results are shown graphically by means of CD plots. Note that, in such a plot, each considered algorithm is placed on the horizontal axis according to its average ranking. The performances of those algorithms that are below the critical difference threshold (computed with a significance level of 0.05) are considered as statistically equivalent. This is indicated by bold horizontal bars joining the markers of the respective algorithm variants. Figure 2 shows two CD plots. The first one (Figure 2a) concerns the obtained solution quality. IGA-PIDS is clearly the best-performing algorithm, with statistical significance. Concerning computation times (Figure 2b) it can be observed that Pan’s greedy is slightly faster than IGA-PIDS. However, no statistical difference can be detected between the running times of Pan’s algorithm and those of IGA-PIDS.

6. Conclusions

In this paper, we have studied an APX-hard combinatorial optimization problem in graphs and networks, the so-called minimum positive influence dominating set (MPIDS) problem. For tackling this problem, we present a simple and a fast improved greedy algorithm (IGA-PIDS) which is based on an effective exploitation of information provided by problem specific knowledge. The performance of IGA-PIDS is evaluated on 17 social networks of different sizes, ranging from 34 to 23,628 vertices. Moreover, we have applied our own greedy algorithm, as well as the competitors from the literature, to large-scale complex networks from the SNAP data set. The previously existing greedy heuristics for the MPIDS problem were re-implemented by ourselves for this purpose. Numerical results show that our greedy algorithm is able to outperform the other greedy algorithms from the literature, providing solutions with a significantly higher quality, especially in the context of the larger SNAP networks. We were also able to show that CPLEX, even though performing very strongly on small and medium-sized problem instances, reaches its limits when tackling large-scale complex networks.
Concerning limitations of this work, it would certainly be interesting to relate network characteristics with problem difficulty. That is, in the future it would be very interesting to identify those network characteristics that make the problem difficult, both for CPLEX and for our greedy algorithm. Note that such characteristics are not necessarily the same for the two types of algorithms. Furthermore, the design of well-working metaheuristics for this problem seems very challenging. Considering the results of CPLEX from this work, it is clear that the two existing metaheuristics for the MPIDS problem can not compete with CPLEX for most of the 17 social networks that were tackled in earlier works. The design of novel metaheuristics for the MPIDS problem will benefit from our improved greedy heuristic. Moreover, given the results of CPLEX, it might be a good idea to develop hybrid algorithms that combine metaheuristic elements with those of exact solvers such as CPLEX.

Author Contributions

Methodology, S.B.; Programming, S.B.; Writing—original draft, S.B.; Writing—review and editing, S.B. and C.B. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by project CI-SUSTAIN funded by the Spanish Ministry of Science and Innovation (PID2019-104156GB-I00).

Acknowledgments

S. Bouamama is grateful to DGRSDT-MESRS (Algeria), for financial support.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Cai, S.; Hou, W.; Wang, Y.; Luo, C.; Lin, Q. Two-goal Local Search and Inference Rules for Minimum Dominating Set. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, Yokohama, Japan, 11–17 July 2020; pp. 1467–1473. [Google Scholar]
  2. Li, J.; Potru, R.; Shahrokhi, F. A Performance Study of Some Approximation Algorithms for Computing a Small Dominating Set in a Graph. Algorithms 2020, 13, 339. [Google Scholar] [CrossRef]
  3. Li, R.; Hu, S.; Liu, H.; Li, R.; Ouyang, D.; Yin, M. Multi-Start Local Search Algorithm for the Minimum Connected Dominating Set Problems. Mathematics 2019, 7, 1173. [Google Scholar] [CrossRef] [Green Version]
  4. Yuan, F.; Li, C.; Gao, X.; Yin, M.; Wang, Y. A novel hybrid algorithm for minimum total dominating set problem. Mathematics 2019, 7, 222. [Google Scholar] [CrossRef] [Green Version]
  5. Zhou, Y.; Li, J.; Liu, Y.; Lv, S.; Lai, Y.; Wang, J. Improved Memetic Algorithm for Solving the Minimum Weight Vertex Independent Dominating Set. Mathematics 2020, 8, 1155. [Google Scholar] [CrossRef]
  6. Wang, F.; Camacho, E.; Xu, K. Positive influence dominating set in online social networks. In International Conference on Combinatorial Optimization and Applications; Springer: Berlin/Heidelberg, Germany, 2009; pp. 313–321. [Google Scholar]
  7. Tankovska, H. Global Social Networks Ranked by Number of Users 2021. Available online: https://0-www-statista-com.brum.beds.ac.uk/statistics/272014/global-social-networks-ranked-by-number-of-users/ (accessed on 22 February 2021).
  8. Fournier, A.K.; Hall, E.; Ricke, P.; Storey, B. Alcohol and the social network: Online social networking sites and college students’ perceived drinking norms. Psychol. Pop. Media Cult. 2013, 2, 86. [Google Scholar] [CrossRef]
  9. Long, C.; Wong, R.C.W. Minimizing seed set for viral marketing. In Proceedings of the 2011 11th IEEE International Conference on Data Mining, Vancouver, BC, Canada, 11–14 December 2011; pp. 427–436. [Google Scholar]
  10. Günneç, D.; Raghavan, S.; Zhang, R. Least-cost influence maximization on social networks. Informs J. Comput. 2020, 32, 289–302. [Google Scholar] [CrossRef]
  11. Wang, G. Domination Problems in Social Networks. Ph.D. Thesis, University of Southern Queensland, Queensland, Australia, 2014. [Google Scholar]
  12. Rad, A.A.; Benyoucef, M. Towards detecting influential users in social networks. In E-Technologies: Transformation in a Connected World; Springer: Berlin/Heidelberg, Germany, 2011; pp. 227–240. [Google Scholar]
  13. Wang, F.; Du, H.; Camacho, E.; Xu, K.; Lee, W.; Shi, Y.; Shan, S. On positive influence dominating sets in social networks. Theor. Comput. Sci. 2011, 412, 265–269. [Google Scholar] [CrossRef] [Green Version]
  14. Jungnickel, D. Graphs, Networks and Algorithms; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
  15. Raei, H.; Yazdani, N.; Asadpour, M. A new algorithm for positive influence dominating set in social networks. In Proceedings of the 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Istanbul, Turkey, 26–29 August 2012; pp. 253–257. [Google Scholar]
  16. Fei, M.; Weidong, C. An improved algorithm for finding minimum positive influence dominating sets in social networks. J. South China Norm. Univ. 2016, 48, 59–63. [Google Scholar]
  17. Pan, J.; Bu, T.M. A Fast Greedy Algorithm for Finding Minimum Positive Influence Dominating Sets in Social Networks. In Proceedings of the IEEE INFOCOM 2019-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Paris, France, 29 April–2 May 2019; pp. 360–364. [Google Scholar]
  18. Blum, C.; Roli, A. Metaheuristics in combinatorial optimization: Overview and conceptual comparison. ACM Comput. Surv. 2003, 35, 268–308. [Google Scholar] [CrossRef]
  19. Hussain, K.; Salleh, M.N.M.; Cheng, S.; Shi, Y. Metaheuristic research: A comprehensive survey. Artif. Intell. Rev. 2019, 52, 2191–2233. [Google Scholar] [CrossRef] [Green Version]
  20. Lin, G.; Guan, J.; Feng, H. An ILP based memetic algorithm for finding minimum positive influence dominating sets in social networks. Phys. Stat. Mech. Appl. 2018, 500, 199–209. [Google Scholar] [CrossRef]
  21. Feo, T.A.; Resende, M.G. Greedy randomized adaptive search procedures. J. Glob. Optim. 1995, 6, 109–133. [Google Scholar] [CrossRef] [Green Version]
  22. Lin, G.; Luo, J.; Xu, H.; Xu, M. A Hybrid Swarm Intelligence-Based Algorithm for Finding Minimum Positive Influence Dominating Sets. In The International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery; Springer: Cham, Switzerland, 2019; pp. 506–511. [Google Scholar]
  23. Bouamama, S.; Blum, C. A hybrid algorithmic model for the minimum weight dominating set problem. Simul. Model. Pract. Theory 2016, 64, 57–68. [Google Scholar] [CrossRef]
  24. Biggs, N.; Lloyd, E.K.; Wilson, R.J. Graph Theory, 1736–1936; Oxford University Press: Oxford, UK, 1986. [Google Scholar]
  25. Girvan, M.; Newman, M.E. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 2002, 99, 7821–7826. [Google Scholar] [CrossRef] [Green Version]
  26. Zachary, W.W. An information flow model for conflict and fission in small groups. J. Anthropol. Res. 1977, 33, 452–473. [Google Scholar] [CrossRef] [Green Version]
  27. Lusseau, D.; Schneider, K.; Boisseau, O.J.; Haase, P.; Slooten, E.; Dawson, S.M. The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations. Behav. Ecol. Sociobiol. 2003, 54, 396–405. [Google Scholar] [CrossRef]
  28. Gleiser, P.M.; Danon, L. Community structure in jazz. Adv. Complex Syst. 2003, 6, 565–573. [Google Scholar] [CrossRef] [Green Version]
  29. Rossi, R.A.; Ahmed, N.K. The Network Data Repository with Interactive Graph Analytics and Visualization. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015. [Google Scholar]
  30. Leskovec, J.; Krevl, A. SNAP Datasets: Stanford Large Network Dataset Collection. 2014. Available online: https://snap.stanford.edu/data/ (accessed on 26 February 2021).
  31. Calvo, B.; Santafé, G. scmamp: Statistical Comparison of Multiple Algorithms in Multiple Problems. R. J. 2016, 8, 248–256. [Google Scholar] [CrossRef] [Green Version]
  32. García, S.; Herrera, F. An Extension on “Statistical Comparisons of Classifiers over Multiple Data Sets” for all Pairwise Comparisons. J. Mach. Learn. Res. 2008, 9, 2677–2694. [Google Scholar]
Figure 1. An illustrative example of the MPIDS problem. Black vertices form part of the solutions.
Figure 1. An illustrative example of the MPIDS problem. Black vertices form part of the solutions.
Algorithms 14 00079 g001
Figure 2. Critical difference plots.
Figure 2. Critical difference plots.
Algorithms 14 00079 g002
Table 1. Greedy functions used in the existing literature.
Table 1. Greedy functions used in the existing literature.
Algorithm NameAlgorithm TypeGreedy FunctionComplexityYearRef
Wang’s algorithmGreedycover-degree O ( n 3 ) 2011[13]
Raei’s algorithmGreedyneed-degree O ( n 2 ) 2012[15]
Fei’s algorithmGreedyboth-2016[16]
Pan’s algorithmGreedycover-degree O ( n lg n + m ) 2019[17]
ILPMAA hybrid metaheuristic
(GA + TS)
cover-degree-2018[20]
HSIAA hybrid metaheuristic
(GA + PSO)
cover-degree-2019[22]
Table 2. Small real social networks used in experiments.
Table 2. Small real social networks used in experiments.
NetworknmOpt [20]Description
Karate347815Social network of
friendships in a Karate club
Dolphins6215930Dolphin social network
Football11561363Network of American
college football teams
Jazz1982742-Collaboration network
between Jazz Musicians
Table 3. Large-scale real social networks used in experiments.
Table 3. Large-scale real social networks used in experiments.
NetworknmDescription
CA-GrQc524114,484Collaboration network of Arxiv General Relativity
CA-HepTh987525,973Collaboration network of Arxiv High Energy Physics Theory
CA-HepPh12,006118,489Collaboration network of Arxiv High Energy Physics
CA-AstroPh18,771198,050Collaboration network of Arxiv Astro Physics
CA-CondMat23,13393,439Collaboration network of Arxiv Condensed Matter
Email-Enron36,692183,831Email communication network from Enron
ncsrrlwg2639615,872Collaboration network between by scientists
actors-data10,042145,682Collaboration network between by actors
ego-facebook403988,234Social circles from Facebook
socfb-nips-ego28882981Social friendship network extracted from Facebook
socfb-Mich67374881,903Social friendship network extracted from Facebook
socfb-Brandeis993898137,567Social friendship network extracted from Facebook
soc-gplus23,62839,194Social network extracted from Google+
Table 4. SNAP networks used in experiments.
Table 4. SNAP networks used in experiments.
NetworknmDescription
amazon0302262,111899,792Amazon product co-purchasing
network from 2 March 2003
amazon0312400,7272,349,869Amazon product co-purchasing
network from 12 March 2003
amazon0505410,2362,439,437Amazon product co-purchasing
network from 5 May 2003
amazon0601403,3942,443,408Amazon product co-purchasing
network from 1 June 2003
cit-HepPh34,546420,877Arxiv High Energy Physics paper
citation network
cit-HepTh2777352,285Arxiv High Energy Physics
paper citation
email-EuAll265,214364,481Email network from a EU
research institution
p2p-Gnutella0410,87639,994Gnutella peer to peer network from
4 August 2002
p2p-Gnutella2426,51865,369Gnutella peer to peer network from
24 August 2002
p2p-Gnutella2522,68754,705Gnutella peer to peer network from
25 August 2002
p2p-Gnutella3036,68288,328Gnutella peer to peer network from
30 August 2002
p2p-Gnutella3162,586147,892Gnutella peer to peer network from
31 August 2002
soc-Slashdot08117736469,180Slashdot social network from
November 2008
soc-Slashdot092282,168504,230Slashdot social network from
February 2009
soc-Epinions175,879405,740Who-trusts-whom network of
Epinions.com
wiki-Vote7115100,762Wikipedia who-votes-on-whom
network
web-NotreDame325,7291,090,108Web graph of Notre Dame
web-Stanford281,9031,992,636Web graph of Stanford.edu
wiki-Talk2,394,3854,659,565Wikipedia talk
(communication) network
web-BerkStan685,2306,649,470Web graph of Berkeley and Stanford
web-Google875,7134,322,051Web graph from Google
cit-Patents3,774,76816,518,947Citation network among US Patents
Table 5. Numerical results for the small and large-scale real social networks.
Table 5. Numerical results for the small and large-scale real social networks.
NetworkWang’s GreedyRaei’s GreedyFei’s GreedyPan’s GreedyIGA-PIDS
ValTime (s)ValTime (s)ValTime (s)ValTime (s)ValTime (s)
CA-GrQc26260.2826230.3026220.3626120.026070.0
CA-HepTh45980.9546020.9445821.1745650.045440.0
CA-HepPh48872.9148862.9148763.2748570.01548170.015
CA-AstroPh70817.9270857.9170628.5970300.03169530.031
CA-CondMat98697.5898537.6498378.1198160.097480.015
Email-Enron12,01513.5212,18415.7511,95815.2311,9520.04711,8430.031
ncstrlwg230340.3930250.3930230.4430260.030100.015
actors-data31992.3332052.4131872.4432150.01531740.016
ego-facebook19760.7519750.8419750.8419780.06219750.078
socfb-nips-ego13980.0213980.0513980.0513980.013980.016
socfb-Mich6714810.5614780.6314730.5514580.01614270.015
socfb-Brandeis9915350.9715391.6415290.9815220.03115020.032
soc-gplus83411.9782472.2282672.5683510.03182890.031
Karate310.0310.0300.0320.0310.0
Dolphins150.0150.0150.0150.0150.0
Football680.015680.0690.0690.0680.0
Jazz810.015820.0810.0830.0810.0
Avg3660.882.3633664.472.5653646.122.6233645.820.0153616.590.017
Table 6. Results before and after removing redundant vertices, and the comparison to CPLEX.
Table 6. Results before and after removing redundant vertices, and the comparison to CPLEX.
NetworkBeforeAfterCPLEX
ValTime (s)ValTime (s)ValGap (%)
CA-GrQc26100.026070.02587 *0.77
CA-HepTh45590.045440.04471 *1.63
CA-HepPh48550.048170.01547182.1
CA-AstroPh70340.03169530.03167403.16
CA-CondMat98040.01597480.01595841.71
Email-Enron11,9140.04711,8430.03111,682 *1.38
ncstrlwg230140.030100.0152994 *0.53
actors-data32140.01531740.01630922.65
ego-facebook19770.06219750.0781973 *0.1
socfb-nips-ego13980.01513980.0161398 *0
socfb-Mich6714520.014270.01513297.37
socfb-Brandeis9915170.01615020.03214007.29
soc-gplus82940.03182890.0318244 *0.55
Karate310.0310.030 *3.33
Dolphins150.0150.015 *0
Football700.0680.063 *7.93
Jazz820.0810.079 *2.53
Avg3637.650.0143616.590.0173552.882.53
Table 7. Numerical results for the SNAP networks. Red color indicates those cases in which CPLEX fails to find good solutions.
Table 7. Numerical results for the SNAP networks. Red color indicates those cases in which CPLEX fails to find good solutions.
NetworkWang’s GreedyRaei’s GreedyFei’s GreedyPan’s GreedyIGA-PIDSCPLEX
ValTime (s)ValTime (s)ValTime (s)ValTime (s)ValTime (s)ValGap (%)
Amazon0302136,4481680.11136,1771565.20135,5021619.34136,7230.19134,5690.23262,111−48.66
Amazon0312186,7725862.28188,1945676.23186,0095777.75183,1080.56180,8530.67400,727−54.87
Amazon0505189,3926152.06----185,3070.56183,1140.64410,236−55.63
Amazon0601184,8926833.42186,1266077.13--182,2910.63179,9640.66403,394−55.39
Cit-HepPh13,34047.0813,39439.1113,31637.4013,3400.07813,1110.0812,3506.16
Cit-HepTh11,54924.8111,67125.3111,53125.7711,5440.07811,3990.0810,7406.14
Email-EuAll106,178521.58105,691676.42105,8151038.92106,2200.89105,9061.25105,659 *0.23
p2p-Gnutella0443101.4142971.4242941.6842430.041700.039954.38
p2p-Gnutella2488126.6187946.6387766.8487500.01586650.01584572.46
p2p-Gnutella2576824.6976534.7076594.8676350.01675550.073702.51
p2p-Gnutella3012,32116.4112,31416.2712,28517.1912,2540.01612,1250.01511,8592.24
p2p-Gnutella3120,61461.0820,60460.4220,54163.5620,4480.03220,2680.01619,876 *1.97
Slashdot081119,11598.3419,567101.8019,126137.4818,5710.04718,5150.03218,419 *0.52
Slashdot090221,417107.2021,85678.9221,40385.1920,8570.06320,7820.03120,629 *0.74
soc-Epinions121,22752.8621,49488.5921,24182.8921,0150.04620,9860.03120,960 *0.12
Wiki-Vote15700.7515930.7715640.8315060.01614990.0151461 *2.60
web-NotreDame144,6541172.89144,6961223.31144,3911366.73145,5640.55144,3850.84143,7420.45
web-Stanford139,9702944.28140,5773635.30139,8123405.31140,630114.64139,346139.45137,1751.58
Wiki-Talk------499,39246.125490,13342.13480,063 *2.10
web-BerkStan------339,452209.27337,388259.81335,4930.56
web-Google------396,8363.49394,8063.67389,0791.47
cit-Patents------1,599,4176.01,576,0915.733,774,768−57.63
Avg------184,322.8617.42182,074.0920.70317,207.41−42.60
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Bouamama, S.; Blum, C. An Improved Greedy Heuristic for the Minimum Positive Influence Dominating Set Problem in Social Networks. Algorithms 2021, 14, 79. https://0-doi-org.brum.beds.ac.uk/10.3390/a14030079

AMA Style

Bouamama S, Blum C. An Improved Greedy Heuristic for the Minimum Positive Influence Dominating Set Problem in Social Networks. Algorithms. 2021; 14(3):79. https://0-doi-org.brum.beds.ac.uk/10.3390/a14030079

Chicago/Turabian Style

Bouamama, Salim, and Christian Blum. 2021. "An Improved Greedy Heuristic for the Minimum Positive Influence Dominating Set Problem in Social Networks" Algorithms 14, no. 3: 79. https://0-doi-org.brum.beds.ac.uk/10.3390/a14030079

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop