Optimization through consensus in the network
The paradigm we have proposed for largescale optimization task is based on consensus in networked structures. In opinion formation models, there is a population of agents, each with a (discrete or continuous) opinion value representing its information about a subject (Deffuant et al. 2001; Gandica et al. 2010; Weisbuch 2004; Urbig et al. 2008). The term opinion is not easy to define in reality; however, it can be considered as a discrete or continuous value expressing the individuals’ degree of desire or preference. This opinion is often represented as a real number when the model is unimodal or as a vector of real numbers when the model is multimodal. In this paper we aimed at optimizing an objective function, and therefore, each agent will have an opinion value containing all the input parameters of desired objective function, i.e., a multimodal model.
Opinion formation in multiagent systems
The agents update their opinions as a result of interactions with their neighboring agents. Consider two neighboring agents i and j with opinions as x_{
i
} and x_{
j
}, respectively. Their opinions at time n + 1 will be a function of their previous opinions, i.e. x_{
i
}(n + 1) = f_{1}(x_{
i
}(n), x_{
j
}(n)), x_{
j
}(n + 1) = f_{2}(x_{
i
}(n), x_{
j
}(n)). If certain conditions are met, after a number of updates in these values, the agents can reach a consensus in their opinions (Kozma & Barrat 2008; Carletti et al. 2006). The collective behavior of the agents over complex networks largely depends on the structural properties of the networks (Amblard & Deffuant 2004), and minor modification in the structure of the network can have drastic effects on the behavior of opinion formation (Nardini et al. 2008).
There are a number of rules for modelling opinion formation in complex networks. For example, considering discrete opinions, in the voter model, randomly selected agents exchange their opinions by that of one of their neighbours (Krapivsky & Redner 2003). The agents might influence their neighbouring agents to change their opinions based on their strength and the neighbours’ threshold (Leskovec et al. 2006). In the evolution of continuous opinions on a network, the opinions of two connected agents are updated if their difference is less than a threshold, i.e. the agents have evolving opinions (Deffuant et al. 2001; Amblard & Deffuant 2004; Lorenz 2007; Kurmyshev et al. 2011; Hegselmann & Krause 2002; Guo & Cai 2009).
In this work we considered a specific form of continuous bounded confidence model in which each agent has an opinion in the range [−1,1]  denoted by opinion space – and update its opinions based on a specific rule (Deffuant et al.
2001; Fortunato et al.
2005). First, each agent takes a random value from opinion space. Then, at each proceeding step, each agent finds its bestmatching neighbour, i.e., the one that optimizes the objective function furthest among the neighbours, and then updates its opinion value with this bestmatching adjacent. The update rule for agent
i is as
$\begin{array}{l}\left\{\begin{array}{l}{x}_{i}\left(n+1\right)={x}_{i}\left(n\right)+\mu \left[{x}_{j}\left(n\right){x}_{i}\left(n\right)\right]\\ \mathrm{\text{if}}\phantom{\rule{0.5em}{0ex}}f({x}_{i}(n))>f({x}_{j}(n))\end{array}\right.\phantom{\rule{2em}{0ex}},\phantom{\rule{0.5em}{0ex}}i=1,2,\dots ,N,\\ j={\text{arg}}_{\text{min},k}\phantom{\rule{0.4em}{0ex}}f({x}_{k}\left(n\right))\phantom{\rule{10em}{0ex}},\phantom{\rule{0.5em}{0ex}}k\phantom{\rule{0.1em}{0ex}}\in \phantom{\rule{0.1em}{0ex}}{N}_{i}\end{array}$
(1)
where f is the desired cost (or objective) function to be optimized, N is the network size and N_{
i
} is the set of neighbours of agent i. μ is the convergence (or influence) parameter, which often takes a value between 0 and 1. This parameter controls the speed of convergence in such a way that small values of μ corresponds to slow but smooth convergence, while the large values of μ corresponds to faster but wavy convergence.
To some extent, the above model for opinion formation imitates the behavior of agents in real social networks. A person may know many individuals in the society; however, he/she is only influenced by his/her closest friends (i.e., neighbors in the network). In many cases, individuals get the maximum influence through their best (closest) friends and try to make themselves similar to them, i.e., making their opinion closer to their closest friends. People try to behave like their best friends for establishing and maintaining their friendships and they influenced by them more than the others in their life. Sometimes these changes will happen because people want to preserve their connections and friendships and they will act or behave like their close friends (Barry & Wentzel 2006). They project their own attitudes and habits to their friends. Furthermore, research showed that, in general, the influence of the very best friend approximately is equal or comparable to the influence of multiple friends (Berndt & Murphy 2003).
In our model, each agent finds one of its neighboring agents that have the best value in the objective function – which is denoted by bestmatching neighbor. For example, if the objective function is an energy function; the neighbor with minimal energy function is selected. The agents then update their opinions using equation (
1). It has been shown that considering proper connection weights can enhance the consensus properties of the network, i.e., the network reaches to consensus in a shorter time (Jalili
2013a; Jalili
2013b; Yang et al.
2009; Brunetti et al.
2012). Therefore, we also took proper weights while updating the opinions. The update equations read
$\begin{array}{l}\left\{\begin{array}{l}{x}_{i}\left(n+1\right)={x}_{i}\left(n\right)+\mu \frac{f\left({x}_{j}\left(n\right)\right)+\u03f5}{f\left({x}_{i}\left(n\right)\right)+\u03f5}\left[{x}_{j}\left(n\right){x}_{i}\left(n\right)\right]\\ \mathrm{\text{if}}\phantom{\rule{0.5em}{0ex}}f\left({x}_{i}\left(n\right)\right)>f\left({x}_{j}\left(n\right)\right)\end{array}\right.\phantom{\rule{1em}{0ex}},\phantom{\rule{0.5em}{0ex}}i=1,2,\dots ,N,\\ j={\text{arg}}_{\text{min},k}\phantom{\rule{0.4em}{0ex}}f\left({x}_{k}\left(n\right)\right)\phantom{\rule{15em}{0ex}},\phantom{\rule{0.5em}{0ex}}k\phantom{\rule{0.1em}{0ex}}\in \phantom{\rule{0.1em}{0ex}}{N}_{i}\end{array}$
(2)
where ϵ is a small value (in order to make the denominator nonzero). The above weighted update rules can be justified as follows. Let us suppose that an objective functions is to be minimized. As the bestmatching neighbor is found for each agent, it influences the agent according to its fitness, i.e., its value in the objective function. To this end, the weight for the update rule of an agent gets as the fitness function at that agent divided by the fitness value of the bestmatching neighbor, often resulting in a value in the range 0–1 (note that the opinions are updated only when the fitness of bestmatching neighbor is better than that of the agent). It is worth mentioning that in some cases, the opinions are in multi dimensions, i.e., x is vector, in which the best matching agent is obtained separately for each dimension.
The method largely depends on the diffusion of good opinions (i.e., those that are good in terms of the objective function) in the network. Agents with opinion values close to the optimal objective function disseminate their opinions through communicating with their neighbors, i.e., getting into consensus with them. Indeed, influence of opinions is in two folds. Indeed, closer opinions to the optimal value of the objective function have a better chance to be selected as bestmatching neighbors.
The above rule for opinion formation is somehow inspired by communication in human societies. Our friends influence our behavior in daily life; however, we are usually affected only when our friends are better than us. Here, similarly, for each agent, first, the best matching agent is found, and then, its opinion is updated (using equation (2)) if the fitness of the bestmatching neighbor is better (i.e., it results in a lower value in the objective function) than that of the agent.
It is worth mentioning that the consensus (or synchronization) properties of dynamical networks largely depend on their structure and some topologies are favored for fast consensus (Belykh et al. 2005; Ajdari Rad et al. 2008). Network topology plays also important role in the evolution of other dynamical phenomena over complex networks, such as evolution of cooperative behavior among interacting agents (Perc & Szolnoki 2010; Perc 2009).
A pseudocode of the proposed consensusbased optimization algorithm is illustrated in section PseudoCode as follows.
PseudoCode for the proposed consensusbased optimization method
Function CBO
N: number of agents in the population (network size)
M: number of attributes of opinion vector
Boundaries: the range of the opinions
F: desired objective function which is needed to be optimized (minimized in this case)
Begin

Initialize N * M matrix X by a random normal distribution for the opinion values in Boundaries;

net = Create a structured network;

Repeat

for each agent i in population do

for each attribute a do

neighbors_opinion = mask other attributes of the opinions x in neighbors of agent i in network net by a dummy value;

self_opinion = mask other attributes of the opinions x agent i;

j = find the best agent in neighbors_opinion resulting in the best value for F;

if neighbor_opinion of agent
j optimizes F better than
self_opinion then$\begin{array}{l}\mathit{\text{weight}}=\frac{F\left(\mathit{\text{neighbors}}\_{\mathit{\text{opinion}}}_{j}\right)+\u03f5}{F\left(\mathit{\text{self}}\_\mathit{\text{opinion}}\right)+\u03f5};\\ x\left[i,a\right]=x\left[i,a\right]+\mu \xb7\mathit{\text{weight}}\xb7\left(x\left[j,a\right]x\left[i,a\right]\right);\\ x\left[i,a\right]=\text{mod}\left(x\left[i,a\right],\mathit{\text{Boundaries}}\right);\end{array}$

end if

end for

end for

Until stopping condition(s) has/have been met

End
In the beginning of the process, the agents are initialized by some random values in a range acceptable by their opinion values. As indicated by Watts and Dodds (2007) “a minority of individuals who influence an exceptional number of their peers” (Watts & Dodds 2007), there is often a minority of agents that have a significant influence on others, which is mainly due to their specific position in the network The hypothesis of influential agents demonstrated that initiating influential individuals will be explicitly different from initiating noninfluential ones in the size and likelihood of a cascade (Watts & Dodds 2007). This means that initial opinions for influential agents probably would bias the result of the consensus. This phenomenon will not happen in proposed method, since CBO is not based on the bounded confidence model. Every agent selects his/her bestmatching neighbor regardless of its great social power and degree.
Consensus of opinion values
In this section, we provide a mathematical proof that the update rule expressed in equation (
2) leads the opinions to converge. To this end, let us rewrite it as follows (Hegselmann & Krause
2002):
$x\left(t+1\right)=A\left(t,x\left(t\right)\right).x\left(t\right),$
(3)
where
x(
t) = [
x_{1}(
t),
x_{2}(
t), …,
x_{
n
}(
t)] is the opinion vector at time
t and
A is a timedependence state transition matrix that also depends on the opinion vector. We would like to verify that starting from an initial opinion values
x(0), whether or not all opinions converges to a single value, that is lim
_{t → ∞}x_{
i
}(
t) =
x^{∗} for
i = 1, 2, …,
N. Let us define diameter
d of the opinions as
$d\left(x\right)=\underset{1\phantom{\rule{0.5em}{0ex}}\le \phantom{\rule{0.5em}{0ex}}i,j\phantom{\rule{0.5em}{0ex}}\le \phantom{\rule{0.5em}{0ex}}N}{\text{max}}\left({x}_{i}{x}_{j}\right),$
(4)
Lemma 1 (Krause
2000): Consider a stochastic matrix
A (i.e., a nonnegative matrix with rowsums equal to 1 is defined as a stochastic matrix), then, one has
$\underset{i,j}{\text{max}}\left{x}_{i}\left(t+1\right){x}_{j}\left(t+1\right)\right\le d\left(A\right).\underset{i,j}{\text{max}}\left{x}_{i}\left(t\right){x}_{j}\left(t\right)\right,$
(5)
or equivalently,
$d\left(x\left(t+1\right)\right)=d\left(A\left(t,x\left(t\right)\right).x\left(t\right)\right)\le \left(1\underset{1\phantom{\rule{0.5em}{0ex}}\le i,j\le \phantom{\rule{0.5em}{0ex}}N}{\text{min}}{\displaystyle \sum _{k=1}^{n}\text{min}\left\{{a}_{\mathit{\text{ik}}},{a}_{\mathit{\text{jk}}}\right\}}\right)d\left(x\left(t\right)\right),$
(6)
The above lemma was proved in (Seneta 1981); however, we also give another proof using a simpler method.
Proof: Expression (6) can be written as
$\begin{array}{l}d\left(\mathit{\text{Ax}}\right)=\underset{1i,j\le N}{\text{max}}\left({A}_{i}x{A}_{j}x\right)=\underset{1\le i,j\le N}{\text{max}}{\displaystyle \sum _{k=1}^{n}\left({a}_{\mathit{\text{ik}}}{x}_{k}{a}_{\mathit{\text{jk}}}{x}_{k}\right)}\\ \phantom{\rule{2.5em}{0ex}}=\underset{1\le i,j\le N}{\text{max}}{\displaystyle \sum _{k=1}^{n}\left({a}_{\mathit{\text{ik}}}\text{min}\left\{{a}_{\mathit{\text{ik}}},{a}_{\mathit{\text{jk}}}\right\}+\text{min}\left\{{a}_{\mathit{\text{ik}}},{a}_{\mathit{\text{jk}}}\right\}{a}_{\mathit{\text{jk}}}\right){x}_{k}}\\ \phantom{\rule{2.5em}{0ex}}=\underset{1\le i,j\le N}{\text{max}}{\displaystyle \sum _{k=1}^{n}\left({a}_{\mathit{\text{ik}}}\text{min}\left\{{a}_{\mathit{\text{ik}}},{a}_{\mathit{\text{jk}}}\right\}\right){x}_{k}}\underset{1\le i,j\le n}{\text{min}}{\displaystyle \sum _{k=1}^{n}\left({a}_{\mathit{\text{jk}}}\text{min}\left\{{a}_{\mathit{\text{ik}}},{a}_{\mathit{\text{jk}}}\right\}\right){x}_{k}}\\ \phantom{\rule{3.5em}{0ex}}\le \left(1\underset{1\le i,j\le N}{\text{min}}{\displaystyle \sum _{k=1}^{n}\text{min}\left\{{a}_{\mathit{\text{ik}}},{a}_{\mathit{\text{jk}}}\right\}}\right)\phantom{\rule{1em}{0ex}}\left(\underset{1\le i\le N}{\text{max}}{x}_{i}\underset{1\le j\le n}{\text{min}}{x}_{j}\right)\\ \phantom{\rule{2.5em}{0ex}}=\left(1\underset{1\le i,j\le N}{\text{min}}{\displaystyle \sum _{k=1}^{n}\text{min}\left\{{a}_{\mathit{\text{ik}}},{a}_{\mathit{\text{jk}}}\right\}}\right)\underset{1\le i,j\le N}{\text{max}}\left({x}_{i}{x}_{j}\right)\\ \phantom{\rule{2.5em}{0ex}}=\left(1\underset{1\le i,j\le N}{\text{min}}{\displaystyle \sum _{k=1}^{n}\text{min}\left\{{a}_{\mathit{\text{ik}}},{a}_{\mathit{\text{jk}}}\right\}}\right)d\left(x\right)\end{array}$
Our proposed weighted update rule for opinion formation, as expressed by equation (
2), can be rewritten as
${x}_{i}\left(t+1\right)={x}_{i}\left(t\right)+\mu \phantom{\rule{0.12em}{0ex}}{w}_{j}\left({x}_{j}\left(t\right){x}_{i}\left(t\right)\right)=\left(1\mu \phantom{\rule{0.12em}{0ex}}{w}_{j}\right){x}_{i}\left(t\right)+\mu \phantom{\rule{0.12em}{0ex}}{w}_{j}{x}_{j}\left(t\right)=A\phantom{\rule{0.5em}{0ex}}.x\left(t\right),$
(7)
It is clear that in the above representation, matrix A is a stochastic matrix.
Theorem 1: The product of two stochastic matrixes is a stochastic matrix.
Proof: Let
A and
B are two stochastic matrices, and
C =
A.B is their product. Entries of
C are multiplications of entries of
A and
B. Since
A and
B are stochastic, their elements are nonnegative and, thus, the entries of
C are also nonnegative. The rowsum of
C (
C_{
i
} for
i = 1, 2, …,
N) can be obtained as
${C}_{i}={\displaystyle \sum _{j}{C}_{\mathit{ij}}=}{\displaystyle \sum _{j=1}^{N}{\displaystyle \sum _{k=1}^{N}{A}_{\mathit{\text{ik}}}}{B}_{\mathit{kj}}=}{\displaystyle \sum _{j=1}^{N}{A}_{\mathit{\text{ik}}}{\displaystyle \sum _{k=1}^{N}{B}_{\mathit{kj}}}=}{\displaystyle \sum _{j=1}^{N}{A}_{\mathit{\text{ik}}}*1=1},$
(8)
Therefore, C is a matrix with nonnegative entries and rowsums of equal to 1, and thus, it is a stochastic matrix.
Let t_{1} and t_{2} represent time steps (t_{1} < t_{2}) and B(t_{1},t_{2}) = A(t_{1}1)A(t_{1}2)A(t_{1}3)…A(t_{2}), which models the accumulated weights between time t_{1} and t_{2} (Hegselmann & Krause 2002). It can be simply shown that for any r ≥ 0, 1 – r ≤ e^{–r}.
Theorem 2 (Convergence Theorem): Considering opinion update rule (2), suppose we have a matrix B(t_{1},t_{2}) = [b_{
ij
}(t_{1},t_{2})], which is a stochastic matrix and models accumulated weights where b_{
ij
} is an element of matrix B the sequences 0 = t_{0} < t_{1} < t_{2} < … ≤ T and δ_{1}, δ_{2}, …, δ_{
i
}, … are such that 0 ≤ δ_{
t
} ≤ 1 and ${\sum}_{t=0}^{\infty}{\delta}_{t}=\infty}.$. If ${\sum}_{k=1}^{\infty}\mathit{min}\left\{{b}_{\mathit{\text{ik}}}\left({t}_{m},\mathit{tm}1\right),{b}_{\mathit{\text{jk}}}\left({t}_{m},{t}_{m1}\right)\right\}}\ge {\delta}_{m$ for all m ≥ 1 and 1 ≤ i, j ≤ N, then for any initial condition, there exists a consensus, i.e., lim _{t → ∞}x_{
i
}(t) = x* for i = 1, …, N.
Proof: see the Appendix section.
Optimization tasks
We applied our optimization procedure on a number of benchmark problems and compared its performance with some wellknown methods including genetic algorithms (GA), particle swarm optimization (PSO), differential evolution (DE) and distributed dual averaging (DDA) algorithms. GA has been successfully applied to many optimization problems and was used as a basic paradigm in this work. It starts with a population of some random solutions – denoted by chromosomes. Therefore, the first step is to encode the initial solutions from phenotype to genotype. The objective function is then used for ranking the chromosomes. GA works iteratively and, in each step, uses some operators such as parent selection, recombination or cross over and mutation (Holland 1975). There are a number of parameters that should be tuned in order for a GA to work well. These include crossover probability, mutation probability, population model and parent selection models. Crossover probability P_{
c
} indicates the probability of creating a new chromosome from two parents. Mutation probability P_{
m
} indicates the portion of the population that undergoes mutation in each iteration of the algorithm.
DE is one of the bestperforming evolutionary algorithms frequently used for optimization tasks, which often results in the optimal solution in shorter steps as compared to other optimization algorithms. DE uses the difference of a randomly selected pair of chromosomes – indicating diversity of the population – and adds it to one of the chromosomes in the population. Then, it uses crossover operators such as binomial and exponential crossover to combine the chromosomes (Storn & Price 1997). The parameters of the algorithm are as follows. β is a real value showing the coefficient of the difference between two selected chromosomes and controls the amplification of differential variation. P_{
r
} indicates the probability of using the mutant (trial) vector. N_{
v
} is an integer number indicating the number of couple chromosomes in calculating the mutant vector.
We also compared our algorithm with PSO that is a wellknown optimization algorithm based on swarm intelligent (Kennedy & Eberhart 1995). In this algorithm, there is a population of agents called swarms (or particles) interacting with other agents – like our algorithm. PSO has two components: cognitive and social components. The cognitive component is the experience of each particle while the social component is the experience of the community the agents belong to. PSO has shown high degree of flexibility and acceptable speed in solving many optimization problems. Here we used one of the best extensions of PSO that is PSO with Inertia weights (Eberhart & Shi 2000). This feature plays an important role in balancing the powers of exploration and exploitation and making the algorithm more stable. PSO has a number of control parameters. Let us denote the parameters controlling the cognitive and social power of the algorithm as c_{1} and c_{2}, respectively.
Distributed dual averaging (DDA) algorithm – inspired by Nesterov’s dual averaging algorithm (Xiao 2010; Nesterov 2009) – has been proposed for optimizing convex functions (Duchi & Wainwright 2012). Similar to CBO optimization algorithm, DDA is a networkbased optimization method in which each node computes subdifferential of a local function while receiving information from its neighboring nodes. There is also a weight matrix to model the weighting process of the method. In any iteration, each node updates its solution vector by multiplying the stochastic weight matrix by the summation of its neighbors’ parameters and the subgradient of the objective function. DDA is computationally efficient and the convergence time depends on properties of the objective function and underlying network topology. Expander graphs have been proposed as efficient connection topology for DDA optimization algorithm (Duchi & Wainwright 2012). Alternating direction method of multipliers (ADMM) is another optimization method which uses properties of dual decomposition and augmented Lagrangian methods simultaneously (Boyd & Vandenberghe 2004). The Lagrange dual function is obtained by convex conjugate definition and the dual problem is solved using gradient ascent.
Benchmark problems
We evaluated the performance of the proposed optimization strategy on a number of benchmark problems. As the first problem, we considered the following cost function
${F}_{1}\left(x\right)={e}^{2\left(\text{ln}\left(2\right)\right){\left(\frac{x0.1}{0.8}\right)}^{2}}.{sin}^{6}\left(5\mathit{\pi x}\right),$
(9)
which is a function with many local optima. The optimal point for which the minimum 1 is achieved for this function is at x^{
*
} = 0.1.
We used a number of multiple competitive functions which have been introduced as benchmarks in optimization problems (Tang et al.
2009). The first function of this type is Shifted Rastrigin’s function that is defined as
${F}_{2}\left(x\right)={F}_{\mathit{\text{rastrigin}}}\left(x\right)={\displaystyle \sum _{i=1}^{D}\left[{x}_{i}210cos\left(2\pi {x}_{i}\right)+10\right]},$
(10)
which is a multimodal, shifted, separable and scalable function. The other function of this type considered here is Shifted Ackley’s function, which is defined as
$\begin{array}{l}{F}_{3}\left(x\right)={F}_{\mathit{\text{ackley}}}\left(x\right)\\ \phantom{\rule{2.8em}{0ex}}=20\mathit{\text{exp}}\left(0.2\sqrt{\frac{1}{D}{\displaystyle \sum _{i=1}^{D}{x}_{i}^{2}}}\right)\mathit{\text{exp}}\left(\frac{1}{D}{\displaystyle \sum _{i=1}^{D}cos\left(2\pi {x}_{i}\right)}\right)+20+e.\end{array}$
(11)
We also considered Shifted Schwefel’s function, defined as
${F}_{4}\left(x\right)={F}_{\mathit{\text{schwefel}}}\left(x\right)={\displaystyle \sum _{i=1}^{n}{\left(\sum _{j=1}^{i}{x}_{i}\right)}^{2}},$
(12)
and Shifted Elliptic function, defined as
${F}_{5}\left(x\right)={F}_{\mathit{\text{elliptic}}}\left(x\right)={\displaystyle \sum _{i=1}^{D}{\left({10}^{6}\right)}^{\frac{i1}{D1}}{x}_{i}^{2}}.$
(13)
In all above functions except F_{1}, x ∈ [−5, 5]^{
D
}. Furthermore, the global optimum  which is F_{2}^{*} = F_{3}^{*} = F_{4}^{*} = F_{5}^{*} = 0  is achieved at x^{
*
} which is a random and different vector of real numbers in each run.
Network structures
One of the key ingredients of the proposed optimization algorithm is the graph structure used for connecting the agents, which is kept unchanged during the optimization process. In other words, the set of neighbours are not changed for the agents. In this work, we used a number of wellknown graph structures including, ErdősRényi random, WattsStrogatz smallworld and BarabasiAlbert scalefree networks.
We used the model introduced by Erdős and Rényi for construction of pure random networks (Erdős & Rényi 1960). In this model, N nodes are considered and each pair is connected with probability P. Research showed that real networks are neither random nor regular but somewhere in between; they are indeed smallworld. In order to construct smallworld networks, we used the original model proposed by Watts and Strogatz, as follows (Watts & Strogatz 1998). Starting with a regular ring graph in which each node is connected to its knearest neighbours, each edge is rewired with probability P, provided that selfloops and duplication of edges are prohibited. They showed that for some intermediate values of the rewiring probability P, we obtain a network with low characteristic path length, comparable to that of random networks, and high clustering coefficient (i.e., transitivity) that is much higher than corresponding random networks.
ErdősRényi and WattsStrogatz models result in networks with almost homogeneous degree distribution. However, it was shown that many real networks have heterogeneous degree distribution; there are many lowdegree nodes in the network, while a few nodes are hubs with high degrees (Albert & Barabasi 2002; Barabasi & Albert 1999; Barabási 2009). Barabasi and Albert proposed a preferential attachment growth model for constructing such networks, which is used in this work (Barabasi & Albert 1999). The model starts with a k + 1 alltoall connected nodes. In each step, a new node with k links is added to the network. This node tips to the old nodes with probability that is proportional to their degree, i.e., the higher is the degree of an old node in the network, the higher the probability of the making connection with the new node. The model results in scalefree networks whose degree distribution obeys a powerlaw (Barabasi & Albert 1999).