2.1. Generate Good Quality Difference Image by NSCT Feature Extraction and IntraScale Fusion
As mentioned in the introduction, fusion of images can produce better quality images, by combining information from multiple source images. Image fusion in transform domains [
7,
13,
28] produces better results as the wavelet/contourlet transforms can represent frequencies in time and space. Nonsubsampled Contourlet Transforms (NSCT) possess advantages such as shift and scale invariance [
29], a property more suitable for the change detection studies. Added to that, being multi directional, the NSCT is efficient in capturing 2D singularities [
30] to derive edge and line features of image, thus geographical structures in them, while wavelets can capture only point singularities. This transform contains a pair of filter banks, of which, the nonsubsampled pyramidal filter bank (NSPFB) generates sub band pyramids with
$j+1$ redundancy at
jth scale. The nonsubsampled directional filter bank (NSDFB) produces
${2}^{{l}_{j}}$ directional subbands at
${l}_{j}$ level and
jth scale [
29] resulting into sub bands of multiple coarseness and directions. The structure of the NSCT is illustrated in the
Figure 2. A large number of coefficients are zeroes [
30] and hence, nonsubsampled contourlets are efficient in suppressing background and various noise models, and can project changed pixels more prominently. In the present study, a NDI and a NRI having complementary information are fused at NSCT feature level, due to their unique properties described earlier and produces fused NSCT coefficients that contain information from the two DI’s mentioned. The steps of the method are summarized in Algorithm 1.
Algorithm 1 NSCT Fusion 
Input: Source images ${I}_{1}$ at time ${T}_{1}$ and ${I}_{2}$ at time ${T}_{2}$ 
Output: Fused NSCT coefficients
 1:
Generate NDI using Equations ( 1) and ( 2).  2:
Generate NRI using Equation ( 3).  3:
Apply NSCT on the NDI and NRI up to sufficient scales J and obtain approximation coefficients and directional features.  4:
Denoise features using appropriate threshold ($\lambda $).  5:
Fuse the approximation coefficients by averaging, and high frequency coefficients on minimum energy rule.

The DI by subtraction from spectral bands are generated as
and normalized, called Normalized Difference Image (NDI), as
where
$D{I}_{b,max}$ is the maximum pixel value and
$D{I}_{b,min}$, the minimum value of pixel (where
$b=1,2,\dots ,B$). The second DI, Ratio image is generated as
In the ratio images, as the unchanged pixel will have a value closer to the “1”, we will subtract the values from “1” and take the absolute value to map the values in the range
$[0,1]$ (to match with the NDI), we call this as NRI (normalized ratio image). NSCT Features are extracted from these DIs (NDI and NRI) by applying Jscale decompositions (where Jscale is chosen using entropy of the decomposed images). The subband coefficients are denoised using a threshold defined by Donoho and Johnstone equation
$\lambda =\sigma \sqrt{2log\left(MN\right)}$ (here,
$\sigma $ is estimated suitably and MN is the size of the image) [
48]. A large number of background pixels are removed on thresholding and hence, the foreground containg the changed pixels gets enhanced. The resultant sub band contains different land cover information from both NDI and NRI features. The denoised approximation coefficients of NDI and NRI are fused by averaging to produce a resultant sub band comprising of the details of both the
$DI$’s equally [
7]. The minimum local energy fusion rule is applied on higher frequency sub bands [
7] so as to preserve the edge & salient features, using
where
$D{H}_{j,b}^{q,f},D{H}_{j,b}^{q,nor},D{H}_{j,b}^{q,r}$ are high frequency coefficients in direction
q at
jth scale of fused image, NDI and NRI, and
${E}_{j,b}^{q,nor},{E}_{j,b}^{q,r}$ represents the local energy of the coefficient in the direction
q of band b at
jth scale around
$3\times 3$ window of NDI and NRI, respectively.
From each spectral band, ‘S’ number of NSCT features (subband images) of size $M\times N$ each, are obtained after intrascale fusion, which are mapped to S dimensional vector with respect to each pixel, and segmented by employing the proposed LIPSO algorithm.
2.2. Generate Binary Change Map Using Leader Intelligence PSO
As our task is to produce two mutually exclusive clusters of changed and unchanged pixels, the interdistance between the clusters is to be maximized and the intradistance of the pixels in a group is to be minimized. Therefore, this problem can be formulated as an optimization problem trying to minimize the following error function:
where
K is the number of clusters(regions),
${x}_{j}$ is the data point belongs to
kth cluster(
${\mathcal{C}}_{k}$),
${v}_{k}$ is the cluster centre of
kth cluster,
$d({x}_{j}{v}_{k})$ is the Euclidean norm and
${n}_{k}$ is the number of data points belonging to the
kth cluster(region). Particle Swarm Optimization algorithm (PSO) proposed by Kennedy and Eberhart [
39,
49] is a population based algorithm that imitates the best characteristics of social insects, in solving complex optimization problems [
42,
43,
50]. It is a typical optimization algorithm suitable for realistic problems like clustering and we choose this due to its many virtues: it is a widely accepted optimization method with a welldefined mathematical foundation, especially, when it is combined with Lévy flights [
42,
43]. Interested readers may please refer to Reference [
50] for the theoretical details of PSO and Lévy flight in References [
42,
43] & in the
Appendix A. The metaheuristic underlying in nature such as swarm intelligence is used in PSO for fast convergence of the problem, which is exploited in generating changed and unchanged clusters in our experiment. It is rather easy to understand its logic, require a few parameter settings only and computational overhead is low as compared to many other algorithms.
Despite several variants of PSO are reported in the literature, the local optima problem could not be eliminated completely. Moreover, they often fail in arriving at consistent solution. Of the several improved versions of PSO, the Lévy flight version is found promising. We consider two variants of that, the LFPSO [
42] in which the Lévy flight is performed only when the particle exceeds a limit value and the PSOLF [
43], where the particles fly according to a probability,
$p>0.5$. Although they perform well with benchmark problems [
43], they fail to achieve good results in reallife problems like image segmentation resulting into bad clusters or not even generating clusters. It is probably due to the inability of the particles to attain sufficient velocity or fail to maintain a velocity within the search space. A second reason may be due to loss of diversity that leads to stagnation of particles. Therefore, a modified PSO with the concept of Leader and followers [
46] performing Lévy flight intelligently (LIPSO), is proposed. One can easily understand the steps involved in LIPSO from Algorithm 2 and the flowchart in
Appendix B.
Algorithm 2 Leader Intelligence Particle Swarm Optimization 
Input: Fused coefficients Output: Cluster centres
 1:
Initialize particles randomly selected from the search space, as ${X}_{i}=\{{X}_{i1},{X}_{i2},\dots ,{X}_{iD}\}$.  2:
Initialize various parameters: ${c}_{1}$ and ${c}_{2}$, $Xmax$, $Xmin$, $Vmax$ and $Vmin$.  3:
Evaluate fitness by Equation ( 5).  4:
Assign $pBes{t}_{i,D}$ = initial positions of particles, $gBes{t}_{D}$ = best particle position of swarm, $pBestfi{t}_{i}$ = the fitness value of ith particle, and $gBestfit$ = the best fitness of the swarm.  5:
Set the particle with the highest fitness value as the Leader.  6:
If $rand>0.5$ then update position of Leader by Lévy flight using Equation ( 9), else update velocity and position of Leader by Equations ( 6) and ( 7) respectively.  7:
Update followers as: if the distance between leader and followers crosses the threshold, update the followers with Equation ( 13); otherwise update velocity and position of particles by Equations ( 6) and ( 7) respectively.  8:
Evaluate fitness function Equation ( 5) and update $pBes{t}_{i,D}$, $gBes{t}_{D}$, $pBestfi{t}_{i}$, $gBestfit$, if better fitness values found.  9:
Repeat Step5 through Step8 until it attains the convergence criteria.

The proposed algorithm, LIPSO, is a variant of PSOLF [
50] in the following aspects: (i) an intelligent parameter that produces a reasonable step size replaces inertia weight term in the equation of PSO with Lévy flight, (ii) the concept of Leaderfollower in Reference [
46] is modified by performing Lévy flight intelligently, and (iii) Modification in the structure of the PSO algorithm by incorporating cooperative competition of the individual swarms, that is, the follower updates its position w.r.t the current position of the leader.
Initially, as in the standard PSO, the particles are chosen randomly from the search space—the vector of NSCT coefficients generated in the first part. Each particle represents the cluster centers of changed and unchanged pixels and hence, the dimension of particle is
$2d=D\left(say\right)$. The fitness of the particles is evaluated using Equation (
5) and the one with the minimum value is designated as the best particle. The
$pBes{t}_{i,D}$,
$\phantom{\rule{3.33333pt}{0ex}}gBes{t}_{D}$,
$\phantom{\rule{3.33333pt}{0ex}}pBestfi{t}_{i}$,
$\phantom{\rule{3.33333pt}{0ex}}gBestfit$ are assigned with the initial particle position, best position among the particles, fitness value of the
ith particle and fitness of the best particle in the swarm, respectively. The best particle of the swarm is designated as the leader and the rest is termed as followers. Over the iterations, the concept of Lévy flight is implemented with the assumption that the leader of the swarm occasionally takes Lévy walk in reality while foraging and that takes place only after the local search is finished and when the food (here, solution of the problem) is no more available in the locality. Therefore, the Lévy flight is first employed on the leader, on random probability basis, for example, half of the time or two third of time; otherwise, the position is updated using Equations (
6) and (
7) as in SPSO [
50].
where
${V}_{i}^{t+1}$ and
${V}_{i}^{t}$ are the velocity of
ith particle at iteration
$t+1$ and
t respectively,
${c}_{1}$ and
${c}_{2}$ are acceleration parameters often called cognitive weighting factor and social weighting factor,
$\omega $ is the inertia weight, ⊕ is the element level multiplication of matrix and
${X}_{i}^{t+1}$ and
${X}_{i}^{t}$ are the position of
ith particle at iteration
$t+1$ and
t respectively. In the present work,
$\omega $ is calculated as
where
itr is the current iteration and Maxitr is the maximum iteration.
With Lévy flight, the position of the leader is updated using the following equation
where
${X}_{L}^{t+1}$ is the position of the leader at current iteration
$t+1$, and
${X}_{L}^{t}$ is the position of leader at iteration
t and
${V}_{L}^{t+1}$ is the velocity of particle at current iteration, calculated as
where
$ILevy\left({X}_{L}^{t+1}\right)$ is the Lévy flight of intelligent leader at current iteration, given as,
where
s is the step size calculated using Equation (
A2),
$\eta $ is the Intelligent parameter calculated as
where
A is the intelligent constant, an integer value chosen based on the scale of the problem.
$\eta $ can be finetuned by changing the value of “
A” according to the problem in our hand. In this experiment the value of
A was best performed in the range
$[5,10]$. The intelligent parameter
$\eta $ controls the step length during Lévy flight with an assumption that the intelligence of the leader will decide how fast the swarm should move towards the best position. With this Lévy equation, a larger step size is obtained for the leader. Once the position of Leader is updated, the followers are updated by the Equation (
13) if their distance from leader crosses a threshold value; otherwise by Equation (
7).
where
$fLevy\left({X}_{i}^{t+1}\right)$ is the Lévy flight of the
ith follower at the current iteration, which is the velocity of follower, calculated as
With Lévy flight, particles explore the search space for better position; otherwise, the local search is continued for better solution. Thus, the algorithm maintains a tradeoff between exploration and exploitation. In the present work, the threshold value lies in the range $[0,1]$, which is chosen randomly. It can be set to a fixed value as well, depending on the problem. After updating the followers the fitness is calculated and updated the $pBest$ and $gBest$, $\phantom{\rule{3.33333pt}{0ex}}pBestfi{t}_{i}$ and $\phantom{\rule{3.33333pt}{0ex}}gBestfit$, if better values are found. The iterations are repeated until convergence. The convergence criteria could be set in two different ways—(1) till the difference between current gBest and that in the memory is less than a very small epsilon value; (2) until the maximum iteration is completed. As there are chances to occur no improvement in the fitness value for a few iterations in sequence, the first method is not suitable for segmentation problems. Therefore, the second method was adopted in this experiment. A suitable number of iterations (Maxitr) that generate robust clusters could be fixed empirically.
Once the algorithm is converged, the $gBes{t}_{D}$ contains the cluster centers of changed and unchanged pixels using which the clusters are generated and from the cluster labels (0 = changed, 1 = unchanged), the change map is prepared.