Distributed Batch Learning of Growing Neural Gas for Quick and Efficient Clustering

Siow, Chyan Zheng; Saputra, Azhar Aulia; Obo, Takenori; Kubota, Naoyuki

doi:10.3390/math12121909

Open AccessArticle

Distributed Batch Learning of Growing Neural Gas for Quick and Efficient Clustering

by

Chyan Zheng Siow

^*,†

,

Azhar Aulia Saputra

^†,

Takenori Obo

^†

and

Naoyuki Kubota

^†

Graduate School of Systems Design, Tokyo Metropolitan University, Hino-shi 191-0065, Tokyo, Japan

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2024, 12(12), 1909; https://0-doi-org.brum.beds.ac.uk/10.3390/math12121909

Submission received: 15 May 2024 / Revised: 14 June 2024 / Accepted: 16 June 2024 / Published: 20 June 2024

(This article belongs to the Special Issue Advanced Machine Vision with Mathematics)

Download

Browse Figures

Versions Notes

Abstract

:

Growing neural gas (GNG) has been widely used in topological mapping, clustering and unsupervised tasks. It starts from two random nodes and grows until it forms a topological network covering all data. The time required for growth depends on the total amount of data and the current network nodes. To accelerate growth, we introduce a novel distributed batch processing method to extract the rough distribution called Distributed Batch Learning Growing Neural Gas (DBL-GNG). First, instead of using a for loop in standard GNG, we adopt a batch learning approach to accelerate learning. To do this, we replace most of the standard equations with matrix calculations. Next, instead of starting with two random nodes, we start with multiple nodes in different distribution areas. Furthermore, we also propose to add multiple nodes to the network instead of adding them one by one. Finally, we introduce an edge cutting method to reduce unimportant links between nodes to obtain a better cluster network. We demonstrate DBL-GNG on multiple benchmark datasets. From the results, DBL-GNG performs faster than other GNG methods by at least 10 times. We also demonstrate the scalability of DBL-GNG by implementing a multi-scale batch learning process in it, named MS-DBL-GNG, which successfully obtains fast convergence results. In addition, we also demonstrate the dynamic data adaptation of DBL-GNG to 3D point cloud data. It is capable of processing and mapping topological nodes on point cloud objects in real time.

Keywords:

batch learning growing neural gas; topological structure; unsupervised learning

MSC:

18F60; 62H30; 93B24

1. Introduction

Recently, various types of unsupervised learning methods [1,2,3] have been applied to data mining tasks with the rapid progress of Society 5.0, Digital Twin, and cyber–physical systems [4,5,6]. For example, we need an online real-time processing system to deal with the huge size of 3D distance data measured by LiDAR in the case of self-driving cars. Furthermore, with the recent addition of depth sensors in the latest smartphones, we can use it to form color point clouds for object detection or activity recognition. The processing efficiency will be improved greatly if we can extract topological features from the 3D distance data or RGB point clouds. The main objectives of unsupervised learning are the feature extraction, clustering, and topological mapping of a dataset to find important information efficiently [7]. Many methods have been proposed, and their effectiveness has been shown in order to achieve these objectives [8,9]. However, unsupervised learning still has many problems, such as computational time in large-scale datasets or large-scale networks. In general, more sample data and more network’s nodes can form a better cluster and topological map. Therefore, in this paper, we focus on one of the main goals of unsupervised learning, which is to reduce the computational cost in the presence of large-scale datasets and network nodes.

Growing Neural Gas (GNG) proposed by Fritzke [10] is based on competitive topological learning and can dynamically change the topological structure based on the adjacent relation (i.e., an edge) referring to the activation frequency of the adjacent node according to the error index. GNG can obtain a topological structure by connecting an edge between the first and second nearest nodes with sample data. This is an important advantage of GNG in the research on clustering methods. However, standard GNG learns one by one from a set of data and takes time to process. If the given point cloud data contains millions of data, standard GNG takes a longer time to obtain the topological map. One example is supporting the real-time tracking of mobile support robots [11]. In addition, standard GNG has unstable learning [12], where the topological map generated in the current epoch is different from that in the previous epoch even on the same dataset. For example, if the tracking of the mobile support robot is unstable, it can lead to incorrect predictions of motion and cause collisions. To speed up this process and solve unstable learning, we can learn in batches.

Batch learning is widely used in deep learning. It divides the dataset into multiple mini-batches for learning. Batch learning has also recently been applied to GNG to obtain topological maps from datasets [3,12,13,14]. Multiple past studies have indeed shown that batch learning performs better than standard GNG in terms of learning stability and speed. However, most of the works do not make full use of matrix calculations as can be seen from the algorithms they provide; they still use for loops to input data one by one from each batch, resulting in slow computation time. In addition, GPU processors are becoming more and more popular, and GPUs are good at matrix calculations [15]. Therefore, we are motivated to implement matrix computation on batch learning GNG.

Previous GNG methods all start from one area and then gradually expand until all the data are covered. Therefore, it takes time to move from one side to the other. To solve this problem, we can implement multiple starting points. However, if a starting point is randomly selected from the data, there may be bias, that is, multiple starting points are located in the same area. Therefore, we intend to implement a distributed method to uniformly initialize multiple starting points. We first randomly select a starting point from the data, and then sort the data from the nearest to the farthest based on this starting point. After that, we remove the nearest batch and then select a point from the farthest batch as the next starting point. By repeating this process, the starting points can be evenly distributed throughout the data.

In addition, the strategy of adding nodes in previous GNGs is to add one node per interval or per epoch. In GNGs, nodes are added next to the node with the highest error. So, there will be other similar higher error nodes at the same time, but it will take some time to wait. Because of this strategy, the previous GNGs cannot add multiple nodes at the same time, resulting in slow growth. Therefore, we propose to implement a add multiple nodes strategy in GNG, which precalculates how many nodes need to be added and then adds them all at once. This can both speed up the growth and distribute the growth in the network without biasing towards the area with the largest error.

Last, GNG suffers from an edge connection problem due to noisy data [16]. It connects two different clusters into one because of the single noise between them. Based on these issue, we are motivated to implement an edge cutting method to cut those edges that are unimportant due to noisy data. In order to know which edges are not important, we implement an edge strength attribute in the connection to calculate how strong the connection between two nodes is. If the connection between two nodes is weak, we can cut it off in the post-processing part.

Therefore, in this paper, we contribute a novel model named Distributed Batch Learning Growing Neural Gas (DBL-GNG). First, it uses matrix calculations in batch learning, which reduces the computation time and runs faster than other GNG methods. Second, we introduce a distributed initialization method to form starting points in different areas so that GNG can grow and learn faster in different areas simultaneously. Third, to speed up the growth, we propose to add multiple nodes at a time instead of adding them one by one. Finally, we propose an edge cutting method to identify those unimportant edges in the network by calculating the edge strength. Through the combination of distributed initialization, matrix calculations for batch learning, distributed growing and edge cutting methods, we can reduce the computation time by at least 20 times on large-scale datasets. In addition, we also demonstrate the importance of the edge cutting method in clustering tasks and outperform the k-means and DBSCAN clustering methods.

This paper is organized as follows. Section 2 explains the previous works related to this paper and discusses their advantages and disadvantages in detail. Section 3 proposes a novel DBL-GNG method. Section 4 explains the setup and implementation. Section 5 shows several comparison results using benchmark tests and discusses the effectiveness of the proposed method. Finally, we summarize this paper, and discuss the essence of the proposed method and the future research directions.

2. Related Work

One of the most well-known methods of unsupervised learning is the k-means method. The k-means method has k prototypes (i.e., cluster centers), and the positions are iteratively determined by calculating the center of gravity of the dataset assigned to each prototype. The k-means method is used in many applications of data mining since it is easy to implement, and the prototypes positions are converged in a finite number of iterations. However, the k-means has some problems such as robustness for an imbalanced and overlapping data distribution [17]. In the field of soft clustering, many probabilistic methods [18,19] and fuzzy c-means (FCM)-type algorithms have been proposed [20,21]. In many cases, soft clustering methods can improve the global convergence property compared to the k-means-type clustering method since the soft clustering method allows each input data to have some probabilities or grades in a cluster. However, these methods require each datum to calculate the distance to each node in the network. In particular, FCM requires calculating the membership value of each cluster and takes a long time to calculate in large-scale networks. Moreover, these methods require that the data distribution is static, which is an offline update.

To solve these problems, many clustering methods based on a topological mapping have been proposed. Basically, the learning algorithm is based on online incremental unsupervised learning, such as the leaning Growing Cell Structure (GCS) [22], and GNG. GCS and GNG can dynamically change the topological structure based on the adjacent relation, referring to the ignition frequency of the adjacent node according to an error index. However, GCS does not delete nodes and edges, while GNG can delete nodes and edges from the topological structure by utilizing the ages of each edge [10,22]. The Self-Organizing Incremental Neural Network (SOINN) [23,24] was proposed by Shen and Hasegawa for improving the learning capability of GNG; SOINN is one of the most effective learning methods in the field of online incremental unsupervised learning. SOINN has high tolerance for noise and clustering capability. In particular, Load-Balancing SOINN (LB-SOINN) significantly improves learning stability and clustering performance [25] by using a load that means the learning time of each node. However, all these methods require input signals one by one, and therefore will require more time to process and grow if used in large-scale datasets. Moreover, the order of input also determines the generated results, leading to unstable learning [12].

The Adaptive Resonance Theory (ART)-based clustering method is another successful approach in online incremental unsupervised learning. Specifically, the ART-based clustering method with a topological structure (TCA) shows the superior clustering performance than GNG-based algorithms while avoiding catastrophic forgetting [7,26]. One common drawback of ART-based clustering methods is the specification of a similarity threshold (i.e., a vigilance parameter). In most cases, the similarity threshold has a significant impact on the clustering performance, while its optimal value of the similarity threshold depends on a dataset. Moreover, unlike GNG-based algorithms, it is difficult for an ART-based clustering method to approximate the entire distribution of a dataset because the method tends to generate many disconnected clusters at the early learning stage.

Batch learning is proposed to solve the problem of learning stability and speed up learning. In the work of Toda et al. [12], it is proposed to update network nodes by membership values between connected nodes obtained from the FCM method (FCM-BL-GNG). This method can solve the learning stability issue, but it takes time to calculate the membership value. In the work of Iwasa et al. [14], it is proposed to split the dataset into multiple mini-batches and then use this mini-batch to train GNG (MS-BL-GNG). It differs from previous methods in that it stores the delta movement of nodes and updates it after each batch. It does speed up learning, but it uses more data samples than a standard GNG. Moreover, both methods add a node in each epoch or batch, thus reducing the learning speed. To accelerate network growth, Fernando et al. [3] propose using a “add-if-silent strategy” method on top of the MS-BL-GNG method to insert nodes into the network in each iteration (FastMS-BL-GNG). With this implementation, FastMS-BL-GNG is able to operate in dynamic or non-stationary data distributions. Dynamic data, such as continuous 3D point cloud data, change every time. But these three methods still input the input signals one by one. In fact, if we use an adjacency matrix, we can perform the calculation in one go.

From the above discussion, we focus on using matrix calculations to improve the speed of batch GNG learning. At the same time, we also focus on the distributed growth of the network, rather than one-to-one growth or random growth. To highlight the differences between the proposed method and other methods, we compare them in the following aspects: (1) Is it incremental learning? (2) Does it support non-stationary data? (3) Does it require the total number of clusters to be defined in advance? (4) Does it need to learn from the input data one by one? These differences are shown in Table 1.

3. Distributed Batch Learning Growing Neural Gas

In this section, we first explain standard GNG and then explain how to implement it as batch learning by replacing it with matrix calculation. Later, we introduce a distributed growth method instead of slowly growing by starting from one area.

3.1. Growing Neural Gas

In this section, we explain the learning process of standard GNG. First, we randomly input two data into the network

W \in {w_{1}, w_{i}, \dots, w_{m}}

. Then, at each epoch, each datum is fed into the network to learn. For each input datum

x_{t}

, we calculate the Euclidean distance between each node

w_{i}

in the network and

x_{t}

as follows:

d_{i} \leftarrow \sqrt{\sum {(x_{t} - w_{i})}^{2}}, \forall i \in W

(1)

where

d_{i}

is the distance between the t-th x and the i-th w.

Next, based on the obtained distance, the winner node (closest node) and the second winner node (second closest node) are obtained as follows:

\begin{matrix} s_{1} \leftarrow \underset{\forall i \in W}{argmin} (d_{i}) \\ s_{2} \leftarrow \underset{\forall i \in W \land i \neq s_{1}}{argmin} (d_{i}) \end{matrix}

(2)

where

s_{1}

and

s_{2}

are the indices of the first and second winner nodes, respectively.

If node

w_{s_{1}}

and node

w_{s_{2}}

are not connected, then we create an edge to connect them as follows:

c_{s_{1}, s_{2}} \leftarrow 1

(3)

where c is the edge of the network. Next, we set the age of their edge (

s_{1}

and

s_{2}

) to 0 as follows:

a_{s_{1}, s_{2}} \leftarrow 0

(4)

where a is the age of the edge. Afterward, increase the age of all edges connected to the winning node as follows:

a_{s_{1}, i} \leftarrow a_{s_{1}, i} + 1, \forall i \in N (w_{s_{1}})

(5)

where

N (w_{s_{1}})

returns the neighbor nodes of the winner node. Next, increase the error value of the winning node by multiplying it by the learning rate

α

as follows:

E_{s_{1}} \leftarrow E_{s_{1}} + d_{s_{1}} \times α

(6)

where E is the cumulative error of the node. We also move the winning node with the same learning rate as follows:

w_{s_{1}} \leftarrow w_{s_{1}} + (x_{t} - w_{s_{1}}) \times α

(7)

At the same time, the neighbor nodes of the winning node also move but with a smaller learning rate

β

as follows:

w_{i} \leftarrow w_{i} + (x_{t} - w_{i}) \times β, \forall i \in N (w_{s_{1}})

(8)

Two deletions are performed. First, we delete those edges whose age exceeds the threshold

θ

as follows:

c_{i, j} \leftarrow \{\begin{matrix} 1, & a_{i, j} < θ \land c_{i, j} = 1 \\ 0, & a_{i, j} \geq θ \land c_{i, j} = 1 \end{matrix}

(9)

Secondly, we delete those isolated nodes according to the node index obtained by the following formula:

χ \leftarrow {i | \sum_{j = 1}^{N (w_{i})} (c_{i, j}) = 0}, \forall i \in W

(10)

where

χ

stores the index of the node to be removed from the network. We also remove those errors, edges, and age variables related to the

χ

node index.

Then, the cumulative error across all nodes is reduced using the discount factor

δ

as follows:

E_{i} \leftarrow E_{i} \times δ, \forall i \in W

(11)

Finally, as data for each interval

γ

are fed into the network for learning, we insert a new node into the network. To prevent the network from growing indefinitely, we predefine a maximum number of nodes M for the network, and once reached, no new nodes are created. The conditional equation is as follows:

t \mod γ = 0 \land m < M

(12)

where mod is the modulo operation, and m is the total number of nodes in the network. The new node is inserted between the highest error node and its highest error neighbor node. First, we obtain the highest error node and its highest error neighbor node as follows:

q_{1} \leftarrow \underset{\forall i \in W}{argmax} (E_{i})

(13)

q_{2} \leftarrow \underset{\forall i \in N (w_{q_{1}})}{argmax} (E_{i})

(14)

Next, insert a new node between them as follows:

w_{q_{3}} \leftarrow 0.5 \times (w_{q_{1}} + w_{q_{2}})

(15)

Then, delete the edge between

q_{1}

and

q_{2}

. At the same time, create two edges connecting

q_{3}

to

q_{1}

and

q_{2}

, respectively, as follows:

\begin{matrix} c_{i, j} & \leftarrow 0, \forall i, j \in {{q_{1}, q_{2}}, {q_{2}, q_{1}}} \\ c_{i, j} & \leftarrow 1, \forall i, j \in {{q_{1}, q_{3}}, {q_{2}, q_{3}}, {q_{3}, q_{1}}, {q_{3}, q_{2}}} \end{matrix}

(16)

The ages of these two edges are also set to 0 as follows:

a_{i, j} \leftarrow 0, \forall i, j \in {{q_{1}, q_{3}}, {q_{2}, q_{3}}, {q_{3}, q_{1}}, {q_{3}, q_{2}}}

(17)

We then reduce the errors of nodes

q_{1}

and

q_{2}

by a ratio

ρ

and assign the

q_{1}

and

q_{2}

node errors to the

q_{3}

node error as follows:

\begin{matrix} E_{q_{1}} & \leftarrow E_{q_{1}} \times ρ \\ E_{q_{2}} & \leftarrow E_{q_{2}} \times ρ \\ E_{q_{3}} & \leftarrow (E_{q_{1}} + E_{q_{2}}) \times 0.5 \end{matrix}

(18)

The learning process of standard GNG is one epoch after another. Each epoch will disrupt the data sequence and input it into the network one by one for learning. The stopping condition of a standard GNG can be that the number of network nodes reaches the maximum number of nodes at the end of the epoch.

3.2. Shortage of Standard GNG

In the previous section, we described the procedure of standard GNG. From the procedure shown, there are several steps that can be optimized. The first one is that it starts randomly from two or three nodes. If these nodes start from one data area, it will take time to explore to the next data area. For example, in Figure 1a, the first starting point is at the upper left corner. To explore another area, an epoch is required as shown in Figure 1b. Moreover, the nodes on the path are still retained, which is useless as shown in Figure 1c.

The second problem is that the data are entered one by one. Therefore, once there are a lot of data, it takes time to complete an epoch. Providing 3D point clouds can range from thousands to billions or even trillions of points. We plot the average execution time of each epoch to find the winner node for each datum as shown in Figure 2. As can be seen from the line chart, by using the for loop in standard GNG, the execution time increases exponentially once the amount of data increases.

The third problem is the node removal strategy as in Equation (10). It removes only those isolated nodes. Standard GNG has the advantage of setting an age for the edge and removing the edge when the age exceeds the limit. This deletion can distinguish different data clusters and be used to delete unimportant nodes. However, in some cases, this deletion policy does not delete the path between the two data regions as shown in Figure 1c.

The fourth problem is that standard GNG can only update one node at a time based on one input data, as shown in Figure 3. At current time t, it moves the node closest to the input data as Figure 3a. If the next input data are on the opposite side and the winner node is still the same node as Figure 3b, the previous move has become meaningless. Therefore, standard GNG is very dependent on the input sequence. This problem shows that the learned output of standard GNG is unstable.

The fifth problem is that standard GNG cannot distinguish clusters in some cases if there is very little noise between two clusters as shown in Figure 4. This problem can be solved by using the age of the edges but not if the number of iterations has not been reached. Therefore, we have to come up with a solution to eliminate these edges before the end of training to form better clusters.

3.3. Batch Learning Growing Neural Gas

In the previous section, we discussed the shortcomings of standard GNGs. One drawback is that it requires data to be entered one by one. Therefore, we introduce batch learning with matrix computation to accelerate learning. There are various GNG batch learning methods [3,12,13]. One method is to use FCM to calculate the membership of each network node to each batch data [12]. By doing this, it can improve the learning stability of standard GNG. Another approach is to accumulate delta moves of mini batches of data and then use these delta moves to update the network nodes [13]. Doing so ensures stable learning and is faster than using FCM to calculate membership values. However, although their name is batch learning, based on their algorithm, batches of data are still input into the network one by one using a for loop. Therefore, in order to perform batch learning without using any for loops, we introduce matrix calculations in the process. The overall process of batch learning on GNG is shown in Figure 5.

Same as standard GNG, first, we randomly input two data into the network W. Then, we reset the batch’s temporary variables. The temporary variables of the batch are delta movements

Δ W^{*}

, node activations

A^{*}

, and edge strength S.

The next step is to find the winner node for each datum. Therefore, the main key equation is to obtain the distance between each datum to each node. Gives the Euclidean distance in Equation (1), we can derive the formula into each datum for each node as follows:

\begin{matrix} d_{t, i} & \leftarrow \sqrt{\sum {(X - W)}^{2}} \\ \leftarrow \sqrt{\sum (X^{2} - 2 X W + W^{2})} \\ \leftarrow \sqrt{\sum X^{2} - \sum 2 X W + \sum W^{2}} \\ \leftarrow \sqrt{\sum X^{2} - 2 (X \cdot W^{T}) + \sum W^{2}} \end{matrix}

(19)

where

X \in {x_{1}, x_{t}, \dots}

is the batch data, W is all network nodes

W^{T}

is the transpose of the matrix, and

d_{t, i}

is the distance matrix between the t-th data and the i-th node.

X \cdot W^{T}

uses the dot product to sum the products of X and W. As can be seen from the equation, there is no for loop involved, but all distances from each datum to each node can be obtained. Next, we use the resulting distance matrix to find the index of the minimum value for each row (each datum) as follows:

s_{1_{t}} \leftarrow \underset{\forall t \in d^{'}}{argmin} (d_{t}^{'})

(20)

where

s_{1_{t}}

is the winner node index of t-th data x,

argmin (\cdot)

is return the index of the first minimum value in the list, and

d^{'}

is a clone of the distance matrix d, so we can easily obtain the second winner node by changing the winner node value to the maximum value as follows:

\begin{matrix} d_{t, s_{1_{t}}}^{'} & = max (d^{'}), \forall t \in d^{'} \\ s_{2_{t}} & \leftarrow \underset{\forall t \in d^{'}}{argmin} (d_{t}^{'}) \end{matrix}

(21)

where

s_{2_{t}}

is the second winner node index of t-th data x. Although Equations (20) and (21) look like a for loop searching for the smallest index, in some specific libraries (such as the NumPy library), hardware-specific tricks are used to speed up vectorized operations on the CPU. Therefore, using

argmin (\cdot)

is faster than a for loop. The performance results are significantly better as shown in Figure 2. Obtaining the winner node index by using a matrix calculation to obtain the distance is faster than a for loop.

Next, we use the identity matrix

I_{m}

to calculate the error for each node. The identity matrix is a square matrix with ones on the main diagonal and zeroes elsewhere. The equation is calculated as follows:

E \leftarrow E + (\sum_{t = 1}^{n} ({(I_{m})}_{s_{1}} \times d)) \times α

(22)

where

{(I_{m})}_{s_{1}}

returns the node matrix of

s_{1}

as

[n \times m]

, and n is the total number of batch data. After multiplying the distance, we add up all the values of each datum for each node and then multiply by the learning rate

α

.

To address learning stability, we accumulate delta movements instead of updating the network immediately. We also use the identification matrix to calculate the delta movement of each node. The delta movement is a

[m \times F]

matrix. The calculation is different from the previous equation because it involves the number of features F of the node. The equation is derived as follows:

\begin{matrix} Δ W^{1} & \leftarrow Δ W^{1} + \sum ((X - W_{s_{1}}) \times α) \\ \leftarrow Δ W^{1} + (\sum X - \sum W_{s_{1}}) \times α \\ \leftarrow Δ W^{1} + ({(I_{m})}_{s_{1}}^{T} X - {(W^{T} \times \sum_{t = 1}^{n} {(I_{m})}_{s_{1}})}^{T}) \times α \end{matrix}

(23)

where

Δ W^{1}

is the delta matrix based on winner nodes

s_{1}

, initialized to 0 by default.

Afterwards, the neighbor nodes of the winner nodes

s_{1}

need to be updated. This equation is the same as the last equation but uses an adjacency matrix instead of an identification matrix. The equation is derived as follows:

\begin{matrix} Δ W^{2} & \leftarrow Δ W^{2} + \sum ((X - W_{N (s_{1})}) \times β) \\ \leftarrow Δ W^{2} + (\sum X - \sum W_{N (s_{1})}) \times β \\ \leftarrow Δ W^{2} + (c_{s_{1}}^{T} X - {(W^{T} \times \sum_{t = 1}^{n} c_{s_{1}})}^{T}) \times β \end{matrix}

(24)

where

c_{s_{1}}

returns the connected node matrix of

s_{1}

as

[n \times m]

, and

Δ W^{2}

is the delta matrix based on the neighbor nodes of

s_{1}

.

The delta movement is accumulated from each datum; therefore, we have to average it before applying to the network nodes, as otherwise, the value is too large. Therefore, we need to count the number of times a node is selected for update in each batch. First, we count the number based on the winner node

s_{1}

as follows:

A^{1} \leftarrow A^{1} + \sum_{t = 1}^{n} {(I_{m})}_{s_{1}}

(25)

where

A^{1}

is the activation count of each node based on

s_{1}

. Next, we count the number of times a node is selected based on the neighbor nodes of

s_{1}

as follows:

A^{2} \leftarrow A^{2} + \sum_{t = 1}^{n} c_{s_{1}}

(26)

where

A^{2}

is the activation count of each node based on the neighbor nodes of

s_{1}

.

After that, the next step is to create an edge connecting the first and second winner nodes. However, we not only connect two nodes but we also calculate the strength of the connection as follows:

\begin{matrix} {\hat{c}}_{s_{1}, s_{2}} & \leftarrow 1 \\ {\hat{c}}_{s_{2}, s_{1}} & \leftarrow 1 \\ \tilde{c} & \leftarrow {(I_{m})}_{s_{1}} + {(I_{m})}_{s_{2}} \\ \tilde{c} & \leftarrow {\tilde{c}}^{T} \tilde{c} \\ S & \leftarrow S + \hat{c} \times \tilde{c} \end{matrix}

(27)

where S is the cumulative edge strength of the epoch, the initial value of the matrix is 0, and the matrix size is

[m \times m]

. First, we form the connection of

s_{1}

and

s_{2}

as

\hat{c}

. We do this twice to ensure that the adjacency matrix is symmetric. We then sum the identity matrices of

s_{1}

and

s_{2}

to

\tilde{c}

. The total activation of the node and edge can be obtained by taking the dot product of the matrix

\tilde{c}

. Finally, by multiplying the total activation

\tilde{c}

and the default connection

\hat{c}

, we can obtain the strength of the connection, which is then accumulated to S. Edge strength can help us decide edge cuts at the end of training so we can obtain better clustering. We do not use edge age in batch learning, the reason being that batch learning performs all node updates in one batch; therefore, the edge age is not important since the connection of

s_{1}

and

s_{2}

is continuously updated in every epoch.

From Equations (19)–(27), we have the learning process in batch learning, just like obtaining the loss value in deep learning. This can be performed by dividing the data into small batches, such as mini-batches in deep learning. The process of batch learning using matrix calculation is shown in Algorithm 1.

Algorithm 1 Batch learning algorithms using matrix calculation.

Input: $X, Δ W^{1}, Δ W^{2}, A^{1}, A^{2}, S$ ▹ They are batch of data, delta movements, node activation
counts, and edge strength.
Output: $Δ W^{1}, Δ W^{2}, A^{1}, A^{2}, S$
$d \leftarrow$ Equation (19) ▹ Calculate the distance of each datum to each node.
$s_{1} \leftarrow$ Equation (20) ▹ Get the winner node for each datum.
$s_{2} \leftarrow$ Equation (21) ▹ Get the second winner node for each datum.
$E \leftarrow$ Equation (22) ▹ Update the error node of $s_{1}$ .
$Δ W^{1} \leftarrow$ Equation (23) ▹ Update delta movement based on $s_{1}$ .
$Δ W^{2} \leftarrow$ Equation (24) ▹ Update delta movement based on the connected nodes of $s_{1}$ .
$A^{1} \leftarrow$ Equation (25) ▹ Update node activation count based on $s_{1}$ .
$A^{2} \leftarrow$ Equation (26) ▹ Update node activation count based on the connected nodes of $s_{1}$ .
$S \leftarrow$ Equation (27) ▹ Update edge strength based on $s_{1}$ and $s_{2}$ .

Next, we use the obtained values to update the network. We first update the node position as follows:

W \leftarrow W + {({(Δ W^{1})}^{T} \times \frac{1}{A^{1} + ϵ})}^{T} + {({(Δ W^{2})}^{T} \times \frac{1}{A^{2} + ϵ})}^{T}

(28)

where

ϵ

is the epsilon value to prevent zero division error, which we set to 1

\times 10^{- 4}

. We then replace the network edges with edge strengths as follows:

c_{i, j} \leftarrow \{\begin{matrix} 1, & S_{i, j} > 0 \\ 0, & S_{i, j} = 0 \end{matrix}, \forall i, j \in S

(29)

where this is why we do not use age in batch learning since the connections are updated batch by batch. Next, we use Equation (10) to remove those isolated nodes and delete those related variables as follows:

\begin{matrix} W & \leftarrow {w_{i} | i \notin χ}, \forall i \in W \\ E & \leftarrow {E_{i} | i \notin χ}, \forall i \in E \\ A^{1} & \leftarrow {A_{i}^{1} | i \notin χ}, \forall i \in A^{1} \\ S & \leftarrow {S_{i, j} | i \notin χ \lor j \notin χ}, \forall i, j \in S \\ c & \leftarrow {c_{i, j} | i \notin χ \lor j \notin χ}, \forall i, j \in c \end{matrix}

(30)

where

χ

stores the indexes of the isolated nodes. Last, we reduce the error according to Equation (11).

However, we find that some nodes updated slowly and did not update within an epoch, such as the problem in Figure 1c. Therefore, we introduce the deletion of those inactive nodes as follows:

χ \leftarrow {i | A_{i}^{1} = 0}, \forall i \in A^{1}

(31)

where

χ

stores the indexes of the inactive nodes, and we also remove those related variables as Equation (30). However, this deletion of inactive nodes cannot be performed frequently because some nodes may be active in the next epoch. Therefore, we set a probability of 10% to perform inactive node deletion. The process of the network update is shown in Algorithm 2.

Algorithm 2 Network update on every epoch.

$W \leftarrow$ Equation (28) ▹ Update network nodes based on delta movement.
$c \leftarrow$ Equation (29) ▹ Update network edges based on edge strength.
$χ^{1} \leftarrow$ Equation (10) ▹ Get those isolated node indexes.
Equation (30) $(χ^{1})$ ▹ Delete those related variables based on node indexes.
$m \leftarrow m - | χ^{1} |$ ▹ Update the number of network nodes.
$E \leftarrow$ Equation (11) ▹ Perform error discounting on all nodes.
if $U ([0, 1])$ > 0.9 then ▹ Gives 10% chance to perform the following action.
$χ^{2} \leftarrow$ Equation (31) ▹ Get those inactive node indexes.
Equation (30) $(χ^{2})$ ▹ Delete those related variables based on node indexes.
$m \leftarrow m - | χ^{2} |$ ▹ Update the number of network nodes.
end if

After updating the network nodes, we perform the same thing as standard GNG. We insert a new node between the highest error node and its highest error neighbor node. However, we insert the new node on every epoch. We perform Equations (13)–(16) and (18) for this add new node operation. In addition to this, we also expand the edge strength matrix as follows:

\begin{matrix} S_{i, j} & \leftarrow 0, \forall i \in {q_{1}, q_{2}}, \forall j \in {q_{2}, q_{1}} \\ S_{i, j} & \leftarrow 1, \forall i \in {q_{1}, q_{2}, q_{3}, q_{3}}, \forall j \in {q_{3}, q_{3}, q_{1}, q_{2}} \end{matrix}

(32)

The process of adding a new node is shown in Algorithm 3.

Algorithm 3 Add a new node on every epoch.

if m < M then ▹ The network does not exceed the maximum number of nodes.
$q_{1} \leftarrow$ Equation (13) ▹ Get the maximum error node.
$q_{2} \leftarrow$ Equation (14) ▹ Get the maximum error neighbor node of $q_{1}$ .
$w_{q_{3}} \leftarrow$ Equation (15) ▹ Create new node to the network.
$c \leftarrow$ Equation (16) ▹ Create an edge between $w_{q_{1}}$ and $w_{q_{2}}$ .
$E \leftarrow$ Equation (18) ▹ Update the node errors of $w_{q_{1}}$ , $w_{q_{2}}$ , and $w_{q_{3}}$ .
$S \leftarrow$ Equation (32) ▹ Extend the edge strength matrix.
$m \leftarrow m + 1$ ▹ Update the number of network nodes.
end if

BL-GNG has the same problem as standard GNG, that is, it forms a connection between two clusters if there is noise between them as shown in Figure 4. Therefore, this paper proposes an edge cutting method that utilizes edge strength. If the edge is within a cluster, it remains active and has a strong intensity. If outside the cluster, there is less activation and weaker intensity. Therefore, we store the edge strength in Equation (27) so that we can use it to cut those unimportant edges. We use the percentile function to determine if a specific edge needs to be removed as follows:

\begin{matrix} \dot{S} & \leftarrow S > 0 \\ c_{i, j} & \leftarrow {c_{i, j} | S_{i, j} > P_{\dot{S}} (p)}, \forall i, j \in S \end{matrix}

(33)

where

\dot{S}

is to keep those values greater than 0, P is the percentile function, and p is the filtering probability. The filter probability depends heavily on the dataset, but we set it to 0.15 by default. After deleting those unimportant edges, we implement Equations (10) and (30) to delete those isolated nodes. The process of edge cutting is shown in Algorithm 4.

Algorithm 4 Edge cutting process before end of training.

$χ^{2} \leftarrow$ Equation (31) ▹ Get those inactive node indexes.
Equation (30) $(χ^{2})$ ▹ Delete those related variables based on node indexes.
$m \leftarrow m - | χ^{1} |$ ▹ Update the number of network nodes. $| χ^{1} |$ is the variable counts.
$c \leftarrow$ Equation (33) ▹ Remove unimportant edges from the network.
$χ^{1} \leftarrow$ Equation (10) ▹ Get those isolated node indexes.
Equation (30) $(χ^{1})$ ▹ Delete those related variables based on node indexes.
$m \leftarrow m - | χ^{1} |$ ▹ Update the number of network nodes. $| χ^{2} |$ is the variable counts.

3.4. Distributed Batch Learning Growing Neural Gas

We find that BL-GNG grows slowly because it inserts new nodes into the network after each batch. Therefore, to speed up learning, we try to insert multiple nodes at once. The new node strategy is to choose the node with the highest error, so we try to use the percentile function to find out how many extra nodes should be added for the current batch as follows:

\begin{matrix} g \leftarrow \sum E > P_{E} (p) \end{matrix}

(34)

where g is the number of nodes that should be grown in the current batch, P is the percentile function, and p is the filter probability value. We set p to 0.85 by default. By doing this, we ensure that network growth is distributed.

However, it still does not solve the problem of Figure 1a. When the starting point is at a corner edge of the data, it will definitely take time to grow to another corner edge. The starting point here refers to the first two nodes in the network. Therefore, we can have multiple starting points instead of just one starting point. However, we cannot randomly choose another starting point; otherwise, randomness will make the two starting points close to each other. Therefore, we introduce a method to evenly distribute the starting points over all data.

Given this, if we know the starting point of the area, then the next starting point should be generated from the farthest area to ensure that the two starting points are not close to each other. By repeating this process without going back to the previous area, we can evenly distribute the generation of starting points over all data. First, we have to define the total number of starting points

η

. Next, we calculate the batch size as follows:

B \leftarrow \frac{D}{η}

(35)

where B is the batch size and D is the data size. We then randomly select a data point from a subset of the data, where the subset is the batch at the end of the data, as follows:

w_{m} \in_{R} {x_{i} | i > D - B}, \forall i \in X

(36)

where

\in_{R}

randomly selects a datum from the subset. Then, we calculate the distance from the current node to all data, and then sort the data in order of distance from nearest to far as follows:

\begin{matrix} d & \leftarrow \sqrt{\sum {(w_{m} - x_{i})}^{2}}, \forall i \in X \\ \leftarrow \sqrt{\sum X^{2} - 2 w_{m} X + w_{m}^{2}} \\ \leftarrow \sqrt{\sum X^{2} - \sum 2 w_{m} X + \sum w_{m}^{2}} \\ \leftarrow \sqrt{\sum X^{2} - 2 (w_{m} \cdot X^{T}) + \sum w_{m}^{2}} \\ \leftarrow \sum X^{2} - 2 (w_{m} \cdot X^{T}) \end{matrix}

(37)

X \leftarrow {x_{i}}, \forall i \in argsort (d)

(38)

where

argsort (d)

returns the indices of distance in ascending order. The distance formula is modified here because we only need to know the data in the sorted index. Next, we select the third closest node as the neighbor node of the current node and connect them with an edge as shown below:

w_{m + 1} \leftarrow {x_{3}}

(39)

c_{i, j} \leftarrow 1, \forall i \in {m, m + 1}, \forall j \in {m + 1, m}

(40)

We choose the third closest node because we want to leave some space between the current nodes. Next, we remove the most nearest batch from the X as follows:

X \leftarrow {x_{i} | i > B}, \forall i \in X

(41)

Finally, repeat Equations (36)–(41)

η

times. The distributed node initialization process is shown in Algorithm 5. The complete process of DBL-GNG is shown in Algorithm 6, and the flowchart is shown in Figure 6.

Algorithm 5 The distributed node initialization process.

Input: $X, η$ ▹ Data, and the total number of starting points respectively.
Output: $W, c, m$ ▹ Nodes, edges, and number of network nodes respectively.
$B \leftarrow$ Equation (35) ▹ Get the batch size.
for $p o i n t = 1, \dots, η$ do
$w_{m} \leftarrow$ Equation (36) ▹ Randomly select a data from the end batch.
$d \leftarrow$ Equation (37) ▹ Calculate the distance between the selected node and all data.
$X \leftarrow$ Equation (38) ▹ Sort data based on distance.
$w_{m + 1} \leftarrow$ Equation (39) ▹ Select the third nearest data as neighbor node.
$c \leftarrow$ Equation (40) ▹ Creates an edge connecting the selected node to neighbor node.
$X \leftarrow$ Equation (41) ▹ Remove closest distance batches from the data.
$m \leftarrow m + 2$ ▹ Update the number of network nodes.
end for

Algorithm 6 The complete process of DBL-GNG.

Input: X ▹ Data.
Parameter: $η$ ▹ Total number of starting points.
Parameter: M ▹ Maximum number of network nodes.
$W, c, m \leftarrow$ Algorithm 5 ▹ Perform node distribution initialization.
while $m < M$ do
$Δ W^{1}, Δ W^{1}, A^{1}, A^{2}, S =$ 0 ▹ Reset temporary variables.
$Δ W^{1}, Δ W^{1}, A^{1}, A^{2}, S \leftarrow$ Algorithm 1 ▹ Perform batch learning on input data.
$W, c \leftarrow$ Algorithm 2 ▹ Update the network.
$g \leftarrow$ Equation (34) ▹ Get the required growth count.
for $i = 1, \dots, g$ do
$W, c \leftarrow$ Algorithm 3 ▹ Add new node to the network.
end for
end while
$W, c \leftarrow$ Algorithm 4 ▹ Cut the unimportant edges.

4. Implementation

4.1. Dataset

The benchmark datasets used in this paper are aggregation (Agg) [27], D31 [28], R15 [28], path-based (Path) [29], spiral (Sp) [29], and H72D [30]. The clustering patterns of the datasets are shown in Figure 7, including the number of clusters and points for each dataset.

The aggregation dataset is the only one with unbalanced data, with a standard deviation of approximately 81. It is important for model evaluation to evaluate whether the model is able to cluster imbalanced data. The D31 dataset has balanced data but with more clusters, and each cluster is close to each other. This dataset is used to evaluate whether the model is able to cluster multiple clusters in dense environments. The R15 dataset is used to evaluate whether the model can work in different dense environments. As can be seen in the center, each cluster is close to each other, while on the outside, it is further apart. This is also useful for evaluating path residuals in the network problem depicted in Figure 1c. The Pathbased dataset has almost the same balance data but different clustering patterns, where the first one is lines and the other two are groups. This is to evaluate whether the model can support different clustering patterns. The spiral dataset consists of three spiral patterns and is used to evaluate the model’s ability to cluster multiple complex curves. The H72D contains 72 clusters and is built on top of D31. The significance of this dataset is to test the speed of the GNG framework on large-scale datasets.

4.2. Model Parameters

In DBL-GNG, it uses similar parameter settings to standard GNG as shown in Table 2. In standard GNG, there are some parameters to consider; the first is the learning rate of the winning node (

α

) and neighbor nodes (

β

). If the learning rate is set too high, the nodes will keep moving and fail to converge; if it is set too low, it will take more time to learn. Through experiments, we find that the

α

and

β

parameters have little impact on the performance of DBL-GNG. One reason is because for each batch of learning, it averages the delta movements

Δ W^{*}

across activations

A^{*}

and resets for the next batch. But the neighbor node’s learning rate must be lower than the winning node’s learning rate. Therefore, we set the learning rate of the winner node to 0.5 and the learning rate of the neighbor nodes to 0.01 to obtain stable performance. Another parameter in standard GNG is the frequency of node creation (

γ

) and edge deletion (setting the maximum edge age

θ

). It creates nodes every

γ

interval and removes edges when the maximum age

θ

is exceeded. In DBL-GNG, we do not need these two parameters because DBL-GNG creates nodes on every epoch, the number of nodes is determined by the node error exceeding the 0.85 percentile, and edges are reset every epoch. Through experiments, we find that the

ρ

and

δ

parameters have little impact in either GNG or DBL-GNG. Therefore, we set both to 0.5 on DBL-GNG. We also compare the parameter settings with other models. As can be seen from the table, DBL-GNG only needs to set two parameters: the first is node initialization, and the second is the maximum number of nodes, which is the same as FCM-BL-GNG. Although MS-BL-GNG only needs to customize one parameter in the table, it also needs to customize the batch scale. For small datasets, this batch scale is difficult to define.

To standardize the evaluation between DBL-GNG and other models, we set the maximum node settings for each dataset as shown in Table 3. The initialization node (initial starting point) of DBL-GNG is the total number of clusters in the dataset.

4.3. Evaluation Metrics

Two measurements are taken for evaluation. The first one is the quantization error [31], which is the average error (distance) between each datum and its nearest node, given as follows:

E r r \leftarrow \frac{\sum_{t = 1}^{X} \sqrt{{(x_{t} - w_{s_{1}})}^{2}}}{D}

(42)

where X is the data of the dataset, D is the size of the dataset, and

w_{s_{1}}

is obtained through Equations (1) and (2).

Another metric is the total computation time in seconds. For each model, we first learn until the maximum number of nodes is reached and then train for an additional 10 epochs. All experiments were performed on a computer equipped with an Intel Xeon(R) E-2286M CPU 2.40 GHz x16 processor and 64 GB RAM. The programming environment for this study was performed in Ubuntu 22.04.3 LTS and Python 3.9.18. Also, all results in the following experiments are the average result over 10 runs.

5. Experiments

5.1. Topological Map

A topological map is a simplified diagram that retains only important information and removes unnecessary details, which is one of the main purposes of GNG. This topological map is important for the robot to make path decisions [25,32]. We plot the DBL-GNG topological map for each dataset in Figure 8, including initialization, when reaching the maximum node, after an additional 10 epochs of training, and after the edge cutting method. From the results, our initialization method is very useful; we can see in Figure 8u that we initialized 72 points, and these points cover all clustering areas. This is much better than just initializing by random selection. Next, it can be seen from the topological map results that once the number of nodes reaches the maximum, most of the data area has been completely covered. But with an additional 10 epochs, it has successfully smoothed and refined the topological map. Finally, using the edge cutting method, the unimportant edges are successfully removed. We can see it clearly in Figure 8l. Before cutting, there are few connected clusters in the central area because there is some small noise between them. But with our edge cutting method, it successfully removes those unimportant edges.

However, we can see that there are still some edges that were not successfully removed. Therefore, our edge cutting method cannot cut paths if there is a lot of noise between clusters. When we look closely at a topological map (e.g., the Agg dataset), we can see that nodes are always inside clusters rather than covering clusters along cluster edges. The reason is that our optimization objective is to place a node in a set of data where the center position is the best position. Therefore, we should update the optimization objective and no longer use distance as the error metric but should use the coverage area of a few nodes as the error metric. We consider this study to be our future work.

5.2. Comparison of DBL-GNG with Other GNGs

We compare the performance of DBL-GNG with standard GNG, GNG-U [33], FCM-BL-GNG [12], MS-BL-GNG [14], and FastMS-BL-GNG [3]. The quantization error and total calculation time are shown in Table 4 and Table 5, respectively. In terms of computational speed, Table 5 clearly shows that DBL-GNG performs faster than other GNG methods. It is at least 10× faster on smaller datasets and at least 20× faster on large datasets such as the H72D dataset. Therefore, DBL-GNG is well suited for large-scale datasets, such as performing topological mapping on point clouds [3].

In terms of the quantization error shown in Table 4, DBL-GNG also outperforms other GNGs (except the Spiral dataset). We plot the DBL-GNG topological map of the Spiral dataset in Figure 8t. The figure shows that DBL-GNG indeed covers all data perfectly. The reason why the effect is not as good as for FCM-BL-GNG is that they add nodes one by one (slow convergence rate) to evenly cover all data, while DBL-GNG grows multiple nodes in one epoch. To test it, we modified the filter probability, the p value in Equation (34), from the default value of 0.85 to 0.99 to slow down the network growth rate. The results show that the slowly growing DBL-GNG does outperform FCM-BL-GNG on the Spiral dataset, with a quantization error of 0.442 ± 0.003. However, the computation time is increased to 0.160 ± 0.005.

In summary, DBL-GNG can reduce the computation time by using matrix calculation and distributed initialization and growth. DBL-GNG also outperforms other methods in terms of quantization error. By adjusting the filter probability in Equation (34), better quantization results can be obtained.

5.3. Discussion on Convergence Rate

The convergence rate is an important discussion point of GNG, which determines the growth of the GNG network. If the growth is faster, it means that the network can cover all data faster, and the remaining epochs are for fine-tuning the nodes instead of exploration. But if the growth is too fast, little quantization errors will occur as discussed in the previous section. Moreover, it also includes the problem of unbalanced topology nodes on the cluster as shown in Figure 9. The results shown are produced by a standard GNG with a high add node frequency. As can be seen from the figure, although all clusters have the same amount of data, some clusters have more nodes and some clusters have fewer nodes. Such unbalanced nodes will lead to higher quantization errors (because a single node needs to cover more data), and lower clustering results (because it is difficult to cut the edges between each node). However, although the number of nodes in DBL-GNG increases rapidly, the imbalance problem is lessened as shown in Figure 8k. The reason is that this paper introduces a distributed initialization function, which initializes the nodes in each cluster at the beginning, so imbalance problems rarely occur.

The convergence rate can be determined by the number of nodes and the reduction in error. We plot the growth and error reduction in DBL-GNG compared to other batch learning methods on the H72D dataset in Figure 10. As can be seen from the figure, DBL-GNG converges faster than other batch learning methods (except FastMS-BL-GNG). It takes about eight epochs to reach the maximum node, while other batch learning methods are still growing one by one. The reason why FastMS-BL-GNG converges faster is that it divides the first epoch into multiple scale batches and uses “add-if-silent” to add nodes to the network at each iteration. However, compared to the computation time, DBL-GNG performs 10 times faster than FastMS-BL-GNG.

To understand the growth and error reduction during the convergence process, Figure 11 and Figure 12 plot the growth and error reduction for the Spiral and Path datasets, respectively. From the figures, we can see that FastMS-BL-GNG converges faster than other batch learning methods in terms of node growth and error reduction. In addition, we can observe that, the error decreases with the increase in nodes. However, as shown in Figure 11, FastMS-BL-GNG and DBL-GNG reach the maximum number of nodes in the same epoch, but DBL-GNG is still in the learning stage, and its error is higher than FastMS-BL-GNG. A similar phenomenon is also shown in Figure 12, where DBL-GNG has reached the maximum number of nodes, but its error is similar to that of MS-BL-GNG, while the number of nodes of MS-BL-GNG is only half of that of DBL-GNG. The reason for the slow error convergence is that DBL-GNG is full batch learning, while FastMS-BL-GNG and MS-BL-GNG are multi-scale batch learning. Assuming that learning is learning error, full batch learning takes the average of the accumulated errors of all data, while multi-scale batch learning takes the average of the accumulated errors of small batches. Therefore, full batch learning will learn more slowly because the error is divided by a larger number. To speed up the convergence of DBL-GNG, in the next section, we implement the multi-scale process in DBL-GNG.

5.4. Multi-Scale DBL-GNG

To demonstrate that DBL-GNG is scalable, we implement the multi-scale of FastMS-BL-GNG to our DBL-GNG, named MS-DBL-GNG. For the first epoch, we divide the dataset into five scale batches (five learning phases). The batch size of each batch is

λ_{L} = {\frac{D}{32}, \frac{D}{16}, \frac{D}{8}, \frac{D}{4}, \frac{D}{2}}

, where D is the size of the dataset, and the starting point of the data is as follows:

data start point = min (L - 1, 1) \times λ_{L}

(43)

where L is the learning phase of

λ_{L}

and starts from 1. Meanwhile, for each phase, we set the percentile probability of Equation (34) as follows:

p \leftarrow 1.0 - 0.15 (L^{\max} - L - 1)

(44)

where the p value is the filtering probability, which is used to calculate the number of nodes to be inserted based on the outlier error nodes.

After the first epoch, we perform full-batch learning like DBL-GNG until the maximum number of nodes is reached, and then train for an additional 10 epochs. We plot the growth and error reduction in MS-DBL-GNG and FastMS-BL-GNG on the H72D, Path, and Spiral datasets in Figure 13, Figure 14 and Figure 15, respectively.

From Figure 13, it can be seen that in terms of the number of nodes, MS-DBL-GNG converges in the first epoch, while FastMS-BL-GNG converges in the second epoch, but in terms of error, both methods converge in the first epoch. This indicates that multi-scale batch learning indeed helps DBL-GNG to boost learning in the first epoch. A similar phenomenon also appears on the Path dataset in Figure 14. Both methods converge in the number of nodes and error in the first epoch, but we can see that MS-DBL-GNG is constantly reducing the error, while FastMS-BL-GNG has stopped learning. This indicates that in the first epoch, the multi-scale in MS-DBL-GNG promotes the growth in nodes and then continuously fine-tunes the network in the following epochs. However, different results appear on the Spiral dataset in Figure 15, where FastMS-BL-GNG converges around the fifth epoch, while MS-DBL-GNG is still adding nodes and reducing the error. We can see that in the first epoch, the multi-scale in MS-DBL-GNG does help promote network growth, but the effect is not significant. The reason is that MS-DBL-GNG adopts a distributed growth method, and its growth is based on the current network node size (the total number of outlier error nodes), while FastMS-BL-GNG adopts an “if-add-silent” strategy in each interaction on the mini-batch. However, we plot the topological maps of the two methods in Figure 16, from which we can see that MS-DBL-GNG matches the data better than FastMS-DBL-GNG.

In summary, DBL-GNG is scalable. Implementing multi-scale batch learning in MS-DBL-GNG can indeed improve the network convergence speed in the first epoch. In addition, in terms of error quantization, MS-DBL-GNG outperforms FastMS-BL-GNG (as shown in Table 4). In terms of computation time, MS-DBL-GNG outperforms other methods on large-scale datasets. However, the quantization error of MS-DBL-GNG is higher than that of DBL-GNG, which is a trade-off between speed and performance.

5.5. Learning Stability

We did mention the learning stability issue of standard GNG in the previous section as shown in Figure 3. It is very important to discuss the issue of learning stability, and this analysis is important to ensure that GNG systems are more reliable and stable, such as the work of Chinnamuniyandi et al., which ensures that neural networks are more reliable and stable [34]. GNG generates a topological map of data. If the same time series data are used to train the GNG model, the position of its topological map will change in each epoch, resulting in unstable output clustering results and feature extraction. Time series data can be images taken by surveillance cameras in a static environment. To understand this issue, we plot the total movement and quantization error per epoch using standard GNG on the H72D, Path, and Spiral datasets in Figure 17, Figure 18 and Figure 19, respectively. The total movement is the total amount of Manhattan distance updated by each node in Equations (7) and (8). We also plot the DBL-GNG total movement in the figures, where the total movement is the delta movement in Equation (28).

From all the line graphs, we can see that although the error of the standard GNG has converged in the second epoch, the movement of its nodes has not decreased and remains unchanged, that is, the nodes in the topological map are constantly moving. However, the results of DBL-GNG are different, and the total movement of the nodes gradually decreases until it is almost stationary. Although the standard GNG nodes are constantly moving, their errors do not decrease. It can be clearly seen from Figure 17 that the error of the standard GNG is constant, while the error of the DBL-GNG is constantly decreasing. The reason why standard GNG has this problem is that each update is based on the input sequence, while DBL-GNG is not affected by the input sequence because DBL-GNG averages the movements over each input batch.

In summary, DBL-GNG has better learning stability compared to standard GNG. Although standard GNG converges faster than DBL-GNG, the computation time and quantization error in Table 4 and Table 5 show that DBL-GNG runs faster than standard GNG and has the lowest quantization error. In addition, since DBL-GNG can produce stable outputs in each epoch, we can use its topological map for feature extraction or clustering. We present an overall comparison of various GNG methods based on the experimental results in Table 6.

5.6. Comparison with Other Clustering Methods

In this section, we try to compare three different types of clustering methods, which are k-means, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and a hierarchical clustering method named agglomerative clustering. All three models are available from sklearn [35]. These three models are all unsupervised machine learning techniques that divide the population into several clusters so that the data points in the same cluster are similar, while the data points in different clusters are dissimilar.

However, all three models require some parameters to be defined in advance, among which k-means needs to define the total number of clusters, DBSCAN needs to define two parameters, namely,

e p s

and

m i n P t s

, and agglomerative needs to define the distance threshold.

e p s

is the maximum distance between two samples, where one sample is considered a neighbor of the other, and

m i n P t s

is the number of samples (or total weight) for a point to be considered a core point within its neighborhood. The distance threshold is used to determine whether clusters are merged in a hierarchical relationship. All of this information requires prior knowledge of the data distribution and overall clustering. In DBL-GNG, we do not need to predefine the number of clusters and the distance between each sample, which is the advantage of GNG over other clustering methods. The parameters fine-tuned for each model on each dataset are shown in Table 7.

In DBL-GNG, we can reuse the default parameter settings to obtain the topological map. However, the topological map contains links from one cluster to another. Therefore, we introduce an edge cutting method in this paper. This edge cutting method is regarded as a post-processing method to cut those unimportant edges from the network. We can modify the filtering probability p in Equation (33) to improve the clustering results without retraining the entire model. The filtering probability p used in this experiment is also shown in Table 7. We show the clustering results of each method in Table 8, including the unadjusted filtering probability DBL-GNG results. Each method is run 10 times, and the average clustering accuracy is reported.

From the results, we can see that without adjusting the filtering probability, DBL-GNG does not perform better than other clustering models. But in some cases, such as in the Spiral dataset, DBL-GNG performs better than k-means and agglomerative clustering. The reason is that the concept of GNG is to grow from one node to another, while k-means and agglomerative clustering are based on regional clustering, so these two models perform well in regional clustering datasets, such as R15, D31, and H72D. DBSCAN works better in the Spiral dataset because DBSCAN groups the data within the distance and contains at least the minimum number of points from the neighborhood. But it should be noted that DBL-GNG does not know how many clusters there are in advance, nor the distance between each datum. However, after adjusting the filtering probability, it successfully cuts the edge and obtains better clustering results. In terms of computation time, we compare all methods on the H72D dataset. DBSCAN is the fastest, taking 0.70 s to complete, followed by k-means, which takes 0.087 s, then DBL-GNG, which takes about 0.852 s, and the slowest is agglomerative clustering, which takes 3.237 s.

To demonstrate the performance of DBL-GNG, we plot the clustering map results of the dataset Sp in Figure 20. It can be seen that even though k-means clustering knows the total number of clusters in advance, it is unable to distinguish the spiral data. Agglomerative clustering gives the worst result, with more than three clusters shown in the figure. This is because agglomerative clustering is unable to merge the hierarchical relationships based on neighboring data on the spiral. DBSCAN solves this neighboring data problem, and it is able to cluster the data into three spirals. In DBL-GNG, it is able to distinguish the three spiral data; however, we can see in the upper right part that the tail is clustered into another group. This is because of the edge cutting method. At a filtering probability of 0.15, it cuts the edges of the tail because the tail edges have fewer activations. By reducing the filtering probability, we can obtain a perfect spiral clustering result.

While agglomerative clustering performs poorly on the spiral clustering dataset, it performs well on region clustering datasets such as H72D. The reason why agglomerative clustering performs well in H72D is that the distance threshold is perfectly defined to fit the dataset. However, in the real world, most datasets are imbalanced, such as the Agg dataset, or contain different types of shapes, such as the Spiral dataset. To analyze the difference between agglomerative clustering and adjusted DBL-GNG clustering, we plot the results of both methods on Figure 21. From the figure, we can see that the agglomerative clustering method successfully distinguishes all the clusters, and the size of the clusters is obviously fixed. We highlight the areas in the image with circles. We can see that it successfully separates the set of data in circle 1 into two groups, and merges the two sets of data together in circle 2. However, in DBL-GNG, it merges the data in cycle 1, but splits the data in circle 2. From another perspective, DBL-GNG can continuously group two sets of data with the same density into one group, as shown in circle 1, and split them according to their density as shown in circle 2. This is the difference between DBL-GNG and the agglomerative clustering model.

5.7. Experiment DBL-GNG in Real Environment

We also test the DBL-GNG in a real-world environment to demonstrate its effectiveness and adaptability. This study employs point cloud data from the depth camera (Orbbec Femto Mega) with more than 100,000 points. First, we follow most previous works, which use voxel grid filters to reduce point cloud data before processing. Meanwhile, we also follow Fernando et al. [13] without entering all the data for learning. Therefore, we set a probability of 0.1% to select random data for learning. This setup is a little different from the previous one, we set the starting point to 10 and the maximum number of nodes to 500. At the same time, we do not perform additional training but perform edge cutting every frame. Two experiments are conducted. The first experiment involves moving a hand in front of the depth camera, while the other involves moving a pack of snack on a table. The experimental results are shown in Figure 22 and Figure 23, respectively. The speed is about 40 FPS. Through this, we can demonstrate that DBL-GNG is able to adapt to dynamic data.

Experiments show that DBL-GNG has adaptability to dynamic environments. We can see this in Figure 22. The topological map can be mapped to the hand as the gesture changes. Another discussion is that when learning with random selected data, it is able to generate a network in more nodes than a voxel filter. But it is not suitable for dynamically changing environments. We can see that the hand in column 6 contains more noise than using the voxel filter. Because it randomly selects data, this means that it will undergo unstable learning like the standard GNG in Figure 3.

In Figure 23, column 6 shows one of the limitations of DBL-GNG, which is that nodes need time to move from the old object location (snack) to the new object location (hand). All nodes are created on the snack before this, at which point the maximum number of nodes is reached. But once the hand is in, it takes time to move the node from the snack to the hand. To solve this problem, we have to implement a way to reduce the nodes in advance, or if a new object is detected, the maximum nodes should be increased temporary. We leave this study to our future work.

6. Conclusions

This paper introduces a novel GNG method called Distributed Batch Learning Growing Neural Gas (DBL-GNG). First, it performs distributed initialization, placing multiple starting points in the dataset. Then, we introduce matrix calculation into the batch learning method to perform batch learning. Next, we recommend adding multiple nodes to the network at once and removing those that are inactive. Finally, edge cutting methods are used to remove unimportant edges from the network to form better clusters. Experimental results show that DBL-GNG runs faster than other GNG algorithms, especially on large-scale datasets, where it runs at least 10 times faster than other GNG methods. It also has smaller quantization errors compared to other GNG methods. To accelerate convergence learning, we implement multi-scale processing in DBL-GNG, called MS-DBL-GNG, which is twice as fast as DBL-GNG and proves that DBL-GNG is scalable. In addition, we compare the clustering results with other unsupervised learning methods and explain the advantages and disadvantages of DBL-GNG. We also demonstrate DBL-GNG in a real environment. We build topological maps on 3D point clouds of moving objects. The results show that DBL-GNG has the ability to adapt to dynamic data environments. However, there is a problem that when the network reaches the maximum number of nodes, it takes time to reduce nodes from the previous object and create nodes on the new object. We believe that this problem does exist in most previous works. We will leave this investigation and resolution to our future work.

Author Contributions

Conceptualization, C.Z.S. and N.K.; methodology, C.Z.S. and A.A.S.; software, C.Z.S. and T.O.; validation, C.Z.S., A.A.S. and T.O.; formal analysis, C.Z.S.; investigation, C.Z.S.; resources, C.Z.S.; data curation, C.Z.S.; writing—original draft preparation, C.Z.S.; writing—review and editing, C.Z.S.; visualization, C.Z.S.; supervision, A.A.S. and T.O.; project administration, N.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by Japan Science and Technology Agency (JST), Moonshot R&D, under grant number JPMJMS2034, and Tokyo Metropolitan University (TMU) Local 5G research support. The authors greatly appreciate the scholarship support from the Japan Ministry of Education, Culture, Sports, Science, and Technology (MEXT).

Data Availability Statement

The data that support the findings of this study are openly available in GitHub at https://github.com/CornerSiow/DBL-GNG (accessed on 20 May 2024).

Acknowledgments

The authors thank Fernando Ardilla and Adnan Rachmat Anom Besari for their careful review and advice.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

GNG	Growing Neural Gas
BL	Batch Learning
FCM	Fuzzy c-means
MS-BL-GNG	Multi-Scale BL GNG
DBL-GNG	Distributed Batch Learning GNG
GNG-U	GNG with Utility

References

Uykan, Z. Fusion of centroid-based clustering with graph clustering: An expectation-maximization-based hybrid clustering. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 4068–4082. [Google Scholar] [CrossRef] [PubMed]
Shi, D.; Zhu, L.; Li, Y.; Li, J.; Nie, X. Robust structured graph clustering. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 4424–4436. [Google Scholar] [CrossRef] [PubMed]
Ardilla, F.; Saputra, A.A.; Kubota, N. Multi-Scale Batch-Learning Growing Neural Gas Efficiently for Dynamic Data Distributions. Int. J. Autom. Technol. 2023, 17, 206–216. [Google Scholar] [CrossRef]
Narvaez Rojas, C.; Alomia Peñafiel, G.A.; Loaiza Buitrago, D.F.; Tavera Romero, C.A. Society 5.0: A Japanese concept for a superintelligent society. Sustainability 2021, 13, 6567. [Google Scholar] [CrossRef]
Greer, C.; Burns, M.; Wollman, D.; Griffor, E. Cyber-Physical Systems and Internet of Things; National Institute of Standards and Technology: Gaithersburg, MD, USA, 2019. [Google Scholar]
Oshio, K.; Kaneko, K.; Kubota, N. Multi-scopic simulation for human-robot interactions based on multi-objective behavior coordination. In Proceedings of the International Workshop on Advanced Computational Intelligence and Intelligent Informatics, Beijing, China, 31 October–3 November 2021; pp. 3–8. [Google Scholar]
Masuyama, N.; Loo, C.K.; Ishibuchi, H.; Kubota, N.; Nojima, Y.; Liu, Y. Topological clustering via adaptive resonance theory with information theoretic learning. IEEE Access 2019, 7, 76920–76936. [Google Scholar] [CrossRef]
Liu, Y.; Ishibuchi, H.; Masuyama, N.; Nojima, Y. Adapting reference vectors and scalarizing functions by growing neural gas to handle irregular Pareto fronts. IEEE Trans. Evol. Comput. 2019, 24, 439–453. [Google Scholar] [CrossRef]
Masuyama, N.; Nojima, Y.; Ishibuchi, H.; Liu, Z. Adaptive Resonance Theory-based Clustering for Handling Mixed Data. In Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18–23 July 2022; pp. 1–8. [Google Scholar]
Fritzke, B. A growing neural gas network learns topologies. In Proceedings of the Advances in Neural Information Processing Systems 7, Denver, CO, USA, 28 November–1 December 1994. [Google Scholar]
Doteguchi, N.; Kubota, N. Topological Tracking for Mobility Support Robots Based on Multi-scale Batch Learning Growing Neural Gas. In Proceedings of the Mobile Wireless Middleware, Operating Systems and Applications: 10th International Conference on Mobile Wireless Middleware, Operating Systems and Applications (MOBILWARE 2021); Springer: Cham, Switzerland, 2022; pp. 17–31. [Google Scholar]
Toda, Y.; Matsuno, T.; Minami, M. Multilayer batch learning growing neural gas for learning multiscale topologies. J. Adv. Comput. Intell. Intell. Inform. 2021, 25, 1011–1023. [Google Scholar] [CrossRef]
Ardilla, F.; Saputra, A.A.; Kubota, N. Batch Learning Growing Neural Gas for Sequential Point Cloud Processing. In Proceedings of the 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Prague, Czech Republic, 9–12 October 2022; pp. 1766–1771. [Google Scholar]
Iwasa, M.; Kubota, N.; Toda, Y. Multi-scale batch-learning growing neural gas for topological feature extraction in navigation of mobility support robots. In Proceedings of the 7th International Workshop on Advanced Computational Intelligence and Intelligent Informatics (IWACIII 2021), Beijing, China, 31 October–3 November 2021. [Google Scholar]
Li, F.; Ye, Y.; Tian, Z.; Zhang, X. CPU versus GPU: Which can perform matrix computation faster—Performance comparison for basic linear algebra subprograms. Neural Comput. Appl. 2019, 31, 4353–4365. [Google Scholar] [CrossRef]
Saroya, M.; Best, G.; Hollinger, G.A. Roadmap learning for probabilistic occupancy maps with topology-informed growing neural gas. IEEE Robot. Autom. Lett. 2021, 6, 4805–4812. [Google Scholar] [CrossRef]
Liang, J.; Bai, L.; Dang, C.; Cao, F. The K-means-type algorithms versus imbalanced data distributions. IEEE Trans. Fuzzy Syst. 2012, 20, 728–745. [Google Scholar] [CrossRef]
Nigam, K.; McCallum, A.K.; Thrun, S.; Mitchell, T. Text classification from labeled and unlabeled documents using EM. Mach. Learn. 2000, 39, 103–134. [Google Scholar] [CrossRef]
Goldwater, S.; Griffiths, T. A fully Bayesian approach to unsupervised part-of-speech tagging. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Stroudsburg, PA, USA, 25–27 June 2007; pp. 744–751. [Google Scholar]
Baraldi, A.; Blonda, P. A survey of fuzzy clustering algorithms for pattern recognition. I. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 1999, 29, 778–785. [Google Scholar] [CrossRef] [PubMed]
Anderson, D.T.; Bezdek, J.C.; Popescu, M.; Keller, J.M. Comparing fuzzy, probabilistic, and possibilistic partitions. IEEE Trans. Fuzzy Syst. 2010, 18, 906–918. [Google Scholar] [CrossRef]
Fritzke, B. Growing cell structures—A self-organizing network for unsupervised and supervised learning. Neural Netw. 1994, 7, 1441–1460. [Google Scholar] [CrossRef]
Furao, S.; Hasegawa, O. An incremental network for on-line unsupervised classification and topology learning. Neural Netw. 2006, 19, 90–106. [Google Scholar] [CrossRef] [PubMed]
Furao, S.; Ogura, T.; Hasegawa, O. An enhanced self-organizing incremental neural network for online unsupervised learning. Neural Netw. 2007, 20, 893–903. [Google Scholar] [CrossRef] [PubMed]
Saputra, A.A.; Wada, K.; Masuda, S.; Kubota, N. Multi-scopic neuro-cognitive adaptation for legged locomotion robots. Sci. Rep. 2022, 12, 16222. [Google Scholar] [CrossRef] [PubMed]
Masuyama, N.; Amako, N.; Nojima, Y.; Liu, Y.; Loo, C.K.; Ishibuchi, H. Fast topological adaptive resonance theory based on correntropy induced metric. In Proceedings of the 2019 IEEE Symposium Series on Computational Intelligence (SSCI), Xiamen, China, 6–9 December 2019; pp. 2215–2221. [Google Scholar]
Gionis, A.; Mannila, H.; Tsaparas, P. Clustering aggregation. ACM Trans. Knowl. Discov. Data (Tkdd) 2007, 1, 4-es. [Google Scholar] [CrossRef]
Veenman, C.J.; Reinders, M.J.T.; Backer, E. A maximum variance cluster algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 1273–1280. [Google Scholar] [CrossRef]
Chang, H.; Yeung, D.Y. Robust path-based spectral clustering. Pattern Recognit. 2008, 41, 191–203. [Google Scholar] [CrossRef]
Siow, C.Z.; Obo, T.; Kubota, N. Top-Down Multi-Layer Learning for Growing Neural Gas. 2024; unpublished. [Google Scholar]
Toda, Y.; Wada, A.; Miyase, H.; Ozasa, K.; Matsuno, T.; Minami, M. Growing neural gas with different topologies for 3D space perception. Appl. Sci. 2022, 12, 1705. [Google Scholar] [CrossRef]
Saputra, A.A.; Hong, C.W.; Yani, M.; Ardilla, F.; Besari, A.R.A.; Toda, Y.; Kubota, N. Topological based Environmental Reconstruction for Efficient Multi-Level Control of Robot Locomotion. In Proceedings of the 2022 International Electronics Symposium (IES), Surabaya, Indonesia, 9–11 August 2022; pp. 491–496. [Google Scholar]
Fritzke, B. A self-organizing network that can follow non-stationary distributions. In Proceedings of the Artificial Neural Networks—ICANN’97: 7th International Conference, Lausanne, Switzerland, 8–10 October 1997; Proceeedings 7. Springer: Berlin/Heidelberg, Germany, 1997; pp. 613–618. [Google Scholar]
Chinnamuniyandi, M.; Chandran, S.; Changjin, X. Fractional order uncertain BAM neural networks with mixed time delays: An existence and Quasi-uniform stability analysis. J. Intell. Fuzzy Syst. 2024, 46, 4291–4313. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]

Figure 1. The standard GNG exploration from one data area to another. The starting point is at the top right, and at least one epoch is needed for the network to grow to the next data area. After the exploration, the nodes on the path will remain, which are useless and cannot be deleted according to the standard GNG algorithm. (a) Nodes start from the data area on one side. (b) Exploring the next data area takes at least one epoch. (c) After exploring the next data area, the path remains on the network.

Figure 2. The average execution time (in seconds) per epoch to find the winner node for each datum using the standard GNG method or the batch learning GNG method.

Figure 3. The network update problem if data are input one by one for learning. Network nodes will be constantly moving and primarily depend on the input data sequence. (a) The winner node

w_{s_{1}}

at time t is updating its position toward the input data sample. (b) The winner node

w_{s_{1}}

at time

t + 1

is updating its position toward the input data sample.

Figure 3. The network update problem if data are input one by one for learning. Network nodes will be constantly moving and primarily depend on the input data sequence. (a) The winner node

w_{s_{1}}

at time t is updating its position toward the input data sample. (b) The winner node

w_{s_{1}}

at time

t + 1

is updating its position toward the input data sample.

Figure 4. Standard GNG cannot remove edges when there is small noise between two clusters. (a) Perfect topological map of two clusters. (b) Imperfect topological map of two clusters.

Figure 5. The overall process of batch learning on GNG.

Figure 6. The flowchart of DBL-GNG.

Figure 7. The clustering pattern for each dataset. C means cluster. P means point. (a) Agg—7 C—788 P (b) D31—31 C—3100 P (c) R15—15 C—600 P (d) Path—3 C—300 P (e) Sp—3 C—312 P (f) H72D—72 C—14,400 P.

Figure 8. The topological map of DBL-GNG on Agg, D31, R15, Path, Sp, and H72D dataset. A = Distributed initialization; B = Maximum number of nodes reached; C = Train for an additional 10 epochs; D = After edge cutting.

Figure 9. The problem of the fast convergence rate of standard GNG nodes. It can be seen that although all clusters have the same amount of data, some clusters have more nodes while other clusters have fewer nodes.

Figure 10. The number of node and quantization error per epoch for FCM-BL-GNG, MS-BL-GNG, FastMS-BL-GNG, and DBL-GNG on the H72D dataset.

Figure 11. The number of node and quantization error per epoch for FCM-BL-GNG, MS-BL-GNG, FastMS-BL-GNG, and DBL-GNG on the Spiral dataset.

Figure 12. The number of node and quantization error per epoch for FCM-BL-GNG, MS-BL-GNG, FastMS-BL-GNG, and DBL-GNG on the Path dataset.

Figure 13. The number of node and quantization error per epoch for FastMS-BL-GNG, and MS-DBL-GNG on the H72D dataset.

Figure 14. The number of node and quantization error per epoch for FastMS-BL-GNG, and MS-DBL-GNG on the Path dataset.

Figure 15. The number of node and quantization error per epoch for FastMS-BL-GNG, and MS-DBL-GNG on the Spiral dataset.

Figure 16. The clustering pattern for each dataset. C means cluster. P means point.

Figure 17. The total movement and quantization error per epoch for standard GNG and DBL-GNG on the H72D dataset.

Figure 18. The total movement and quantization error per epoch for standard GNG and DBL-GNG on the Path dataset.

Figure 19. The total movement and quantization error per epoch for standard GNG and DBL-GNG on the Spiral dataset.

Figure 20. The clustering results of different methods on the Spiral dataset. One color represents one cluster. DBL-GNG* is the adjusted filtering probability.

Figure 21. The clustering results of agglomerative and DBL-GNG methods on the H72D dataset. One color represents one cluster.

Figure 22. DBL-GNG topology from point cloud where a hand is moving in front of the depth camera. The first row is the original image, the second row is randomly selected data for learning, and the last row is using voxel filters.

Figure 23. DBL-GNG topology from point cloud where a pack of snacks is on a table in front of the depth camera. The first row is the original image, the second row is randomly selected data for learning, and the last row is using voxel filters. Place the snacks in the first four columns and move the snacks in the last four columns.

Table 1. The differences between the proposed method (DBL-GNG) and other methods.

Method	Incremental Learning	Support Non-Stationary Data	No Need to Predefine the Total Number of Clusters	No Need to Learn Data One by One
k-means	No	No	No	Yes
FCM	No	No	No	Yes
GCS	Yes	No	Yes	No
SOINN	Yes	Yes	Yes	No
LB-SOINN	Yes	Yes	Yes	No
TCA	Yes	Yes	Yes	No
GNG	Yes	Yes	Yes	No
FCM-BL-GNG	Yes	No	Yes	No
MS-BL-GNG	Yes	No	Yes	No
FastMS-BL-GNG	Yes	Yes	Yes	No
DBL-GNG	Yes	Yes	Yes	Yes

Table 2. Parameter settings for each different GNG method. Custom means the user has to define it beforehand. L is the learning phase of FastMS-BL-GNG.

Parameter	GNG	GNG-U	FCM-BL-GNG	MS-BL-GNG	FastMS-BL-GNG	DBL-GNG
Node Initialization	2	2	Custom	3	3	Custom
Winner Node Learning	Custom	Custom	N/A	1.0	$1.0 - α (L^{\max} - L)$	0.5
Neighbor Node Learning	Custom	Custom	N/A	0.05	$α (L^{\max} - L)$	0.01
Node Growth Frequency	1 node per $γ$ interval	1 node per $γ$ interval	1 node per epoch	1 node per mini-batch	add-if-silent per iteration and 1 node per phase	Auto calculate g nodes per epoch
Remove Edge Frequency	Max edge age $θ$	Max edge age $θ$	Reset edges every epoch	Reset edges every epoch	Reset edges every phase	Reset edges every epoch
New Node Error Discount $ρ$	Custom	Custom	0.5	0.5	Interpolation of $q_{1}$ and $q_{2}$	0.5
All Node Error Discount $δ$	Custom	Custom	Reset to 0	0.05	N/A	0.5
Node Deletion	Isolated nodes	Isolated nodes and use utility value	N/A	Inactive nodes	Isolated nodes and selected inactive nodes	Isolated nodes and inactive nodes
Maximum number of nodes	Custom	Custom	Custom	Custom	Custom	Custom

Table 3. The maximum node and initial starting point for each dataset. The initial starting point is the same as the total number of clusters for each dataset.

Dataset	Maximum Nodes	Initial Starting Points
Agg	68	7
D31	93	31
R15	45	15
Path	30	3
Sp	96	3
H72D	360	72

Table 4. The quantization error for each dataset.

Dataset	GNG	GNG-U	FCM-BL-GNG	MS-BL-GNG	FastMS-BL-GNG	DBL-GNG	MS-DBL-GNG
Agg	1.058 ± 0.015	1.058 ± 0.015	1.033 ± 0.005	0.992 ± 0.005	1.009 ± 0.006	0.981 ± 0.003	0.989 ± 0.007
D31	0.666 ± 0.007	0.665 ± 0.008	0.665 ± 0.007	0.629 ± 0.002	0.649 ± 0.007	0.627 ± 0.003	0.627 ± 0.004
R15	0.272 ± 0.006	0.267 ± 0.007	0.267 ± 0.004	0.247 ± 0.002	0.252 ± 0.004	0.246 ± 0.003	0.248 ± 0.003
Path	1.389 ± 0.034	1.375 ± 0.028	1.336 ± 0.031	1.328 ± 0.011	1.408 ± 0.062	1.288 ± 0.023	1.312 ± 0.024
Sp	0.491 ± 0.009	0.485 ± 0.010	0.448 ± 0.004	0.497 ± 0.018	0.476 ± 0.008	0.449 ± 0.005	0.450 ± 0.005
H72D	0.520 ± 0.004	0.523 ± 0.005	0.516 ± 0.002	0.489 ± 0.002	0.511 ± 0.004	0.485 ± 0.001	0.489 ± 0.001

Bold text indicates the lowest quantization error for each dataset among all methods.

Table 5. The total computation time in seconds for each dataset.

Dataset	GNG	GNG-U	FCM-BL-GNG	MS-BL-GNG	FastMS-BL-GNG	DBL-GNG	MS-DBL-GNG
Agg	1.035 ± 0.024	1.038 ± 0.020	5.564 ± 0.062	0.709 ± 0.015	0.413 ± 0.010	0.053 ± 0.001	0.053 ± 0.010
D31	3.283 ± 0.028	3.298 ± 0.030	22.085 ± 0.288	3.428 ± 0.055	1.607 ± 0.044	0.054 ± 0.003	0.065 ± 0.004
R15	0.692 ± 0.012	0.735 ± 0.022	2.052 ± 0.050	0.411 ± 0.030	0.323 ± 0.020	0.023 ± 0.001	0.047 ± 0.005
Path	0.299 ± 0.005	2.141 ± 0.835	1.099 ± 0.026	0.151 ± 0.010	0.144 ± 0.002	0.029 ± 0.001	0.028 ± 0.002
Sp	0.334 ± 0.011	0.917 ± 0.112	2.894 ± 0.022	1.314 ± 0.317	0.255 ± 0.011	0.076 ± 0.003	0.070 ± 0.002
H72D	20.874 ± 0.415	20.582 ± 0.221	487.846 ± 6.111	59.151 ± 0.609	12.743 ± 0.075	0.852 ± 0.018	0.680 ± 0.023

Bold text indicates the shortest computation time for each dataset among all methods.

Table 6. The overall comparison of various GNG methods.

Dataset	Node Convergence Speed	Error Convergence Speed	Learning Stability	Quantization Error	Computational Time
GNG	Fast	Fast	No	High	Slow
GNG-U	Fast	Fast	No	High	Slow
FCM-BL-GNG	Slow	Slow	Yes	Medium	Slowest
MS-BL-GNG	Fast	Fast	Yes	Low	Medium
FastMS-BL-GNG	Fastest	Fastest	Yes	Medium	Fast
DBL-GNG	Medium	Medium	Yes	Lowest	Faster
MS-DBL-GNG	Fast	Fast	Yes	Lower	Fastest

Table 7. The parameter settings for each clustering model on each dataset.

Dataset	k-Means	DBSCAN	Agglomerative	DBL-GNG
Agg	k = 7	$e p s$ = 3.0, $m i n P t s$ = 1	distance = 75	p = 0.45
D31	k = 31	$e p s$ = 0.5, $m i n P t s$ = 5	distance = 15	p = 0.65
R15	k = 15	$e p s$ = 0.5, $m i n P t s$ = 5	distance = 10	p = 0.65
Path	k = 3	$e p s$ = 1.0, $m i n P t s$ = 5	distance = 45	p = 0.55
Sp	k = 3	$e p s$ = 3.0, $m i n P t s$ = 5	distance = 25	p = 0.10
H72D	k = 72	$e p s$ = 0.5, $m i n P t s$ = 5	distance = 40	p = 0.45

Table 8. The clustering results of clustering models on each dataset.

Dataset	k-Means	DBSCAN	Agglomerative	DBL-GNG with p = 0.15	DBL-GNG
Agg	0.746 ± 0.089	0.792 ± 0.001	0.957 ± 0.001	0.808 ± 0.049	0.968 ± 0.041
D31	0.897 ± 0.027	0.843 ± 0.001	0.959 ± 0.001	0.158 ± 0.054	0.963 ± 0.010
R15	0.950 ± 0.046	0.728 ± 0.001	0.992 ± 0.001	0.730 ± 0.082	0.994 ± 0.002
Path	0.455 ± 0.004	0.720 ± 0.001	0.543 ± 0.001	0.367 ± 0.001	0.805 ± 0.094
Sp	0.235 ± 0.003	1.000 ± 0.001	0.253 ± 0.001	0.998 ± 0.005	1.000 ± 0.001
H72D	0.903 ± 0.013	0.821 ± 0.001	0.979 ± 0.001	0.436 ± 0.038	0.901 ± 0.017

Bold text indicates the best clustering result for each dataset among all methods.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Siow, C.Z.; Saputra, A.A.; Obo, T.; Kubota, N. Distributed Batch Learning of Growing Neural Gas for Quick and Efficient Clustering. Mathematics 2024, 12, 1909. https://0-doi-org.brum.beds.ac.uk/10.3390/math12121909

AMA Style

Siow CZ, Saputra AA, Obo T, Kubota N. Distributed Batch Learning of Growing Neural Gas for Quick and Efficient Clustering. Mathematics. 2024; 12(12):1909. https://0-doi-org.brum.beds.ac.uk/10.3390/math12121909

Chicago/Turabian Style

Siow, Chyan Zheng, Azhar Aulia Saputra, Takenori Obo, and Naoyuki Kubota. 2024. "Distributed Batch Learning of Growing Neural Gas for Quick and Efficient Clustering" Mathematics 12, no. 12: 1909. https://0-doi-org.brum.beds.ac.uk/10.3390/math12121909

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Distributed Batch Learning of Growing Neural Gas for Quick and Efficient Clustering

Abstract

1. Introduction

2. Related Work

3. Distributed Batch Learning Growing Neural Gas

3.1. Growing Neural Gas

3.2. Shortage of Standard GNG

3.3. Batch Learning Growing Neural Gas

3.4. Distributed Batch Learning Growing Neural Gas

4. Implementation

4.1. Dataset

4.2. Model Parameters

4.3. Evaluation Metrics

5. Experiments

5.1. Topological Map

5.2. Comparison of DBL-GNG with Other GNGs

5.3. Discussion on Convergence Rate

5.4. Multi-Scale DBL-GNG

5.5. Learning Stability

5.6. Comparison with Other Clustering Methods

5.7. Experiment DBL-GNG in Real Environment

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI