1. Introduction
The Internet of Things (IoT) [
1] is the next generation of the Internet, which promises to endow potentially every physical object with sensors and/or actuators, allowing its integration between each of these physical objects and with other physical and virtual elements. Nowadays, the IoT has emerged from its infancy and evolved into a broad view encompassing the interconnection of sensors, actuators, the most varied intelligent objects and wireless sensor networks (WSNs). According to some authors, IoT is the backbone of a smart city [
2] and WSNs are the sensing-actuation arm of the IoT. WSNs allow the seamless integration of sensing and communication elements into urban infrastructure, forming a digital skin over it [
3]. The information generated by multiple WSNs in the IoT will be shared between different platforms, systems and applications, enabling an integrated and common view of the city. Such a view will provide inputs for well-grounded, timely and holistic decisions, in order to optimize processes and services, ultimately contributing to building cities as greener and more sustainable ecosystems.
The devices that make up the IoT have wireless communication capability and are equipped either with embedded sensors, as it is the case with smart phones, or with dedicated sensors, as in the case of WSNs [
4]. Many IoT devices, mainly in the context of WSN, are powered by batteries. The wireless communication capabilities and the small dimensions of the devices in a WSN reduce the costs and complexity of implementing these networks and increase the flexibility and the resilience of the system as a whole. The low cost of the sensors means that, in general, there is redundancy in their deployment, so that, if some devices fail, the monitoring area will continue to be covered by the remaining devices. The downside of these devices is the dependency they create between the lifetime of the system and the duration of their batteries [
5].
Average battery life is a relevant metric when evaluating an IoT system because it guarantees uptime without intervention in the environment, which can be undesirable or costly, depending on the purpose of the device and the application. In IoT, the system’s functionalities are distributed among the devices that collect the data, those that transmit it and those that apply algorithms to process and analyze the data. A widely adopted technique for increasing the lifetime of an IoT system is to use data aggregation and/or fusion algorithms on the devices that process and analyze the data.
Algorithms for data aggregation and/or fusion executed within the network avoid unnecessary transmissions, thereby reducing the volume of data transmitted and energy consumption, making the network more efficient [
6]. Data fusion techniques combine data from several sensors, aiming at achieving higher accuracy and more specific inferences than those obtained using a single sensor [
6]. The outcome of a fusion operation can be a data synthesis or a transformation of the data from a state closer to the raw data to a level closer to a decision-making layer. In IoT, fusion algorithms aim at synthesizing the data, so that the volume of data transmitted is reduced, thus saving energy in the system [
7].
There are already works in the literature proposing the application of data fusion techniques in the context of WSN and IoT systems [
6,
8,
9,
10,
11]. However, until now, all works consider that the application requirements are previously known (at the design time), and the devices, algorithms and network behavior were designed specifically for the requirements of the target application (or applications). Such requirements vary from the nature of the sensed phenomenon, the sensing intervals and the data range to the events of interest. Such a feature was suitable for WSN systems, mainly in their first generation which, as mentioned, can be seen as intranets of sensing devices. However, with the IoT evolution, in current scenarios, it is not realistic to consider that such requirements are known a priori, particularly in the context of IoT for smart cities.
In a smart city, it is envisaged that the IoT will provide a sensing and communication infrastructure to be shared by multiple stakeholders, ranging from end users to corporate systems and government agencies. Applications that will make use of this infrastructure may not even exist at the time of its creation. In addition, to reach the full potential of an IoT ecosystem for smart cities, one must take advantage of the opportunistic nature of interactions, which make devices, services and applications appear and disappear at every moment, creating service opportunities in an ad hoc manner. Therefore, when designing a data fusion algorithm in this context, it should not be assumed that the information about the data nature, intervals or events that will be monitored is known, because the objective is to reuse the same infrastructure for different applications, in diverse contexts [
7,
12].
Contributions
One of the first attempts to accommodate the discussed IoT requirements in terms of data fusion is described in [
7]. In the paper, the authors propose
Hephaestus, a data fusion algorithm that operates without the need to be aware of the application requirements. Instead, based on statistical properties of the data (kurtosis, asymmetry and mean), it separates the datasets into monomodal subsets. These subsets have the potential to map the events taking place in the monitored environment, without the need to have prior knowledge about them. Therefore,
Hephaestus is able to identify and differentiate several events occurring in a monitored environment. However, it analyzes the data based on descriptive statistics, and its limitation is the fact that different datasets may present the same asymmetry and, therefore, not be classified as different. In other words, the classification of events provided by
Hephaestus is not perfectly accurate and may lead to the loss of relevant information or misinterpretation of phenomena occurring in the environment. By phenomena, we mean any occurrence of data masses (set of samples with a recognizable data peak) that may or may not correlate to events in the monitored environment. Our goal is to identify those phenomena and use them as an analysis basis to provide better input for an application’s inferences.
In this work, we present a new algorithm called Heracles, which aims to overcome the limitations observed in algorithms based on descriptive statistics, such as Hephaestus. To improve the accuracy of event identification based on the analysis of the frequency concentration of the sensing data, we changed the perspective of analysis: Instead of using descriptive statistics, our approach adopts local maximums and minimums. The objective of such an approach is to prevent decisions regarding the grouping of data into subsets that denote events of relevance to any application from being made from just one number, as in Hephaestus (the outcome of its statistical analysis). Instead, mathematical functions are used with additional information, such as intervals of growth and decreases in values, in order to understand which are the local maximums and minimums.
Heracles does consider the context of the application, adapting to the dataset to perform the analysis. By context, we mean application features in terms of data thresholds, sensor interfaces and data density (number of samples in a given phenomenon). Heracles does not need to know such application requirements in advance, but its performance is directly linked to how adapted it is to the context and how adjusted its parameters are to the configuration of the monitored environment. For this reason, Heracles does not depend on the requirements of the applications, but on the monitored environment and its characteristics.
The final goal of our proposal is to contribute to providing an infrastructure for collecting and analyzing IoT data for smart cities, where multiple applications benefit, possibly in an ad hoc way, from the data generated and the information produced. The proposed fusion algorithm aims to, on the one hand, minimize data transmission to save energy from the sensing devices. On the other hand, it aims to generate value-added information of a higher quality than the raw data, which will serve as input to information systems and decision-making processes. By receiving a dataset and organizing it in different phenomena, Heracles will be able to provide a better input for inferences at higher decision levels. Such information will be generated agnostically to the requirements of specific applications, in order to be able to explore the wealth of data made available in IoT systems, in a creative and opportunistic way.
The remainder of this paper is organized as follows. In
Section 2, we discuss related work.
Section 3 presents our proposal in detail and
Section 4 describes the experiments carried out to evaluate it.
Section 5 provides final remarks and future research directions.
2. Related Work
With the challenge of taking advantage of the massive amount of data generated by sensor networks in the context of IoT systems, producing value-added information and knowledge has driven the publication of several works recently. In this section, we analyze some of these works, which are based on the use of different data analysis techniques. For each related work, we describe its general characteristics, its purpose and its similarities and differences in comparison to Heracles.
Examples of proposals with a similar purpose to
Heracles are: [
6,
7,
9,
10,
11,
13,
14,
15].
In [
9], the authors proposed a new average-based sensor fusion function that intends to fuse data of multiple applications simultaneously while handling uncertainty in the data samples The authors argue that the sensor output should include a reading-related uncertainty value (Δy), and the sensor observation should be an interval (y − Δy, y + Δy). The reading interval is estimated by adding two tolerance values, respectively, to the left and the right of the pure value. The tolerance values may or may not be the same.
The following proposals deal with the challenge of executing data fusion algorithms for various applications in an IoT environment composed of large-scale wireless sensor networks.
The authors in [
11] modified the moving average filter (MAF) technique to consider the importance of the sensed data for the application. The work of [
11] presents an enhanced moving average filter, whose main idea is to weight the dataset to express the requirements (data ranges, data rates and states) of different applications. The disadvantage of this approach is the need for prior knowledge about the requirements of the application. As discussed, this premise is not always true in IoT scenarios, particularly in the smart cities domain, where new applications can be deployed in the ecosystem in an opportunistic or ad hoc way. In scenarios where the set of applications changes rapidly or frequently, EMAF is rendered unfeasible, needing to be constantly reconfigured. As EMAF [
11] uses unprocessed data (readings with a low level of abstraction), high-level data of abstraction (such as decisions) are not properly handled. However, in later works, the same authors of EMAF presented high abstraction fusion methods [
6]. In [
6], the authors adapted some existing fusion methods to perform information fusion in data for multiple applications in the context of shared sensor networks. The authors proposed the following methods of information fusion: (i) enhanced Bayesian inference (EBI), (ii) enhanced Dempster–Shafer inference (EDSI) and (iii) enhanced fault tolerance average (EFTA).
EBI formalizes the combination of evidence according to the rules of Bayesian probability theory for each application [
16]. It represents the hypothesis that the application “Y” will behave in a determined way given the result of the application “X”, considering the set of states of each isolated application. In EDSI, each application has its own set of hypotheses, which represents the application’s behavior. EDSI infers through the Dempster–Shafer inference method on the conditions of isolated applications and considers the premise that both applications go to a given state simultaneously. EFTA calculates the data intervals according to the requirements of each application, applying the traditional fault tolerance interval method. Then, EFTA produces the combination of each calculated interval.
Despite dealing with information for several applications, the solutions proposed in [
6,
11] present a significant drawback as they need to be pre-configured with the requirements of each application in the IoT network. While EMAF [
11] needs to know the application requirements to properly weigh the dataset, EBI, EDSI and EFTA [
6] need to infer the states for each application to present the decision that integrates various applications. Therefore, again the proposals fall in the limitation of not operating efficiently in scenarios where the set of applications in execution constantly changes. In contrast, in
Heracles, we have introduced a new method of information fusion that does not depend on the requirements of the application, as long as the monitored environment is evaluated and used to calibrate the algorithm parameters.
In [
13], the authors presented a novel decision-level data fusion technique for multiapplication WSNs (MWSN) based on data density which is able to handle decision conflicts. The proposed data fusion technique divides the dataset into groups based on application requirements and evaluates them separately. Each data group is known as an abstract sensor. The advantage of dividing the dataset into abstract sensors is to avoid interference among distinct data and better represent phenomena in the MWSN. Then, decisions are extracted from each abstract sensor and an analysis process is performed to verify if there are conflicts among the decisions. After that, a tableau is used to solve the conflicts excluding any undesired behavior. Different from [
13], our work does not need to know a priori any application requirement. Moreover, our proposal has a different goal, since it is not a decision system but, instead, can be used jointly with a decision system to improve the quality of the decision outcome.
The authors in [
14] proposed a distributed algorithm to detect phenomena, such as fires, oil spills (or spills of toxic gases) and others, in a monitored environment where the sensors are mobile devices and the phenomenon is dynamic (moving, growing or shrinking). The authors assume that the monitored environment does not have a central server to collect and aggregate data from the sensors. In the algorithm proposed in [
14], sensors organize themselves into groups without intersection by choosing sensors as group leaders and assigning group members to sensors closer to the leader. The sensors of each group send the data to the leader, which aggregates the collected data and detects the local phenomenon (in the area covered by the sensors of the group). Then, based on the order of the leaders’ identifiers, each leader sends information on the detected local phenomenon to the next leader in that order. The receiving leader aggregates the information and sends it to the next leader, and so on. The last leader in the chain order aggregates information on all (locally) detected phenomena to discover the global phenomenon. In addition, the authors in [
14] proposed two algorithms for electing leaders: Based on information from the global event and based on information from local phenomena. They adopt an optimization technique to reduce the energy costs of sending local information. The approach reduces the amount of data transmitted between leaders, since the information on local phenomena sent between them is summarized by a tuple of values represented by data limits, peaks and the volume of the set.
The proposal described in [
14] has some similarities of purpose to our work as it is a distributed algorithm for event detection in a monitored environment. However, differently from our work, the authors in [
14] assume the following premises: (i) the network sensors are homogeneous in terms of processing power, battery time, storage and communication range; and (ii) the sensors have knowledge of the “standard range of values” being monitored. The premise of the sensors being homogeneous is totally inadequate for an IoT environment, characterized by a high degree of heterogeneity of the devices. Our proposal has no such restriction. In
Heracles, it is not necessary to specify the characteristics of the sensors or the data monitored at the time of system initialization. Our algorithm aims to adapt to the features of the monitored environment and network.
In [
7], a distributed algorithm is presented that uses information fusion to aggregate sensor data. The proposal uses descriptive statistics to make a peak analysis and thus identify changes in the monitored environment. The proposed algorithm, called
Hephaestus, calculates the mean, kurtosis and asymmetry of the input dataset. From these measurements, it decides whether or not to separate the set in two using the average as a parameter for dividing the sets. Then, it applies the same procedure to the subsets.
Hephaestus shares several features with our proposal and its limitations served as motivation for the design of
Heracles. In our current work, there is a better identification of the phenomena since we deal with the asymmetry to find the information peaks, which leads to a better characterization of the datasets in comparison with
Hephaestus.
3. Heracles
In this section, we present Heracles, a context-based data fusion algorithm that does not consider the application requirements. This section is divided into: (i) an overview of the algorithm; (ii) the system model used; (iii) the phenomena model used; and, finally, (iv) the algorithm description.
3.2. System Model
The IoT network considered in this work includes several gateways and a set of heterogeneous sensor and actuator nodes that can belong to multiple physical networks. The gateway acts as a facade for gathering the latest information about the network execution context and sensor nodes status. The information includes, for each sensor node, its sensing capabilities, current residual energy, operation mode and geographical location. Such information is acquired by messages sent by sensor nodes whenever it is required from any gateway. Regarding the sensor nodes, they can play two roles: (I) collector node and (II) fusion node. The collector node is responsible for collecting data and applying statistical methods to filter and describe data. The fusion node is responsible for collecting data from the sensor nodes, to identify different phenomena and send the output to the gateway. Such roles are determined a priori, at the deployment time, and depend on the device capabilities. Collector nodes are devices with less computational power than fusion nodes. The collector nodes are responsible for the description and data filtering steps of the algorithm, while the fusion nodes perform the data clustering. The collector nodes are responsible for realizing steps 1 and 2 sequentially, while the fusion node is responsible for performing step 3.
We model an IoT network as a forest F that comprises one or more physical networks. We model each physical network as an undirected graph G = (V, E), where V = (v1, v2, …, vn) represents the set of sensor nodes and E = (e1, e2, …, em) represents the set of all possible communication links among the sensor nodes in the same network.
For any given sensor node vi in V, i denotes the index of the node that belongs to the network. A sensor node Vi can provide one or more tasks depending on its capabilities to collect/sample different types of data, as, for instance, temperature, light, smoke and movement. Sensors can detect all events of interest occurring within their sensing range, provided that they have the required sensing unit. We assume that all the sensors in the physical network Gi have a valid communication path to reach at least one gateway. We also assume that the sensors in the same physical network are synchronized.
Each sensor may produce a data sample Si. Let each sample of data be represented as a symbol. Each symbol is a tuple containing four values: sensorID, type, measurement and timestamp. SensorID is the sensor identification. It can be any unique identification, for example, a sensor MAC number or its ID in the network. Type is represented by an integer. Each integer denotes a given sensing unit type such as temperature and humidity. Measurement is a numerical value representing the data obtained by the sensor (the data sample). Finally, timestamp represents a timestamp of the moment when the data were collected. In this sense, S2. measurement in our algorithm will represent a measurement produced by sensor 2.
4. Experiments
In this section, we describe the experiments performed for assessing Heracles regarding different aspects. We defined two main goals for the evaluations. The first goal is to analyze Heracles’ overhead in terms of communication and energy consumption. The second goal is to analyze the accuracy in terms of its success in correctly inferring the phenomena used in our case study.
To assess
Heracles’ overhead in terms of communication and energy consumption of the IoT network, we compare it with: (i) the naïve Bayes (NB), a well-known decision-level data fusion algorithm; (ii) the Hephaestus algorithm used as input to the NB; and (iii) the density-based data fusion algorithm presented in [
13].
To assess accuracy (in terms of success in correctly inferring the phenomena), we compare
Heracles with the naïve Bayes (NB) for multiple applications, but without being aware of the requirements of the applications. We also compare
Heracles with the proposal described in [
13], which is a multiapplication algorithm, and with Hephaestus, also without being aware of the requirements of the applications.
In this section, we first describe the adopted scenario, implementation details and the metrics used in our experiments. Thereafter, we discuss the results obtained in the performed experiments. All experiments lasted for one hour and were repeated 30 times, and the presented results had a confidence interval of 95%.
4.3. Metrics
The metrics used in the performed experiments for evaluating resource consumption were the memory and the energy consumption. The memory consumption is defined as the amount of memory used by our algorithm installed in the nodes (RAM and ROM) and the energy consumption is defined as the amount of energy consumed by the network, i.e., the total amount of energy consumed by each node when executing the steps of the algorithm. To evaluate the energy consumed by the execution of
Heracles, we used the energy model used in [
7]. Overall, the energy consumption of a sensor node during time t is calculated as shown in Equation (2):
where Ec represents the energy consumption of the communication module, and Es represents the energy consumption of the sensing module. For the ease of analysis, we assume that the data exchange between two neighboring sensor nodes (within the one-hop communication range) belonging to the same network is performed through direct communication. The energy consumption of transmitting l-bit (message of l bit in size) data over distance d is defined as Etx (l,d) [
6], computed as shown in Equation (3):
where Eelec and εamp are hardware-related parameters [
6]. We also assume that the receiver does not consume energy in the data exchange process. For any two distance sensors (outside the one-hop communication range, still belonging to the same network), the data communication is transferred by using the shortest path-based multihop routing protocol (please note that the routing process is out of the scope of our work). The energy consumption of transmitting l-bit (message of l bit in size) data from the source (src) to the destination (des) is defined as Etx (l, src, des), as shown in Equation (4):
where d is replaced from hop to hop, i is an iterator and k is the minimum hop count that the data travel from source to destination. The energy consumption of the sensing module is calculated by the linear Equation (5):
where ERsi represents the energy consumption of service i in one time unit and tsi represents the time period for performing service i.
The metric used for accuracy was Matthew’s correlation coefficient (MCC). Given the true positive rate TP (percentage of correctly classified as true occurrences of a given event), true negative rate TN (the event did not occur and this fact was correctly identified), false positive rate FP (percentage of instances classified as true but actually false) and false negative rate FN (percentage of instances classified as false but actually true), the MCC is computed as shown in Equation (6):
MCC obtains the maximum score (+1) when both true positive and true negative are 100\% and the minimum score (−1) when false positive and false negative are 100\%. MCC is thus a good measure to strike a balance between the accuracies of multiple classes.
Regarding FP, FN, VP and VN, we must remember that the outputs of
Heracles are mapped phenomena, as stated in the previous sections. The goal of the evaluation in terms of accuracy is to assess if by using
Heracles we can improve the decision-making process. A decision-making process is as good as the quality and scope of its inputs, and in this case, the inputs are the potential phenomena occurring in the environment. Therefore, what is going to be actually evaluated is whether the set
Heracles + naïve Bayes is better than only using naïve Bayes or using Hephaestus + naïve Bayes. We claim that by providing the identified phenomena as inputs to the naïve Bayes, we will achieve a more accurate decision to detect when an application changed its state (from a healthy state to a failure) since the phenomena will map those states in the fusion window. An example of mapped phenomena considered in our scenario, specified according to the adopted phenomena model (see
Section 3.3), is (Temperature, 55, 72, 23, 63.7). This tuple means that the temperature samples varied from 55 to 72, there were 23 samples in the window and an average value of 63.7 °C—what could be later mapped by a decision system as fit for the OLPM application.
In our case, for the set Heracles + naive Bayes, FP indicates when a mapped phenomenon will make the decision system believe (and act upon such a decision) that an application changed its state, when it should not have changed, for example, an OPLM application that detects a transmission failure when there is no transmission failure. FN indicates when a mapped phenomenon will make the decision system believe that an application has not changed state when it should have changed, for instance, an OPLM application that does not detect a transmission failure when the temperature reaches a given threshold that denotes a failure. TP indicates a situation when a mapped phenomenon will make the decision system believe that an application has changed states correctly (a transmission failure is detected, and it really occurred, for example) and TN indicates when a phenomenon indicates to the decision system that an application has maintained its state correctly (an OPLM does not detect a transmission failure fire when there is no transmission failure, for example). Considering the example of the phenomena mentioned above (Temperature, 55, 72, 23, 63.7), if this is used as an input for a naïve Bayes classification (what is the probability that the overhead power line is faulty given that the temperature mean is 63.7?), the answer is going to be not faulty. In this sense, Heracles can clean and reduce data sent to decision systems. On the other hand, if we consider the phenomena (Temperature, 75, 102, 16, 93.7), the result will be that the overhead power line is faulty.
4.6. Evaluating Accuracy
This section describes the simulated experiments for evaluating the accuracy achieved using
Heracles compared to Hephaestus, [
13] and naïve Bayes in terms of their ability to correctly identify the different events of interest for the smart city applications described in our case study.
Since neither Hephaestus nor
Heracles are decision-level algorithms, we intend to use their discovered phenomena as input for the naïve Bayes (NB) decision algorithm. We performed four simulated experiments: (i) the first uses
Heracles, (ii) the second uses Hephaestus, (iii) in the third experiment, we used only naïve Bayes and, finally, in (iv), we used the density-based decision-making data fusion algorithm presented in [
13]. In all simulated experiments, it was not necessary to be aware of the requirements of the deployed applications. The size of the fusion window for
Heracles, Hephaestus, the algorithm proposed in [
13] and naïve Bayes was set as 200 samples. This set of experiments intends to show how effectively
Heracles can improve decision systems by providing better decision inputs.
During these experiments, there were four time slots (T1, T2, T3 and T4). So, for each time slot,
Heracles, Hephaestus, [
13] and NB were tested for comparing the achieved results in terms of MCC values.
Table 2 shows the achieved results. The first row represents
Heracles’ MCC score. The second row shows Hephaestus’ performance. The third row presents NB’s performance.
For the T1 time slot, both applications are in an ideal condition. In this case, given the unequal data ranges, the composition of both applications generated a highly skewed dataset. Hephaestus and [
13] divided the dataset in two distinct groups (through different strategies), identifying the different datasets, and applying NB in each set. The NB, on the other hand, grouped all the data into a single group.
Heracles was also able to divide into distinct groups (our phenomena), which lead to a better performance than NB.
Heracles, Hephaestus and [
13] had similar results in this time slot (since they could divide data into similar results). Applying NB directly to the dataset biased the fusion result. Such bias is responsible for the worst accuracy of NB (94.32%) in comparison to Hephaestus, the density-based algorithm [
13] (99.88%) and
Heracles’ accuracy (99.66%).
For the T2 time slot, the OPLM app was in an unhealthy condition, and both applications overlapped, which produced an almost symmetric dataset.
Heracles, [
13] and Hephaestus recognized this dataset as a single dataset, but with a slightly shifted mean over the more concentrated data range. The behavior of this mass of data indicates a concentration over the overhead power line-generated data which is overloaded. The shifted mean over the overhead power line-generated data (that were mapped into a single phenomenon) was responsible for the greater accuracy of
Heracles, [
13] (99.58%) and Hephaestus (99.26%) over NB (94.6%).
For the T3 time slot, the BM was in an unhealthy condition, and a highly asymmetric dataset was formed by the composition of the different applications. Due to the high asymmetry in the dataset,
Heracles and Hephaestus were capable of identifying that there were different applications in the monitored area. However,
Heracles and [
13] were better at representing and separating the phenomena, thus leading to a better result (there were clearly three phenomena mapped as exemplified by
Figure 5—A more accurate representation of the environment). NB weighted all data equally, leading to the worse performance of the three (90.1%).
By comparing the accuracy achieved by
Heracles and Hephaestus at the T4 time slot, where both applications were in unhealthy conditions, with the accuracy at other time slots, we can notice a fall in the Hephaestus result, while
Heracles and [
13] still have an upper hand. The dataset obtained at the T4 time slot is an almost symmetric multimodal dataset. Due to its symmetry, depending on the samples Hephaestus receives as input, the dataset is recognized as a symmetric dataset, which does not need to be divided, producing a single dataset. Meanwhile,
Heracles searches for asymmetry and peaks that make the phenomena more identifiable. As T4 presents an almost symmetric dataset, the accuracy of Hephaestus is lower (84.64%) than the accuracy obtained at the other time slots (99.66% for T1, 99.26% for T2 and 95.88% for T3), while
Heracles is almost unaffected. We can see that by consistently mapping the phenomena using the peak and asymmetry strategy,
Heracles was able to deliver a better overall performance in all experiments.
Effect of the Fusion Window Size on T4
Considering that the T4 time slot is the most challenging one due to the symmetry of its data, we decided to further investigate how much the sample size interferes on the accuracy of Heracles. For this experiment, we ran Heracles on the T4 time slot analyzing 200, 400, 600 and 800 samples.
The results in
Table 3 reflect that as
Heracles analyzes more samples, its accuracy increases. This result is expected, since as the sample size increases, the observation of the real world is more complete and the associated uncertainty is lower. This result is important since it presents a solution to a situation where
Heracles has a distinctive better performance over Hephaestus, since the latter is unable to correctly recognize a dataset due to a particular symmetric configuration.
In
Figure 5, we can see that while Hephaestus only separates the phenomena from the calculated average,
Heracles considers that there is an advantage when looking specifically for events to separate peaks. This case exemplifies how the separation of the main event (peak centered at 65) can be advantageous in
Heracles, while Hephaestus characterizes the two separate regions as sufficiently symmetrical.