First, the results for the agglomerative clustering algorithm are illustrated based on the calculated dendrogram. The results for a low number of features are benchmarked to the available synthetic load profiles in Flanders, subsequently highlight how differences in feature behavior lead to the emergence of distinct and compact clusters, and argue how this knowledge can be leveraged from the viewpoint of demand response programs or peak shaving initiatives. Second, the distributions of the features at the same time levels are analysed. On the one hand, the Shannon entropy is used to characterize the variability of each type of feature. On the other hand, the Wasserstein-1 distance is used for an in-depth analysis of the stochastic nature of the peak demands, by comparing the distributions describing the household consumption and peak demand behavior.
3.1. Clustering Result
The dendrogram visualizing the hierarchical clustering process using Ward’s linkage method on the proposed feature set is shown in
Figure 7. Two horizontal cuts are included in the figure. The black line at
denotes the height where three clusters are obtained. This can serve as an initial benchmark, as there are three synthetic load profiles available for low-voltage consumers in Flanders: residential with and without electric heating, and non-residential. The red line was chosen such that 10 disjoint clusters emerge, leading to the color threshold of the highlighted clusters in the dendrogram. This threshold of 10 clusters was chosen based on two independent studies stating that for practical considerations, the total number of clusters should not exceed 10 [
13,
30]. This argument is based on the opinions of industrial experts, as these clusters are often used for tariffing or marketing purposes.
First, it is necessary to benchmark the clustering result to the available residential SLPs in Flanders. As the color threshold and further discussion in this section is based on 10 clusters, the analysis is performed based on the highlighted 10 clusters. By tracking the merging clusters into the three branches of the dendrogram at the cut
, a benchmark can be performed.
Figure 8 and
Figure 9, displaying respectively the distributions of the 14 untransformed features for the individual clusters, and the distributions of the yearly consumption of the consumers assigned to each clusters, allow for an interpretation of the obtained clusters based on consumer properties.
The first branch separates into clusters 1–3, the second into clusters 4–5, while the final branch leads to clusters 6–10. The clusters originating from these three branches are partitioned by dashed lines in
Figure 8 for an easier comparison.
The following discussion on the benchmarking of the results is based on the observed feature distributions in
Figure 8. The first branch groups consumers with a high fraction of consumption and peaks in the evening, which is typical for regular households. The second branch, containing clusters 4–5, groups consumers with a high fraction of the consumption and peaks at night. This is encouraging, as this could indicate the presence of electric heating, one of two major categories of residential consumers.
The interpretation of the third branch is less straightforward, as the properties of the clusters composing this branch are more diffuse: (i) clusters 6–7 group consumers with a disproportionate amount of peaks during the weekend, (ii) cluster 8 collects the consumers with a significant amount of peaks during the early morning, whereas (iii) clusters 9–10 exhibit a large number of peaks during the morning and afternoon.
It is clear that for each time period and in the same branch of the dendrogram, the differences between the fractions of total consumption for that period are limited. Rather, the temporal behavior of the peak demands is the driving force to further separate clusters in each of the three major branches of the dendrogram. Furthermore, the clustering process yields compact clusters with comprehensive results.
This illustrates the usefulness of a feature set that includes the temporal properties of peak demands, especially with the advent of capacity-based tariff schemes for low-voltage consumers. With the introduction of capacity-based tariffs, it is no longer sufficient to know when consumption occurs. Additional knowledge about when peak demands tend to happen is vital to offer consumers the most suitable techno-economic solution.
As a post-hoc validation of the performance of the proposed feature set in determining customer categories, the clusters of the different consumer types in the dataset as introduced in
Section 2.1 are determined and given in
Table 4. Clusters 4 and 5 are predominantely populated by households with electric heating, while cluster 10 groups households with high daytime consumption as well as the majority of the SMEs. However, not all profiles with electric heating are categorised inside clusters 4–5. This is further investigated in
Figure 9, which displays the density plots of the yearly consumption for each individual cluster compared to the density plot of the full dataset. Matched against the density plot of the complete dataset, clusters 4, 6, 8 and 10 are skewed towards households with lower to average yearly consumption in the Eurostat classification. This distribution for cluster 4 is expected and can clarify the diffusion of households with electric heating over different clusters. As the demand profiles of these households can be considered an aggregation of the profile of a regular household with a load profile of an electric heating appliance, the features connected to the peak demands are intrinsically linked to the behavior of that load profile and the timing of the peak demands without the electric heating. The heating load profile for households with otherwise relatively low yearly consumption dominates the aggregated load profile, and consequently encounter the majority of their consumption peaks during the night, consistent with the behavior of cluster 4. For households with electric heating in e.g., cluster 3, the consumption and peak demands during the evening outweigh those during the night.
It can be concluded that the proposed feature set is able to capture the known consumer categories from existing SLPs, and thus passes our self-imposed benchmark test. Three clusters can be attributed to known differences in behavior for low-voltage consumers: the presence of electric heating is captured in clusters 4–5, while the high daytime consumption of SME profiles is present in cluster 10. Deviations from these two clusters for electric heating can be traced back to differing contributions of the electric heating load to the total yearly consumption of the households.
3.2. Stochastic Nature of Peak Demands
The variability of the daily and weekly consumption and peak patterns are described by the entropy of their probability distribution, where the individual fractions are normalized with respect to the length of the considered time period. A uniform distribution with maximum uncertainty leads to a maximal value of the entropy, while the absence of uncertainty leads to an entropy value of 0.
For example, a situation where all peak demands occur during the night due to an electric heating would lead to 0 entropy at the daily level for the consumption probability distribution. The obtained distributions for the entropy at the daily and weekly level for the consumption and peak probability distributions of the full dataset are given in
Figure 10. At the daily level, the peak demands exhibit a much larger variability than the consumption. This is unsurprising, given the continuous nature of the consumption. At the weekly level, this difference is less pronounced.
A beta distribution was successfully fitted to each individual density histogram. The 2-parameter beta probability distribution, defined on the interval [0,1], is defined as follows, with
and
:
The beta function offers several properties that make it suitable to describe the obtained distributions. First, it has a finite support: the regular 2-parameter beta function in Equation (
14) has a [0,1] support. As the entropy can vary from 0 to a maximum of
for the daily level and
for the weekly level, the finite support of a rescaled and shifted beta function is appropriate. Second, as can be observed in
Figure 10, the shapes of the daily and weekly behaviors differ significantly. The two shape parameters
a and
b in the definition of the beta probability function allow us to describe the four distributions with the same formula. For the distributions shown in
Figure 10, it merely means that
for the distributions at the daily level, while
for those at the weekly level.
The relation between the entropy and the clusters obtained in
Section 3.1 is investigated in
Figure 11, which displays the mean values of the entropy for each individual cluster. The significantly lower entropy of the probability distribution describing the peak demands can be traced back to the clustering results. The overwhelming presence of peak demands during the night period results in low entropy for cluster 4, while cluster 10 exhibited a majority of its peaks during daytime. Similarly, half of the peak demands for cluster 1 occurred during the evening. On a weekly basis, clusters 6–7 showed a significant amount of peak demands during the weekend, leading to a lower entropy for this period. A low entropy of the probability function describing the peak demands can be taken as an indicator for the presence of a large amount of peaks in a certain time period, which can be leveraged to target demand response programs or peak shaving via an energy storage system. Furthermore, a clear relation can be observed between the obtained clusters on the introduced feature set and the entropy values of the peak demands. The lower entropy values in certain clusters can be traced back to differing intercluster consumer operations at the daily or weekly level.
However, the stochastic nature of these peak demands remains an open question. The probability distributions of the peak demands tend to be significantly more variable than those of the consumption behavior, according to the entropy. Even so, this entropy as a single variable does not reveal anything about whether or not the amount of peaks in a certain time period is disproportional relative to the consumption in that time period.
Therefore, the Wasserstein-1 distance is used to quantify the difference between the probability distributions of the consumption and peak demands at the daily and weekly level for each individual consumer. A larger distance corresponds to a stronger deviation of the peak distribution from the distribution of the consumption, and thus peaks are more deterministic.
Figure 12 and
Figure 13 yield the distributions for the Wasserstein-1 distances at the daily and weekly level, separated by individual cluster. Analogous to
Figure 9, the distribution of the Wasserstein-1 distance calculated for each profile in the full dataset is included for comparison to cluster-specific behavior.
The distributions of the Wasserstein-1 distances further confirm the findings concerning the behavior of consumers constituting each cluster. At the weekly level, clusters 6 and 7 show a major deviation from the dataset behavior, due to the presence of a disproportionate amount of peak demands in the weekend. Similarly for the daily level, cluster 4 displays a large Wasserstein-1 distance, pointing to the electric heating which pushes nearly all peak demands to nighttime.
Clusters 1 and 2 exhibited similar behavior for their consumption at the daily level in
Figure 8. However, households in cluster 1 are characterized by an even higher amount of peak demands in the evening than those in cluster 2, translating to a higher than average Wasserstein-1 distance for cluster 1 at the daily level. This variability and disproportionate amount of peaks in a certain time interval offers insight in possibilities for targeted demand response initiatives or peak shaving via a residential energy storage system. While cluster 6–7 and 8–9 have a similar consumption pattern, the time of occurrence of peak demands is significantly different, which leads to distinct solutions.
As peak demands are typically generated by the simultaneous use of individual appliances, targeted demand response initiatives can be effective for cluster 6 and 7, where the majority of peaks occurs in the weekend. Scattering the use of individual appliances over different days or being mindful of the simultaneous use in the weekend by inducing behavioral changes can reduce the number of peak demands. However, this requires a trigger for the behavioral changes and for these appliances to be available in different time periods. If this is not an option, investing in an energy storage system applying a peak shaving algorithm during weekends, while e.g., maximising the PV self-consumption during weekdays could offer an alternative, although the economic viability depends on the local tariff structure and the investment cost. In contrast, cluster 8 is characterized by peak demands in the early morning and during the daytime, while households in cluster 9 exhibit peaks during the whole day. Consequently, for these households, a PV installation combined with a storage system can already offer a solution to reduce the demand from the grid, while maintaining a high self-consumption.
As a final check on the stochastic nature of peak demands, the relationship between the consumption in a time period and the presence of peak demands is investigated.
Figure 14 displays the relations between the (untransformed) fractions of the consumption and peak demands at the daily level, with an ordinary least-squares (OLS) regression fit overlaid given the observed linear relation. The coefficients obtained in the OLS regression for
, with
and
the fraction of respectively the peak demands and consumption in that time period, are given in
Table 5. As the presence of electric heating heavily skewed previous results for the consumption and peak demands at night, consumers with and without electric heating are treated separately for this analysis.
A correlation between the fraction of the consumption and that of the peak demands is present in
Figure 14 and
Figure 15. As the presence of consumption in a certain time period is a prerequisite for a peak demand, some relation between the two types of parameters was expected. At first sight, the linear relation could be interpreted as an indication of predictability of peak demands in a certain time period. However, it is the spread on this relation that is the indicator of the stochasticity of the peak demands. For example, if 30% of a household’s total consumption is observed occurring during the evenings, the results shown in
Figure 14 suggests that 30–60% of the peak demands can occur in this same time period. This large uncertainty, which is present for each of the considered time periods, severely limits the usability of this linear relation, observed for the full dataset.
However, the knowledge of the introduced clusters can partly alleviate this uncertainty. This is illustrated in
Figure 16 for clusters 1–3, which group households with a large fraction of their consumption during the evening, with a high number of peak demands simultaneously occurring in this time period. While we should be cautious drawing conclusions based on clusters that only include a limited amount of households, it appears that the spread on the fraction of peak demands for the individual clusters is smaller than those in
Figure 14 for the full dataset, while the linear correlation that was observed before is nearly non-existent in some relations.