1. Introduction
As climate change issues such as global warming emerge, social demands for energy conservation and efficient use are increasing. In response to these demands, the South Korean government plans to reduce greenhouse gas emissions by 37% in its Until 2030 program [
1]. Accordingly, they are shifting their policy from supply-side management to demand-side management. Internet of Things (IoT) devices are used to efficiently manage energy for energy consumers [
2]. Advanced metering infrastructure is also being expanded and distributed in South Korea by the Korea Electric Power Corporation; this will lay the foundation for the shift to energy policies that are centered on demand management. Thus, the energy consumption of customers can be analyzed in detail by obtaining load profiles for real-time demand [
3]. Identifying customers’ energy consumption durations, the stability of daily energy consumption patterns over time, and actual energy use can provide customers with insight into their energy consumption behavior patterns [
4]. The analysis of the customer load profile can be related to marketing and design of the load management program. For example, a customer having high energy consumption may purchase high-energy-efficient devices, and customers with constant daily energy consumption patterns respond to the message that energy needs to be saved [
5].
Furthermore, various energy consumption patterns can be clustered through clustering algorithms, and the results utilized for consulting on electricity rates considering the characteristics of each group. By offering different time-of-use tariffs to customers in the different groups, the complementarity among different customers maximizes profit for the concerned retailer [
6]. In addition, the clustering of time-sharing peak–valley pricing can be used to meet the various needs of customers, resulting in load shifting and energy efficiency [
7,
8]. Therefore, an effective analysis based on clustering of the residential customer energy consumption pattern is important for demand side management.
Several studies have proposed clustering methods by using the historical load profile of customers. Bidoki et al. analyzed load profiles of customers with several clustering methods such as K-means, weighted fuzzy mean K-means, and self-organizing map (SOM). The comparison of clustering results obtained by these methods show that the best clustering method changes for the historical load profiles depending on the purpose of clustering [
9]. Zhou et al. proposed a five-stage clustering model based on the fuzzy C-means (FCM) clustering algorithm to efficiently group smart meter data with massive, dynamic, high-dimensional, and heterogeneous characteristics [
10]. Xu et al. proposed the two-stage K-means algorithm using standard K-means and integral K-means to cluster customers with energy consumption patterns. In Reference [
11], customers are clustered using their energy consumption and peak time information. Choi et al. proposes a method to reflect the calendar day features in energy consumption patterns utilizing time domain mapping [
12]. Density-based spatial clustering of applications with noise (DBSCAN) is used in Reference [
12] to group customers with similar load profiles.
Customer clustering analysis based on load profile utilize 15 min or 1 h energy consumption data. However, it is difficult to analyze the energy consumption features with 15 min or 1 h energy consumption data since high-dimensional data are required. Therefore, various feature selection studies have been proposed to extract relevant features for the customer’s energy consumption. Abubaker analyzes energy consumption data in the Tulkarm region of Palestine and proposes a new customer segmentation method by clustering customers with similar energy consumption behavior patterns. Two features are selected from energy consumption data using principal components analysis (PCA) and utilize K-means algorithms to cluster customers and obtain new customer segments [
13]. Haben et al. analyzed customer behavior affecting the factors of peak and variation of energy demand via analysis of smart meter data of customers. Smart meter data of customers is used to analyze energy consumption over a period of time that affects the peak. A finite mixture model is used to identify the behavior of energy consumption by seasonal and temporal characteristics [
14]. Chico et al. proposes an algorithm for effectively clustering the energy consumption pattern by applying the probabilistic approach to the support vector machine (SVM) to select features for outliers in the energy consumption pattern. The proposed method can be used to help suppliers come up with demand response options relevant to customers by adding new clusters that differentiate outliers [
15]. Wang et al. proposes a clustering method that measures Kullback–Liebler distances to obtain various features of energy consumption behavior, which uses the clustering technique by fast search and find of density peaks (CFSFDP) to cluster customers [
16]. Reference [
16] introduces a method to determine the appropriate demand response according to the type of energy consumption behavior of the customer. However, it is difficult to understand how the features obtained by PCA are physically related to energy consumption characteristics.
In this paper, the extreme points of energy consumption patterns are used in combination with demographic characteristics to obtain effective features of a typical residential customer while maintaining the physical meaning of those features. In this study, the extreme points are designed to reflect the difference in energy consumption between two time periods. In addition, the energy consumption of residential customers is correlated with the demographic characteristics. This paper presents an effective feature selection method for clustering the load profiles of residential customers using extreme points and demographic characteristics.
Additionally, previous clustering studies of energy consumption customers use load profile from specific regions, or some studies use load profile from 18 cities, but not more than 600 customers [
9,
10,
11,
12,
13,
14,
15,
16]. In this paper, however, clustering analysis of energy consumption customers is carried out using load profiles from 2041 households with various demographic characteristics, such as area, number of household members and salary level in seven regions.
The remainder of this paper is organized as follows.
Section 2 analyzes the features of residential customers’ energy consumption patterns.
Section 3 describes the method for clustering energy consumption patterns. Finally,
Section 4 and
Section 5 present the results and conclusions.
2. Selection of Extreme Points of the Energy Consumption Pattern with Demographic Characteristics
In this paper, a clustering analysis method utilizing important features of the residential customer energy consumption pattern is proposed. The load profile of residential customers is obtained using data collected between 1 June 2018 and 31 May 2019. Z-score (
) is computed for load profile over a period time in order to detect outliers based on 95% of the confidence interval (−1.96 <
< 1.96). Once an outlier is detected, the load profile of the customer is replaced with the average energy consumption of the same time period in the same season. The Z-score is calculated as follows [
17]:
where
is the data of customer
n at time
t,
is the total customer average data at time
t, and
is the total customer standard deviation data at time
t.
In addition, the features of energy consumption patterns are compared among customers by normalizing the load profile to effectively cluster residential customers. The data are normalized to a maximum value of 1 and minimum value of 0 using minimum–maximum normalization. The load profile is normalized as follows [
18]:
where
is the normalized data of customer
n at time step
t,
is the data of customer
n at time
t, and
is the load profile of customer
n.
2.1. Load Profile Analysis
Figure 1 shows the normalized average annual energy consumption pattern of the residential customers. The extreme point of the time-series data is defined as follows [
19]:
Local maximum point: .
Local minimum point: .
There are four extreme points in the load profile: daybreak local minimum point (), morning local maximum point (), afternoon local minimum point (), and evening local maximum point (). In addition, the difference between the morning local maximum point and the evening local maximum point () is used to differentiate the time when the customer’s energy consumption is concentrated. Therefore, the extreme points represent the energy consumption of residential customers by time period, and these can be compared to each other. Thus, the extreme points are expected to be effective in clustering the energy consumption patterns of residential customers.
2.2. Feature Selection
Residential customers are clustered by selecting features that can effectively analyze the characteristics of each cluster. Feature selection is undertaken by analyzing the correlation between monthly total energy consumption and customer demographic data: area, number of household members, salary level, age, and education level.
Figure 2 shows the distribution of customers by regions and demographic data.
Correlation analysis is carried out by computing Spearman’s correlation coefficient because demographic data are ranking data, and monthly total energy consumption is continuous data. Spearman’s correlation coefficient
is calculated according to the following equation [
20]:
where
denotes the rank data difference of customer
n,
is the rank total monthly energy consumption of customer
n,
is the rank demographic data of customer
n, and
is the total number of customers;
indicates weak correlation,
moderate correlation, and
strong correlation [
21].
Table 1 shows the correlation between the monthly total energy consumption and demographic data. The monthly total energy consumption shows a moderate correlation with the area and number of household members, and weak correlation with salary level and household head information (age and education level). Therefore, the features are selected by utilizing the data of area and number of household members.
Table 2 and
Table 3 show the average extreme value and range of each attribute. The range for each attribute is relatively large in
,
, and
. Therefore, these attributes were selected as features for clustering residential customers.
3. Clustering Method for Residential Customer Energy Profile Using Extreme Points and Demographic Characteristics
For the load profiles of residential customers, the difference in energy consumption between customers is small and the range of energy consumption is wide. In this study, clustering of load profiles of residential customers is carried out by computing the distance from the cluster center point in the residential customer energy consumption data. In this paper, therefore, the simple and effective K-means grouping algorithm is adopted to classify the load profiles of residential customers by computing distances from the center point of an individual group [
13]. It attempts to compute the Euclidean distance between individual load profile set
and the center point (
) of group set
. The load profiles of residential customers are clustered by minimizing the sum of the Euclidean distances of the group set [
22]. The clustering function taken from Reference [
22] follows:
Figure 3 shows the procedure of the proposed clustering method incorporating extreme points and demographic characteristics. Initially, the center point of an individual group is arbitrarily selected. Subsequently, the Euclidean distance between the individual load profile set I and the initial center point of group is computed. Next, the input data
are clustered at the cluster center point closest to
. All input data are distributed to each cluster set. Then, the cluster center point is updated with the average value of the data in each cluster set. This process iterates a set number of times.
The K-means algorithm is applied to determine the optimal number of groups. The elbow method based on distance is used to determine the optimal value
k [
22]. Other methods for determining the optimal value
k include silhouette score, Davies–Bouldin index using cohesion, and separation [
23,
24]. There is little separation between residential customer load profiles. In this paper, therefore, the elbow method is adopted to compute the distance from the cluster center point to residential customer load profile in order to determine the optimal value
k. The elbow method calculates the sum of squared error (SSE) to determine the optimal value
k, which is used herein. The SSE is calculated as follows [
22]:
where
is the number of clusters,
is the
y-th cluster,
is the data in
, and
is the center point of the
y-th cluster.
As increases, the cohesion increases because the number of data in the cluster decreases. Thus, the SSE decreased. When the SSE nears the optimal value , the rate of decrease of the SSE is reduced. Therefore, the SSE shows a flat graph after reaching . Since the graph is elbow-shape, this method is called the elbow method.
This study calculates the SSE
from 2 to 30 clusters.
Figure 3 shows the SSE graph. As a result of using the elbow method, the optimal
for residential customer clustering is determined to be six.
4. Results
As a result of clustering, the energy consumption of residential customers shows six patterns. The energy consumption pattern for each cluster is shown in
Figure 4.
Figure 5a shows the cluster that represents a typical energy consumption pattern for residential customers.
Figure 5b shows the energy saving cluster, which saves energy in the morning.
Figure 5c shows the M-pattern cluster with similar energy consumption in the morning and the evening. Most residential customers in the M-pattern cluster are office workers. It is recognized in the M-pattern cluster that energy consumption increases before leaving for work. Energy consumption of residential customers within M-pattern cluster is relatively low since residential customers within the M-pattern cluster do not normally stay at home during daytime.
Figure 5d shows the morning concentrated cluster with energy consumption concentrated in the morning. In general, residential customers within the morning concentrated cluster tend to use high energy consumption appliances such as washer machines and vacuum cleaners in the morning, resulting in high energy consumption in the morning.
Figure 5e shows the evening concentrated cluster with energy consumption concentrated in the evening. In contrast, residential customers within the evening concentrated cluster tend to use high energy consumption appliances in the evening, resulting in high energy consumption in the evening.
Figure 5f shows the owl cluster with relatively large energy consumption at daybreak compared to the other clusters. The owl cluster shows patterns of energy consumption behavior that consistently use energy until daybreak, like owls, and have very low energy consumption during the daytime.
Table 4 shows the results of the analysis of the characteristics of each cluster: monthly total energy consumption, area, and number of household members. In addition, the energy-consuming cluster and energy-saving cluster have the same area of 100 m
2~132 m
2. However, the number of household members shows different characteristics; the energy consumption cluster has two members, and the energy-saving cluster has four members. The M-pattern cluster, the morning concentrated cluster, and the evening concentrated cluster show similar characteristics in area and number of household members. However, differences in energy consumption behavior patterns result in different monthly total energy consumption. The owl cluster tends to have an area of less than 66
and single-person households. Their monthly total energy consumption is the smallest at 230.24 kWh because the area and the number of household members are smaller than those of the other clusters.
5. Discussion
In this paper, extreme points of load profiles of residential customers are used to analyze the differences in energy consumption between two time periods. The load profiles of residential customers are grouped using extreme points and demographic data. The demographic characteristics of customers in each cluster are also analyzed.
The typical energy consumption pattern of a residential customer has four local extreme points. Each local extreme point appears at daybreak, morning, afternoon, and evening. Thus, the local extreme point is effective in analyzing the differences in energy consumption over a period of time between customers.
Among the demographic data of residential customers, the number of household members and the area are found to be closely correlated with energy consumption. In this paper, therefore, the number of household members and the area information are used to determine the extreme points that best represent the demographic characteristics of residential customers. The K-means algorithm is applied for grouping the load profiles of residential customers. In this study, six patterns of load profiles of residential customers are obtained by the proposed method. This result shows the difference in monthly energy consumption and demographic characteristics according to the energy consumption pattern of each cluster. Unlike energy consumption patterns of other groups, for instance, the owl group, which represents relatively large energy consumption at daybreak, has an area of less than 66 and the number of household members is one, showing demographic characteristics of a single household. Therefore, customer clustering analysis based on extreme point and demographic data effectively analyze the energy consumption patterns and demographic properties.
6. Conclusions
This paper proposes a clustering method for residential customers using extreme points in their energy consumption patterns and demographic data. The energy consumption patterns show extreme points at daybreak and in the morning, afternoon, and evening. This feature enables effective clustering. In addition, demographic data are used to select effective extreme points for the clustering of residential customers using the K-means method.
The residential customers show six types of energy consumption patterns, and the total monthly energy consumption, area, and number of household members are analyzed in each cluster. Furthermore, energy consumption patterns can be correlated with the characteristics of residential customers. Therefore, the extreme points are effective in clustering the energy consumption patterns of residential customers.
In addition, the clustering results obtained by the proposed method can be used to study the billing and load impact. Further research needs to be directed toward the development of personalized residential electricity rates for different demographic characteristics of customers.