3.1. Process Flow of the Integrated Scheme
(1) Offline Stage
The successful application of a data-driven scheme is inseparable from an efficient database that can supply abundant empirical data for the training of the scheme. The database is composed of the operating data of massive OPs and the corresponding OSMs of the system. RBF can be trained using the operating data as input and the corresponding OSM as output, then accurate mapping relationships between the OSM and the system operating variables can be built. In this scheme, the operating data consist of steady-state operating variables, such as bus voltage magnitude, branch power flows, and bus voltage phase angle, etc.
During the practical operation of the systems, the oscillatory stability is closely related to the variation trend and the composition of system generators and loads [
13]. Accordingly, different OPs can be obtained based on the variation of the system generators/loads. With the help of the commercial software PSS/E, the characteristic matrix
A can be conveniently obtained, and then the OSMs corresponding to different OPs can be calculated by modal analysis. In this study, the generation of the database is conducted according to the flow chart shown in
Figure 4, following the procedure described below.
Randomly initialize the parameters of the system loads and shunts in their normal ranges by introducing reasonable perturbations in the corresponding parameters.
Iteratively change the system load level. Loads in different areas are varied with different rates based on their initial values while keeping a constant power factor. Concurrently, the balance of the load variations mainly relies on the generators in the same area.
Increase capacitors and decrease reactors with the increase in loads to simulate the practical operating condition of the systems.
Consider various factors influencing the operation of the system during database creation, including variations in system topology, distribution among generators and loads, and peak and minimum load. Contingencies, scheduled maintenance, and economic dispatch can lead to topology change. Optimal power flow considerations may produce the variation of distribution among generators and loads. The peak and minimum load values tend to be different in different seasons, especially between winter and summer. In practice, the system operating condition hardly stays the same because of such influence factors, and large condition variations may result in an unacceptable decrease in the assessment accuracy of data-driven methods [
22]. To accommodate new operating conditions, the retraining using new samples corresponding to the new conditions is usually considered necessary [
23]. Nevertheless, retraining is more or less time-consuming and may not meet the requirements for seamless estimation of OSM. Usually, a credible list of possible system operating conditions can be acquired from historical operating information collected and stored by utility companies. Thus, a recommended solution is to prepare an abundant database that includes multiple sample sets corresponding to potential system operating conditions on the basis of the credible list, and then use the prepared sample sets to train a series of RBF candidates beforehand in the offline stage.
In general, the more possible operating conditions and the corresponding trained RBFs are contained in the database, the lower the probability of encountering an unseen condition and the greater the likelihood of realizing seamless OSM estimation.
(2) Update Stage
The update stage is essential to promote the robustness to the complex operating conditions and the generalization ability of the integrated scheme. As shown in
Figure 3, the perception to operating condition variations is utilized to actuate the update of the integrated scheme.
In online application of the integrated scheme, when a changed operating condition is encountered due to the abovementioned influence factors, the following strategy will be executed.
If the changed operating condition has previously been recorded in the database and the corresponding RBF candidate has been trained in the offline stage, the prepared candidate will be immediately selected out to replace the original one.
If a match cannot be found, the estimation errors of the available trained RBFs are checked using the new operating condition. If the errors of some candidates are acceptable, then RBF with the highest accuracy among these candidates will be used to conduct OSA for the changed operating condition.
If none of the available RBFs can provide an acceptable accuracy for the changed operating condition, then retraining is required, and a new RBF can be constructed. For this purpose, the new RBF should be trained using the sample set of the changed operating condition. Finally, the changed operating condition and the corresponding RBF will be recorded and added into the database.
With the ongoing application of the integrated scheme, progressively fewer unseen operating conditions will be encountered. In this way, not only can the estimation accuracy be guaranteed, but seamless online OSA can also be achieved.
(3) Online Stage
As shown in
Figure 3, online OSA is conducted using the real-time system operating data. With the development of PMU and WAMS, the collection of system operating data has become more convenient and rapid. While the real-time PMU measurements are obtained for a new OP, the data of the input features will be immediately delivered to the corresponding RBF, and then, the online estimation of OSM for this OP can be provided to the system operators.
Furthermore, at the same time that the system operators acquire the OSM value, a threshold can be established to distinguish whether the assessed OP is stable or unstable. By checking the corresponding threshold, any unstable OP will be detected immediately, and the possible event will be sent to the system operators. Simultaneously, the corresponding preventive control strategies will be executed.
3.2. Compositive Feature Selection Unit
As introduced above, an abundant database containing massive samples can be created. However, two issues remain: the features considered in the database may include many variables that are weakly related to the OSM; and some features that are all strongly related to the OSM may be highly redundant. Regarding the first issue, with an increase in power system operation scale, the feature dimensionality of the database and the number of weakly correlated variables will rapidly increase. Using the raw database with such weakly correlated variables to train RBFs is not conducive to improving the estimation accuracy and will seriously affect the computational efficiency [
24]. Regarding the second issue, the selected features are usually faced with high redundancy, meaning that some of them are strongly correlated, and the training of RBFs with such strongly correlated features may obtain redundant relationships between the OSM and the system operating variables. It can lead to a waste of computational resources.
To overcome the issues discussed above, a compositive feature selection unit is designed for use in the integrated scheme to achieve efficient feature selection, decrease the feature dimensionality, and alleviate feature redundancy. The compositive feature selection unit consists of three steps, elaborated as follows.
Step 1: The flow chart of the first step is shown in
Figure 5. This step aims to split the initial input feature set into multiple feature subsets while ensuring that the pairwise correlations between features from different subsets are relatively weak, whereas the pairwise correlations between features in the same subset are relatively stronger.
Let
represent the absolute value of the Pearson correlation coefficient (PCC) between variable
and variable
, where
and
are two initial input features. The different feature subsets after partitioning are denoted by
and
(
), where M is the user-defined number of subsets. As shown in
Figure 5,
is used to measure the pairwise correlation between different feature subsets, where
In accordance with
Figure 5, M subsets are acquired and used as the input to the next step.
Step 2: The flow chart of the second step is shown in
Figure 6. This step aims to remove the redundant features in each subset, which are significantly related to other features in the same subset. Through this step, the feature redundancy in each subset can be reduced, and thus, the total number of features and the database dimensionality are decreased.
As shown in
Figure 6, an evaluation function
is defined for the features in a given subset. The calculation of
is shown in (10).
where S is a newly generated feature set consisting of features selected from P,
f denotes a feature in subset P from the previous step,
represents the MI between the feature and the OSM,
is a feature selected from P for inclusion in S, and
is a user-defined parameter for adjusting the number of features that are finally selected. Based on empirical experience, a value of
in the range of [0.5, 1] is recommended [
25].
It should be noted that the processing of each feature subset is independent in this step. In this way, after each subset is processed, an intermediate feature set can be created by combining the processed subsets. From this intermediate feature set, the features with low redundancy among them are significantly related to the OSM. Finally, the intermediate feature set is delivered to the final step for further feature selection based on correlation detection.
Step 3: This step aims to choose the pivotal features from the intermediate feature set on the basis of MICe. The different correlations between the features and the OSM can be detected, and then each feature is assigned a score with the corresponding MICe value. Based on the score ranking of the features, the highly ranked features are finally selected to establish the pivotal feature set.
Generally, the application of this unit is regarded as a data preprocessing before RBF training. In the offline stage, this unit is used to perform efficient feature selection on the created database to obtain the pivotal feature set. The input for training RBF consists of the obtained pivotal feature set and the corresponding OSM. In the update stage, new RBFs may need to be trained to accommodate unseen operating conditions. In this case, the compositive feature selection unit will be used in a similar way. In the online stage, the real-time data of the features selected based on this unit will be sent to the corresponding RBF to obtain the online OSA results.