Adaptive Neural Network Q-Learning-Based Full Recurrent Adaptive NeuroFuzzy Nonlinear Control Paradigms for Bidirectional-Interlinking Converter in a Grid-Connected Hybrid AC-DC Microgrid

Awais, Muhammad; Khan, Laiq; Khan, Said Ghani; Awais, Qasim; Jamil, Mohsin

doi:10.3390/en16041902

Open AccessFeature PaperArticle

Adaptive Neural Network Q-Learning-Based Full Recurrent Adaptive NeuroFuzzy Nonlinear Control Paradigms for Bidirectional-Interlinking Converter in a Grid-Connected Hybrid AC-DC Microgrid

¹

Department of Electrical and Computer Engineering, COMSATS University Islamabad, Abbottabad Campus, Abbottabad 22060, Pakistan

²

Department of Electrical and Computer Engineering, COMSATS University Islamabad, Islamabad 45550, Pakistan

³

Department of Mechanical Engineering, College of Engineering, University of Bahrain, Isa Town 32038, Bahrain

⁴

Department of Electronics and Computer Science, Fatima Jinnah Women University, Rawalpindi, Old Presidency, Rawalpindi 46000, Pakistan

⁵

Department of Electrical and Computer Engineering, Faculty of Engineering and Applied Sciences, Memorial University of Newfoundland, St. John’s, NL A1B 3X5, Canada

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Energies 2023, 16(4), 1902; https://0-doi-org.brum.beds.ac.uk/10.3390/en16041902

Submission received: 31 December 2022 / Revised: 5 February 2023 / Accepted: 8 February 2023 / Published: 14 February 2023

(This article belongs to the Topic Power Electronics Converters)

Download

Browse Figures

Versions Notes

Abstract

:

The stability of a hybrid AC-DC microgrid depends mainly upon the bidirectional interlinking converter (BIC), which is responsible for power transfer, power balance, voltage solidity, frequency and transients sanity. The varying generation from renewable resources, fluctuating loads, and bidirectional power flow from the utility grid, charging station, super-capacitor, and batteries produce various stability issues on hybrid microgrids, like net active-reactive power flow on the AC-bus, frequency oscillations, total harmonic distortion (THD), and voltage variations. Therefore, the control of BIC between AC and DC buses in grid-connected hybrid microgrid power systems is of great importance for the quality/smooth operation of power flow, power sharing and stability of the whole power system. In literature, various control schemes are suggested, like conventional droop control, communication-based control, model predictive control, etc., each addressing different stability issues of hybrid AC-DC microgrids. However, model dependence, single-point-failure (SPF), communication vulnerability, complex computations, and complicated multilayer structures motivated the authors to develop online adaptive neural network (NN) Q-learning-based full recurrent adaptive neurofuzzy nonlinear control paradigms for BIC in a grid-connected hybrid AC-DC microgrid. The proposed strategies successfully ensure the following: (i) frequency stabilization, (ii) THD reduction, (iii) voltage normalization and (iv) negligible net active-reactive power flow on the AC-bus. Three novel adaptive NN Q-learning-based full recurrent adaptive neurofuzzy nonlinear control paradigms are proposed for PQ-control of BIC in a grid-connected hybrid AC-DC microgrid. The control schemes are based on NN Q-learning and full recurrent adaptive neurofuzzy identifiers. Hybrid adaptive full recurrent Legendre wavelet-based Neural Network Q-learning-based full recurrent adaptive NeuroFuzzy control, Hybrid adaptive full recurrent Mexican hat wavelet-based Neural Network Q-learning-based full recurrent adaptive NeuroFuzzy control, and Hybrid adaptive full recurrent Morlet wavelet-based Neural Network Q-learning-based full recurrent adaptive NeuroFuzzy control are modeled and tested for the control of BIC. The controllers differ from each other, based on variants used in the antecedent part (Gaussian membership function and B-Spline membership function), and consequent part (Legendre wavelet, Mexican hat wavelet, and Morlet wavelet) of the full recurrent adaptive neurofuzzy identifiers. The performance of the proposed control schemes was validated for various quality and stability parameters, using a simulation testbench in MATLAB/Simulink. The simulation results were bench-marked against an aPID controller, and each proposed control scheme, for a simulation time of a complete solar day.

Keywords:

hybrid AC-DC microgrid; interlinking converter; Legendre wavelet; Mexican hat wavelet; Morlet wavelet; neurofuzzy; Q-learning; recurrent

1. Introduction

Renewable resource-based distributed generators (DGs), like wind turbines (WTs), photovoltaic (PV) generators, micro-turbines (MTs) etc., are in fashion nowadays to reduce greenhouse emissions and to improve the efficiency of the electric power system. A large power system is divided into many small microgrids by the integration of DGs, energy storage system (ESS), and loads, and, thus, making it more controllable, flexible, and effective [1,2,3,4].

The DC microgrid has no frequency issues, produces low power loss, due to the absence of reactive power, and has simple dynamic variations and easy power conversions. These advantages of the DC microgrid make it superior to the AC microgrid. However, hybrid AC-DC microgrids are gaining huge attention, due to their capability of combining the benefits of both AC and DC networks [1,2,5,6,7,8]. The AC and DC microgrids are connected through a bidirectional AC-DC interlinking converter, thus, making a hybrid AC-DC microgrid that is capable of utilizing the advantages of both sides and which can entertain multiple types of loads. However, new technological challenges arise for the smooth and reliable operations of hybrid AC-DC microgrids [1,2,9,10,11,12,13].

AC and DC subgrids are interconnected via bidirectional interlinking converters (BIC). The BIC is responsible for smooth bidirectional power flow, power balance, microgrid stability, voltage solidity, frequency and transients firmness. Failure in the right performance of BIC leads to instability of the microgrid and causes power loss. The unpredictable nature of renewable DGs, nonlinear uncertain loads, switching harmonics, and bidirectional power flow between the utility grid and storage devices produces a variety of stability issues, like net active-reactive power flow on the AC-bus, increased THD level, voltage fluctuation, and frequency instability. Therefore, the key challenge in hybrid AC-DC microgrid is the control problem [2,14,15,16,17,18,19].

Generally, the whole network and, particularly, the BIC requires intelligent adaptive control to ensure the coordination of DC and AC subsystems and appropriate power flow across the hybrid microgrid [1,2,20,21]. The problem is non-trivial and becomes more difficult, due to the difference in response time of AC and DC networks. The faster response of the DC network over the AC network, and interactions between corresponding sides, may lead to destabilization of the power system [22,23]. Thus, the key issue in the hybrid AC-DC microgrid is the control of BIC.

Several control schemes are reported in the literature for the control of BIC in a hybrid AC-DC microgrid. Reference [24] presented coordination control and dual closed-loop control for the interlinking converter. However, the study lacked real time simulation results and presented two cases with specific variation in the generation and loads. Reference [2] designed a uniform control scheme for the bidirectional interlinking converter for economical and resilient objectives of hybrid AC-DC microgrids. However, the study lacked a real time simulation testbed and was studied for specific power flow cases. Reference [25] presented the voltage regulation of the DC-bus by designing a model-based controller for the bi-directional interlinking converter. However, the study required proper modeling of the system and the PID control used might not work for large complex power systems. Reference [26] presented the application of a passivity framework to AC-DC grids. However, a lossless line was assumed in the study and no real time simulation results were presented. Reference [27] presented virtual inertia support for BIC in the hybrid AC-DC microgrid. However, use of PI control and only the regulation of active power reduced its impact for large power systems with various operating points. Many other articles are found in the literature but the lack of real time systems and a small number of DGs reduce their productivity.

This article presents three novel control schemes for a bidirectional interlinking converter for a hybrid AC-DC microgrid. The control schemes are hybrid adaptive NN Q-learning-based full recurrent adaptive NeuroFuzzy (FRNF) control schemes. The difference in the three proposed control schemes is based on the different variants in the antecedent and consequent parts of adaptive NeuroFuzzy architecture. The variants in the antecedent part are Gaussian membership function and B-Spline membership function, while the variants used in the consequent part are Mexican hat wavelet, Legendre wavelet, and Morlet wavelet. The real time simulation testbed consisted of multiple DGs and an utility grid connected to DC- and AC-buses via converters and transformers. The real time solar and wind profiles, along with maximum power point tracking, used in this article were obtained from our previously published articles [28,29,30,31,32]. The use of multiple DGs, real time solar and wind profiles, and one day complete simulation are the key features of this study. The performance of the proposed control scheme was verified through various quality and stability parameters, using MATLAB/SIMULINK software, 2015a. The various results obtained from the simulation were compared with an aPID controller and against each other. Table 1 gives the features and contribution of the proposed solution, compared to previously reported work.

The article is divided into sections for understanding. Section 2 describes the testbed, Section 4, Section 5, Section 6, Section 7, Section 8, Section 9, Section 10, Section 11, Section 12 and Section 13 discusses the detailed modeling of the proposed control schemes, Section 14 discusses the various results of the simulation, Section 15 concludes the article, and Section 16 shows suggested future work.

Remark

This work is the extension of the authors’ previous work reported in [28,29,30,31,32], which focused on maximum power point tracking of the variable speed wind energy conversion system, PV-farm, swift response of solid oxide fuel cell, and charging/discharging scheduling of the plug in hybrid vehicles (PHEVs) in a grid-connected hybrid AC-DC microgrid. The voltage, frequency and THD regulations were considered in the above studies. However, the control of the interlinking converter in this study regulates the concerns (voltage, frequency, THD) at a higher efficiency level. Hence, the new control typologies were developed to increase the stability and reliability of the hybrid microgrid.

Table 1. Main contributions, features and limitations of the existing and proposed work.

Ref. No.	Year	Frequency Regulation	Voltage Regulation	THD Reduction	Limitations/Features
[33]	2016	✗	✓	✓	☞	Independent of communication and model parameters. However, general hybrid AC-DC model is assumed with negligence of frequency regulation.
[34]	2017	✗	✓	✗	☞	Applicable only on low voltage hybrid grid. Frequency regulation and THD reduction are neglected.
[35]	2018	✗	✓	✗	☞	Leader follower consensus and particle swarm optimization are prone to SPF $^{1}$ . Only voltage regulation is considered.
[36]	2019	✗	✗	✗	☞	Synchronization control increases the risk of SPF. Only voltage regulation is considered.
[37]	2020	✓	✓	✗	☞	Hierarchical control strategy has drawback of SPF. THD reduction is neglected.
[18]	2021	✗	✓	✓	☞	Model predictive control algorithm requires model and parameters. It is complex and consumes more time. Moreover, frequency regulation is neglected.
[38]	2021	✗	✗	✓	☞	Plug-in-hybrid vehicles and other renewable resources are not taken into account. THD reduction is neglected.
[39]	2021	✗	✓	✗	☞	PI controller cannot operate under highly nonlinear conditions and prone to SPF. Only voltage regulation is considered.
[40]	2022	✓	✓	✗	☞	Coordinated control scheme requires communication network. THD regulation is neglected.
[41]	2022	✗	✓	✓	☞	Multiple control schemes are integrated. SPF of any control scheme results in power instability that may lead to blackout. Only wind power, PV and BSS are considered. Frequency regulation is neglected.
Proposed	2022	✓	✓	✓	☞	Model free control strategy is used. No communication required. Advantages of Q-learning and full recurrent neurfuzzy are combined to avoid SPF. Complete microgrid with multiple DGs is considered. Real time solar irradiance, temperature and wind speed is used. Optimal active-reactive power flow, voltage-frequency regulation, and THD reduction are considered.

¹ SPF = Single-Point-Failure.

2. System Overview and Model Description

Figure 1 sketches the proposed microgrid. It consists of AC and DC-buses which are connected to each other via a main interlinking inverter. The DC-bus has multiple renewable energy resources connected to it through various AC-DC converters. The 165 F super-capacitor bank, 260 kW Photovoltaic (PV) array, 260 kW solid oxide fuel cell (SOFC), 150 kW electrolyzer, 200 Ah batteries, and 100 kW wind turbine (WT) are connected to the DC-bus through AC-DC and DC-DC converters. The AC-bus is connected with 200 kVA micro-turbine, 11 kV grid, bidirectional smart charging station (CS), and the AC-load through transformers, as well as AC-DC and AC-DC-AC converters. The DC-bus is interlinked with the AC-bus through the main inverter. The technical details of the proposed microgrid and its components are given in Appendix A [28,29,30,31,32].

Modeling and Description of Interlinking Inverter

AC and DC sub-grids are linked with each other via an interlinking power converter in the smart microgrid hybrid power system (SMG-HPS). The role of the interlinking converter is important in terms of bidirectional power transfer and stability of the whole SMG-HPS. The uncertain load and generation conditions enforces the interlinking converter to operate in various modes. If no power transfer takes place to either side, the operational mode is known as stop mode. If power transfers from the DC subgrid side to the AC subgrid side, the operational mode is known as inverter mode. If power transfers from the AC subgrid side to the DC subgrid side, the operational mode is known as rectifier mode.

Smooth power transfer between AC and DC subgrids is guaranteed by the interlinking converter. During grid-connected mode, the power balance is ensured by the utility grid, while the DC-bus voltage stability is ensured by an interlinking converter. The increasing use of electronic devices in the power sector, at the household and industrial levels, dramatically increases the complexity of the power system and reduces efficiency. The increase in THD level, and fluctuations in the frequency and voltage profiles, reduce the life of many electric components, as well as decrease the reliability and quality of power.

The specifications of interlining inverter used in this research work are given in Table A9, Appendix A.9.

The interlinking or main inverter acts as a bridge between AC and DC sub-grids. It controls the active and reactive power and, thus, ensures the voltage stability. The terminal voltages of this voltage source converter are as follows:

\begin{matrix} v_{a n} = L_{11} \frac{d i_{a}}{d t} + R_{11} i_{a} + v_{a n - A C} \end{matrix}

(1)

\begin{matrix} v_{b n} = L_{11} \frac{d i_{b}}{d t} + R_{11} i_{b} + v_{b n - A C} \end{matrix}

(2)

\begin{matrix} v_{c n} = L_{11} \frac{d i_{c}}{d t} + R_{11} i_{c} + v_{c n - A C} \end{matrix}

(3)

where, inductance and resistance are shown by

L_{11}

and

R_{11}

, respectively. The dq-reference frame is calculated by using Park’s transformation as follows:

\begin{matrix} \frac{d i_{d}}{d t} = ω i_{q} - \frac{R_{11} i_{d}}{L_{11}} + \frac{1}{L_{11}} (v_{d} - v_{d}^{*}) \end{matrix}

(4)

\begin{matrix} \frac{d i_{q}}{d t} = ω i_{d} - \frac{R_{11} i_{q}}{L_{11}} + \frac{1}{L_{11}} (v_{q} - v_{q}^{*}) \end{matrix}

(5)

Voltages in the dq-reference frame are calculated using the following equations:

\begin{matrix} v_{d} = L_{11} \frac{d i_{d}}{d t} - ω L_{11} i_{q} + v_{d}^{*} + R_{11} i_{d} \end{matrix}

(6)

\begin{matrix} v_{q} = L_{11} \frac{d i_{q}}{d t} - ω L_{11} i_{d} + v_{q}^{*} + R_{11} i_{q} \end{matrix}

(7)

In the dq-reference frame the power is calculated using the following relation:

\begin{matrix} P_{d q} = \frac{3}{2} (v_{d} i_{d} + v_{q} i_{q}) \end{matrix}

(8)

3. Supervisory Control of Microgrid and Operation Strategy

The principal purpose of supervisory control is to provide continuous and reliable power to all the connected loads. The loads used in this research work are residential load (

P_{L}

) and CS load (

P_{C S}

). The significant power is transmitted to these main loads. However, the excess power during off-peak hours is transmitted to BSS and SC for storing purposes and later use. The electrolyzer also acts as a load and obtains power from the DC-bus of SMG-HPS. The supervisory control enforces the use of renewable energy to entertain loads and store electrical power in various storing devices. However, in the case of a deficit power, the extra power is obtained from BSS, SC, and CS (PHEVs).

The modes of operation of supervisory control depend on the balance of available and required power.

3.1. Modes of Operation of Supervisory Control System

The modes are based on the deficit and excess power. The description of the modes used in this research work is given under.

3.1.1. Mode of Power Deficit

The operational modes of supervisory control systems with deficit power are described below.

Mode 1: WT, PV and BSS Fulfill the Load Demand

In this mode, only RES WT, PV, and BSS provide power to fulfil the load demand. The WT and PV provide their maximum output powers while BSS power is restricted to up to 20% of the state-of-charge (SOC). The mathematical representation of this mode is:

\begin{matrix} P_{L o a d} \geq P_{r e n} + P_{B S S} \end{matrix}

(9)

where,

P_{L o a d} = P_{C S} + P_{L} = V_{L} \times I_{L}

is the load demand,

P_{r e n} = P_{W T} + P_{P V}

is the cumulative renewable power, and

P_{B S S}

is the BSS power for

S O C_{B S S} \geq 20 %

. The BSS discharges in this mode of operation.

Mode 2: WT, PV, BSS and SC Fulfill the Load Demand

If the load demand is higher than the power produced in mode 1, then deficit power is obtained from SC to fulfil the load demand. The mathematical representation of this mode is:

\begin{matrix} P_{L o a d} \geq P_{r e n} + P_{B S S} + P_{S C} \end{matrix}

(10)

where

P_{S C}

is the power obtained by SC for

S O C_{S C} \geq 20 %

. All other sources are kept OFF and power is obtained only through RES, BSS, and SC during this mode of operation.

Mode 3: WT, PV, BSS, SC and SOFC Fulfill the Load Demand

If the power demand of load is not fulfilled from RES and the discharging of BSS and SC, then the deficit power is obtained from SOFC to satisfy the load demand. The SOFC tracks a power reference that depends on the deficit power. The mathematical representation of this mode is:

\begin{matrix} P_{L o a d} \geq P_{r e n} + P_{B S S} + P_{S C} + P_{S O F C} \end{matrix}

(11)

where

P_{S O F C}

is the power obtained from SOFC.

Mode 4: WT, PV, BSS, SC, SOFC and Grid Fulfill the Load Demand

If the power required by the load increases from mode 3, then the extra power deficit is obtained from the utility grid to satisfy the load requirements. The mathematical expression for this mode is:

\begin{matrix} P_{L o a d} \geq P_{r e n} + P_{B S S} + P_{S C} + P_{S O F C} + P_{G r i d} \end{matrix}

(12)

where

P_{G r i d}

is the power obtained from the utility grid.

Mode 5: WT, PV, BSS, SC, SOFC, Grid and MT Fulfill the Load Demand

If the power delivered from mode 4 cannot satisfy the load demands during peak hours, then the deficit power is obtained from the MT. The mathematical relation for this mode is:

\begin{matrix} P_{L o a d} \geq P_{r e n} + P_{B S S} + P_{S C} + P_{S O F C} + P_{G r i d} + P_{M T} \end{matrix}

(13)

where

P_{M T}

is the power obtained from MT to satisfy the load demand.

3.1.2. Modes of Excess Power

The modes of supervisory control systems with excess power are described below.

Mode 6: Excess Power Given to Electrolyzer

If the power obtained from RES and BSS is more than the power required by the load, then the extra energy is absorbed by the electrolyzer. This mode can be mathematically represented as:

\begin{matrix} P_{L o a d} & \leq & P_{r e n} + P_{B S S} \end{matrix}

(14)

\begin{matrix} P_{e x c e s s} & = & P_{L o a d} - P_{r e n} - P_{B S S} \end{matrix}

(15)

\begin{matrix} P_{e l e c t} & = & P_{e x c e s s} \end{matrix}

(16)

where,

P_{e x c e s s}

is the extra amount of power,

P_{e l e c t}

is the electrolyzer power. The BSS is in discharge mode.

Mode 7: Excess Power Given to SC and Electrolyzer

If the power generated from RES and BSS is greater than the load demand, while SOC of SC is less than 90%, then the excess energy is utilized to charge SC. In the case of more surplus power, the electrolyzer is entertained as well. The mathematical representation of this mode is:

\begin{matrix} P_{L o a d} & \leq & P_{r e n} + P_{B S S} \end{matrix}

(17)

\begin{matrix} P_{e x c e s s} & = & P_{L o a d} - P_{r e n} - P_{B S S} \end{matrix}

(18)

\begin{matrix} P_{S C} & = & P_{e x c e s s} \end{matrix}

(19)

\begin{matrix} P_{e l e c t} & = & P_{e x c e s s} - P_{S C} \end{matrix}

(20)

where

P_{S C}

shows the power absorbed by SC in charging mode. In this mode, RES and BSS are providing power, while load, SC and electrolyzer are absorbing power.

Mode 8: Excess Power Given to SC, Grid, and Electrolyzer

In this mode of operation, the power generated from the RES is greater than the load demand. Therefore, the excess energy is given to SC for its charging. If SMG-HPS still has surplus power generated from RES then the excess energy is delivered to the utility grid during its peak hours and the remaining surplus power is given to the electrolyzer for the production of hydrogen gas. The mathematical expression for this mode is:

\begin{matrix} P_{L o a d} & \leq & P_{r e n} + P_{B S S} \end{matrix}

(21)

\begin{matrix} P_{e x c e s s} & = & P_{L o a d} - P_{r e n} - P_{B S S} \end{matrix}

(22)

\begin{matrix} P_{S C} & = & P_{e x c e s s} \end{matrix}

(23)

\begin{matrix} P_{G r i d} & = & P_{e x c e s s} - P_{S C} \end{matrix}

(24)

\begin{matrix} P_{e l e c t} & = & P_{e x c e s s} - P_{S C} - P_{G r i d} \end{matrix}

(25)

where

P_{G r i d}

shows the power absorbed by the utility grid during peak hours.

Mode 9: Excess Power Given to SC, Grid, and Electrolyzer, while BSS Is Disconnected

This mode of operation is similar to mode 8. However, in this mode, the BSS is disconnected, i.e.,

P_{B A T} = 0

. The excess power generated from RES is delivered to SC for charging, for the utility grid during peak hours, and for the electrolyzer for production of hydrogen gas, respectively. The mathematical expression is:

\begin{matrix} P_{L o a d} & \leq & P_{r e n} \end{matrix}

(26)

\begin{matrix} P_{e x c e s s} & = & P_{L o a d} - P_{r e n} \end{matrix}

(27)

\begin{matrix} P_{S C} & = & P_{e x c e s s} \end{matrix}

(28)

\begin{matrix} P_{G r i d} & = & P_{e x c e s s} - P_{S C} \end{matrix}

(29)

\begin{matrix} P_{e l e c t} & = & P_{e x c e s s} - P_{S C} - P_{G r i d} \end{matrix}

(30)

The supervisory control flow chart is given in Figure 2.

4. Description and Modeling of Proposed Control Schemes

This section discusses the mathematical modeling of, and describes, the three proposed hybrid adaptive NN Q-learning0based full recurrent adaptive NeuroFuzzy (NNQLNF) control schemes. The proposed control schemes consist of two parts: NeuroFuzzy parameter tuning (NFPT) and optimal action-value function

Q^{*} (x, u)

estimator network (QEN).

Q^{*} (x, u)

is estimated using a backpropagation (BP) neural network (NN). The NeuroFuzzy systems used are discussed in Section 6 and Section 7. The NeuroFuzzy parameters are updated using

Q^{*} (x, u)

. The action exploration modifier (AEM) guarantees the trail of all possible actions. The NNQLNF control technique does not depend on prior information for future driving conditions, due to online adaptation of the learning algorithm and fuzzy parameters, which make it a prominent and advantageous control paradigm.

5. Neural Network Q-Learning-Based Full Recurrent Adaptive NeuroFuzzy Control

In control problems, Q-learning and actor-critic learning are two major types of reinforcement learning. The actor-critic learning estimates the state value function and chooses an optimal action for every state. However, in Q-learning, the system approximates the action-value function for all action-state sets and chooses the optimal control technique based on this [42,43].

The schematic of NNQLNF is given in Figure 3. It consists of two parts: the QEN network, that estimates

Q^{*} (x, u)

, and the FRNF architecture, that tunes the parameters. The QEN comprises a BP NN for estimation purpose, while the FRNF is an intelligent identifier discussed in Section 6 and Section 7. Three different FRNF architectures are used as FPT, and, thus, three distinct hybrid adaptive FQL embedded FRNF control paradigms are derived, which are used in this research work.

5.1. Back Propagation NN for Estimating $Q^{*} (x, u)$

If the initial state is x and initial action is u, then the expected discounted sum of rewards gives the action-value function

Q (x, u)

, as given below [42]:

\begin{matrix} Q (x, u) = E (\underset{k = 1}{\sum^{\infty}} γ^{k} r_{t + k + a} | x_{t} = x, u_{t} = u) \end{matrix}

(31)

where, E is the expected value function, and u is the action value. The optimal action-value

Q^{*} (x, u)

is given as:

\begin{matrix} Q^{*} (x, u) = E (r (x_{t + 1}) + γ \max_{u^{'}} Q^{*} (x_{t + 1}, u^{'}) | x_{t} = x, u_{t} = u) \end{matrix}

(32)

QEN approximates/predicts the ideal action-value function

Q^{*} (x, u)

related to different inputs and control output states. The approximation property of BP NN is used for the estimation of

Q^{*} (x, u)

.

The internal architecture of QEN is given in Figure 4. The network consists of a three-layer structure: input node, hidden node, and output node. The inputs are state variable and control action. However, the output is the required QEN i.e.,

Q (x, u)

.

The

Q (x, u)

can be represented mathematically as:

\begin{matrix} Q (x, u) = f (V) \end{matrix}

(33)

where, V gives the weighted sum input of the output node and is given as:

\begin{matrix} V = \underset{k = 1}{\sum^{10}} w_{(40 + k)} \times y_{k} \end{matrix}

(34)

where,

w_{(40 + k)}

represents the weight between hidden and output node and

y_{k}

is the output of hidden layer and is given as:

\begin{matrix} y_{k} = f (a_{k}) \end{matrix}

(35)

where,

a_{k}

gives the added input of kth hidden node and is given as:

\begin{matrix} a_{k} = \underset{i = 1}{\sum^{4}} x_{i} \times w_{(i - 1, j)} \end{matrix}

(36)

where,

w_{(i - 1, j)}

shows the weight between input and hidden node, x =

x_{i}

represents the QEN’s input, and f shows the activation function of the node. The sigmoid function is used as the activation function of the node and is given as:

\begin{matrix} f (x) = \frac{1}{1 + e x p^{- x}} \end{matrix}

(37)

Generalized policy iteration is used for tuning the parameters of QEN. The optimal action-value function is approximated by reducing temporal difference (TD) error,

δ_{t}

continuously with the help of NN. The TD error is given as follows:

\begin{matrix} δ_{t} = r_{t + 1} + γ \max_{u^{'}} Q^{*} (x_{t + 1}, u^{'}) - Q (x_{t}, u_{t}) \end{matrix}

(38)

The cost function is the mean square error and is given as:

\begin{matrix} E = \frac{1}{2} δ_{t}^{2} \end{matrix}

(39)

The gradient descent method is used for fast convergence. The rule for weight update is given as:

\begin{matrix} w (t + 1) & = & w (t) - ζ \frac{\partial E}{\partial w} \end{matrix}

(40)

\begin{matrix} \frac{\partial E}{\partial w} & = & σ_{t} \frac{\partial δ_{t}}{\partial w} = - δ_{t} \frac{\partial Q (x_{t}, u_{t})}{\partial w} \end{matrix}

(41)

\frac{\partial Q (x_{t}, u_{t})}{\partial w}

can be obtained by using chain rule for

w_{(40 + k)}

and

w_{(i - 1, j)}

as shown below:

\begin{matrix} \frac{\partial Q (x_{t}, u_{t})}{\partial w_{(40 + k)}} = \frac{\partial Q (x_{t}, u_{t})}{\partial V} \times \frac{\partial V}{\partial w_{(40 + k)}} \\ = f^{'} (V) \times y (k) \end{matrix}

(42)

\begin{matrix} = y_{k} \times Q (x_{t}, u_{t}) \times (1 - Q (x_{t}, u_{t})) \\ k = 1, \dots, 10 \end{matrix}

(43)

\begin{matrix} \frac{\partial Q (x_{t}, u_{t})}{\partial w_{(i - 1, j)}} & = & \frac{\partial Q (x_{t}, u_{t})}{\partial V} \times \frac{\partial V}{\partial y_{j}} \times \frac{\partial y_{j}}{\partial a_{j}} \times \\ \frac{\partial a_{j}}{\partial w_{(i - 1, j)}} \end{matrix}

(44)

\begin{matrix} = & f^{'} (V) \times y_{j} \times w_{(40 + j)} \\ \times f^{'} (a_{j}) \times u_{i} \end{matrix}

(45)

\begin{matrix} = & w_{(40 + j)} \times u_{i} \times Q (x_{t}, u_{t}) \times \\ [1 - Q (x_{t}, u_{t})] \times y_{j} \times [1 - y_{j}] \\ i = 1, \dots, 4 j = 1, \dots, 10 \end{matrix}

(46)

\frac{\partial Q (x_{t}, u_{t})}{\partial u}

can also be obtained using chain rule as given below:

\begin{matrix} \frac{\partial Q (x_{t}, u_{t})}{\partial u} & = & \frac{\partial Q (x_{t}, u_{t})}{\partial V} \times \\ \underset{j = 1}{\sum^{10}} (\frac{\partial V}{\partial y_{j}} \times \frac{\partial y_{j}}{\partial w_{j}} \times \frac{\partial w_{j}}{\partial u}) \end{matrix}

(47)

\begin{matrix} = & f^{'} (V) \times \underset{j = 1}{\sum^{10}} (w_{j}^{(1)} f^{'} (w_{i}) \times w_{j, 4}^{(2)}) \end{matrix}

(48)

\begin{matrix} = & Q (x_{t}, u_{t}) \times (1 - Q (x_{t}, u_{t})) \times \\ \underset{j = 1}{\sum^{10}} (w_{i}^{(1)} \times w_{i, 4}^{(2)} \times y_{j} \times [1 - y_{j}]) \end{matrix}

(49)

Also;

\begin{matrix} \frac{\partial Q (x_{t}, u_{t})}{\partial u} & = & \frac{\partial Q (x_{t}, u_{t})}{\partial V} \times \\ \underset{j = 1}{\sum^{10}} (\frac{\partial V}{\partial y_{j}} \times \frac{\partial y_{j}}{\partial a_{j}} \times \frac{\partial a_{j}}{\partial u}) \end{matrix}

(50)

\begin{matrix} = & f^{'} (V) \times \underset{j = 1}{\sum^{10}} (w_{(40 + j)} f^{'} (a_{5}) \\ \times w_{(30 + j)}) \end{matrix}

(51)

\begin{matrix} = & Q (x_{t}, u_{t}) \times (1 - Q (x_{t}, u_{t})) \times \\ \underset{j = 1}{\sum^{10}} (w_{(40 + j)} \times w_{(30 + j)} \times \\ y_{j} \times [1 - y_{j}]) \end{matrix}

(52)

It must be noted that the control output of FRNF is an input to the NN too.

6. Full Recurrent Adpative NeuroFuzzy Architectures

A variety of FRNF identifiers are used in order to identify the nonlinear

\hat{f} (x)

and

\hat{g} (x)

functions for different sub-systems in SMG-HPS. The seven-layered FRNF system uses NeuroFuzzy concept for estimation.

Fuzzy logic uses IF-THEN rules for approximation of unknown functions using standard fuzzy model. The unknown functions,

\hat{f} (x)

and

\hat{g} (x)

, can be identified by the standard fuzzy model using a set of rules:

R^{m}

: IF

x_{1}

is

A_{1}^{j} \dots

and

x_{n}

is

A_{n}^{j}

THEN y is

β_{l}^{j}

Let fuzzy logic controller have q inputs,

ρ_{1}, ρ_{2}, \dots, ρ_{q}

. The output of NeuroFuzzy system is given as:

\begin{matrix} Υ & = & \frac{\sum_{l = 1}^{m} \prod_{j = 1}^{q} μ_{F_{j}^{l}} (ρ_{j}) β_{l}}{\sum_{l = 1}^{m} \prod_{j = 1}^{q} μ_{F_{j}^{l}} (ρ_{j})} \end{matrix}

(53)

where

μ_{F_{j}^{l}}

is the membership function,

ρ_{j}

and

β_{l}

are adjustable parameters. It is the point in R (set of rules) at which

μ_{β_{j}}

achieves its maximum value. m is the number of fuzzy rules used to construct the identifier,

F_{j}^{l}

is the jth fuzzy set corresponding to the lth fuzzy rule, and

β_{l}

is centroid of the lth fuzzy set corresponding to identifier output,

\hat{f} (x)

and

\hat{g} (x)

. Equation (53) can be written for

\hat{f} (x)

and

\hat{g} (x)

using fuzzy-basis function vector

ξ (x)

, as:

\begin{matrix} \hat{f} (x) = ξ_{f}^{T} ξ (x) \end{matrix}

(54)

and

\begin{matrix} \hat{g} (x) = ξ_{g}^{T} ξ (x) \end{matrix}

(55)

where

\begin{matrix} β_{f} = {[β_{f 1} β_{f 2} \dots β_{f m}]}^{T} \end{matrix}

(56)

and

\begin{matrix} β_{g} = {[β_{g 1} β_{g 2} \dots β_{g m}]}^{T} \end{matrix}

(57)

and

ξ (x)

is given as

\begin{matrix} ξ = {[ξ_{1} ξ_{2} \dots ξ_{m}]}^{T} = [\frac{\prod_{j = 1}^{q} μ_{F_{j}^{1}} (ρ_{j})}{\sum_{l = 1}^{m} \prod_{j = 1}^{q} μ_{F_{j}^{l}} (ρ_{j})} \\ \dots \frac{\prod_{j = 1}^{q} μ_{F_{j}^{l}} (ρ_{j})}{\sum_{l = 1}^{m} \prod_{j = 1}^{q} μ_{F_{j}^{l}} (ρ_{j})}] \end{matrix}

(58)

Several mathematical relations and functions are available for designing a fast and robust NeuroFuzzy identifier. The following variants were used to design antecedent and the consequent part of the fuzzy logic system for this research work.

6.1. Variants of Antecedent Part

The transformation of continuous input variables into linguistic variables is fuzzification. A membership function is always required for the transformation. The importance of the membership function is based on its shape, that translates complete information of the plant (uncertainties and nonlinearities) in a fuzzy inference system.

6.1.1. Gaussian Membership Function

The Gaussian membership function has the following properties:

local and nonlinear nature
smooth output

Gradient-based techniques are highly suitable for use, due to the continuous differentiable nature of the Gaussian membership function. It is expressed as:

\begin{matrix} μ_{j}^{r} (x_{i}) & = e x p [- {(\frac{x_{i} (k) + m_{i j}}{σ_{i j}})}^{2}] \end{matrix}

(59)

where

m_{i j}

and

σ_{i j}

are the mean and variance of the ith input and jth membership function.

6.1.2. B-Spline Membership Function

The B-Spline membership function (locally controllable membership function formed by polynomial pieces) is used as a variant of the antecedent part. The B-Spline membership function is defined as [44]:

ϱ_{i j} (x_{i}) = O_{i}^{(2)} = \begin{matrix} \sum_{k = 0}^{n} Υ_{k} Δ_{k, p} (x_{i}) \end{matrix} \begin{matrix} 1 \leq p \leq n . \end{matrix}

(60)

where,

ϱ_{i j}

represents the degree of B-Spline membership function,

Υ_{k}

is the control point with

k = 0, 1, 2, 3, \dots, n

with

n + 1

total control points. p is the order of B-Spline basis function.

Δ_{k, p} (x_{i}) = Δ (x_{i} \ ℘_{1}, ℘_{2}, \dots, ℘_{n + p})

is the kth B-Spline basis function and is given by the following Cox-de Boor recursion formula [44]:

\begin{matrix} Δ_{k, p} (x_{i}) = \{\begin{matrix} \begin{matrix} 1 if p = 1 \end{matrix} \begin{matrix} x \in [℘_{i} ℘_{i + 1}[ \end{matrix} \\ \begin{matrix} (\frac{x - ℘_{i}}{℘_{i + p - 1} - ℘_{i}}) Δ_{i, p - 1} + (\frac{℘_{i + p} - x}{℘_{i + p} - ℘_{i + 1}}) \\ Δ_{i + 1, p - 1} if p > 1, x \in [℘_{i} ℘_{i + p}[ \end{matrix} \\ \begin{matrix} 0 if p = 1 \end{matrix} \begin{matrix} x \notin [℘_{i} ℘_{i + 1}[ \end{matrix} \end{matrix} \end{matrix}

(61)

where,

℘ = [℘_{1}, ℘_{2}, \dots, ℘_{n + p}] \in R

is the knot vector such that

℘_{i + 1} - ℘ i \geq 0

.

This research work employed a second-order B-Spline membership function with nine control points and thirteen-knot vectors.

6.2. Variants of Consequent Part

The consequent part generates weights based on different mathematical functions, like the Fourier series function, wavelet networks, and polynomial NN. The operation of the consequent part takes place in parallel to the antecedent part and produces the final output of the identifier at the defuzzification layer. The variants of the consequent part used in this research work are given below.

6.2.1. Fuzzy Wavelet Neural Networks (NNs)

For a better estimation of nonlinear functions, wavelet NNs were proposed as a substitute to feedforward NNs. Due to numerous neurons, NNs may get struck in the local minima, that results in slower convergence of the network. To get rid of this, wavelet functions can be used in the structure. Wavelets are waves having a limited duration and zero mean value. The localization characteristics of wavelets, and the fast learning abilities of NNs, result in better outcomes for complex nonlinear system modeling. The schematic diagram of wavelet NN is given in Figure 5.

Following are the wavelet activation functions used in this research work.

Mexican hat wavelet (MHW) is a negative normalized, non-orthogonal second derivative of Gaussian function. MHW function is expressed as;

$\begin{matrix} Ψ_{i} (x_{i}) & = & {|d_{i j}|}^{- \frac{1}{2}} ψ (z_{i}) \end{matrix}$

(62)

where, $Ψ_{i} (x_{i})$ is the family of wavelets obtained by single $ψ (x_{i})$ function, $z_{i j} = \frac{x_{i} (k) + H_{i j} F_{i j} - t_{i j}}{d_{i j}}$ , $H_{i j}$ represents the output of the consequent part, $F_{i j}$ is adaptive recurrent feedback weight, $t_{i j}$ is the translation, and $d_{i j} \neq 0,$ for $i = 1, 2, \dots, n$ is dilation respectively.
Morlet wavelet (Mor-W) is given as [44]:

$Ψ_{i j} (z_{i j}) = e x p [- 0.5 {(z_{i j})}^{2} c o s (5 z_{i j})]$

(63)

where;

$z_{i j} = \frac{x_{i} (k) + H_{i j} F_{i j} - t_{i j}}{d_{i j}}$

(64)

where $H_{i j}$ represents the output of the consequent part, $F_{i j}$ is adaptive recurrent feedback weight, $t_{i j}$ is the translation, and $d_{i j}$ is dilation of Mor-W.
Legendre wavelets (Leg-W) are also known as spherical harmonic wavelets. They are based on Legendre polynomial, compactly supported, and orthonormal wavelets. They can be expressed as [45]:

$\begin{matrix} Ψ_{p q}^{k} (x) = \{\begin{matrix} \begin{matrix} 2^{p + 1} \sqrt{2 k + 1} \times \\ L_{k} (2^{p} x - q - 0.5), \\ 0, \end{matrix} \begin{matrix} \forall \frac{q}{2^{p}} \leq x \leq \frac{q + 1}{2^{p}} \\ o t h e r w i s e \end{matrix} \end{matrix} \end{matrix}$

(65)

where, $p = 1, 2, \dots, m$ and $q = 0, 1, \dots, 2^{p} - 1$ shows decomposition level and integer translation, respectively. $L_{k}$ shows the Legendre polynomial as given in (66), with k being the degree of the polynomial [45].

$\begin{matrix} L_{p} (x) = \frac{1}{2^{p} n!} \frac{d^{n}}{d x^{n}} {(x^{2} - 1)}^{p} \end{matrix}$

(66)

where, $p \geq 0, x \in [- 1 1]$ . The first four Legendre polynomials for $p = 0, \dots, 3$ used in this work are given below [45]:

$\begin{matrix} L_{o} (x) & = & 1, \\ L_{1} (x) & = & x, \\ L_{2} (x) & = & 0.5 (3 x^{2} - 1), \\ L_{3} (x) & = & 0.5 (5 x^{3} - 3 x) . \end{matrix}$

(67)

Six Leg-W basis functions were used in this research work for $p = 1$ and $q = 0, \dots, 2$ defined on $[0 1]$ .

7. Proposed Full Recurrent Adaptive NeuroFuzzy Identifier

The full recurrent adaptive NeuroFuzzy (FRNF) has seven layers as shown in Figure 6. The antecedent part consists of the first three layers, whereas the remaining four layers are consequent part layers. The n number of input signals in the first layer is equivalent to the m number of nodes and these nodes are used for input distribution.

Let

I_{i}^{k}

and

O_{i}^{k}

represents input and output of the ith node in the kth layer. The operation function of nodes and the signal propagation in each layer of FRNF is given below.

Layer 1

The input is received at this layer and is passed to the next layer. The temporal relationship in the network is shown by the feedback connection of this layer.

The output is:

O_{i}^{(1)} = x_{i}

.

Layer 2

One of the variants described in Section 6.1 is used as the membership function in this layer. This layer estimates the membership degree and fuzzy sets.

The output of this layer, in the case of the B-Spline membership function, is:

O_{i}^{(2)} = ϱ_{i j} (k) + υ ϱ_{i j} (k - 1)

(68)

where,

ϱ

represents one of the variants given in (60),

υ

is the closed-loop adjustable feedback gain of the antecedent part in case of B-Spline membership function.

The output of this layer in the case of Gaussian membership function is:

\begin{matrix} O_{i}^{(2)} & = & e x p [- {(\frac{x_{i} (k) + O_{i}^{(2)} (k - 1) θ_{i j} + m_{i j}}{σ_{i j}})}^{2}] \end{matrix}

(69)

where,

O_{i}

represents the output of the ith node, superscript (2) indicates layer number,

i j

subscript shows the jth term of the ith input,

x_{i}

is the input, and

θ_{i j}

is the recurrent weight,

σ_{i j}

and

m_{i j}

are the variance and mean of the ith input and jth membership function. It must be noted that the recurrent weight of the Gaussian membership function in antecedent part is adaptive.

Layer 3

In the rule layer, the product of the membership function is calculated. The number of rules in this layer determines the number of nodes.

The output, in the case of B-Spline membership function, is:

\begin{matrix} O_{i}^{(3)} & = & \overset{n}{\prod_{i = 1}} O_{i}^{(2)} \end{matrix}

(70)

For the Gaussian membership function, the output of this layer is:

\begin{matrix} O_{i}^{(3)} & = & \overset{n}{\prod_{i = 1}} e x p [- {(\frac{x_{i} + O_{i}^{(2)} (k - 1) θ_{i j} - m_{i j}}{σ_{i j}})}^{2}] \end{matrix}

(71)

Layer 4

This layer approximates the weighted firing strength and shows the THEN-part of fuzzy rules. The inputs to this layer are the error signal and own weighted feedback signal.

The output of this layer for Leg-W given in Section 6.2 is:

\begin{matrix} O_{i}^{(4)} = H_{i} (k) + δ_{i} H_{i} (k - 1) \end{matrix}

(72)

where, for Leg-W using (65),

H_{i} = \sum_{p = 1}^{N} \sum_{q = 0}^{2^{p} - 1} \sum_{k = 0}^{K} w_{n m}^{k} Ψ_{p q}^{k}

,

w_{n m}

represents Leg-W coefficients. The feedback weight is a closed-loop fixed gain for this layer.

In the case of MHW and Mor-W, the output of this layer is:

\begin{matrix} O_{i}^{(4)} = w_{i} \underset{i = 1}{\sum^{n}} Ψ_{i j}^{k} (Z_{i j}) \end{matrix}

(73)

where,

w_{i}

is the weight and

Ψ_{i j}^{k} (Z_{i j})

is given in Equations (62) and (63) for MHW and Mor-W, respectively.

It must be kept in mind that the feedback gain for MHW and Mor-W is adaptive, while the feedback gain for all other variants used is conventional feedback adjustable fixed gain.

Layer 5

In the first defuzzification layer, the sum of products of antecedent and consequent parts of (70) and (72) is calculated.

The output is:

\begin{matrix} O_{i}^{(5)} & = & \underset{i = 1}{\sum^{n}} O_{i}^{(4)} O_{i}^{(3)} \end{matrix}

(74)

Layer 6

In the second defuzzification layer, the sum of all the rules from (70) is calculated.

The output is:

\begin{matrix} O_{i}^{(6)} & = & \underset{i = 1}{\sum^{n}} O_{i}^{(3)} \end{matrix}

(75)

Layer 7

The required nonlinear functions are approximated in the output layer.

The output is:

\begin{matrix} \hat{u_{f}} = O_{f}^{(7)} = \hat{f} (x) = \frac{O_{f i}^{(5)}}{O_{f i}^{(6)}} \end{matrix}

(76)

\begin{matrix} \hat{u_{g}} = O_{g}^{(7)} = \hat{g} (x) = \frac{O_{g i}^{(5)}}{O_{g i}^{(6)}} \end{matrix}

(77)

7.1. Optimization Algorithm

The training of FRNF is for the adjustment of input-output pairs or a given function by fine-tuning network parameters. Mean square error is used as the cost function for the training purpose, given as [46,47,48,49,50,51];

\begin{matrix} E & = & \frac{1}{2} {(\hat{y} - y)}^{2}, \end{matrix}

(78)

where E is the identification error,

\hat{y}

is the approximated output of the subsystem, and y is the actual output of the subsystem, respectively. The gradient descent method is used for fast cost function reduction and convergence [46,47,48,49,50,51]. The general equation is given as follows:

\begin{matrix} Ω (k + 1) & = & Ω (k) - γ g_{k} \end{matrix}

(79)

where,

g_{k}

is the gradient of cost function at kth iteration,

γ > 0

is the learning rate and k is the iteration index.

8. Full Recurrent Adaptive NeuroFuzzy Identifiers

The parameters of FRNF were tuned, based on the estimation of

Q (x, u)

, as discussed in Section 5.1. The optimization of FRNF output was achieved by updating the parameters of FRNF and maximizing the action-value function

Q (x, u)

w.r.t. control output u. The parameters of FRNF were tuned using the gradient descent method, described in Section 7.1. The following three different combinations of FRNF algorithms were used in this research work for tuning of parameters.

8.1. FRNF-HBs-LegW Identifier

In this FRNF the identifier was the B-Spline membership function, described in Section 6.1.2, while the consequent part was the Legendre wavelet given in Section 6.2.1. The proposed identifier was a seven-layered scheme, discussed in Section 7. The schematic diagram of the identifier is shown in Figure 7. Both the antecedent and consequent parts were conventional recurrent.

The update equations of all the parameters of the antecedent part and the consequent part, according to (79), used the following chain rules:

\begin{matrix} \frac{\partial E}{\partial ζ_{i j}} = \frac{\partial E}{\partial ξ_{i}} \frac{\partial ξ_{i}}{\partial ς_{i}} \frac{\partial ς_{i}}{\partial ζ_{i j}} \end{matrix}

(80)

where,

ζ_{i j}

shows the variants of B-Spline membership function defined in (60). The update equations for B-Spline membership function in antecedent part is:

\begin{matrix} μ_{i j} (p + 1) = μ_{i j} (p) + γ e_{q} \frac{ξ_{j} (β_{i} - \hat{u} u)}{μ_{i j} \sum_{j = 1}^{n} ξ_{j}} \end{matrix}

(81)

where,

\hat{u}

is the output of identifier given in Equation (76),

e_{q} = (e \frac{\partial y}{\partial ζ})

,

ξ_{j}

is output of Equation (70),

β

is output of Equation (72), update parameters are rules and weights

[μ]

.

The chain rule for consequent part is given below:

\begin{matrix} \frac{\partial E}{\partial ϑ_{i j}^{(p)}} = ψ_{i j}^{(p)} \frac{ξ_{j} (x_{i})}{\sum_{j = 1}^{n} ξ_{j} (x_{i})} \end{matrix}

(82)

where

ψ_{i}

represents Leg-W as given in (65) and

ϑ

shows the Leg-W variants. The update equations are:

\begin{matrix} ϑ_{i j}^{p} (p + 1) & = & ϑ_{i j}^{p} (p) + γ e_{q} \frac{ξ_{j}}{\sum_{j = 1}^{n} ξ_{j}} \times ψ_{i j}^{(p)} \end{matrix}

(83)

where,

e_{q} = (e \frac{\partial y}{\partial ϑ})

,

ξ_{j}

is output of Equation (70), and update parameters are polynomials of Leg-P

[ϑ_{i j}^{p}]

given in Equation (67).

8.2. FRNF-MHW Identifier

In this FRNF, the identifier is given in Equation (76), the antecedent part in this control scheme is the Gaussian membership function, described in Section 6.1.1, while the consequent part is the MHW described in Section 6.2.1. The FRNF-MHW identifier has a seven-layered architecture, given in Section 7. The schematic diagram of identifier is shown in Figure 8. The recurrent part in this identifier is adaptive for both antecedent and consequent parts.

The update equations of all the parameters of the antecedent part, according to (79), uses the following chain rules:

\begin{matrix} \frac{\partial E}{\partial ζ_{i}} = \frac{\partial E}{\partial ξ_{i}} \frac{\partial ξ_{i}}{\partial μ_{i}} \frac{\partial μ_{i}}{\partial ζ_{i}} \end{matrix}

(84)

where,

ζ_{i}

shows the variant like mean, variance and feedback weight of Gaussian membership function.

\begin{matrix} m_{i} (k + 1) = m_{i} (k) + γ e_{q} [\frac{(β_{i} - u) ξ_{i}}{\sum_{j = 1}^{n} ξ_{i}}] \times \\ [\frac{x_{i} (k) + μ_{i} (k - 1) θ_{i} - m_{i}}{{(σ_{i})}^{2}}] \end{matrix}

(85)

\begin{matrix} σ_{i} (k + 1) = σ_{i} (k) + γ e_{q} [\frac{(β_{i} - u) ξ_{i}}{\sum_{j = 1}^{n} ξ_{i}}] \times \\ [\frac{(x_{i} (k) + μ_{i} (k - 1) θ_{i} - m_{i})}{{(σ_{i})}^{3}}] \end{matrix}

(86)

\begin{matrix} θ_{i} (k + 1) = θ_{i} (k) - γ e_{q} [\frac{(β_{i} - u) ξ_{i}}{\sum_{j = 1}^{n} ξ_{i}}] \times \\ [\frac{x_{i} (k) + μ_{i} (k - 1) θ_{i} - m_{i}}{{(σ_{i})}^{2}}] \end{matrix}

(87)

where,

\hat{u}

is the output of identifier,

e_{q} = (e \frac{\partial y}{\partial ζ})

,

ξ_{j}

is output of Equation (70),

β

is output of Equation (72), update parameters are mean, variance, and feedback weight

[m σ θ]

.

The chain rule for the consequent part is given below:

\begin{matrix} \frac{\partial E}{\partial ϑ_{i}} = \frac{\partial E}{\partial ξ_{i}} \frac{\partial ξ_{i}}{\partial β_{i}} \frac{\partial β_{i}}{\partial Ψ_{i}} \frac{\partial Ψ_{i}}{\partial z_{i}} \frac{\partial z_{i}}{\partial ϑ_{i}} \end{matrix}

(88)

where here

Ψ_{i}

represents MHW given in Equation (62),

z_{i}

defined below is an intermediate state variable and

ϑ

shows the variants of MHW like, translation, dilation, feedback weight etc.

\begin{matrix} z_{i} = (\frac{x_{i} (k) + H_{i} F_{i} - t_{i}}{d_{i}}) \end{matrix}

(89)

\begin{matrix} t_{i} (k + 1) = t_{i} (k) - γ e_{q} (\frac{ξ_{i} w_{i}}{\sum_{j = 1}^{n} ξ_{i}}) \times \\ [e^{- 0.5 {(z_{i})}^{2}} \frac{[\frac{0.5}{z_{i}} - 3.5 z_{i} + {(z_{i})}^{3}]}{| d_{i} |^{3 / 2}}] \end{matrix}

(90)

\begin{matrix} d_{i} (k + 1) = d_{i} (k) - γ e_{q} (\frac{ξ_{i} w_{i}}{\sum_{j = 1}^{n} ξ_{i}}) \times \\ [e^{- 0.5 {(z_{i})}^{2}} \frac{[0.5 - 3.5 z_{i}^{2} + {(z_{i})}^{4}]}{| d_{i} |^{3 / 2}}] \end{matrix}

(91)

\begin{matrix} F_{i} (k + 1) = F_{i} (k) + γ e_{q} (\frac{ξ_{i} w_{i}}{\sum_{j = 1}^{n} ξ_{i}}) \times \\ [e^{- 0.5 {(z_{i})}^{2}} \frac{[\frac{0.5}{z_{i}} - 3.5 z_{i} + {(z_{i})}^{3}]}{| d_{i} |^{3 / 2}}] H_{i} \end{matrix}

(92)

where,

e_{q} = (e \frac{\partial y}{\partial ϑ})

,

ξ_{j}

is output of Equation (70), and update parameters are translation, dilation, and feedback weight

[t d F]

.

The weight of the consequent part is updated according to the following chain rule:

\begin{matrix} \frac{\partial E}{\partial w_{i}} = \frac{\partial E}{\partial ξ_{i}} \frac{\partial ξ_{i}}{\partial β_{i}} \frac{\partial β_{i}}{\partial w_{i}} \end{matrix}

(93)

where

w_{i}

represents the weight of the consequent layer.

\begin{matrix} w_{i} (k + 1) = w_{i} (k) + γ e_{q} (\frac{ξ_{i}}{\sum_{j = 1}^{n} ξ_{i}}) [ψ_{i + 1} + ψ_{i}] \end{matrix}

(94)

where,

ψ

shows the MHW given in Equation (62).

8.3. FRNF-Mor-W Identifier

In this FRNF, the identifier is given in Equation (76), the antecedent part in this control scheme is the Gaussian membership function, described in Section 6.1.1, while the consequent part is the Morlet wavelet given in Section 6.2.1. The FRNF-Mor-W identifier has a seven-layered architecture, given in Section 7. The schematic diagram of the identifier is shown in Figure 9. The recurrent part in this identifier is adaptive for both antecedent and consequent parts.

The update equations of all the parameters of the antecedent part, according to (79), use the following chain rules:

\begin{matrix} \frac{\partial E}{\partial ζ_{i}} = \frac{\partial E}{\partial ξ_{i}} \frac{\partial ξ_{i}}{\partial μ_{i}} \frac{\partial μ_{i}}{\partial ζ_{i}} \end{matrix}

(95)

where,

ζ_{i}

shows the variant like mean, variance and feedback weight of Gaussian membership function.

\begin{matrix} m_{i} (k + 1) = m_{i} (k) + γ e_{q} [\frac{(β_{i} - \hat{u}) ξ_{i}}{\sum_{j = 1}^{n} ξ_{i}}] \times \\ [\frac{x_{i} (k) + μ_{i} (k - 1) θ_{i} - m_{i}}{{(σ_{i})}^{2}}] \end{matrix}

(96)

\begin{matrix} σ_{i} (k + 1) = σ_{i} (k) + γ e_{q} [\frac{(β_{i} - u) ξ_{i}}{\sum_{j = 1}^{n} ξ_{i}}] \times \\ [\frac{(x_{i} (k) + μ_{i} (k - 1) θ_{i} - m_{i})}{{(σ_{i})}^{3}}] \end{matrix}

(97)

\begin{matrix} θ_{i} (k + 1) = θ_{i} (k) - γ e_{q} [\frac{(β_{i} - u) ξ_{i}}{\sum_{j = 1}^{n} ξ_{i}}] \times \\ [\frac{x_{i} (k) + μ_{i} (k - 1) θ_{i} - m_{i}}{{(σ_{i})}^{2}}] \end{matrix}

(98)

where,

\hat{u}

is the output of identifier given in Equation (76),

e_{q} = (e \frac{\partial y}{\partial ζ})

,

ξ_{j}

is output of Equation (70),

β

is output of Equation (72), update parameters are mean, variance, and feedback weight

[m σ θ]

.

The chain rule for the consequent part is given below:

\begin{matrix} \frac{\partial E}{\partial ϑ_{i}} = \frac{\partial E}{\partial ξ_{i}} \frac{\partial ξ_{i}}{\partial ς_{i}} \frac{\partial ς_{i}}{\partial Ψ_{i}} \frac{\partial Ψ_{i}}{\partial ϑ_{i}} \end{matrix}

(99)

where

Ψ_{i}

represents Mor-W as given in (63) and

ϑ

shows the variants of Mor-W-like translation, and dilation as given in (63). The update equations are:

\begin{matrix} t_{i j} (p + 1) & = & t_{i j} (p) + γ e_{q} ℧ \end{matrix}

(100)

\begin{matrix} d_{i j} (p + 1) & = & d_{i j} (p) + γ e_{q} ℧ z_{i j} \end{matrix}

(101)

\begin{matrix} w_{i j} (p + 1) & = & w_{i j} (p) + γ e_{q} \frac{ξ_{j}}{\sum_{j = 1}^{n} ξ_{j}} \end{matrix}

(102)

where,

e_{q} = (e \frac{\partial y}{\partial ϑ})

,

ξ_{j}

is output of Equation (70),

℧ = \frac{ξ_{j} w_{j}^{(5)}}{\sum_{j = 1}^{n} ξ_{j}} \times \frac{(c o s (5 z_{i j}) e^{- 0.5 z_{i j}^{2}} z_{i j} + 5 s i n (5 z_{i j}) e^{- 0.5 z_{i j}^{2}})}{d_{i j}^{(5)}}

, and update parameters are rules, translation, dilation, and weight

[μ t d w]

.

9. Exploration Policy and Action Modifier

If all the actions from all the states are tried, then

Q (x, u)

converges to

Q^{*} (x, u)

, with a probability of 1 [52]. In this research work, an exploration policy was implemented that guaranteed the trail of all possible actions for the control output

\hat{u}

given by the FRNF. The AEM generated a control output

u_{c}

. The

u_{c}

was the sum of a disturbed action

u_{d}

and

\hat{u}

. The standard deviation of

u_{d}

was

σ_{Q (t)}

, which was recommended by FRNF and its normal distribution was zero. The problem of exploration was solved by this AEM in reinforcement learning [42].

\begin{matrix} u_{c} & = & \hat{u} + u_{d} u_{d} \sim N (0, σ_{Q (t)}) \end{matrix}

(103)

where,

\begin{matrix} σ_{Q (t)} & = & \frac{k}{[1 + 2 e x p (m a x Q (x, a))]} \end{matrix}

(104)

where, k is a variable that shrinks or expands

u_{d}

.

10. Proposed Hybrid Adaptive Neural Network Q-Learning-Based Full Recurrent Adaptive NeuroFuzzy Control Paradigms

Using different combinations of the FRNF identifier algorithms, discussed in Section 8, the following three hybrid adaptive Q-learning control paradigms were proposed:

Hybrid adaptive full recurrent Legendre wavelet-based Neural Network Q-learning control
In this control scheme, the NNQLNF was embedded with an FRNF-Leg wavelet identifier, discussed in Section 5.1 and Section 8.1.
Hybrid adaptive full recurrent Mexican hat wavelet-based Neural Network Q-learning control
In this control scheme, the NNQLNF was embedded with the FRNF-MHW identifier, discussed in Section 5.1 and Section 8.2.
Hybrid adaptive full recurrent molet wavelet-based Neural Network Q-learning-based control
In this control scheme, the NNQLNF was embedded with the FRNF-Mor-W identifier, discussed in Section 5.1 and Section 8.3.

11. Implementation Procedure of Hybrid Adaptive NNQLNF Control Paradigms

The following steps took place for the implementation of the proposed hybrid adaptive NNQLNF control paradigms [42].

In the first step, the initialization of $Q (x_{t}, u_{t})$ , the parameters of FRNF and the weights $w_{1} - w_{40}$ , $w_{41} - w_{50}$ of QEN took place.
Control output $u_{t}$ was obtained from the FRNF identifier.
$u_{t}$ was then processed by AEM, according to Equation (103).
$u_{c}$ was the actual control output fed to the system.
The estimated $Q (x_{t + 1}, u_{t + 1})$ was obtained from QEN, depending on control action, previous and current states.
The performance of controller r, $Q (x_{t}, u_{t})$ and $Q (x_{t + 1}, u_{t + 1})$ , given in Equation (38), were used to estimate the TD error from Equation (38).
QEN was updated, based on Equations (43) and (46).
Parameters of FRNF were updated.
$Q (x_{t}, u_{t})$ was updated to $Q (x_{t + 1}, u_{t + 1})$ .
If the parameters of QEN and FRNF were not updated for a specific interval of time, then the learning procedure terminated; otherwise, the algorithm was repeated from step 2 again.

12. PQ Control of Interlinking Inverter Using Hybrid Adaptive NN Based Q-Learning Full Recurrent Adaptive NeuroFuzzy Control Paradigms

This section focuses on the PQ control of the interlinking inverter of SMG-HPS using the adaptive hybrid NNQLNF control paradigms, discussed in Section 5. The main aim of this research was to minimize the THD, frequency, and voltage fluctuations, which arise due to the use of multiple renewable energy sources and non-renewable energy sources along with converters and nonlinear loads. The objective was achieved by PQ control using the three proposed adaptive hybrid NNQLNF-based control schemes on interlinking inverters. The results were compared with an aPID control scheme.

13. Formulation of Control Problem

The active and reactive powers were obtained from the AC-bus. The difference between actual and reference power generated the error signal, e, which was used for tuning the parameters of the proposed control schemes.

\begin{matrix} e_{p} = P_{r e f} - P_{A C} \end{matrix}

(105)

\begin{matrix} e_{q} = Q_{r e f} - Q_{A C} \end{matrix}

(106)

where,

e_{p}

is the real power error,

P_{r e f}

is the real reference power,

P_{A C}

is the real AC bus power,

e_{q}

is the reactive power error,

Q_{r e f}

is the reactive reference power and

Q_{A C}

is the reactive AC-bus power.

The error signals were input to the proposed control schemes. The control output obtained from Equation (103) of both real and reactive power was transformed from dqo to abc and was fed to the PWM generator. The control objective was achieved by the convergence of the following signals:

\begin{matrix} lim_{t \to + \infty} \{\begin{matrix} P_{A C} (t) \to P_{r e f} (t) \\ Q_{A C} (t) \to Q_{r e f} (t) \end{matrix} \end{matrix}

(107)

The proposed control schemes achieved the objective and, thus, produced minimum THD and negligible fluctuations in frequency and voltage profiles.

14. Results and Discussion

The SMG-HPS testbed was prepared and simulated in Matlab/Simulink R2015a. The technical details of of SMG-HPS are discussed in Section 2. The real-time environmental data (wind speed, ambient temperature, and solar irradiance) were obtained from the Pakistan Meteorological Department (PMD) for a complete solar day at the Islamabad station. The case study taken was the Defense Housing Authority (DHA), Islamabad, Pakistan. The irradiation varied with the appearance of the sun. The average irradiation level during day time was about 1000 W/m

^{2}

, while the average temperature level was 20

^{°}

C, having a maximum peak of about 42.6

^{°}

C during the day time. Figure 10 shows the irradiance profile on the left y-axis and the temperature profile on the right y-axis, while Figure 11 shows the wind speed profile used for this case study.

In this study, three intelligent hybrid adaptive NNQLNF control schemes were implemented on an interlinking converter in the proposed SMG-HPS. The performances of all the control schemes were compared with an aPID control scheme. Based on the simulation results, the best performing controller was identified.

For stability of the power system, the net power on the AC-bus should be equal to zero. Figure 12 shows the net active power profiles of the AC-bus for three NNQLNF-based controllers and an aPID control scheme. It was clear from the results that the net power, due to the aPID was of higher magnitude, as compared to the proposed control schemes. The negligible variations in the net real power on the AC-bus was due to the use of power electronic devices, converters, and nonlinear loads. However, the least magnitude was observed for the hybrid adaptive FRNF-MHW-based NNQLNF control scheme.

Figure 13 shows the net reactive power profile of the AC-bus for the proposed control schemes, compared with the aPID. The plots show the variation due to the aPID was of greater magnitude, as compared to the proposed control schemes. The variation in net reactive power of the AC-bus due to the proposed control schemes was of negligible magnitude. Little variation in the results was due to the use of multiple RESs that produced varying outputs, depending on weather conditions. Other factors of the variations in net active and reactive power profiles were due to the use of power electronic converters, CS, and nonlinear loads.

For the stability of the power system and life of various loads, the frequency of the AC-bus must remain close to 50 Hz. Figure 14 shows the frequency of the AC-bus maintained by the interlinking converter during various modes of operation in SMG-HPS. The results revealed that the frequency of the AC-bus for an aPID control scheme had greater variations, which could lead to instability of the power system. The larger magnitude of the frequency at the AC-bus reduced the life of power electronic devices and other sensitive loads. The frequency variations were responsible for imperfect charging/discharging of power storage devices, and, thus reduced their efficiency as well. It must be noted that the frequency maintained by using the proposed intelligent NNQLNF-based control schemes for interlinking converters was nearly 50 Hz all the time. However, negligible variations found in the simulation results were because of continuous switching of power between AC sub-grid and DC sub-grid. Other factors included the use of weather-dependent RES, power electronic converters, and nonlinear loads.

Table 2 shows the average values of

Δ

P

_{AC - bus}

and

Δ

Q

_{AC - bus}

variations.

Figure 15 shows the %age change in load current frequency by various control schemes. The results revealed the performance of the proposed control schemes as being superior over the aPID control scheme. The negligible variation in the results for the proposed control schemes was due to sudden load changes, inverters, regulators, and charging/discharging operation of multiple devices. The variations were also because of power transfer to the utility grid during peak hours and bidirectional power transfer between the charging station and micro-grid. However, the results obtained from the proposed hybrid adaptive FRNF-MHW-based NNQLNF control scheme were more satisfactory and reliable, compared to the other control schemes.

Figure 16 shows the RMS voltage profiles for the various control schemes. The results showed that the RMS voltage for the aPID control scheme was not stable for any specific time interval and was continuously varying. This reduced the life of power electronic devices and other machines connected to the SMG-HPS. However, the RMS voltage for the proposed control schemes showed stable magnitude, compared with an aPID control scheme. The variations were because of switching of power between the AC sub-grid and the DC sub-grid, along with the varying output power of RES due to weather conditions.

Figure 17 shows the comparison of the percentage change in THD for the load current. The %

Δ

THD in load current due to the proposed control schemes was of lower magnitude, as compared to an aPID control scheme. The %

Δ

THD complied with IEEE standard 1547 [53]. However, the least magnitude was observed for the proposed adaptive FRNF-MHW-based NNQLNF control scheme, which proved its performance over other control schemes.

Table 3 shows the average values of %

Δ

f

_{Load}

and %

Δ

THD

_{Load}

in load current.

Figure 18 shows the maximum output power obtained for the best performing controller given in the previously published article [28], with the same SMG-HPS. The interlinking converter in the research work referred to above was controlled by an aPID controller. However, in this research work, the maximum power output of PV array was obtained with different proposed controllers for the interlinking inverter. It is clear from the figure that the overall maximum power was obtained for the proposed controllers of the interlinking inverter. However, the maximum power was seen for he proposed adaptive FRNF-MHW-based NNQLNF control scheme, which proved its performance over other control schemes.

Figure 19 shows the maximum output power obtained for the best performing controller given in a previously published article [29], with the same SMG-HPS. The interlinking converter in the research work referred to above was controlled by an aPID controller. However, in this research work the maximum power output of the wind turbine was obtained with different proposed controllers for the interlinking inverter. It is clear from the figure that the overall maximum power was obtained for the proposed controllers of the interlinking inverter. However, the maximum power was observed for the proposed adaptive FRNF-MHW-based NNQLNF control scheme, which proved its performance over other control schemes.

Figure 20 shows a spider chart for comparable parameters of interlinking converter controllers. The values were scaled (

Δ

P

_{A C}

by 100,

Δ

Q

_{A C}

by 20, %

Δ

f by 10, and %

Δ

THD by 15) to have a clear picture. The observations and analysis revealed that the hybrid adaptive FRNF-MHW-based NNQLNF control scheme provided superior performance over all other control schemes.

15. Conclusions

In this article, mathematical modeling of the proposed hybrid adaptive FRNF, based on an NN Q-learning control scheme, was discussed in detail. The proposed control schemes used BP NN, FRNF, and AEM for estimation of theoptimal action-value function, parameters update and action exploration modifier. An intelligent supervisory control system was also described in detail with nine modes of operation. The supervisory control was responsible for optimal power flow and ensured the power balance between generation and load.

This research work also focused on three adaptive FRNF-based NN Q-Learning control schemes for the control of a bidirectional interlinking converter between bybrid AC and DC sub-grids in the proposed SMG-HPS. The results were compared with an aPID control scheme. The use of intelligent controllers ensured the following: (i) power system stability, (ii) power quality, and (iii) reliability. These benefits were due to reducing the following: (a) %age

Δ

THD and (b) %age

Δ

f in the load current for various modes of the power system operation. The proposed control schemes improved stability by reducing net real and reactive powers on the AC-buses. However, the overall best performance was observed for the proposed adaptive FRNF-MHW-based NN Q-Learning control scheme.

16. Future Work

Intelligent adaptive supervisory control based on NN/Fuzzy logic, advanced control schemes to schedule BSS/PHEV charging/discharging for profit gain, inclusion of priority-based sensitive loads, like data centers/hospitals, are some interesting and recommended future work.

Author Contributions

M.A. and L.K. designed the idea. M.A. worked on the mathematical analysis of the control scheme. M.A. did background research and implemented the model in the software. M.A. conducted the simulations. M.A., S.G.K., Q.A. and M.J. compiled and analyzed the results under the supervision of L.K. M.A. formulated the draft. L.K., S.G.K., Q.A. and M.J. tailored the article. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations were used in this manuscript:

AEM	Action exploration modifier
aPID	Adaptive PID
BP	Back propagation
BIC	Bidirectional interlinking converter
BSS	Battery storage system
CS	Charging station
DG	Distributed generator
ESS	Energy storage system
FPT	Fuzzy parameter tuning
FQL	Fuzzy Q-learning
FRNF	Full recurrent adaptive Neurofuzzy
Leg-W	Legendre wavelet
MHW	Mexican hat wavelet
Mor-W	Morlet wavelet
MT	Micro-turbine
NFPT	Neurofuzzy parameter tuning
NN	Neural network
NNQLNF	Neural network Q-learning based full recurrent adaptive Neurofuzzy
PHEV	Plug-in-hybrid-vehicle
PQ	P shows real power and Q shows reactive power
PV	Photovoltaic
QEN	estimator network
RES	Renewable energy resource
SC	Supercapacitor
SMG-HPS	Smart microgrid hybrid power system
SOFC	Solid oxide fuel cell
SOC	State of charge
SPF	Single point failure
TD	Temporal difference
THD	Total harmonic distortion
WT	Wind Turbine

Appendix A. Entire System Parameters

Appendix A.1. Parameters of VS-WECS

Table A1 below gives the parameters of VS-WECS (wind turbine) used for this research work:

Table A1. Parameters of VS-WECS.

Type	nED-100
Base wind speed	10 m/s
Rotor speed	3367 rpm
Drive train	2-mass model
Pitch angle	$0^{°}$
Rated power	100 kW

Appendix A.2. Parameters of SOFC

Table A2 below gives the parameters of SOFC used for this research work.

Table A2. Parameters of SOFC.

Type	Bloom Energy USA ES-5700
Number of cells in seires	768
SOFC stack	4 kW
SOFC array	$5 \times 10 = 50$
SOFC array power rating	50 × 4 kW = 200 kW

Appendix A.3. Parameters of PV

Table A3 below gives the parameters of PV subsystem used for this research work.

Table A3. Parameters of PV-farm.

Type	SunPower SPR-305-WHT
Module unit	305 W @ 1 kW/m $^{2}$ , 25 $^{°}$ C
Number of series string/module	13
Number of parallel string/module	66
Power rating	$305 \times 13 \times 66 \approx 262$ kW

Appendix A.4. Parameters of Charging Station

Table A4 below gives the parameters of PHEVs and BSS installed at CS used in this research work.

Table A4. Ratings of PHEV’s batteries.

Vehicle Company	Battery Type	Battery Capacity	Rated Voltage
Vehicle Company	Battery Type	(kWh)	(V)
Nissan	Li-ion	24.0	360
Renault	Li-ion	22.0	300
Mitsubishi	Li-ion	16.0	20
Toyota	Li-ion	6.7	300
Honda	Li-ion	4.4	201

Appendix A.5. Modeling and Parameters of Battery

BSS is an integral part of an SMG-HPS and provides high energy density. It stores electrical power from SMG-HPS during off-peak hours, and, thus, helps in the utilization of RESs at their maximum. The stored electrical energy is returned to the SMG-HPS during peak hours; thus, making it more flexible and reliable. The important parameters of a BSS are the state of charge (SOC) and the battery voltage. The battery voltage is given as under [54,55,56,57].

V_{b a t} = V_{o c} - r_{i} \cdot I_{b a t}

(A1)

where

V_{b a t}, V_{o c}, r_{i}

, and

I_{b a t}

are the terminal voltage, open-circuit voltage, internal resistance and output current of the battery respectively. The output current

I_{b a t}

of the battery can be estimated as:

I_{b a t} = \frac{V_{o c} - \sqrt{V_{o c}^{2} - 4 \cdot r_{i} \cdot P}}{2 r_{i}}

(A2)

The following equation shows the Coulomb counting method for the estimation of SOC of a battery:

S O C_{b a t} = S O C_{b a t}^{i n i} - \int \frac{η \cdot I_{b a t}}{q} d t

(A3)

where

S O C_{b a t}, S O C_{b a t}^{i n i}, η

, and q represents battery SOC, battery initial SOC, charge/discharge mode and battery capacity (ampere-hour) respectively.

Table A5 below gives the parameters of batteries used in this research work:

Table A5. Parameters of batteries.

Type	CINCO FM/BB12100T
Capacity	200 Ah
Voltage/string	12 V
Number of parallel strings	3
Number of series strings	34
Rated voltage	12 × 34 ≈ 400 V

Appendix A.6. Parameters of Electrolyzer

Table A6 below gives the parameters of electrolyzer used in this research work.

Table A6. Parameters of electrolyzer.

Type	QualeanQL-85000
Rated power	150 kW
Rated voltage	380 V
Number of cells in the stack	30
Number of electrolyzers	6

Appendix A.7. Parameters of Microturbine

Table A7 below gives the parameters of the microturbine used in this research work:

Table A7. Parameters of microturbine.

Type	Ingersoll Rand MT250
Rated power	200 kVA, 160 kW
Rated voltage	440 V
Rated frequency	50

Appendix A.8. Parameters of Utility Grid

Table A8 below gives the parameters of UG used in this research work.

Table A8. Parameters of UG.

Parameter	Rating
Phase Voltage	11 kV
Frequency	50 Hz
Rated power	10 MVA

Appendix A.9. Parameters of Interlinking Inverter

Table A9 below gives the parameters of interlinking inverter used in this research work:

Table A9. Parameters of Interlinking Inverter.

Type	Zhejiang, China CHZIRI-2VF
Rated power	400 kW
Rated voltage	200/540 V
Inductance L-filter	2.1 $μ$ H

Appendix A.10. Adaptive PID Control System

The aPID control law, used for various comparisons and control in this research work, is given below:

\begin{matrix} u_{a P I D} (j) = K_{P - a P I D} (j) e (j) + \\ K_{I - a P I D} (j) \int (e (j)) d t + K_{D - a P I D} (j) \frac{d (e (j))}{d t} \end{matrix}

(A4)

where,

K_{P_{a} P I D}

,

K_{I - a P I D}

, and

K_{D - a P I D}

are proportional, integral, and derivative constants.

e (j)

is the error signal used as cost function. The update equations are:

\begin{matrix} K_{P - a P I D} (j + 1) = K_{P - a P I D} (j) + α e^{2} (j) \end{matrix}

(A5)

\begin{matrix} K_{I - a P I D} (j + 1) = K_{I - a P I D} (j) + α e (j) \int e (j) d t \end{matrix}

(A6)

\begin{matrix} K_{D - a P I D} (j + 1) = K_{D - a P I D} (j) + α e (j) \frac{d (e (j))}{d t} \end{matrix}

(A7)

References

Wang, P.; Jin, C.; Zhu, D.; Tang, Y.; Loh, P.C.; Choo, F.H. Distributed Control for Autonomous Operation of a Three-Port AC/DC/DS Hybrid Microgrid. IEEE Trans. Ind. Electron. 2015, 62, 1279–1290. [Google Scholar] [CrossRef]
Jin, C.; Dong, C.; Wang, J.; Wang, P. Uniform Control Scheme for the Interlinking Converter Enhancing the Economy and Resilience of Hybrid AC/DC Microgrids. In Proceedings of the 2018 IEEE Innovative Smart Grid Technologies–Asia (ISGT Asia), Singapore, 22–25 May 2018. [Google Scholar] [CrossRef]
Kebede, A.A.; Kalogiannis, T.; Van Mierlo, J.; Berecibar, M. A comprehensive review of stationary energy storage devices for large scale renewable energy sources grid integration. Renew. Sustain. Energy Rev. 2022, 159, 112213. [Google Scholar] [CrossRef]
Wang, G.; Sadiq, M.; Bashir, T.; Jain, V.; Ali, S.A.; Shabbir, M.S. The dynamic association between different strategies of renewable energy sources and sustainable economic growth under SDGs. Energy Strategy Rev. 2022, 42, 100886. [Google Scholar] [CrossRef]
Pires, V.F.; Cordeiro, A.; Roncero-Clemente, C.; Rivera, S.; Dragičević, T. DC-DC Converters for Bipolar Microgrid Voltage Balancing: A Comprehensive Review of Architectures and Topologies. IEEE J. Emerg. Sel. Top. Power Electron. 2022, 11, 981–998. [Google Scholar] [CrossRef]
Ali, M.; Hossain, M.I.; Shafiullah, M. Fuzzy logic for energy management in hybrid energy storage systems integrated DC microgrid. In Proceedings of the 2022 International Conference on Power Energy Systems and Applications (ICoPESA), Singapore, 25–27 February 2022; pp. 424–429. [Google Scholar]
Jovcic, D.; Ahmed, K. High-Voltage Direct-Current Transmission; John Wiley & Sons, Ltd.: New York, NY, USA, 2015. [Google Scholar] [CrossRef]
Alzahrani, A.; Ramu, S.K.; Devarajan, G.; Vairavasundaram, I.; Vairavasundaram, S. A review on hydrogen-based hybrid microgrid system: Topologies for hydrogen energy storage, integration, and energy management with solar and wind energy. Energies 2022, 15, 7979. [Google Scholar] [CrossRef]
Sarwar, S.; Kirli, D.; Merlin, M.; Kiprakis, A.E. Major Challenges towards Energy Management and Power Sharing in a Hybrid AC/DC Microgrid: A Review. Energies 2022, 15, 8851. [Google Scholar] [CrossRef]
Wang, H.; Li, W.; Yue, Y.; Zhao, H. Distributed economic control for AC/DC hybrid microgrid. Electronics 2022, 11, 13. [Google Scholar] [CrossRef]
Ullah, Z.; Wang, S.; Lai, J.; Azam, M.; Badshah, F.; Wu, G.; Elkadeem, M.R. Implementation of various control methods for the efficient energy management in hybrid microgrid system. Ain Shams Eng. J. 2022, 101961. [Google Scholar] [CrossRef]
Boche, A.; Foucher, C.; Villa, L.F.L. Understanding Microgrid Sustainability: A Systemic and Comprehensive Review. Energies 2022, 15, 2906. [Google Scholar] [CrossRef]
Fotopoulou, M.; Rakopoulos, D.; Stergiopoulos, F.; Voutetakis, S. A Review on the Driving Forces, Challenges, and Applications of AC/DC Hybrid Smart Microgrids. In Smart Grids Technology and Applications; IntechOpen: London, UK, 2022. [Google Scholar] [CrossRef]
Prasad, T.N.; Devakirubakaran, S.; Muthubalaji, S.; Srinivasan, S.; Karthikeyan, B.; Palanisamy, R.; Bajaj, M.; Zawbaa, H.M.; Kamel, S. Power management in hybrid ANFIS PID based AC-DC microgrids with EHO based cost optimized droop control strategy. Energy Rep. 2022, 8, 15081–15094. [Google Scholar] [CrossRef]
Sajid, A.; Sabzehgar, R.; Rasouli, M.; Fajri, P. Control of Interlinking Bidirectional Converter in AC/DC Hybrid Microgrid Operating in Stand-Alone Mode. In Proceedings of the 2019 IEEE Milan PowerTech, Milan, Italy, 23–27 June 2019. [Google Scholar] [CrossRef]
Unamuno, E.; Barrena, J.A. Hybrid AC/DC microgrids—Part II: Review and classification of control strategies. Renew. Sustain. Energy Rev. 2015, 52, 1123–1134. [Google Scholar] [CrossRef]
Shen, X.; Tan, D.; Shuai, Z.; Luo, A. Control Techniques for Bidirectional Interlinking Converters in Hybrid Microgrids: Leveraging the advantages of both AC and DC. IEEE Power Electron. Mag. 2019, 6, 39–47. [Google Scholar] [CrossRef]
Ali, S.U.; Aamir, M.; Jafri, A.R.; Subramaniam, U.; Haroon, F.; Waqar, A.; Yaseen, M. Model predictive control—Based distributed control algorithm for bidirectional interlinking converter in hybrid microgrids. Int. Trans. Electr. Energy Syst. 2021, 31, e12817. [Google Scholar] [CrossRef]
Guo, Y.; Guo, Y.; Sun, H.; Xi, J.; Hao, Y. Overview of Improved Droop Control Methods of Hybrid AC/DC Microgrid Interlinking Converter. In Proceedings of the 2nd International Conference on Computer Engineering, Information Science & Application Technology (ICCIA 2017), Beijing, China, 8–11 September 2017. [Google Scholar] [CrossRef]
Malik, S.M.; Sun, Y.; Huang, W.; Ai, X.; Shuai, Z. A Generalized Droop Strategy for Interlinking Converter in a Standalone Hybrid Microgrid. Appl. Energy 2018, 226, 1056–1063. [Google Scholar] [CrossRef]
Wang, C.; Deng, C.; Pan, X. Line impedance compensation control strategy for multiple interlinking converters in hybrid AC/DC microgrid. IET Gener. Transm. Distrib. 2022. [Google Scholar] [CrossRef]
Rault, P.; Guillaud, X.; Colas, F.; Nguefeu, S. Investigation on interactions between AC and DC grids. In Proceedings of the 2013 IEEE Grenoble Conference, Grenoble, France, 16–20 June 2013. [Google Scholar] [CrossRef]
Martinez, S.; Torres, F.; Roa, C.; Lopez, E. Interaction between AC Grids and MTDC Systems Based on Droop Controllers. In Proceedings of the 2018 IEEE International Conference on Automation/XXIII Congress of the Chilean Association of Automatic Control (ICA-ACCA), Concepcion, Chile, 17–19 October 2018. [Google Scholar] [CrossRef]
Zhang, J.; Guo, D.; Wang, F.; Zuo, Y.; Zhang, H. Control strategy of interlinking converter in hybrid AC/DC microgrid. In Proceedings of the 2013 International Conference on Renewable Energy Research and Applications (ICRERA), Madrid, Spain, 20–23 October 2013. [Google Scholar] [CrossRef]
Adi, F.S.; Song, H.; Kim, J.S. Interlink Converter Controller Design based on System Identification of DC Sub-Grid Model in Hybrid AC/DC Microgrid. IFAC-PapersOnLine 2019, 52, 45–50. [Google Scholar] [CrossRef]
Watson, J.D.; Lestas, I. Control of Interlinking Converters in Hybrid AC/DC Grids: Network Stability and Scalability. IEEE Trans. Power Syst. 2021, 36, 769–780. [Google Scholar] [CrossRef]
Zhang, Z.; Fang, J.; Dong, C.; Jin, C.; Tang, Y. Enhanced Grid Frequency and DC-link Voltage Regulation in Hybrid AC/DC Microgrids through Bidirectional Virtual Inertia Support. IEEE Trans. Ind. Electron. 2022, 1–10. [Google Scholar] [CrossRef]
Awais, M.; Khan, L.; Ahmad, S.; Mumtaz, S.; Badar, R. Nonlinear adaptive NeuroFuzzy feedback linearization based MPPT control schemes for photovoltaic system in microgrid. PLoS ONE 2020, 15, e0234992. [Google Scholar] [CrossRef]
Awais, M.; Khan, L.; Badar, R.; Ahmad, S.; Mumtaz, S.; Ullah, S. NeuroFuzzy Full-Recurrent Hybrid B-Spline Wavelet Based Feedback Linearization Control for PMSG-WECS in a Grid-connected Hybrid Power System. In Proceedings of the 2021 International Bhurban Conference on Applied Sciences and Technologies (IBCAST), Islamabad, Pakistan, 12–16 January 2021. [Google Scholar] [CrossRef]
Awais, M.; Khan, L.; Ahmad, S.; Mumtaz, S.; Badar, R.; Ullah, S. Legendre-wavelet embedded NeuroFuzzy feedback linearization based controlscheme for PHEVs charging station in a microgrid. Turk. J. Electr. Eng. Comput. Sci. 2021, 29, 2046–2066. [Google Scholar] [CrossRef]
Awais, M.; Khan, L.; Ahmad, S.; Jamil, M. Feedback-Linearization-Based Fuel-Cell Adaptive-Control Paradigm in a Microgrid Using a Wavelet-Entrenched NeuroFuzzy Framework. Energies 2021, 14, 1850. [Google Scholar] [CrossRef]
Awais, M.; Khan, L.; Badar, R.; Mumtaz, S.; Ahmad, S.; Ullah, S. Wavelet-Hybridized NeuroFuzzy Feedback Linearization based Control Strategy for PHEVs Charging Station in a Smart Microgrid. In Proceedings of the 2020 International Symposium on Recent Advances in Electrical Engineering & Computer Sciences (RAEE & CS), Islamabad, Pakistan, 20–22 October 2020. [Google Scholar] [CrossRef]
Dehkordi, N.M.; Sadati, N.; Hamzeh, M. Robust backstepping control of an interlink converter in a hybrid AC/DC microgrid based on feedback linearisation method. Int. J. Control 2016, 90, 1990–2004. [Google Scholar] [CrossRef]
Zhang, B.; Cao, G.; Ren, C.; Wang, J.; Han, X. Autonomous control strategy of bidirectional AC/DC converter in low voltage hybrid microgrid. In Proceedings of the 2017 12th IEEE Conference on Industrial Electronics and Applications (ICIEA), Siem Reap, Cambodia, 18–20 June 2017. [Google Scholar] [CrossRef]
Saad, N.H.; El-Sattar, A.A.; Mansour, A.E.A.M. A novel control strategy for grid connected hybrid renewable energy systems using improved particle swarm optimization. Ain Shams Eng. J. 2018, 9, 2195–2214. [Google Scholar] [CrossRef]
Jiao, J.; Meng, R.; Guan, Z.; Ren, C.; Wang, L.; Zhang, B. Grid-connected Control Strategy for Bidirectional AC-DC Interlinking Converter in AC-DC Hybrid Microgrid. In Proceedings of the 2019 IEEE 10th International Symposium on Power Electronics for Distributed Generation Systems (PEDG), Xi’an, China, 3–6 June 2019. [Google Scholar] [CrossRef]
Yu, H.; Niu, S.; Zhang, Y.; Jian, L. An integrated and reconfigurable hybrid AC/DC microgrid architecture with autonomous power flow control for nearly/net zero energy buildings. Appl. Energy 2020, 263, 114610. [Google Scholar] [CrossRef]
Sinha, S.; Ghosh, S.; Bajpai, P. Power sharing through interlinking converters in adaptive droop controlled multiple microgrid system. Int. J. Electr. Power Energy Syst. 2021, 128, 106649. [Google Scholar] [CrossRef]
Jayalakshmi, N.S.; Nempu, P.B. Performance Enhancement of a Hybrid AC-DC Microgrid Operating with Alternative Energy Sources Using Supercapacitor. Int. J. Electr. Comput. Eng. Syst. 2021, 12, 67–76. [Google Scholar] [CrossRef]
Ahmed, M.; Meegahapola, L.; Datta, M.; Vahidnia, A. A Novel Hybrid AC/DC Microgrid Architecture with a Central Energy Storage System. IEEE Trans. Power Deliv. 2022, 37, 2060–2070. [Google Scholar] [CrossRef]
Antalem, D.T.; Muneer, V.; Bhattacharya, A. Decentralized control of islanding/grid-connected hybrid DC/AC microgrid using interlinking converters. Sci. Technol. Energy Transit. 2022, 77, 22. [Google Scholar] [CrossRef]
Hu, Y.; Li, W.; Xu, H.; Xu, G. An online learning control strategy for hybrid electric vehicle based on fuzzy Q-learning. Energies 2015, 8, 11167–11186. [Google Scholar] [CrossRef]
Dai, X.; Li, C.K.; Rad, A.B. An approach to tune fuzzy controllers based on reinforcement learning for autonomous vehicle control. IEEE Trans. Intell. Transp. Syst. 2005, 6, 285–293. [Google Scholar] [CrossRef]
Khan, L.; Badar, R. Hybrid adaptive neuro-fuzzy B-spline-based SSSC damping control paradigm using online system identification. Turk. J. Electr. Eng. Comput. Sci. 2015, 23, 395–420. [Google Scholar] [CrossRef]
Badar, R.; Khan, L. Online adaptive Legendre wavelet embedded neurofuzzy damping control algorithm. In Proceedings of the INMIC, Lahore, Pakistan, 19–20 December 2013; pp. 7–12. [Google Scholar] [CrossRef]
Werbos, P.J. Generalization of backpropagation with application to a recurrent gas market model. Neural Netw. 1988, 1, 339–356. [Google Scholar] [CrossRef]
Tutschku, K. Recurrent Multilayer Perceptrons for Identification and Control: The Road to Applications. 1995. Available online: https://www.researchgate.net/profile/Kurt-Tutschku/publication/229087124_Recurrent_multilayer_perceptrons_for_identification_and_control_The_road_to_applications/links/0912f50bdf57c64002000000/Recurrent-multilayer-perceptrons-for-identification-and-control-The-road-to-applications.pdf (accessed on 30 December 2022).
Sun, W.; Wang, Y. A recurrent fuzzy neural network based adaptive control and its application on robotic tracking control. Neural Inf. Process.-Lett. Rev. 2004, 5, 19–26. [Google Scholar]
Song, J.; Shi, H. Dynamic system modeling based on wavelet recurrent fuzzy neural network. In Proceedings of the 2011 Seventh International Conference on Natural Computation, Shanghai, China, 26–28 July 2011; Volume 2, pp. 766–770. [Google Scholar] [CrossRef]
Liu, Y.; Lin, Y.; Wu, S.; Chuang, C.; Lin, C. Brain Dynamics in Predicting Driving Fatigue Using a Recurrent Self-Evolving Fuzzy Neural Network. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 347–360. [Google Scholar] [CrossRef]
Pratama, M.; Lu, J.; Lughofer, E.; Zhang, G.; Er, M.J. An Incremental Learning of Concept Drifts Using Evolving Type-2 Recurrent Fuzzy Neural Networks. IEEE Trans. Fuzzy Syst. 2017, 25, 1175–1192. [Google Scholar] [CrossRef]
Watkins, C.J.C.H. Learning from Delayed Rewards. Ph.D. Thesis, University of Cambridge, Cambridge, UK, 1989. [Google Scholar]
IEEE Std 1547-2018 (Revision of IEEE Std 1547-2003); IEEE Standard for Interconnection and Interoperability of Distributed Energy Resources with Associated Electric Power Systems Interfaces. IEEE: Piscataway, NJ, USA, 2018; pp. 1–138. [CrossRef]
Huang, B.; Wang, J. Deep-reinforcement-learning-based capacity scheduling for PV-battery storage system. IEEE Trans. Smart Grid 2020, 12, 2272–2283. [Google Scholar] [CrossRef]
Mumtaz, S.; Ali, S.; Ahmad, S.; Khan, L.; Hassan, S.Z.; Kamal, T. Energy management and control of plug-in hybrid electric vehicle charging stations in a grid-connected hybrid power system. Energies 2017, 10, 1923. [Google Scholar] [CrossRef]
Sharma, V.; Haque, M.H.; Aziz, S.M. Energy cost minimization for net zero energy homes through optimal sizing of battery storage system. Renew. Energy 2019, 141, 278–286. [Google Scholar] [CrossRef]
Braeuer, F.; Rominger, J.; McKenna, R.; Fichtner, W. Battery storage systems: An economic model-based analysis of parallel revenue streams and general implications for industry. Appl. Energy 2019, 239, 1424–1440. [Google Scholar] [CrossRef]

Figure 1. General sketch of the proposed microgrid.

Figure 2. Flow chart for supervisory control system. * shows the new/required power.

Figure 3. Neuro Network Q-learning-based full recurrent adaptive NeuroFuzzy internal architecture.

Figure 4. QEN internal architecture.

Figure 5. Wavelet Neural Network.

Figure 6. Architecture of Full Recurrent Adaptive NeuroFuzzy System.

Figure 7. B-Spline membership function and Legendre wavelet.

Figure 8. Gaussian membership function with Mexican hat wavelet.

Figure 9. Gaussian membership function with Morlet wavelet.

Figure 10. Solar Irradiance Level (W/m

^{2}

) (left), Ambient Temperature Level (

^{°}

C) (right).

Figure 10. Solar Irradiance Level (W/m

^{2}

) (left), Ambient Temperature Level (

^{°}

C) (right).

Figure 11. Wind speed (m/s).

Figure 12.

Δ

P

_{A C}

evolution.

Figure 12.

Δ

P

_{A C}

evolution.

Figure 13.

Δ

Q

_{A C}

evolution.

Figure 13.

Δ

Q

_{A C}

evolution.

Figure 14. AC bus frequency evolution.

Figure 15. %

Δ

f in load current evolution.

Figure 15. %

Δ

f in load current evolution.

Figure 16. V

_{r m s}

load evolution.

Figure 16. V

_{r m s}

load evolution.

Figure 17. %

Δ

THD in load current evolution.

Figure 17. %

Δ

THD in load current evolution.

Figure 18. Maximum power point tracking of PV array evolution.

Figure 19. Maximum power point tracking of wind turbine evolution.

Figure 20. Spider chart of comparable parameters for interlinking converter controllers.

Table 2. Average values of

Δ P_{A C - b u s}

and

Δ Q_{A C - b u s}

variations for all controllers.

Table 2. Average values of

Δ P_{A C - b u s}

and

Δ Q_{A C - b u s}

variations for all controllers.

Control Scheme	P $_{AC - bus}$	Q $_{AC - bus}$
Control Scheme	( $μ$ Watts)	( $μ$ VARs)
aPID	1.0700	0.191100
FRNF-Leg wavelet based NNQLNF control	0.008600	0.008863
FRNF-Mor-W based NNQLNF control	0.008642	0.008794
FRNF-MHW based NNQLNF control	0.008472	0.008698

Table 3. Average values of %

Δ

f

_{Load}

and %

Δ

THD

_{Load}

variation in load current for all controllers.

Table 3. Average values of %

Δ

f

_{Load}

and %

Δ

THD

_{Load}

variation in load current for all controllers.

Control Scheme	% $Δ$ f $_{Load}$	% $Δ$ THD $_{Load}$
aPID	0.0025480	1.29900
FRNF-Leg wavelet based NNQLNF control	0.0006112	0.06519
FRNF-Mor-W based NNQLNF control	0.0006130	0.06551
FRNF-MHW based NNQLNF control	0.0005992	0.06423

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Awais, M.; Khan, L.; Khan, S.G.; Awais, Q.; Jamil, M. Adaptive Neural Network Q-Learning-Based Full Recurrent Adaptive NeuroFuzzy Nonlinear Control Paradigms for Bidirectional-Interlinking Converter in a Grid-Connected Hybrid AC-DC Microgrid. Energies 2023, 16, 1902. https://0-doi-org.brum.beds.ac.uk/10.3390/en16041902

AMA Style

Awais M, Khan L, Khan SG, Awais Q, Jamil M. Adaptive Neural Network Q-Learning-Based Full Recurrent Adaptive NeuroFuzzy Nonlinear Control Paradigms for Bidirectional-Interlinking Converter in a Grid-Connected Hybrid AC-DC Microgrid. Energies. 2023; 16(4):1902. https://0-doi-org.brum.beds.ac.uk/10.3390/en16041902

Chicago/Turabian Style

Awais, Muhammad, Laiq Khan, Said Ghani Khan, Qasim Awais, and Mohsin Jamil. 2023. "Adaptive Neural Network Q-Learning-Based Full Recurrent Adaptive NeuroFuzzy Nonlinear Control Paradigms for Bidirectional-Interlinking Converter in a Grid-Connected Hybrid AC-DC Microgrid" Energies 16, no. 4: 1902. https://0-doi-org.brum.beds.ac.uk/10.3390/en16041902

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaptive Neural Network Q-Learning-Based Full Recurrent Adaptive NeuroFuzzy Nonlinear Control Paradigms for Bidirectional-Interlinking Converter in a Grid-Connected Hybrid AC-DC Microgrid

Abstract

1. Introduction

Remark

2. System Overview and Model Description

Modeling and Description of Interlinking Inverter

3. Supervisory Control of Microgrid and Operation Strategy

3.1. Modes of Operation of Supervisory Control System

3.1.1. Mode of Power Deficit

Mode 1: WT, PV and BSS Fulfill the Load Demand

Mode 2: WT, PV, BSS and SC Fulfill the Load Demand

Mode 3: WT, PV, BSS, SC and SOFC Fulfill the Load Demand

Mode 4: WT, PV, BSS, SC, SOFC and Grid Fulfill the Load Demand

Mode 5: WT, PV, BSS, SC, SOFC, Grid and MT Fulfill the Load Demand

3.1.2. Modes of Excess Power

Mode 6: Excess Power Given to Electrolyzer

Mode 7: Excess Power Given to SC and Electrolyzer

Mode 8: Excess Power Given to SC, Grid, and Electrolyzer

Mode 9: Excess Power Given to SC, Grid, and Electrolyzer, while BSS Is Disconnected

4. Description and Modeling of Proposed Control Schemes

5. Neural Network Q-Learning-Based Full Recurrent Adaptive NeuroFuzzy Control

5.1. Back Propagation NN for Estimating Q * ( x , u )

6. Full Recurrent Adpative NeuroFuzzy Architectures

6.1. Variants of Antecedent Part

6.1.1. Gaussian Membership Function

6.1.2. B-Spline Membership Function

6.2. Variants of Consequent Part

6.2.1. Fuzzy Wavelet Neural Networks (NNs)

7. Proposed Full Recurrent Adaptive NeuroFuzzy Identifier

7.1. Optimization Algorithm

8. Full Recurrent Adaptive NeuroFuzzy Identifiers

8.1. FRNF-HBs-LegW Identifier

8.2. FRNF-MHW Identifier

8.3. FRNF-Mor-W Identifier

9. Exploration Policy and Action Modifier

10. Proposed Hybrid Adaptive Neural Network Q-Learning-Based Full Recurrent Adaptive NeuroFuzzy Control Paradigms

11. Implementation Procedure of Hybrid Adaptive NNQLNF Control Paradigms

12. PQ Control of Interlinking Inverter Using Hybrid Adaptive NN Based Q-Learning Full Recurrent Adaptive NeuroFuzzy Control Paradigms

13. Formulation of Control Problem

14. Results and Discussion

15. Conclusions

16. Future Work

Author Contributions

Funding

Conflicts of Interest

Abbreviations

Appendix A. Entire System Parameters

Appendix A.1. Parameters of VS-WECS

Appendix A.2. Parameters of SOFC

Appendix A.3. Parameters of PV

Appendix A.4. Parameters of Charging Station

Appendix A.5. Modeling and Parameters of Battery

Appendix A.6. Parameters of Electrolyzer

Appendix A.7. Parameters of Microturbine

Appendix A.8. Parameters of Utility Grid

Appendix A.9. Parameters of Interlinking Inverter

Appendix A.10. Adaptive PID Control System

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5.1. Back Propagation NN for Estimating $Q^{*} (x, u)$