6.1. Basic Methods of Structural Decomposition in Design with Luts and Embs
Field-programmable gate arrays are widely used for implementing circuits of various digital systems [
12,
15,
69,
92]. To implement an FSM circuit, the following internal resources of FPGA chip can be used: look-up table (LUT) elements, embedded memory blocks (EMBs), programmable flip-flops, programmable interconnections, input-output blocks, and block of synchronization. LUTs and flip-flops form configurable logic blocks (CLBs). The “island-style” architecture is used in the majority of FPGAs [
17,
93,
94].
A LUT is a block having
inputs and a single output [
95,
96,
97,
98]. If a Boolean function depends on up to
arguments [
67], then the corresponding circuit only includes a single LUT. However, the number of LUT inputs is very limited [
95,
96,
97]. Due to it, the methods of functional decomposition are used to implement the FPGA-based FSM circuits [
99,
100,
101,
102,
103]. As a result, the FSM circuits have a lot of logic levels and a complex systems of interconnections [
29]. Such circuits resemble programs that are based on intensive use of “go-to” operators [
104]. Using terminology from programming, we can say that the functional decomposition produces the “spaghetti-type” LUT-based FSM circuits.
Modern FPGAs include a lot of configurable embedded memory blocks [
95,
96]. These CLBs allow for implementing systems of regular functions [
28]. If at least a part of the FSM circuit is implemented using EMBs, then the characteristics of this circuit can be significantly improved [
16]. Because of it, there are a lot of design methods targeting EMB-based FSMs [
16,
105,
106,
107,
108,
109,
110,
111,
112,
113,
114,
115]. In [
28], there is the survey of various methods of EMB-based FSM design. However, very often, practically all available EMBs are used for implementing the operational blocks of digital systems. Accordingly, the EMB-based FSM design methods can only be applied if a designer has some “free” EMBs.
An EMB can be characterized by a pair
, where
is a number of address inputs and
is a number of memory cell outputs. A single EMB can keep a truth table of an SBF including up to
Boolean functions depended on up to
arguments [
116]. A pair
defines a configuration of an EMB with the constant total number of bits (size of EMB):
The parameters
and
could be defined by a designer [
66]. It means that EMBs are configurable memory blocks [
67]. The following configurations exist for modern EMBs [
95,
96]:
. Accordingly, modern EMBs are very flexible and can be tuned to meet characteristics of a particular FSM. This explains the existence of a wide spectrum of EMB-based design methods [
16,
105,
106,
107,
108,
109,
110,
111,
112,
113,
114,
115].
If the condition
holds, then a single EMB implements an FSM circuit [
28]. If (
39) is violated, then an FSM circuit could be implemented as: (1) a homogenous network of EMBs or (2) a heterogeneous network where LUTs and EMBs are used together [
16,
114].
There are three approaches for implementing combinational parts of CLB-based FSMs. They are the following: (1) using only LUTs; (2) using only EMBs; and, (3) using the heterogeneous approach, when both LUTs and EMBs are applied [
28].
One of the most crucial steps in the CLB-based design flow is the technology mapping [
29,
117,
118]. The outcome of the technology mapping is a network of interconnected CLBs representing an FSM circuit. This step largely determines the resulting characteristics of an FSM circuit. These characteristics are strongly interrelated.
A chip area occupied by a CLB-based FSM circuit is mostly determined by the number of CLBs and the system of their interconnections. Obviously, to reduce the area, it is necessary to reduce the CLB count in an FSM circuit. As follows from [
119], the more LUTs are included into an FSM circuit, the more power it consumes. Now, “process technology has scaled considerably …with current design activity at 14 and 7 nm. Due to it, interconnection delay now dominates logic delay” [
18]. As noted in [
120], the interconnections are responsible for the consume up to 70% of power. Accordingly, it is very important to reduce the amount of interconnections to improve the characteristics of FSM circuits. All of this can be done using methods of structural decomposition.
As follows from (
39), an FSM circuit can be implemented by a single EMB if the following conditions hold for a configuration
:
As a rule, the modern EMBs are synchronous blocks. Hence, there is no need in an additional register to keep FSM state codes [
28].
Figure 18 shows a trivial EMB-based circuit of Mealy FSM.
To design such a circuit, it is necessary to [
28]: (1) execute the state assignment; (2) construct a DST on the base of an STT; and, (3) create the truth table corresponding to the DST. This truth table has
columns containing an address of a particular cell. Each cell has
bits. Transitions from any state
are represented by
rows of the truth table [
28]:
The following parameters can be found for
(
Table 2): the number of inputs
, and the number of state variables
. Accordingly, using (
42) gives
. If an input
is insignificant for transitions from a state
, then there are the same values of IMFs and outputs for cells with addresses having either
or
. This rule is illustrated by
Table 9 with the transitions from state
from
Table 2.
In
Table 9, the number of a cell is shown in the column
q. The column
h is added to compare
Table 2 and
Table 9. The even rows of
Table 9 correspond to
, and the odd rows correspond to
.
The transition from LUTs to EMBs is similar to the transition from gates to large scale integration circuits. This transition improves all the characteristics of an FSM circuit, namely, the chip area that is occupied by FSM circuit, the FSM performance and power consumption. If conditions (
40) and (
41) are violated, then methods of structural decomposition can be used [
21]. In this case, an FSM circuit is represented as a network of EMBs and LUTs.
The analysis of numerous literature has shown that the following methods of structural decomposition are used in EMB-based FSM design:
The replacement of inputs
by additional variables
leading to
FSMs [
16,
107,
108,
109,
110,
111,
112,
113].
The maximum encoding of collections of outputs leading to
FSMs [
28].
Mixed encoding of outputs leading to
FSMs [
121].
The encoding of product terms leading to
FSMs [
122].
Following the notation of [
21], we denote, as LUTer, a block consisting of LUTs and. as EMBer, a block consisting of EMBs. The structural diagram of
Mealy FSM is shown in
Figure 19.
In
FSM, the LUTerP implements SBF (
23), the EMBer contains a truth table of SBFs (
25) and (
26). As follows from
Figure 18 and
Figure 19, the outputs
are synchronized. This is necessary to stabilize FSM outputs [
42]. The
Mealy FSM can be used if the following condition holds:
Clearly, the
FSM (
Figure 19) uses an idea of the two-level MCU (
Figure 4) in an FPGA environment. The state variables create the address part of microinstructions. The number of EMBs in EMBer is determined as
To diminish the value of
, the maximum encoding of COs
can be used [
21]. The replacement of inputs can be used together with this approach. This results in the
Mealy FSM (
Figure 20).
In
FSM, the EMBer implements SBFs (
23) and (
31). The LUTerO transforms codes
into outputs
. To do it, SBF (
17) is implemented by LUTerO. Now, the number of EMBs in EMBer is determined as
The value of
is determined by (
15).
If the condition
holds, then a single-level circuit of LUTerO includes up to
N LUTs. If (
46) is violated, then a mixed encoding of outputs [
121] can be used. The idea of this approach is the following.
Let it be
,
, and
. The analysis of these values shows that the condition (
46) is violated. Let the set of COs include COs
and
. If we eliminate
from
, then
. Now, there is
. The eliminated outputs form a set
. The set of outputs is represented as
, where
. This leads to
Mealy FSM (
Figure 21).
In
FSM, the outputs
are represented by SBF (
26). The outputs
are represented by (
17). The outputs
are represented by one-hot codes, the outputs
by maximum binary codes. Because of that, this is a mixed encoding of outputs.
In [
121], there is proposed a method allowing to create such a partition of the set
O. It allows for eliminating the minimum possible number elements of
O to create the set
.
This approach can be used to diminish the number of CLBs in the circuit of LUTerO. For example, there is
for LUTs of Virtex 7 [
96]. If
, then the number of LUTs in the circuit of LUTerO is equal to
N. However, the CLB can be organized as two LUTs having five shared inputs. If the mixed encoding of outputs gives the set
with
, then the number of LUTs in LUTerO is determined as
. The closer the values of
N and
are, the greater the saving in the number of CLBs.
Two approaches are possible for implementing EMB-based Mealy FSMs [
122]. In both cases, the binary codes
encode the terms
. These codes have
bits. The variables
are used for encoding of terms, where
. The value of
is determined by (
32). The system
represents the block of terms [
122]. This system can be implemented as either the network of LUTs (
Figure 22a) or the network of EMBs (
Figure 22b).
Both methods should be used. Finally, the method leading to the minimum hardware should be selected [
122].
6.2. Structural Decomposition in Lut-Based Design
As mentioned in [
12], EMBs are widely used for implementing various blocks of digital systems. Accordingly, it is quite possible that only LUTs can be used for implementing FSM circuits. The methods of structural decomposition may be used in LUT-based FSMs [
21]. They are used to improve LUT counts (and other characteristics) of LUT-based
P Mealy FSMs (
Figure 23).
In
P FSMs, the LUTerD implements SBF (
5) and the LUTerO implements SBF (
6). Each function
is represented by a SOP having
literals. In the best case, there are
LUTs in the circuit of LUTerD and
N LUTs in the circuit of LUTerO. The following relation determines this case:
If (
47) is violated, then a
P FSM is represented by a multi-level circuit. To improve LUT count of such circuits, the model of
FSM can be used.
This approach is proposed in [
123]. It leads to a three-level circuit that is shown in
Figure 24.
In
FSM, the LUTerP implements system (
23). It generates additional variables
replacing inputs
. The LUTerD generates input memory functions that are represented by (
25). The LUTerZ generates variables
used for encoding of collections of outputs. This block implements SBF (
31). The LUTerO implements outputs
that are represented by SBF (
17).
The method of synthesis of LUT-based
FSM includes the following steps [
123]:
Executing the replacement of inputs.
Executing the state assignment optimizing (
23).
Deriving collections of outputs from the STT.
Executing the encoding of COs.
Creating the DST of FSM.
Deriving SBFs (
25) and (
31) from the DST.
Implementing FSM circuit using particular LUTs.
In [
123], the results of experiments conducted to compare the characteristics of various models of LUT-based FSMs are shown. The standard benchmarks [
124] were used for investigation. These benchmarks are Mealy FSMs; they are represented in KISS2 format.
Table 10 contains the characteristics of these benchmark FSMs.
To conduct experiments [
123], the CAD tool Vivado (ver. 2019.1) [
125] was used with the target chip XC7VX690T2FFG1761 (Xilinx Virtex 7) [
126]. There is
for LUTs of Virtex 7 family.
Four other methods were compared with
FSMs. They were Auto of Vivado, one-hot of Vivado, JEDI [
39,
127], and DEMAIN [
128]. The benchmarks were divided by five categories. To do it, the values of
and
were used. If
, then benchmarks belong to category 0; if
, it is the category 1; if
, then it defines the category 2; if
, then benchmarks belong to category 3; finally, the relation
, determines category 4.
Table 11 (the LUT counts) and
Table 12 (the maximum operating frequency) represent the results of investigations [
123]. As follows from
Table 11, MPY-based FSMs have minimum number of LUTs. As follows from
Table 12, MPY-based FSMs are the slowest. However, this disadvantage is reduced with the increase in the number of category.
6.3. New Methods of Structural Decomposition
In all thw discussed methods, only maximum state codes are used when the value of
is determined by (
4). In [
129,
130,
131], there is a method of twofold state assignment proposed. In this case, any state
has two codes. The code
determines the state as an element of the set
S. The code
defines the state as an element of some partition class.
To use the method [
129,
130], it is necessary to construct a partition
of the set of states
S. For each class
, the following condition holds:
In (
48), the symbol
denotes the length (the number of bits) of a code
for states
; the symbol
defines the number of inputs
determining the transitions from states
.
Each class
determines a DST
with transitions from states
. This table includes inputs from the set
, outputs from the set
, and IMFs that are equal to 1 for transitions from states
. These IMFs form a set
. A DST
determines the SBFs
The variables encode states as elements of the set .
This approach determines
Mealy FSMs. The logic circuits of
FSMs include three levels of logic blocks.
Figure 25 showsn the structural diagram of
FSM.
In
Mealy FSM, the LUTerk (
) implements SBF (
49) and (
50). The LUTerTO implements the following SBFs:
The LUTer
transform state codes
into state codes
. To do it, the following SBF is implemented:
The structural diagram (
Figure 25) determines a case of the one-hot encoding of outputs [
130]. In [
129], there was a method proposed combining the twofold state assignment with the maximum encoding of COs. This leads to
Mealy FSM, as shown in
Figure 26.
In
FSM, the SBFs (
50) and (
52) are replaced by SBFs:
Because of (
48), each function (
49), (
50), and (
54) are implemented as a single-level circuit; moreover, each function is implemented by a circuit having exactly one LUT. If there is
then it is enough a single LUT to implement a circuit for each determined by (
52) and (
54). If there is
then the circuit of the LUTer
is a single-level one. If the condition (
46) holds, then there are up to
N LUTs in the circuit of LUTerO.
In the best case, the conditions (
46), (
48), (
56), and (
57) are true. This best case determines the three-level LUT-based circuits of both
and
Mealy FSMs. Logic circuits of
FSMs consume fewer LUTs than equivalent
FSMs, as shown in [
129]. The experimental results [
130] show that the logic circuits of
FSMs consume fewer LUTs than this is for the equivalent
P Mealy FSMs.
Using the twofold state assignment improves the characteristics of EMB-based FSMs, as shown in [
122]. In [
122], this method is used to improve LUT count in
Mealy FSMs (
Figure 22b). The method is based on finding a partition
of the set of terms
F. For each class of this partition, the following condition holds:
The value of can be found as , where is a number of elements in the set .
The binary codes
encode the classes
. These codes have
bits, where
The code of a term
is represented as
In (
60),
is a code of a term as an element of the set
, * is a sign of concatenation. To encode terms, the variables
are used. To use free outputs of EMB, the set
D is represented as
and the set
O is represented as
. The classes of
are encoded using variables
. Now, the
FSM is represented, as shown in
Figure 27.
In [
122], the results of experiments are shown. The following models were compared:
P FSMs (
Figure 23),
FSMs (
Figure 19),
FSMs (
Figure 22b), and the proposed approach (
Figure 27).
Table 13 (LUT counts),
Table 14 (the maximum operating frequency), and
Table 15 (the consumed power) show the results of experiments for some benchmarks [
124].
The experiments have been conducted for the benchmarks [
124], the evolution board with chip XC7VX690TFFG1761-2 [
126] and CAD tool Vivado [
125]. It is enough a single EMB of Virtex 7 to implement the logic circuits for any from 33 benchmarks [
124], as shown in [
122]. A network of LUTs and EMBs is used to implement circuits for other benchmarks.
It is possible to improve the characteristics of LUT-based FSM circuits using the transformation of objects [
21]. For example, there is a structural diagram of
Mealy FSM shown in
Figure 28 [
132].
In
FSM, the LUTerZV implements SBFs (
35) and
The LUTerT generates the functions from the SBF (
37) and the LUTerO implements SBF (
17). This approach is used to: (1) improve the operating frequency of multi-level
FSMs and (2) reduce the LUT count as compared with
P FSMs if the condition (
47) is violated.
If condition (
47) is violated for functions
, then the LUTerZV is represented by a multi-level circuit. To improve the characteristics of
FSMs, the following approach is proposed in [
132].
The set
S is divided by classes
, such that the condition (
48) holds for each class of
. Next, states
are encoded by codes
having the minimum possible number of bits. The following SBFs should be implemented [
132]: (
54), (
55), (
17), (
37), and
This approach leads to
FSMs. The circuit of
FSM includes three levels of LUTs (
Figure 29).
In
FSM, the LUTerk (
) implements SBFs (
54) and (
62). The LUTerZV generates functions
and
. They are represented by SBFs (
55) and (
63). The LUTerO implements SBF (
17), the LUTer
generates functions (
37).
There are experimental results in [
132] that are obtained using the CAD tool Vivado [
125] and the evolution board with Virtex 7 FPGA chip [
126]. The following characteristics have been compared: the LUT counts (
Table 16), maximum operating frequency (
Table 17), and area-time products (
Table 18).
As follows from
Table 16, the
FSMs require fewer LUTs than other investigated methods. The
FSMs consume more LUTs (8.84%) when compared to
FSMs. However, other FSMs are based on functional decomposition. Their circuits require more LUTs than for
FSMs. The gain increases along with the growth of the category number.
As follows from
Table 17, the
-based FSMs have the highest operating frequency as compared to other investigated methods. The following can be found from
Table 18: the
-based FSMs produce circuits with better area-time products then it is for other investigated methods. Starting from average FSMs,
-based circuits have better area-time products.
Hence, using the methods of structural decomposition allows for improving characteristics of FPGA-based FSMs. Three-level circuits improve the LUT count and two-level circuits improve the performance. These methods can be applied together with other optimization methods used in FSM design [
21].