Next Article in Journal
Characterizing Topics in Social Media Using Dynamics of Conversation
Next Article in Special Issue
On the Depth of Decision Trees with Hypotheses
Previous Article in Journal
Probabilistic Autoencoder Using Fisher Information
Previous Article in Special Issue
The Feature Description of Formal Context Based on the Relationships among Concept Lattices
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Decision Rules Derived from Optimal Decision Trees with Hypotheses

1
Department of Computer Science, College of Computer and Information Sciences, Jouf University, Sakaka 72441, Saudi Arabia
2
Intel Corporation, 5000 W Chandler Blvd, Chandler, AZ 85226, USA
3
Department of Computer Science, School of Mathematics and Computer Science, Institute of Business Administration, University Road, Karachi 75270, Pakistan
4
Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
5
Institute of Computer Science, Faculty of Science and Technology, University of Silesia in Katowice, Będzińska 39, 41-200 Sosnowiec, Poland
*
Author to whom correspondence should be addressed.
Submission received: 18 September 2021 / Revised: 14 October 2021 / Accepted: 2 December 2021 / Published: 7 December 2021
(This article belongs to the Special Issue Rough Set Theory and Entropy in Information Science)

Abstract

:
Conventional decision trees use queries each of which is based on one attribute. In this study, we also examine decision trees that handle additional queries based on hypotheses. This kind of query is similar to the equivalence queries considered in exact learning. Earlier, we designed dynamic programming algorithms for the computation of the minimum depth and the minimum number of internal nodes in decision trees that have hypotheses. Modification of these algorithms considered in the present paper permits us to build decision trees with hypotheses that are optimal relative to the depth or relative to the number of the internal nodes. We compare the length and coverage of decision rules extracted from optimal decision trees with hypotheses and decision rules extracted from optimal conventional decision trees to choose the ones that are preferable as a tool for the representation of information. To this end, we conduct computer experiments on various decision tables from the UCI Machine Learning Repository. In addition, we also consider decision tables for randomly generated Boolean functions. The collected results show that the decision rules derived from decision trees with hypotheses in many cases are better than the rules extracted from conventional decision trees.

1. Introduction

Decision trees are commonly used as classifiers, as an algorithmic tool for solving various problems, and a means of representing information [1,2,3]. They form a part of statistical learning, which refers to a vast set of tools for understanding data [4]. Conventional decision trees studied in test theory [5], rough set theory [6,7,8], and many other areas of computer science exploit queries based on a single attribute. In [9,10,11,12], we considered decision trees that also exploit queries based on hypotheses. Such decision trees are analogous to the tools that have been analyzed in exact learning [13,14,15], where both membership and equivalence queries are used.
In the present paper, we analyze decision trees with hypotheses as a means for representation of information. We design dynamic programming algorithms to optimize such trees corresponding to two cost functions. For various decision tables, we build optimal decision trees and analyze the length and coverage of decision rules extracted from the constructed trees to study which kinds of decision trees are more suitable for the representation of information.
Let us have a decision table T that contains n attributes. We can use two types of queries in the decision trees for this table. We can ask about the value of an attribute. As a result, we obtain this value. We can choose an n-tuple of possible values of attributes and formulate a hypothesis that it is really the tuple of values of the considered attributes. As a result, we either obtain confirmation of the hypothesis or a counterexample. We call this hypothesis proper if the considered n-tuple is a row of the table T.
We studied the following five types of decision trees:
  • Using attributes.
  • Using hypotheses.
  • Using both attributes and hypotheses.
  • Using proper hypotheses.
  • Using attributes as well as proper hypotheses.
We analyzed four different cost functions for the decision trees: the depth, the number of realizable nodes, the number of realizable leaf nodes, and the number of internal nodes. We define a node as realizable relative to a given decision table if a computation can pass through this node for at least one row of the considered decision table.
Previously, we proposed a dynamic programming algorithm in [12] for each of these four cost functions. When we give a decision table and a type of decision tree to this algorithm, it returns the minimum cost of a decision tree of a given type for the given table. The results of the computer experiments show that decision trees with hypotheses can have less complexity than conventional decision trees. It means that they can be used as a means for the representation of information.
The present paper has two aims. The first aim is to construct optimal decision trees with hypotheses. We know that such trees can be used for the representation of information (especially decision trees of type 3). However, the algorithms from [12] were designed only to find the complexity of optimal trees. The algorithms for the two cost functions (the depth and the number of internal nodes) can be modified to build optimal decision trees. Unfortunately, we cannot use a similar approach to build optimal decision trees of types 2 and 3 relative to the number of realizable nodes and optimal decision trees of type 2 relative to the number of realizable leaf nodes.
The second aim is to study the length and coverage of decision rules extracted from the optimal decision trees. Decision rules can be considered one of the simplest and most understandable models for the representation of information. Deriving decision rules from decision trees is a well-known approach. We want to confirm that the decision rules derived from decision trees with hypotheses can be better than the rules derived from conventional decision trees.
For computer experiments, we chose eight decision tables from the UCI ML Repository [16] as well as 100 randomly generated Boolean functions that contain n variables ( n = 3 , , 6 ). We constructed optimal (relative to the depth or to the number of internal nodes) decision trees of five types for these tables. Then we analyzed the length and coverage of decision rules extracted from these trees. For a decision tree with hypotheses for some rows of the considered decision table, it can be more than one derived decision rule that covers the row. In this case, for each row we chose the best rule. The results of the computer experiments show that the decision rules derived from the decision trees with hypotheses, in many cases, are better than the ones derived from conventional decision trees.
The novelty of the paper is directly related to its two main contributions: (i) the modification of dynamic programming algorithms described in [12] such that the modified algorithms can now construct optimal decision trees of five types relative to two cost functions and (ii) the experimental confirmation that the decision rules derived from the decision trees with hypotheses can be more suitable for the representation of information than the decision rules derived from conventional decision trees.
To make the paper more understandable, we add to it slightly modified definitions and one algorithm from [12].
We present the remaining parts of the paper as follows: important notions in Section 2 and Section 3, the decision tree optimization based on dynamic programming algorithms in Section 4, Section 5 and Section 6, experimental results in Section 7, and short conclusions in Section 8.

2. Decision Tables

We can define a decision table T as follows:
  • It is a rectangular table that contains n ( n 1 ) columns.
  • Its columns are tagged by conditional attributes f 1 , , f n .
  • Its columns’ values are from the set ω = { 0 , 1 , 2 , } .
  • Its rows are unique.
  • Its rows are tagged by numbers from ω interpreted as decisions.
  • Its rows are considered as tuples of values of the conditional attributes.
When a decision table does not have any rows, then we call it an empty table. We define a degenerate table as a decision table which is either empty or has all of its rows tagged by the same decision.
Furthermore, we consider the following notation for T:
  • F ( T ) is the set of conditional attributes, i.e., F ( T ) = { f 1 , , f n } .
  • D ( T ) is the set of decisions that are attached to rows.
  • E ( T , f i ) is the set of f i ’s values where f i F ( T ) .
  • E ( T ) is the set of conditional attributes of T for which E ( T , f i )   2 .
Let S = { f i 1 = δ 1 , , f i m = δ m } be a system of equations where m ω , f i 1 , , f i m F ( T ) , and δ 1 E ( T , f i 1 ) , , δ m E ( T , f i m ) (S is empty when m = 0 ). We denote by T S the subtable of T consisting of all rows of T that have values δ 1 , , δ m when they intersect with the columns f i 1 , , f i m . Such subtables are called separable subtables of T.

3. Decision Trees and Rules

In this section, we define notions of decision trees and rules related with a given nonempty decision table T that contains n conditional attributes f 1 , , f n . Let us consider the decision trees in connection with two types of queries. The first type of query is to ask the value of an attribute f i F ( T ) = { f 1 , , f n } . The answer of this query is from the set A ( f i ) = { { f i = δ } : δ E ( T , f i ) } . The second type of query is to ask about a hypothesis over T in the form of H = { f 1 = δ 1 , , f n = δ n } where δ 1 E ( T , f 1 ) , , δ n E ( T , f n ) . The answer of this query is from the set A ( H ) = { H , { f 1 = σ 1 } , , { f n = σ n } : σ 1 E ( T , f 1 ) { δ 1 } , . . . , σ n E ( T , f n ) { δ n } } . If the answer is H, then the hypothesis is true. Other answers are counterexamples. Note that H is a proper hypothesis for T if ( δ 1 , , δ n ) is a row of the table T.
A decision tree over T is a tagged finite directed rooted tree, where the following hold:
  • We label each leaf node by a number from the set D ( T ) { 0 } .
  • We label each internal node by a hypothesis over T or an attribute from the set F ( T ) . In both cases, there is exactly one edge leaving this node for each answer, either from the set A ( H ) in the case of hypothesis query or from the set A ( f i ) in the case of attribute query, and no other edges exit from this node.
Let us consider a decision tree Γ over T. If v is a node of Γ , then we define an equation system S ( Γ , v ) over T corresponding to the node v. We denote the directed path from the root of Γ to the node v as ξ . When ξ does not have any internal nodes, then S ( Γ , v ) is the empty system. On the other side, if it has internal nodes, then S ( Γ , v ) is the union of the systems of equation attached to the edges in ξ .
We consider a decision tree Γ over T as a decision tree for T if, for any node v of Γ , the following hold:
  • When the node v is a leaf node, then the subtable T S ( Γ , v ) is degenerate and vice versa.
  • When v is a leaf node and the subtable T S ( Γ , v ) is empty, then we label the node v by the decision 0.
  • When v is a leaf node and the subtable T S ( Γ , v ) is nonempty, then we label the node v by the decision attached to all rows of T S ( Γ , v ) .
An arbitrary directed path ξ from the root to a leaf node v in Γ is called a complete path in Γ . Denote T ( ξ ) = T S ( Γ , v ) .
The depth of a decision tree Γ is analogous to its time complexity. We denote its depth by h ( Γ ) , which is defined as the maximum number of internal nodes in a complete path in the tree. Similarly, the number of internal nodes in a decision tree Γ (denoted by L w ( Γ ) ) is analogous to its space complexity.
Let Γ be a decision tree for T, ξ be a complete path in Γ such that T ( ξ ) is a nonempty table, and the leaf node of the path ξ be tagged with the decision d. We now define a system of equations S ( ξ ) . S ( ξ ) is the empty system in the case of no internal nodes in ξ . Let us assume now that ξ contains at least one internal node. We now transform systems of equations attached to edges leaving internal nodes of ξ . If an edge is tagged with an equation system containing exactly one equation, then we not change this system. Let an edge e leaving a internal node v be tagged with an equation system containing more than one equation. Then v is tagged with a hypothesis H and e is tagged with the equation system H. (Note that if such a node exists, then it is the last internal node in the complete path ξ .) In this case, we remove from the equation system H attached to e all equations of the kind f j = σ such that f j E ( T S ( Γ , v ) ) . Then, we can obtain S ( ξ ) as the union of new equation systems corresponding to edges in the path ξ . One can show that T ( ξ ) = T S ( ξ ) .
We correspond to the complete path ξ the decision rule,
f i = δ S ( ξ ) ( f i = δ ) d .
We denote this rule by r u l e ( ξ ) . The number of equations in the equation system S ( ξ ) is called the length of the rule r u l e ( ξ ) and is denoted l ( r u l e ( ξ ) ) . The number of rows in the subtable T S ( ξ ) is called the coverage of the rule r u l e ( ξ ) and is denoted c ( r u l e ( ξ ) ) .
Denote Ξ ( T , Γ ) the set of complete paths ξ in Γ such that the table T ( ξ ) is nonempty and R o w s ( T ) the set of rows of the decision table T. For a row r R o w s ( T ) , we denote by l ( r , T , Γ ) the minimum length of a rule r u l e ( ξ ) such that ξ Ξ ( T , Γ ) and r is a row of the subtable T S ( ξ ) , and we denote by c ( r , T , Γ ) the maximum coverage of a rule r u l e ( ξ ) such that ξ Ξ ( T , Γ ) and r is a row of the subtable T S ( ξ ) .
We use the following notation:
l ( T , Γ ) = r R o w s ( T ) l ( r , T , Γ ) | R o w s ( T ) | , c ( T , Γ ) = r R o w s ( T ) c ( r , T , Γ ) | R o w s ( T ) | .

4. Construction of Directed Acyclic Graph Δ ( T )

Let us consider a nonempty decision table T that has n conditional attributes f 1 , , f n . The Algorithm 1 A D A G is used for the construction of a directed acyclic graph (DAG) Δ ( T ) . Consequently, this DAG is used for the construction of optimal decision trees. Some separable subtables of the table T are the nodes of this DAG. We process one node during each iteration of the algorithm. We begin by the graph consisting of unprocessed one node T and end by processing all nodes of the graph. This algorithm was described and used in [9,10,12]. It is a special version of the more general algorithm considered in [17].
Algorithm 1 A D A G (building of DAG Δ ( T ) ).
Input: A nonempty decision table T that has n conditional attributes f 1 , , f n .
Output: Directed acyclic graph Δ ( T ) .
  • Build the graph consisting of one node T that is not tagged as processed.
  • Check the processing of all nodes of the graph is completed or not. If yes, then the algorithm halts and returns the resulting graph as Δ ( T ) . Otherwise, select a node (table) Θ which is yet unprocessed.
  • Check node Θ is degenerate or not.
    (a)
    If yes, then tag the node Θ as processed and move to step 2.
    (b)
    If no, then draw a bundle of edges from the node Θ for each f i E ( Θ ) . Let E ( Θ , f i ) = { a 1 , , a k } . Then draw k edges from Θ and attach to these edges systems of equations { f i = a 1 } , , { f i = a k } . These edges enter nodes Θ { f i = a 1 } , , Θ { f i = a k } , respectively. In case some of the nodes Θ { f i = a 1 } , , Θ { f i = a k } are not available in the graph, then add these nodes to the graph. Tag the node Θ as processed and move to step 2.

5. Construction of Decision Trees with Minimum Depth

Let us consider a nonempty decision table T that contains n conditional attributes f 1 , , f n and k { 1 , , 5 } . We can use the DAG Δ ( T ) to construct a decision tree Γ ( k ) ( T ) of the type k with the minimum depth for the decision table T. For this purpose, we have to construct, corresponding to each vertex Θ of Δ ( T ) , a decision tree Γ ( k ) ( Θ ) of the type k with minimum depth for the table Θ . It is necessary not only consider subtables corresponding to the nodes of Δ ( T ) but also empty subtable Λ of T as well as subtables T r containing only one row r of T, which are not nodes of Δ ( T ) . The idea is to start with these special subtables as well as leaf nodes of Δ ( T ) that are degenerate separable subtables of T. In this way, we move step wise in a bottom up fashion to the table T.
Let us consider the case when Θ is a leaf node of Δ ( T ) or Θ = T r for a row r of the table T, or T = Λ . If Θ is nonempty, then Γ ( k ) ( Θ ) has only one node that is tagged by a decision which is attached to all rows of Θ . Otherwise, it is tagged with 0.
Let us consider other case when Θ is an internal node of Δ ( T ) and the construction of the decision tree Γ ( k ) ( Θ ) is already completed for each child Θ of Θ . Based on these trees, a decision tree for Θ having the minimum depth can be constructed that uses decision trees of the type k for the subtables corresponding to the children of the root. In this tree, the root can be tagged as follows:
  • By an attribute from F ( T ) (such decision tree can be designated as Γ a ( k ) ( Θ ) ).
  • By a hypothesis over T (such decision tree can be designated as Γ h ( k ) ( Θ ) ).
  • By a proper hypothesis over T (such decision tree can be designated as Γ p ( k ) ( Θ ) ).
The set E ( Θ ) is nonempty because Θ is nondegenerate. Now, three procedures for the construction of the trees Γ a ( k ) ( Θ ) , Γ h ( k ) ( Θ ) , and Γ p ( k ) ( Θ ) are described.
We now concentrate on a decision tree Γ ( f i ) for the node Θ , where the root is tagged by an attribute f i E ( Θ ) . For each δ E ( T , f i ) , there exists an edge that exits the root and enters the root of the decision tree Γ ( k ) ( Θ { f i = δ } ) . We tag this edge by the equation system { f i = δ } . It is obvious that
h ( Γ ( f i ) ) = 1 + max { h ( Γ ( k ) ( Θ { f i = δ } ) ) : δ E ( T , f i ) } .
One can easily prove using (1) that Γ ( f i ) is a decision tree with the minimum depth for Θ such that the root of this tree is tagged by the attribute f i and it uses decision trees of the type k for the subtables corresponding to the children of the root.
It is obvious not to consider attributes f i F ( T ) E ( Θ ) . The reason is that for such f i , we can find δ E ( T , f i ) with Θ { f i = δ } = Θ . Therefore, we cannot construct an optimal tree for Θ based on f i .
Construction of the tree Γ a ( k ) ( Θ ) . We build the set E ( Θ ) . For any f i E ( Θ ) , construct the decision tree Γ ( f i ) and choose among these trees a tree with the minimum depth. Return this tree as Γ a ( k ) ( Θ ) .
Let us consider a hypothesis H = { f 1 = δ 1 , , f n = δ n } over T. We call this hypothesis admissible for Θ and an attribute f i F ( T ) = { f 1 , , f n } if Θ { f i = σ } Θ for any σ E ( T , f i ) { δ i } . This hypothesis is not admissible for Θ and an attribute f i F ( T ) if and only if | E ( Θ , f i ) | = 1 and δ i E ( Θ , f i ) . We call H admissible for Θ when we find that H is admissible for Θ and any attribute f i F ( T ) .
We now describe a decision tree Γ ( H ) for Θ . The root of this tree is tagged by an admissible hypothesis H = { f 1 = δ 1 , , f n = δ n } for Θ . For any equation system S A ( H ) , there is an edge that exits the root of Γ ( H ) and enters the root of the tree Γ ( k ) ( Θ S ) This edge is tagged by the equation system S.
It is obvious that
h ( Γ ( H ) ) = 1 + max { h ( Γ ( k ) ( Θ S ) ) : S A ( H ) } .
One can easily prove using (2) that Γ ( H ) is a decision tree with the minimum depth for Θ such that the root of this tree is tagged by the hypothesis H and it uses decision trees of the type k for the subtables corresponding to the children of the root.
It is obvious not to consider hypotheses H that are not admissible for Θ . The reason is that for such H, we can find an equation system S A ( H ) with Θ S = Θ . Therefore, we cannot construct an optimal decision tree for Θ based on H.
Construction of the tree Γ h ( k ) ( Θ ) . Construct a hypothesis H Θ = { f 1 = δ 1 , , f n = δ n } for Θ . If f i F ( T ) E ( Θ ) , then δ i is the only value from E ( Θ , f i ) . If f i E ( Θ ) , then δ i is minimum number from E ( Θ , f i ) for which h ( Γ ( k ) ( Θ { f i = δ i } ) ) = max { h ( Γ ( k ) ( Θ { f i = σ } ) ) : σ E ( Θ , f i ) } . Return the tree Γ ( H Θ ) as Γ h ( k ) ( Θ ) . Using (2), one can prove the correctness of this procedure.
Construction of the tree Γ p ( k ) ( Θ ) . For each row r = ( δ 1 , , δ n ) of the decision table T, we consider a proper hypothesis H r = { f 1 = δ 1 , , f n = δ n } . We inspect if H r is admissible for Θ . If yes, then we construct the decision tree Γ ( H r ) . We choose among the constructed trees a tree with the minimum depth. Return this tree as Γ p ( k ) ( Θ ) .
Given input of a decision table T and k { 1 , , 5 } , the following Algorithm 2 C h builds for each node Θ of the DAG Δ ( T ) a decision tree Γ ( k ) ( Θ ) of the type k for the table Θ having the minimum depth.
Algorithm 2 C h (construction of the tree Γ ( k ) ( T ) ).
Input: T (a nonempty decision table), Δ ( T ) (the directed acyclic graph for T), and k (a natural number between 1 to 5).
Output: A decision tree Γ ( k ) ( T ) .
  • Check all nodes of the DAG Δ ( T ) whether there is a decision tree attached to each node. If yes, then return the tree attached to the node T as Γ ( k ) ( T ) and break the algorithm. If not, select a node Θ of the graph Δ ( T ) that does not have an attached tree. It can be either a leaf node of Δ ( T ) or an internal node of Δ ( T ) where all children are tagged with trees.
  • If Θ is a leaf node, then attach to it the decision tree Γ ( k ) ( Θ ) that have only a single node. This node is tagged with the decision attached to all rows of Θ . Move to step 1.
  • If Θ is not a leaf node, then do the following according to the value k:
    • When k = 1 , construct the tree Γ a ( 1 ) ( Θ ) and attach it to Θ as the tree Γ ( 1 ) ( Θ ) .
    • When k = 2 , construct the tree Γ h ( 2 ) ( Θ ) and attach it to Θ as the tree Γ ( 2 ) ( Θ ) .
    • When k = 3 , construct the trees Γ a ( 3 ) ( Θ ) and Γ h ( 3 ) ( Θ ) , choose between them a tree with the minimum depth and attach it to Θ as the tree Γ ( 3 ) ( Θ ) .
    • When k = 4 , construct the tree Γ p ( 4 ) ( Θ ) and attach it to Θ as the tree Γ ( 4 ) ( Θ ) .
    • When k = 5 , construct the trees Γ a ( 5 ) ( Θ ) and Γ p ( 5 ) ( Θ ) , choose between them a tree with the minimum depth and attach it to Θ as the tree Γ ( 5 ) ( Θ ) .
    Move to step 1.
Let T be a decision table and k { 1 , , 5 } . We use the following notation: l h ( k ) ( T ) = l ( T , Γ ( k ) ( T ) ) and c h ( k ) ( T ) = c ( T , Γ ( k ) ( T ) ) .

6. Construction of Decision Trees Containing Minimum Number of Internal Nodes

Let us consider a nonempty decision table T that contains n conditional attributes f 1 , , f n and k { 1 , , 5 } . We can use the DAG Δ ( T ) to construct a decision tree G ( k ) ( T ) of the type k with the minimum number of internal nodes for the decision table T. To construct the tree G ( k ) ( T ) , for each node Θ of the DAG Δ ( T ) , we construct a decision tree G ( k ) ( Θ ) of the type k with the minimum number of internal nodes for the decision table Θ . It is necessary to not only consider the subtables corresponding to the nodes of Δ ( T ) but also the empty subtable Λ of T as well as the subtables T r containing only one row r of T which are not nodes of Δ ( T ) . The idea is to start with these special subtables as well as leaf nodes of Δ ( T ) that are degenerate separable subtables of T. In this way, we move step wise in a bottom up fashion to the table T.
Let us consider the case when Θ is a leaf node of Δ ( T ) or Θ = T r for a row r of the table T, or T = Λ . If Θ is nonempty, then the decision tree G ( k ) ( Θ ) has only one node that is tagged by a decision which is attached to all rows of Θ . Otherwise, it is tagged with 0.
Let us consider another case when Θ is an internal node of Δ ( T ) such that the construction of the decision tree G ( k ) ( Θ ) is already completed for each child Θ of Θ . Based on these trees, a decision tree containing the minimum number of internal nodes for Θ can be constructed that uses decision trees of the type k for the subtables corresponding to the children of the root. In this tree, the root can be tagged as follows:
  • By an attribute from F ( T ) (such decision tree can be designated as G a ( k ) ( Θ ) ).
  • By a hypothesis over T (such decision tree can be designated as G h ( k ) ( Θ ) ).
  • By a proper hypothesis over T (such decision tree can be designated as G p ( k ) ( Θ ) ).
The set E ( Θ ) is nonempty because Θ is nondegenerate. Now, three procedures for the construction of the trees G a ( k ) ( Θ ) , G h ( k ) ( Θ ) , and G p ( k ) ( Θ ) are described.
We now concentrate on a decision tree G ( f i ) for the node Θ where the root is tagged by an attribute f i E ( Θ ) . For each δ E ( T , f i ) , there is an edge that exits the root and enters the root of the decision tree G ( k ) ( Θ { f i = δ } ) . We tag this edge by the equation system { f i = δ } . It is obvious that
L w ( G ( f i ) ) = 1 + δ E ( T , f i ) L w ( G ( k ) ( Θ { f i = δ } ) ) .
One can easily prove using (3) that G ( f i ) is a decision tree with the minimum number of internal nodes for Θ such that the root of the tree is tagged by the attribute f i and it uses decision trees of the type k for the subtables corresponding to the children of the root.
It is obvious not to consider attributes f i F ( T ) E ( Θ ) . The reason is that for such f i , we can find δ E ( T , f i ) with Θ { f i = δ } = Θ . Therefore, we cannot construct an optimal tree for Θ based on f i .
Construction of the tree G a ( k ) ( Θ ) . We build the set of attributes E ( Θ ) . For any f i E ( Θ ) , construct the decision tree G ( f i ) and choose among these trees a tree with the minimum number of internal nodes. We return this tree as G a ( k ) ( Θ ) .
We now describe a decision tree G ( H ) for Θ . The root of this tree is tagged by an admissible hypothesis H = { f 1 = δ 1 , , f n = δ n } for Θ . For any equation system S A ( H ) , there is an edge that exits the root of G ( H ) and enters the root of the tree G ( k ) ( Θ S ) . This edge is tagged by the equation system S. It is obvious that
L w ( G ( H ) ) = 1 + S A ( H ) L w ( G ( k ) ( Θ S ) ) .
One can easily prove using (4) that G ( H ) is a decision tree with the minimum number of internal nodes for Θ such that the root of the tree is tagged by the hypothesis H and it uses decision trees of the type k for the subtables corresponding to the children of the root.
It is obvious not to consider hypotheses H that are not admissible for Θ . The reason is that for such H, we can find an equation system S A ( H ) with Θ S = Θ . Therefore, we cannot construct an optimal decision tree for Θ based on such H.
Construction of the tree G h ( k ) ( Θ ) . We construct a hypothesis H Θ = { f 1 = δ 1 , , f n = δ n } for Θ . If f i E ( Θ ) , then δ i is the only value in E ( Θ , f i ) . Let f i E ( Θ ) . Then δ i is the minimum number from E ( Θ , f i ) such that L w ( G ( k ) ( Θ { f i = δ i } ) ) = max { L w ( G ( k ) ( Θ { f i = σ } ) ) : σ E ( Θ , f i ) } . Obviously, H Θ is admissible for Θ . Return the tree G ( H Θ ) as G h ( k ) ( Θ ) . Using (4), one can prove the correctness of this procedure.
Construction of the tree G p ( k ) ( Θ ) . For each row r = ( δ 1 , , δ n ) of the decision table T, let us consider a proper hypothesis H r   = { f 1 = δ 1 , , f n = δ n } . We inspect if H r is admissible for Θ . If yes, then we construct the decision tree G ( H r ) . We choose among the constructed trees a tree with the minimum number of internal nodes. Return this tree as G p ( k ) ( Θ ) .
Given input of a decision table T and k { 1 , , 5 } , the following Algorithm 3 C L w builds for each node Θ of the DAG Δ ( T ) a decision tree G ( k ) ( Θ ) of the type k for the table Θ having the minimum number of internal nodes.
Algorithm 3 C L w (construction of the tree G ( k ) ( T ) ).
Input: T (a nonempty decision table), Δ ( T ) (the directed acyclic graph for T), and k (a natural number between 1 to 5).
Output: A decision tree G ( k ) ( T ) .
  • Check all nodes of the DAG Δ ( T ) whether there is a decision tree attached to each node. If yes, then return the tree attached to the node T as G ( k ) ( T ) and break the algorithm. If not, select a node Θ of the graph Δ ( T ) that does not have an attached tree. It can be either a leaf node of Δ ( T ) or an internal node of Δ ( T ) where all children are tagged with trees.
  • If Θ is a leaf node, then attach to it the decision tree G ( k ) ( Θ ) that have only a single node. This node is tagged with the decision attached to all rows of Θ . Move to step 1.
  • If Θ is not a leaf node, then do the following according to the value k:
    • When k = 1 , construct the tree G a ( 1 ) ( Θ ) and attach it to Θ as the tree G ( 1 ) ( Θ ) .
    • When k = 2 , construct the tree G h ( 2 ) ( Θ ) and attach it to Θ as the tree G ( 2 ) ( Θ ) .
    • When k = 3 , construct the trees G a ( 3 ) ( Θ ) and G h ( 3 ) ( Θ ) , choose between them a tree with the minimum number of internal nodes and attach it to Θ as the tree G ( 3 ) ( Θ ) .
    • When k = 4 , construct the tree G p ( 4 ) ( Θ ) and attach it to Θ as the tree G ( 4 ) ( Θ ) .
    • When k = 5 , construct the trees G a ( 5 ) ( Θ ) and G p ( 5 ) ( Θ ) , choose between them a tree with the minimum number of internal nodes and attach it to Θ as the tree G ( 5 ) ( Θ ) .
    Move to step 1.
Let T be a decision table and k { 1 , , 5 } . We use the following notation: l L w ( k ) ( T ) = l ( T , G ( k ) ( T ) ) and c L w ( k ) ( T ) = c ( T , G ( k ) ( T ) ) .

7. Experimental Results and Discussion

In this section, we describe the results of the experiments. First, we accomplished the experiments on eight decision tables from the UCI ML Repository [16]. We give the description of these tables in Table 1 where we show first the name of the table (Name), then number of rows (#Rows) and the number of attributes (#Attrs). We arranged the decision tables in Table 1 based on the number of rows. For each of these tables, we built an optimal decision tree of each of five possible types for each of the two possible cost functions. From these trees, we derive decision rules and study their coverage and length.
Next, we experimented with 100 Boolean functions having n variables ( n = 3 , , 6 ) which are generated randomly. Let f be such a Boolean function with n variables x 1 , , x n . We can map it to a decision table T f having n attributes x 1 , , x n . This table has 2 n rows corresponding to all possible n-tuples of variable values. We label each row with the decision that is the value of the function f for the considered row. The decision trees for the table T f are interpreted as decision trees that compute the function f.
For each of tables representing the generated Boolean functions, we build an optimal decision tree of each of five possible types for each of the two possible cost functions. From these trees, we derive decision rules and study their coverage and length.

7.1. Decision Trees with Minimum Depth

The results of experiments based on eight decision tables from [16] and decision trees optimal relative to the depth are represented in Table 2 and Table 3. The first column of Table 2 contains the name of the considered decision table T. The last five columns contain values l h ( 1 ) ( T ) , , l h ( 5 ) ( T ) (minimum values for each decision table are in bold).
The first column of Table 3 contains the name of the considered decision table T. The last five columns contain values c h ( 1 ) ( T ) , , c h ( 5 ) ( T ) (maximum values for each decision table are in bold).
The results of experiments based on Boolean functions and decision trees optimal relative to the depth are represented in Table 4 and Table 5. The first column of Table 4 contains the number n of variables in the considered Boolean functions. The last five columns contain information about values l h ( 1 ) , , l h ( 5 ) in the format min Avg max (minimum values of Avg for each n are in bold).
The first column of Table 5 contains the number n of variables in the considered Boolean functions. The last five columns contain information about values c h ( 1 ) , , c h ( 5 ) in the format min Avg max (maximum values of Avg for each n are in bold).

7.2. Decision Trees Containing Minimum Number of Internal Nodes

We present the results based on the decision tables from [16] and decision trees optimal relative to the number of internal nodes in Table 6 and Table 7. The first column of Table 6 contains the name of the considered decision table T. The last five columns contain values l L w ( 1 ) ( T ) , , l L w ( 5 ) ( T ) (minimum values for each decision table are in bold).
The first column of Table 7 contains the name of the considered decision table T. The last five columns contain values c L w ( 1 ) ( T ) , , c L w ( 5 ) ( T ) (maximum values for each decision table are in bold).
The results of experiments based on Boolean functions and decision trees optimal relative to the number of internal nodes are represented in Table 8 and Table 9. The first column of Table 8 contains the number n of variables in the considered Boolean functions. The last five columns contain information about values l L w ( 1 ) , , l L w ( 5 ) in the format min Avg max (minimum values of Avg for each n are in bold).
The first column of Table 9 contains the number of variables n in the considered Boolean functions. The last five columns contain information about values c L w ( 1 ) , , c L w ( 5 ) in the format min Avg max (maximum values of Avg for each n are in bold).

7.3. Analysis of Experimental Results

The experimental results show that the decision rules derived from decision trees with hypotheses in many cases are better than the ones derived from conventional decision trees. In particular, in the case of decision trees with the minimum depth, for each row in Table 2, Table 3, Table 4 and Table 5, the results for type 2 decision trees are better than the results for type 1 decision trees. In the case of decision trees with a minimum number of internal nodes, for each row of Table 6, Table 7, Table 8 and Table 9 (with the exception of the row zoo-data in Table 7) there is a number k { 2 , , 5 } such that the results for type k decision trees are superior compared to the results for type 1 decision trees.
Note that for the decision trees with the minimum depth, for each decision table from [16] considered in this paper, the best results related to the length and the coverage among decision trees of types { 2 , , 5 } are close to the optimal ones obtained in [18] with the help of dynamic programming algorithms for the construction of optimal decision rules. Results for the decision trees of the type 1 are, generally, far from the optimal ones.
From the obtained experimental results, it follows that the decision rules derived from optimal decision trees with hypotheses are more preferable as a tool for the representation of information than the decision rules derived from optimal conventional decision trees.

8. Conclusions

In this paper, we studied modified decision trees that use two types of queries. We constructed optimal trees relative to two cost functions for a number of known datasets from the UCI Machine Learning Repository and randomly generated Boolean functions, and compared the length as well as coverage of decision rules extracted from the constructed decision trees. The experimental results confirmed that the decision rules derived from the decision trees with hypotheses in many cases are better than the ones derived from conventional decision trees.

Author Contributions

Conceptualization, all authors; methodology, all authors; software, I.C.; validation, I.C., M.A., S.H. and B.Z.; formal analysis, all authors; investigation, B.Z.; resources, all authors; data curation, M.A., S.H. and B.Z.; writing—original draft preparation, M.M.; writing—review and editing, all authors; visualization, B.Z.; supervision, I.C. and M.M.; project administration, M.M.; funding acquisition, M.M. All authors have read and agreed to the published version of the manuscript.

Funding

Research funded by King Abdullah University of Science and Technology.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available in a publicly accessible repository that does not issue DOIs. Publicly available datasets were analyzed in this study. These data can be found here: http://archive.ics.uci.edu/ml (accessed on 12 April 2017).

Acknowledgments

Research reported in this publication was supported by King Abdullah University of Science and Technology (KAUST). The authors are greatly indebted to the anonymous reviewers for useful comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Chapman and Hall/CRC: Boca Raton, FL, USA, 1984. [Google Scholar]
  2. Moshkov, M. Time complexity of decision trees. In Trans. Rough Sets III; Peters, J.F., Skowron, A., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2005; Volume 3400, pp. 244–459. [Google Scholar]
  3. Rokach, L.; Maimon, O. Data Mining with Decision Trees—Theory and Applications. In Series in Machine Perception and Artificial Intelligence; World Scientific: Singapore, 2007; Volume 69. [Google Scholar]
  4. James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning: With Applications in R; Springer: New York, NY, USA, 2021. [Google Scholar]
  5. Chegis, I.A.; Yablonskii, S.V. Logical methods of control of work of electric schemes. Trudy Mat. Inst. Steklov 1958, 51, 270–360. (In Russian) [Google Scholar]
  6. Pawlak, Z. Rough sets. Int. J. Parallel Program. 1982, 11, 341–356. [Google Scholar] [CrossRef]
  7. Pawlak, Z. Rough Sets—Theoretical Aspects of Reasoning about Data. In Theory and Decision Library: Series D; Kluwer: Dordrecht, The Netherlands, 1991; Volume 9. [Google Scholar]
  8. Pawlak, Z.; Skowron, A. Rudiments of rough sets. Inf. Sci. 2007, 177, 3–27. [Google Scholar] [CrossRef]
  9. Azad, M.; Chikalov, I.; Hussain, S.; Moshkov, M. Minimizing depth of decision trees with hypotheses. In Lecture Notes in Computer Science, Proceedings of the Rough Sets—International Joint Conference, IJCRS 2021, Bratislava, Slovakia, 19–24 September 2021; Ramanna, S., Cornelis, C., Ciucci, D., Eds.; Springer: Cham, Switzerland, 2021; Volume 12872, pp. 123–133. [Google Scholar]
  10. Azad, M.; Chikalov, I.; Hussain, S.; Moshkov, M. Minimizing number of nodes in decision trees with hypotheses. In Procedia Computer Science, Proceedings of the 25th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems, KES 2021, Szczecin, Poland, 8–10 September 2021; Watrobski, J., Salabun, W., Toro, C., Zanni-Merk, C., Howlett, R.J., Jain, L.C., Eds.; Elsevier: Amsterdam, The Netherlands, 2021; Volume 192, pp. 232–240. [Google Scholar]
  11. Azad, M.; Chikalov, I.; Hussain, S.; Moshkov, M. Entropy-based greedy algorithm for decision trees using hypotheses. Entropy 2021, 23, 808. [Google Scholar] [CrossRef] [PubMed]
  12. Azad, M.; Chikalov, I.; Hussain, S.; Moshkov, M. Optimization of decision trees with hypotheses for knowledge representation. Electronics 2021, 10, 1580. [Google Scholar] [CrossRef]
  13. Angluin, D. Learning regular sets from queries and counterexamples. Inf. Comput. 1987, 75, 87–106. [Google Scholar] [CrossRef] [Green Version]
  14. Angluin, D. Queries and concept learning. Mach. Learn. 1988, 2, 319–342. [Google Scholar] [CrossRef] [Green Version]
  15. Angluin, D. Queries revisited. Theor. Comput. Sci. 2004, 313, 175–194. [Google Scholar] [CrossRef] [Green Version]
  16. Dua, D.; Graff, C.; UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences. 2017. Available online: http://archive.ics.uci.edu/ml (accessed on 12 April 2017).
  17. AbouEisha, H.; Amin, T.; Chikalov, I.; Hussain, S.; Moshkov, M. Extensions of Dynamic Programming for Combinatorial Optimization and Data Mining. In Intelligent Systems Reference Library; Springer: Cham, Switzerland, 2019; Volume 146. [Google Scholar]
  18. Amin, T.; Chikalov, I.; Moshkov, M.; Zielosko, B. Dynamic programming approach for exact decision rule optimization. In Rough Sets and Intelligent Systems—Professor Zdzisław Pawlak in Memoriam—Volume 1; Skowron, A., Suraj, Z., Eds.; Intelligent Systems Reference Library; Springer: Berlin/Heidelberg, Germany, 2013; Volume 42, pp. 211–228. [Google Scholar]
Table 1. Description of decision tables from [16] which were used in experiments.
Table 1. Description of decision tables from [16] which were used in experiments.
Name#Rows#Attrs
soybean-small4736
zoo-data5917
hayes-roth-data695
breast-cancer26610
balance-scale6255
tic-tac-toe95810
cars17287
nursery12,9609
Table 2. Results for decision tables from [16]: length of decision rules derived from decision trees with minimum depth.
Table 2. Results for decision tables from [16]: length of decision rules derived from decision trees with minimum depth.
Decision Table T l h ( 1 ) ( T ) l h ( 2 ) ( T ) l h ( 3 ) ( T ) l h ( 4 ) ( T ) l h ( 5 ) ( T )
soybean-small1.891.001.891.551.89
zoo-data3.691.562.422.173.24
hayes-roth-data2.832.222.162.322.35
breast-cancer3.612.682.712.702.78
balance-scale3.603.203.203.213.20
tic-tac-toe5.093.043.403.243.14
cars3.722.442.483.073.02
nursery5.783.164.533.124.50
Average3.782.412.852.673.01
Table 3. Results for decision tables from [16]: coverage of decision rules derived from decision trees with minimum depth.
Table 3. Results for decision tables from [16]: coverage of decision rules derived from decision trees with minimum depth.
Decision Table T c h ( 1 ) ( T ) c h ( 2 ) ( T ) c h ( 3 ) ( T ) c h ( 4 ) ( T ) c h ( 5 ) ( T )
soybean-small3.4712.533.4710.623.47
zoo-data7.8810.789.8010.866.46
hayes-roth-data3.466.206.495.485.45
breast-cancer4.989.308.369.386.90
balance-scale2.604.194.184.164.19
tic-tac-toe8.3866.0127.2356.6461.45
cars197.60332.76330.3597.2099.42
nursery29.331524.04304.711530.50246.14
Average32.21245.7386.82215.6054.18
Table 4. Results for Boolean functions: length of decision rules derived from decision trees with minimum depth.
Table 4. Results for Boolean functions: length of decision rules derived from decision trees with minimum depth.
Number of Variables n l h ( 1 ) l h ( 2 ) l h ( 3 ) l h ( 4 ) l h ( 5 )
3 1.50 2 . 20 2.75 1.25 2 . 05 2.63 1.25 1 . 99 2.50 1.25 2 . 08 2.63 1.25 2 . 01 2.50
4 1.88 3 . 18 3.75 1.63 2 . 92 3.50 1.63 2 . 87 3.50 1.63 2 . 91 3.50 1.63 2 . 94 3.50
5 3.44 4 . 09 4.63 3.00 3 . 64 4.22 2.97 3 . 60 4.06 3.13 3 . 66 4.13 3.09 3 . 70 4.19
6 4.78 5 . 14 5.47 3.98 4 . 36 4.77 3.98 4 . 41 4.75 3.97 4 . 35 4.70 4.03 4 . 46 4.78
Table 5. Results for Boolean functions: coverage of decision rules derived from decision trees with minimum depth.
Table 5. Results for Boolean functions: coverage of decision rules derived from decision trees with minimum depth.
Number of Variables n c h ( 1 ) c h ( 2 ) c h ( 3 ) c h ( 4 ) c h ( 5 )
3 1.25 1 . 94 3.00 1.38 2 . 22 3.63 1.50 2 . 21 3.63 1.38 2 . 14 3.63 1.50 2 . 17 3.63
4 1.25 1 . 99 5.38 1.50 2 . 57 6.44 1.50 2 . 52 6.44 1.50 2 . 53 6.44 1.50 2 . 36 6.44
5 1.38 2 . 10 3.69 2.03 3 . 03 4.56 2.06 2 . 98 4.97 2.03 2 . 93 4.69 1.81 2 . 76 4.66
6 1.59 2 . 03 2.84 2.69 3 . 55 4.81 2.58 3 . 37 4.64 2.72 3 . 53 4.70 2.53 3 . 24 4.69
Table 6. Results for decision tables from [16]: length of decision rules derived from decision trees with minimum number of internal nodes.
Table 6. Results for decision tables from [16]: length of decision rules derived from decision trees with minimum number of internal nodes.
Decision Table T l L w ( 1 ) ( T ) l L w ( 2 ) ( T ) l L w ( 3 ) ( T ) l L w ( 4 ) ( T ) l L w ( 5 ) ( T )
soybean-small1.341.001.341.511.34
zoo-data3.051.693.052.393.05
hayes-roth-data2.642.222.612.232.61
breast-cancer4.982.725.302.735.27
balance-scale3.553.203.533.203.53
tic-tac-toe4.453.354.413.154.45
cars2.972.492.962.492.96
nursery3.773.193.773.193.77
Average3.342.483.372.613.37
Table 7. Results for decision tables from [16]: coverage of decision rules derived from decision trees with minimum number of internal nodes.
Table 7. Results for decision tables from [16]: coverage of decision rules derived from decision trees with minimum number of internal nodes.
Decision Table T c L w ( 1 ) ( T ) c L w ( 2 ) ( T ) c L w ( 3 ) ( T ) c L w ( 4 ) ( T ) c L w ( 5 ) ( T )
soybean-small11.5112.5311.519.8111.51
zoo-data10.6910.6810.6910.6310.69
hayes-roth-data3.846.203.876.203.87
breast-cancer2.738.963.059.063.15
balance-scale2.794.212.884.212.88
tic-tac-toe22.4930.1923.5056.5022.69
cars237.33332.46237.37332.46237.37
nursery1471.451527.951471.471527.951471.47
Average220.35241.65220.54244.60220.45
Table 8. Results for Boolean functions: length of decision rules derived from decision trees with minimum number of internal nodes.
Table 8. Results for Boolean functions: length of decision rules derived from decision trees with minimum number of internal nodes.
Number of Variables n l L w ( 1 ) l L w ( 2 ) l L w ( 3 ) l L w ( 4 ) l L w ( 5 )
3 1.50 2 . 07 2.75 1.25 2 . 06 2.63 1.25 1 . 94 2.50 1.25 2 . 06 2.63 1.25 1 . 94 2.50
4 1.88 2 . 90 3.50 1.63 2 . 94 3.50 1.81 2 . 79 3.50 1.63 2 . 94 3.50 1.81 2 . 79 3.50
5 3.13 3 . 75 4.19 3.00 3 . 77 4.31 3.13 3 . 69 4.25 3.00 3 . 77 4.31 3.13 3 . 69 4.25
6 4.28 4 . 69 5.06 4.13 4 . 69 5.19 4.25 4 . 61 4.98 4.13 4 . 69 5.19 4.25 4 . 61 4.98
Table 9. Results for Boolean functions: coverage of decision rules derived from decision trees with minimum number of internal nodes.
Table 9. Results for Boolean functions: coverage of decision rules derived from decision trees with minimum number of internal nodes.
Number of Variables n c L w ( 1 ) c L w ( 2 ) c L w ( 3 ) c L w ( 4 ) c L w ( 5 )
3 1.25 2 . 14 3.00 1.38 2 . 22 3.63 1.50 2 . 27 3.63 1.38 2 . 22 3.63 1.50 2 . 27 3.63
4 1.63 2 . 51 5.38 1.50 2 . 56 6.44 1.50 2 . 63 5.44 1.50 2 . 56 6.44 1.50 2 . 63 5.44
5 1.88 2 . 79 4.19 1.94 2 . 91 4.59 1.75 2 . 82 4.34 1.94 2 . 91 4.59 1.75 2 . 82 4.34
6 2.16 2 . 88 4.09 2.09 3 . 16 4.58 2.27 2 . 99 4.11 2.09 3 . 16 4.58 2.27 2 . 99 4.11
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Azad, M.; Chikalov, I.; Hussain, S.; Moshkov, M.; Zielosko, B. Decision Rules Derived from Optimal Decision Trees with Hypotheses. Entropy 2021, 23, 1641. https://0-doi-org.brum.beds.ac.uk/10.3390/e23121641

AMA Style

Azad M, Chikalov I, Hussain S, Moshkov M, Zielosko B. Decision Rules Derived from Optimal Decision Trees with Hypotheses. Entropy. 2021; 23(12):1641. https://0-doi-org.brum.beds.ac.uk/10.3390/e23121641

Chicago/Turabian Style

Azad, Mohammad, Igor Chikalov, Shahid Hussain, Mikhail Moshkov, and Beata Zielosko. 2021. "Decision Rules Derived from Optimal Decision Trees with Hypotheses" Entropy 23, no. 12: 1641. https://0-doi-org.brum.beds.ac.uk/10.3390/e23121641

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop