Why Use a Fuzzy Partition in F-Transform?

Kreinovich, Vladik; Kosheleva, Olga; Sriboonchitta, Songsak

doi:10.3390/axioms8030094

Open AccessArticle

Why Use a Fuzzy Partition in F-Transform?

by

Vladik Kreinovich

^1,*

,

Olga Kosheleva

²

and

Songsak Sriboonchitta

³

¹

Department of Computer Science, University of Texas at El Paso, El Paso, TX 79968, USA

²

Department of Teacher Education, University of Texas at El Paso, El Paso, TX 79968, USA

³

Faculty of Economics, Chiang Mai University, Chiang Mai 50200, Thailand

^*

Author to whom correspondence should be addressed.

Axioms 2019, 8(3), 94; https://0-doi-org.brum.beds.ac.uk/10.3390/axioms8030094

Submission received: 21 April 2019 / Revised: 31 July 2019 / Accepted: 1 August 2019 / Published: 2 August 2019

(This article belongs to the Special Issue Fuzzy Transforms and Their Applications)

Download Versions Notes

Abstract

:

In many application problems, F-transform algorithms are very efficient. In F-transform techniques, we replace the original signal or image with a finite number of weighted averages. The use of a weighted average can be naturally explained, e.g., by the fact that this is what we get anyway when we measure the signal. However, most successful applications of F-transform have an additional not-so-easy-to-explain feature: the fuzzy partition requirement that the sum of all the related weighting functions is a constant. In this paper, we show that this seemingly difficult-to-explain requirement can also be naturally explained in signal-measurement terms: namely, this requirement can be derived from the natural desire to have all the signal values at different moments of time estimated with the same accuracy. This explanation is the main contribution of this paper.

Keywords:

F-transform; fuzzy partition; measurement; measurement accuracy

1. Formulation of the Problem

1.1. F-Transform: A Brief Reminder

In many practical applications, it turns out to be beneficial to replace the original continuous signal

x (t)

defined on some time interval with a finite number of “averaged” values

x_{i} = \int A_{i} (t) \cdot x (t) d t, i = 0, \dots, n,

(1)

where

A_{i} (t) \geq 0

are appropriate functions, see, e.g., [1,2,3,4,5,6].

In many applications, a very specific form of these functions is used: namely,

A_{i} (t) = A (t - t_{i})

for some function

A (t)

and for

t_{i} = t_{0} + i \cdot h

, where

t_{0}

and

h > 0

are numbers for which

A (t)

is equal to 0 outside the interval

[- h, h]

. However, more general families of functions

A_{i} (t)

are also sometimes efficiently used.

The transition from the original function

x (t)

to the tuple of values

〈 x_{0}, x_{1}, \dots, x_{n} 〉

is known as the F-transform; [1,2,3,4,5,6].

A similar 2-D transformation is very useful in many image processing problems.

1.2. The General Idea behind F-Transform Is Very Reasonable

From the general measurement viewpoint, F-transform makes perfect sense because it corresponds to the results of measuring the signal. Indeed, in practice, a measuring instrument cannot measure the exact value

x (t)

of the signal at a given moment t. No matter how fast the processes within the measuring instrument, it always has some inertia. As a result, the value

m_{i}

measured at each measurement depends not only on the value

x (t)

of the signal at the given moment of time, it also depends on the values at nearby moments of time; see, e.g., [7].

The signal is usually weak, so the values

x (t)

are small. Thus, we can expand the dependence of

m_{i}

on

x (t)

in Taylor series and safely ignore terms which are quadratic or of higher order with respect to

x (t)

. Then, we get a model in which the value

m_{i}

is a linear function of different values

x (t)

; this is the usual technique in applications, see, e.g., [8]. The general form of a linear dependence is

m_{i} = m_{i}^{(0)} + \int A_{i} (t) \cdot x (t) d t

(2)

for some coefficients

A_{i} (t)

.

A measuring instrument is usually calibrated in such a way that in the absence of the signal, when

x (t) = 0

, the measurement result is 0. After such a calibration, we get

m_{i}^{(0)} = 0

and thus, the expression (2) gets a simplified form

m_{i} = \int A_{i} (t) \cdot x (t) d t .

(3)

This is exactly the form used in F-transform. Thus, F-transform is indeed a very natural procedure: it replaces the original signal

x (t)

with the simulated results of measuring this signal, and the results of measuring the signal is exactly what we have in real life.

1.3. However, Why a Fuzzy Partition?

So far, everything has been good and natural, but there is one aspect of successful applications of F-transform that cannot be explained so easily: namely, that in most such applications, the corresponding functions

A_{i} (t)

form a fuzzy partition, in the sense that

\sum_{i = 1}^{n} A_{i} (t) = 1

(4)

for all moments t from the corresponding time interval.

1.3.1. Mathematical Comment

Sometimes, the corresponding requirement takes a slightly different form

\sum_{i = 1}^{n} A_{i} (t) = c

for some constant c. This case can be naturally reduced to case (4) if we consider re-scaled functions

A_{i}^{'} (t) = c^{- 1} \cdot A_{i} (t)

and the corresponding re-scaled values

x_{i}^{'} = c^{- 1} \cdot x_{i}

. In view of this equivalent re-scaling, the question is why it is natural to require that

\sum_{i = 1}^{n} A_{i} (t) = const

.

1.3.2. Application-Related Comment

It is worth mentioning that fuzzy partitions are successfully used in other applications of fuzzy techniques. For example, fuzzy sets that form a fuzzy partition are used:

in fuzzy control; see, e.g., an application to control of telerobots in space medicine [9];
in information accessing systems such as information retrieval systems, filtering systems, recommender systems, and web quality evaluation tools; see, e.g., [10] and references therein, etc.

In fuzzy clustering, the important frequently used requirement is also that the fuzzy sets corresponding to different clusters form a fuzzy partition. The resulting clustering techniques have been very successful in many applications; see, e.g., a recent application to the analysis of earthquake data [11].

On the other hand, in some other applications, it turns to be more efficient to use fuzzy sets which do not form a fuzzy partition; an example related to face and pose detection is given in [12].

1.4. It Is Desirable to Explain the Efficiency of a Fuzzy Partition Requirement

We strongly believe that every time there is an unexplained empirical fact about data processing algorithms, it is desirable to come up with a theoretical explanation. Such an explanation makes the resulting algorithms more reliable, thus decreasing the possibility that these algorithms will fail and, correspondingly, increasing the chances that these efficient algorithms will be used by practitioners, even in potentially high-risk situations. Sometimes, the corresponding analysis finds conditions under which these methods work efficiently, and even helps develop even more efficient techniques.

In applications of fuzzy logic, there are many such interesting empirical facts, e.g., higher efficiency of certain membership functions, of certain “and” and “or”-operations, of certain defuzzification procedures, etc. Finding an explanation for such facts has been one of our main research directions; see, e.g., [13,14,15,16,17,18,19,20,21,22,23,24,25,26,27]. Similar research results help understand the somewhat unexpected efficiency of other intelligent techniques such as deep neural networks; see, e.g., [28,29,30,31,32,33]. This paper can be viewed as a natural continuation of this research direction.

1.5. What We Do in this Paper

In this paper, we show that the fuzzy partition requirement (4) can be naturally explained in the measurement interpretation of F-transform.

To be more precise, we show that what naturally appears is a 1-parametric family of similar requirements of which the fuzzy partition requirement is a particular case, and then we explain that in the fuzzy cases, it is indeed reasonable to use the fuzzy partition requirement.

The resulting explanation of the fuzzy partition requirement is the main contribution of this paper.

1.6. The Structure of this Paper

The structure of our paper is as follows. The main idea behind our explanation is presented in Section 2, for a very general (not necessary fuzzy) uncertainty. In Section 3, we analyze the case of probabilistic uncertainty. In Section 4, this analysis is generalized to general uncertainty. In Section 5, the analysis performed in the previous sections is used to explain which functions

A_{i} (t)

we should choose in a general uncertainty situation and, in particular, in the case of fuzzy uncertainty. All these sections contain original research results. The final Section 6 contains conclusions and possible future research directions.

Comment

In the above text, we assumed that the actual signal

x (t)

is defined for all possible moments of time. In some practical situations, however, it makes sense to only consider discrete-time values

x (t_{1})

x (t_{2})

, …In such discrete-time situations, it makes sense to apply the following discrete F-transform:

x_{i} = c \cdot \sum_{k} A_{i} (t_{k}) \cdot x (t_{k})

for some constant c, where the values

A_{i} (t_{k})

satisfy the same fuzzy partition requirement: that for each k, we have

\sum_{i} A_{i} (t_{k}) = 1

.

In the following text, we use the continuous case to explain the fuzzy partition requirement; however, as one can easily check, the same explanation holds for the discrete F-transform as well.

2. Main Idea

2.1. What if We Can Make Exact Measurements of Instantaneous Values?

In the idealized case, when inertia of measuring instruments is so small that it can be safely ignored, we can measure the exact values

x (t_{1})

,

x (t_{2})

, …, of the signal

x (t)

at different moments of time.

In this case, we get perfect information about the values of the signal at these moments of time

t_{1}

,

t_{2}

, …, but practically no information about the values of the signal

x (t)

at any other moment of time. In other words:

we reconstruct the values $x (t_{1})$ , $x (t_{2})$ , …, with perfect accuracy (0 measurement error), while
the values $x (t)$ corresponding to all other moments of time t are reconstructed with no accuracy at all (the only bound on measurement error is infinity).

Even if we take into account that measurements are never 100% accurate, and we only measure the values

x (t_{i})

with some accuracy, we will still get the difference between our knowledge of values

x (t)

corresponding to different moments of time:

we know the values $x (t_{i})$ with finite accuracy, but
for all other moments of time t, we know nothing (i.e., the only bound of measurement error is infinity).

This difference does not fit well with the fact that we want to get a good representation of the whole signal

x (t)

, i.e., a good representation of its values at all moments of time. Thus, we arrive at the following idea.

2.2. Main Idea

To adequately represent the original signal

x (t)

, it is desirable to select the measurement procedures in such a way that based on these measurements, we reconstruct each value

x (t)

with the same accuracy.

Comment

At this moment, we have presented this idea informally. In the following sections, we will show how to formalize this idea, and we also show that this idea leads to the fuzzy partition requirement.

To be more precise, this idea leads to a general formula that includes the fuzzy partition requirement as a particular case. We also explain why namely the fuzzy partition requirement should be selected in the fuzzy case.

3. Case of Probabilistic Uncertainty

3.1. Description of the Case

Let us start with the most well-studied uncertainty: the probabilistic uncertainty. In this case, we have probabilistic information about the measurement error

Δ m_{i} \overset{def}{=} {\tilde{m}}_{i} - m_{i}

of each measurement, where

{\tilde{m}}_{i}

denotes the result of measuring the quantity

m_{i}

.

We will consider the usual way measurement uncertainties are treated in this approach (see, e.g., [7]): namely, we will assume:

that each measurement error $Δ m_{i}$ is normally distributed with 0 mean and known standard deviation $σ$ , and
that measurement errors $Δ m_{i}$ and $Δ m_{j}$ corresponding to different measurements $i \neq j$ are independent.

3.2. How Accurately Can We Estimate $X (T)$ Based on Each Measurement

Based on each measurement, we know each value

m_{i} = \int A_{i} (t) \cdot x (t) d t

with accuracy

σ

. The integral is, in effect, a large sum, so we have

m_{i} = \sum_{t} A_{i} (t) \cdot x (t) \cdot Δ t .

Thus, for each moment t, we have

A_{i} (t) \cdot x (t) \cdot Δ t = m_{i} - \sum_{s \neq t} A_{i} (s) \cdot x (s) \cdot Δ s,

(5)

and therefore,

x (t) = \frac{1}{A_{i} (t) \cdot Δ t} \cdot m_{i} - \frac{1}{A_{i} (t) \cdot Δ t} \cdot m_{i} \cdot \sum_{s \neq t} A_{i} (s) \cdot x (s) \cdot Δ s .

(6)

The measurement result

{\tilde{m}}_{i}

is an estimate for the quantity

m_{i}

, with mean 0 and standard deviation

σ

. Thus, if we know all the values

x (s)

corresponding to

s \neq t

, then, based on the result

{\tilde{m}}_{i}

of the i-th measurement, we can estimate the remaining value

x (t)

as

x (t) \approx {\tilde{x}}_{i} (t) \overset{def}{=} \frac{1}{A_{i} (t) \cdot Δ t} \cdot {\tilde{m}}_{i} - \frac{1}{A_{i} (t) \cdot Δ t} \cdot m_{i} \cdot \sum_{s \neq t} A_{i} (s) \cdot x (s) \cdot Δ s .

(7)

By comparing the Formulas (6) and (7), we can conclude that the approximation error

Δ x_{i} (t) \overset{def}{=} {\tilde{x}}_{i} (t) - x (t)

of this estimate is equal to

Δ x_{i} (t) = \frac{1}{A_{i} (t) \cdot Δ t} \cdot Δ m_{i} .

(8)

Since the measurement error

Δ m_{i}

is normally distributed, with 0 mean and standard deviation

σ

, the approximation error

Δ x_{i} (t)

is also normally distributed, with 0 mean and standard deviation

σ_{i} (t) = \frac{σ}{A_{i} (t) \cdot Δ t} .

(9)

3.3. How Accurately Can We Estimate $X (T)$ Based on All The Measurements

For each moment t, based on each measurement i, we get an estimate

{\tilde{x}}_{i} (t) \approx x (t)

with the accuracy

σ_{i}

described by Formula (9):

x (t) \approx {\tilde{x}}_{0} (t),

x (t) \approx {\tilde{x}}_{1} (t),

\dots

(10)

x (t) \approx {\tilde{x}}_{n} (t) .

For each estimate, since the distribution of the measurement error is normal, the corresponding probability density function has the form

ρ_{i} ({\tilde{x}}_{i} (t)) = \frac{1}{\sqrt{π} \cdot σ_{i} (t)} \cdot exp (- \frac{{({\tilde{x}}_{i} (t) - x (t))}^{2}}{2 {(σ_{i} (t))}^{2}}) .

(11)

Since the measurement errors

Δ m_{i}

of different measurements are independent, the resulting estimation errors

Δ x_{i} (t) = {\tilde{x}}_{i} (t) - x (t)

are also independent. Thus, the joint probability density corresponding to all the measurements is equal to the product of all the values (11) corresponding to individual measurements:

ρ ({\tilde{x}}_{0} (t), \dots, {\tilde{x}}_{n} (t)) = \frac{1}{{(\sqrt{π})}^{n + 1} \cdot \prod_{i = 0}^{n} σ_{i} (t)} \cdot exp (- \sum_{i = 0}^{n} \frac{{({\tilde{x}}_{i} (t) - x (t))}^{2}}{2 {(σ_{i} (t))}^{2}}) .

(12)

As a combined estimate

\tilde{x} (t)

for

x (t)

, it is reasonable to select the value for which the corresponding probability (12) is the largest possible. This is known as the Maximum Likelihood Method; see, e.g., [34].

To find such a maximum, it is convenient to take the negative logarithm of expression (12) and use the fact that

- ln (z)

is a decreasing function, so the original expression is the largest if and only if its negative logarithm is the smallest. Thus, we arrive at the need to minimize the sum

\sum_{i = 0}^{n} \frac{{({\tilde{x}}_{i} (t) - x (t))}^{2}}{2 σ_{i}^{2}};

(13)

this minimization is known as the Least Squares approach.

Differentiating expression (13) with respect to the unknown

x (t)

and equating the derivative to 0, we conclude that

\sum_{i = 0}^{n} {\tilde{x}}_{i} (t) \cdot {(σ_{i} (t))}^{- 2} = \tilde{x} (t) \cdot \sum_{i = 0}^{n} {(σ_{i} (t))}^{- 2},

(14)

and thus that

\tilde{x} (t) = \frac{\sum_{i = 0}^{n} {\tilde{x}}_{i} (t) \cdot {(σ_{i} (t))}^{- 2}}{\sum_{i = 0}^{n} {(σ_{i} (t))}^{- 2}} .

(15)

The accuracy

\tilde{σ} (t)

of this estimate can be determined if we describe the expression (12) in the form

\frac{1}{\sqrt{π} \cdot \tilde{σ} (t)} \cdot exp (- \frac{{(x (t) - \tilde{x} (t))}^{2}}{2 {(\tilde{σ} (t))}^{2}}) .

(16)

By comparing the coefficients at

{(x (t))}^{2}

under the exponent in Formulas (12) and (16), we conclude that

\frac{1}{2 {(\tilde{σ} (t))}^{2}} = \sum_{i = 0}^{n} \frac{1}{2 {(σ_{i} (t))}^{2}},

(17)

i.e., equivalently that

{(\tilde{σ} (t))}^{- 2} = \sum_{i = 0}^{n} {(σ_{i} (t))}^{- 2} .

(18)

In particular, if all the estimation errors were equal, i.e., if we had

σ_{i} (t) = σ (t)

for all i, then, from (18), we would conclude that

\tilde{σ} (t) = \frac{σ (t)}{\sqrt{N}},

(19)

where

N \overset{def}{=} n + 1

is the overall number of combined measurements.

Substituting expression (9) for

σ_{i} (t)

into Formula (18), we conclude that

{(\tilde{σ} (t))}^{- 2} = \frac{{(Δ t)}^{2}}{σ^{2}} \cdot \sum_{i = 0}^{n} {(A_{i} (t))}^{2} .

(20)

Thus, the requirement that we get the same accuracy for all moments of time t, i.e., that

\tilde{σ} (t) = const

means that we need to have

\sum_{i = 0}^{n} {(A_{i} (t))}^{2} = const .

(21)

3.4. Discussion

Formula (21) is somewhat similar to the fuzzy partition requirement but it is different:

in the fuzzy partition requirement, we demand that the sum of the functions $A_{i} (t)$ be constant, but
here, we have the sum of the squares.

Formula (21) is based on the probabilistic uncertainty, for which the measurement error decreases with repeated measurements as

1 / \sqrt{N}

. However, e.g., for interval uncertainty (see, e.g., [7,35,36,37]), when we only know the upper bound on the measurement errors, the measurement error resulting from N repeated measurements decreases as

1 / N

; see, e.g., [38].

So maybe by considering different types of uncertainty, we can get the fuzzy partition formula? To answer this question, let us consider a general case of how uncertainties can be combined in different approaches.

4. How Uncertainties Can Be Combined in Different Approaches

4.1. Towards a General Formulation of the Problem

In the general case, be it probabilistic or interval or any other approach, we can always describe the corresponding uncertainty in the same unit as the measured quantity.

In the interval approach, a natural measure of uncertainty is the largest possible value

Δ

of the absolute value

| Δ x |

of the approximation error

Δ x = \tilde{x} - x

, where x is the actual value of the corresponding quantity and

\tilde{x}

is the measurement result. This value

Δ

is clearly measured in the same units as the quantity x itself.

In the probabilistic approach, we can use the variance of

Δ x

—which is described in different units than x—but we can also take the square root of this variance and consider standard deviation

σ

, which is already described in the same units.

In the general case, let us denote the corresponding measure of accuracy by

Δ

. The situation when we have no information about the desired quantity corresponds to

Δ = \infty

. The idealized situation when we know the exact value of this quantity corresponds to

Δ = 0

.

If

Δ

and

Δ^{'}

are corresponding measures of accuracy for two different measurements, then what is the accuracy of the resulting combined estimate? Let us denote this combined accuracy by

Δ * Δ^{'}

.

In these terms, to describe the combination, we need to describe the corresponding function

a * b

of two variables. What are the natural properties of this function?

4.2. Commutativity

The result of combining two estimates should not depend on which of the two estimates is listed first, so we should have

a * b = b * a

. In other words, the corresponding combination operation must be commutative.

4.3. Associativity

If we have three estimates, then:

we can first combine the first and the second ones, and then combine the result with the third one,
or we can first combine the second and the third ones, and then combine the result with the first one.

The result should not depend on the order, so we should have

(a * b) * c = a * (b * c)

. In other words, the corresponding operation should be associative.

4.4. Monotonicity

Any additional information can only improve the accuracy. Thus, the accuracy of the combined estimate cannot be worse than the accuracy of each of the estimates used in this combination. Therefore, we get

a * b \leq a

.

Similarly, if we increase the accuracy of each measurement, the accuracy of the resulting measurement will increase too: if

a \leq a^{'}

and

b \leq b^{'}

, then we should have

a * b \leq a^{'} * b^{'}

.

4.5. Non-Degenerate Case

If we start with measurements of finite accuracy, we should never get the exact value, i.e., if

a > 0

and

b > 0

, we should get

a * b > 0

.

4.6. Scale-Invariance

In real life, we deal with the actual quantities, but in computations, we need to describe these quantities by their numerical values. To get a numerical value, we need to select a measuring unit: e.g., to describe distance in numerical terms, we need to select a unit of distance.

This selection is usually arbitrary. For example, for distance, we could consider meters, we could consider centimeters, and we could consider inches or feet. It is reasonable to require that the combination operation remains the same if we keep the same quantities but change the measuring unit. Let us describe this requirement in precise terms.

If we replace the original measuring unit with a new one which is

λ

times smaller, then all the numerical values are multiplied by

λ

. For example, if we replace meters by centimeters, then all the numerical values are multiplied by 100. The corresponding transformation

x \to λ \cdot x

is known as scaling.

Suppose that in the original units, we had accuracies a and b and the combined accuracy was

a * b

. Then, in the new units—since accuracies are described in the same units as the quantity itself—the original accuracies become

λ \cdot a

and

λ \cdot b

, and the combined accuracy is thus

(λ \cdot a) * (λ \cdot b)

. This is the combined accuracy in the new units. It should be the same as when we transform the old-units accuracy

c = a * b

into the new units, getting

λ \cdot (a * b)

:

(λ \cdot a) * (λ \cdot b) = λ \cdot (a * b) .

This invariance under scaling is known as scale-invariance.

4.7. Discussion

Now, we are ready to formulate the main result. To formulate it, we list all the above reasonable properties of a combination operation in the form of the following definition:

Definition 1.

By a combination operation, we mean a function

a * b

that transforms two non-negative numbers a and b into a new non-negative number and for which the following properties hold:

for all a and b, we have $a * b = b * a$ (commutativity);
for all a, b, and c, we have $(a * b) * c = a * (b * c)$ (associativity);
for all a and b, we have $a * b \leq a$ (first monotonicity requirement);
for all a, b, $a^{'}$ , and $b^{'}$ , if $a \leq a^{'}$ and $b \leq b^{'}$ , then $a * b \leq a^{'} * b^{'}$ (second monotonicity requirement);
if $a > 0$ and $b > 0$ , then $a * b > 0$ (non-degeneracy); and
for all a, b, and $λ > 0$ , we have $(λ \cdot a) * (λ \cdot b) = λ \cdot (a * b)$ (scale-invariance).

Comment

This definition is similar to similar definitions presented in [39] for quantum systems and in [30] for the neural networks. However, because of the different application domains, the above definition is somewhat different: e.g., in our case, we have non-degeneracy requirement which is natural for combining uncertainty but not in the above two domains.

Proposition 1.

Every combination operation has either the form

a * b = min (a, b)

or the form

a * b = {(a^{- β} + b^{- β})}^{- 1 / β}

for some

β > 0

.

Proof of this result is, in effect, described in [39] (see also [30]).

Comment

The proof shows that if we do not impose the non-degeneracy condition, the only other alternative is

a * b = 0

. Thus, the non-degeneracy condition can be weakened: instead of requiring that

a * b > 0

for all pairs of positive numbers a and b, it is sufficient to require that

a * b > 0

for at least one such pair.

4.8. Discussion

The form

min (a, b)

is the limit case of the second form when

β \to \infty

.

In the generic case

β < \infty

,

a * b = c

is equivalent to

a^{- β} + b^{- β} = c^{- β} .

(22)

Thus, the probabilistic case corresponds to

β = 2

.

In the situation when we have N measurement results with the same accuracy

Δ_{1} = \dots = Δ_{N} = Δ

, the combined accuracy

\tilde{Δ}

can be determined from the formula

{(\tilde{Δ})}^{- β} = N \cdot Δ^{- 1 / β} .

(23)

Thus, we have

\tilde{Δ} = \frac{Δ}{N^{1 / β}} .

(24)

In the probabilistic case, we indeed have this formula with

β = 2

. The above-mentioned interval-case formula

\tilde{Δ} \sim \frac{1}{N}

(derived in [38]) corresponds to the case

β = 1

; thus,

β = 1

is the value of the parameter

β

corresponding to interval uncertainty.

5. Which Functions $A_{i} (T)$ Should We Choose: General Uncertainty Situation and Case of Fuzzy Uncertainty

5.1. Analysis of the Problem

If we measure

m_{i}

with accuracy

Δ

, then, due to Formula (8) (and similarly to the case of probabilistic uncertainty), the estimate

{\tilde{x}}_{i} (t)

is known with accuracy

Δ_{i} (t) = \frac{Δ}{A_{i} (t) \cdot Δ t} .

(25)

For the case of min combination formula, the combined accuracy is equal to

Δ (t) = min_{i} Δ_{i} (t) = \frac{Δ}{Δ t} \cdot \frac{1}{max_{i} A_{i} (t)} .

(26)

Thus, the requirement that we estimate all the values

x (t)

with the same accuracy means that

max_{i} A_{i} (t) = const .

(27)

For the generic case

β < \infty

, from Formula (22), we conclude that

{(Δ (t))}^{- β} = \frac{{(Δ t)}^{β}}{Δ^{β}} \cdot \sum_{i = 1}^{n} {(A_{i} (t))}^{β} .

(28)

Thus, the requirement that we get the same accuracy for all moments of time t means that we need to have

\sum_{i = 1}^{n} {(A_{i} (t))}^{β} = const .

(29)

5.2. General Conclusion

The requirement that we get the same accuracy for reconstructing the value of the signal at each moment of time of t leads either to condition (27) or to condition (29). In particular, for

β = 1

, we get the fuzzy partition property.

5.3. Which Value $β$ Should We Use in the Case of Fuzzy Uncertainty

In the fuzzy case (see, e.g., [40,41,42,43,44,45,46,47,48]), the usual way of propagating uncertainty—Zadeh extension principle—is equivalent to applying interval computations for each

α

-cut. Thus, for analyzing fuzzy data, it makes sense to use the value of

β

corresponding to interval uncertainty, which as we have mentioned at the end of the previous section, is

β = 1

. For

β = 1

, Formula (29) becomes the fuzzy partition property. Thus, when analyzing fuzzy data, the use of fuzzy partition property is indeed justified.

6. Conclusions and Future Work

6.1. Conclusions

In many applications of fuzzy techniques, including applications of F-transforms, we use fuzzy sets

A_{1} (t), \dots, A_{n} (t)

that form a fuzzy partition, in the sense that for each t, the corresponding degrees

A_{i} (t)

add up to 1 (or to a constant):

\sum_{i} A (t) = 1

. Empirically, in many applications, the fuzzy partition requirement indeed helps, but why does it help? This, until now, remained a mystery.

In this paper, we provide a theoretical justification for this requirement. Specifically, we show that the fuzzy partition requirement naturally follows from the desire to have the signal values at different moments of time to be estimated with the same accuracy.

6.2. Possible Directions of Future Research

While our main objective was to explain the ubiquity of the fuzzy partition requirement in fuzzy logic, our analysis started on a more general note, by considering general uncertainty, of which fuzzy is a particular case. In addition to the case of fuzzy uncertainty, we also explicitly analyzed another important particular type of uncertainty: probabilistic uncertainty.

It is desirable to extend this analysis to other types of uncertainty, e.g.,:

to different imprecise probability situations, and
to situations when different functions $A_{i} (t)$ correspond to different types of uncertainty.

It is also desirable to analyze the situations (like the situation mentioned in Section 1) when empirically, fuzzy sets that do not form a fuzzy partition work better. Maybe in this case, a more general scheme with

β \neq 1

will help?

Author Contributions

The authors contributed equally to this work.

Funding

This work was supported by the Center of Excellence in Econometrics, Faculty of Economics, Chiang Mai University, Thailand, and by the US National Science Foundation via grants 1623190 (A Model of Change for Preparing a New Generation for Professional Practice in Computer Science) and HRD-1242122 (Cyber-ShARE Center of Excellence).

Acknowledgments

The authors are thankful to Irina Perfilieva for encouragement and valuable discussions, and to the anonymous referees for very useful suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Novak, V.; Perfilieva, I.; Holcapek, M.; Kreinovich, V. Filtering out high frequencies in time series using F-transform. Inf. Sci. 2014, 274, 192–209. [Google Scholar] [CrossRef] [Green Version]
Novak, V.; Perfilieva, I.; Kreinovich, V. F-transform in the analysis of periodic signals. In Proceedings of the 15th Czech-Japan Seminar on Data Analysis and Decision Making under Uncertainty CJS’2012, Osaka, Japan, 24–27 September 2012. [Google Scholar]
Perfilieva, I. Fuzzy transforms: Theory and applications. Fuzzy Sets Syst. 2006, 157, 993–1023. [Google Scholar] [CrossRef]
Perfilieva, I. F-transform. In Springer Handbook of Computational Intelligence; Springer: Berlin/Heidelberg, Germany, 2015; pp. 113–130. [Google Scholar]
Perfilieva, I.; Danková, M.; Bede, B. Towards a higher degree F-transform. Fuzzy Sets Syst. 2011, 180, 3–19. [Google Scholar] [CrossRef]
Perfilieva, I.; Kreinovich, V.; Novak, V. F-transform in view of trend extraction. In Proceedings of the 15th Czech-Japan Seminar on Data Analysis and Decision Making under Uncertainty CJS’2012, Osaka, Japan, 24–27 September 2012. [Google Scholar]
Rabinovich, S.G. Measurement Errors and Uncertainties: Theory and Practice; Springer: New York, NY, USA, 2005. [Google Scholar]
Feynman, R.; Leighton, R.; Sands, M. The Feynman Lectures on Physics; Addison Wesley: Boston, MA, USA, 2005. [Google Scholar]
Haidegger, T.; Kovácz, L.; Precup, R.-E.; Benýo, B.; Benyó, Z.; Preitl, S. Simulation and control in telerobots in space medicine. Acta Astronaut. 2012, 81, 390–402. [Google Scholar] [CrossRef]
Herrera-Viedma, E.; López-Herrera, A.G. A review on information accessing systems based on fuzzy linguistic modeling. Int. J. Comput. Intell. Syst. 2010, 3, 420–437. [Google Scholar] [CrossRef]
Di Martino, F.; Pedrycz, W.; Sessa, S. Spatiotemporal extended fuzzy C-means clustering algorithm for hotspots detection and prediction. Fuzzy Sets Syst. 2018, 340, 109–126. [Google Scholar] [CrossRef]
Moallem, P.; Mousavi, B.S.; Naghibzadeh, S.S. Fuzzy inference system optimized by genetic algorithm for robust face and pose detection. Int. J. Artif. Intell. 2015, 13, 73–88. [Google Scholar]
Kosheleva, O.; Kreinovich, V. Why Ragin’s fuzzy techniques lead to successful social science applications: An explanation. J. Innov. Technol. Educ. 2016, 3, 185–192. [Google Scholar] [CrossRef]
Kosheleva, O.; Kreinovich, V. Why product “and?”-operation is often efficient: one more argument. J. Innov. Technol. Educ. 2017, 4, 25–28. [Google Scholar] [CrossRef]
Kosheleva, O.; Kreinovich, V. Why Bellman-Zadeh approach to fuzzy optimization. Appl. Math. Sci. 2018, 12, 517–522. [Google Scholar] [CrossRef] [Green Version]
Kosheleva, O.; Kreinovich, V.; Ngamsantivong, T. Why complex-valued fuzzy? Why complex values in general? A computational explanation. In Proceedings of the Joint World Congress of the International Fuzzy Systems Association and Annual Conference of the North American Fuzzy Information Processing Society IFSA/NAFIPS’ 2013, Edmonton, AB, Canada, 24–28 June 2013; pp. 1233–1236. [Google Scholar]
Kreinovich, V. Why intervals? Why fuzzy numbers? Towards a new justification. In Proceedings of the 2007 IEEE Symposium on Foundations of Computational Intelligence, Honolulu, HI, USA, 1–5 April 2007. [Google Scholar]
Kreinovich, V.; Mouzouris, G.; Nguyen, H.T. Fuzzy rule based modeling as a universal approximation tool. In Fuzzy Systems: Modeling and Control; Nguyen, N.T., Sugeno, M., Eds.; Kluwer: Boston, MA, USA, 1998; pp. 135–195. [Google Scholar]
Kreinovich, V.; Perfilieva, I.; Novak, V. Why inverse F-transform? A compression-based explanation. In Proceedings of the 2013 International Conference on Fuzzy Systems FUZZ-IEEE’2013, Hyderabad, India, 7–10 July 2013; pp. 1378–1384. [Google Scholar]
Kreinovich, V.; Stylios, C. Why fuzzy cognitive maps are efficient. Int. J. Comput. Commun. Control 2015, 10, 825–833. [Google Scholar] [CrossRef]
Kreinovich, V.; Stylios, C. when should we switch from interval-valued fuzzy to full type-2 fuzzy (e.g., Gaussian)? Crit. Rev. 2015, XI, 57–66. [Google Scholar]
Nguyen, H.T.; Koshelev, M.; Kosheleva, O.; Kreinovich, V.; Mesiar, R. Computational complexity and feasibility of fuzzy data processing: why fuzzy numbers, which fuzzy numbers, which operations with fuzzy numbers. In Proceedings of the International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU’98), Paris, France, 6–10 July 1998; pp. 273–280. [Google Scholar]
Nguyen, H.T.; Kreinovich, V. Applications of Continuous Mathematics to Computer Science; Kluwer: Dordrecht, The Netherlands, 1997. [Google Scholar]
Nguyen, H.T.; Kreinovich, V. Methodology of fuzzy control: An introduction. In Fuzzy Systems: Modeling and Control; Nguyen, H.T., Sugeno, M., Eds.; Kluwer: Boston, MA, USA, 1998; pp. 19–62. [Google Scholar]
Nguyen, H.T.; Kreinovich, V.; Lorkowski, J.; Abu, S. Why Sugeno lambda-measures. In Proceedings of the IEEE International Conference on Fuzzy Systems FUZZ-IEEE’2015, Istanbul, Turkey, 1–5 August 2015. [Google Scholar]
Ouncharoen, R.; Kreinovich, V.; Nguyen, H.T. Why lattice-valued fuzzy values? A mathematical justification. J. Intell. Fuzzy Syst. 2015, 29, 1421–1425. [Google Scholar] [CrossRef]
Perfilieva, I.; Kreinovich, V. Why fuzzy transform is efficient in large-scale prediction problems: A theoretical explanation. Adv. Fuzzy Syst. 2011, 2011. [Google Scholar] [CrossRef]
Baral, C.; Fuentes, O.; Kreinovich, V. Why deep neural networks: A possible theoretical explanatio. In Constraint Programming and Decision Making: Theory and Applications; Ceberio, M., Kreinovich, V., Eds.; Springer: Berlin/Heidelberg, Germany, 2018; pp. 1–6. [Google Scholar]
Farhan, A.; Kosheleva, O.; Kreinovich, V. Why max and average poolings are optimal in convolutional neural networks. In Proceedings of the 7th International Symposium on Integrated Uncertainty in Knowledge Modelling and Decision Making IUKM’ 2019, Nara, Japan, 27–29 March 2019. [Google Scholar]
Gholamy, A.; Parra, J.; Kreinovich, V.; Fuentes, O.; Anthony, E. How to best apply deep neural networks in geosciences: Towards optimal ‘averaging’ in dropout training. In Smart Unconventional Modelling, Simulation and Optimization for Geosciences and Petroleum Engineering; Watada, J., Tan, S.C., Vasant, P., Padmanabhan, E., Jain, L.C., Eds.; Springer: Amsterdam, The Netherlands, 2019; pp. 15–26. [Google Scholar]
Kosheleva, O.; Kreinovich, V. Why deep learning methods use KL divergence instead of least squares: A possible pedagogical explanation. Math. Struct. Simul. 2018, 46, 102–106. [Google Scholar]
Kreinovich, V. From traditional neural networks to deep learning: Towards mathematical foundations of empirical successes. In Proceedings of the World Conference on Soft Computing, Baku, Azerbaijan, 29–31 May 2018. [Google Scholar]
Nava, J.; Kreinovich, V. Why a model produced by training a neural network is often more computationally efficient than a nonlinear regression model: A theoretical explanation. J. Uncertain Syst. 2014, 8, 193–204. [Google Scholar]
Sheskin, D.J. Handbook of Parametric and Nonparametric Statistical Procedures; Chapman and Hall/CRC: Boca Raton, FL, USA, 2011. [Google Scholar]
Jaulin, L.; Kiefer, M.; Didrit, O.; Walter, E. Applied Interval Analysis, with Examples in Parameter and State Estimation, Robust Control, and Robotics; Springer: London, UK, 2001. [Google Scholar]
Mayer, G. Interval Analysis and Automatic Result Verification; De Gruyter: Berlin, Germany, 2017. [Google Scholar]
Moore, R.E.; Kearfott, R.B.; Cloud, M.J. Introduction to Interval Analysis; SIAM: Philadelphia, PA, USA, 2009. [Google Scholar]
Walster, G.W.; Kreinovich, V. For unknown–but–bounded errors, interval estimates are often better than averaging. ACM SIGNUM Newsl. 1996, 31, 6–19. [Google Scholar] [CrossRef]
Autchariyapanitkul, A.; Kosheleva, O.; Kreinovich, V.; Sriboonchitta, S. Quantum econometrics: How to explain its quantitative successes and how the resulting formulas are related to scale invariance, entropy, and fuzziness. In Proceedings of the International Symposium on Integrated Uncertainty in Knowledge Modelling and Decision Making IUKM’ 2018, Hanoi, Vietnam, 13–15 March 2018; Huynh, V.-N., Inuiguchi, M., Tran, D.-H., Denoeux, T., Eds.; Springer: Cham, Switzerland, 2018. [Google Scholar]
Belohlavek, R.; Dauben, J.W.; Klir, G.J. Fuzzy Logic and Mathematics: A Historical Perspective; Oxford University Press: New York, NY, USA, 2017. [Google Scholar]
Klir, G.; Yuan, B. Fuzzy Sets and Fuzzy Logic; Prentice Hall: Upper Saddle River, NJ, USA, 1995. [Google Scholar]
Mendel, J.M. Uncertain Rule-Based Fuzzy Systems: Introduction and New Directions; Springer: Cham, Switzerland, 2017. [Google Scholar]
Nguyen, H.T.; Kreinovich, V. Nested intervals and sets: Concepts, relations to fuzzy sets, and applications. In Applications of Interval Computations; Kearfott, R.B., Kreinovich, V., Eds.; Kluwer: Dordrecht, The Netherlands, 1996; pp. 245–290. [Google Scholar]
Nguyen, H.T.; Walker, C.; Walker, E.A. A First Course in Fuzzy Logic; Chapman and Hall/CRC: Boca Raton, FL, USA, 2019. [Google Scholar]
Novák, V.; Perfilieva, I.; Močkoř, J. Mathematical Principles of Fuzzy Logic; Kluwer: Boston, MA, USA; Dordrecht, The Netherlands, 1999. [Google Scholar]
Zadeh, L.A. The concept of a linguistic variable and its application to approximate reasoning—I. Inf. Sci. 1975, 8, 199–249. [Google Scholar] [CrossRef]
Zadeh, L.A. The concept of a linguistic variable and its application to approximate reasoning—II. Inf. Sci. 1975, 8, 301–357. [Google Scholar] [CrossRef]
Zadeh, L.A. The concept of a linguistic variable and its application to approximate reasoning—III. Inf. Sci. 1975, 9, 43–80. [Google Scholar] [CrossRef]

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kreinovich, V.; Kosheleva, O.; Sriboonchitta, S. Why Use a Fuzzy Partition in F-Transform? Axioms 2019, 8, 94. https://0-doi-org.brum.beds.ac.uk/10.3390/axioms8030094

AMA Style

Kreinovich V, Kosheleva O, Sriboonchitta S. Why Use a Fuzzy Partition in F-Transform? Axioms. 2019; 8(3):94. https://0-doi-org.brum.beds.ac.uk/10.3390/axioms8030094

Chicago/Turabian Style

Kreinovich, Vladik, Olga Kosheleva, and Songsak Sriboonchitta. 2019. "Why Use a Fuzzy Partition in F-Transform?" Axioms 8, no. 3: 94. https://0-doi-org.brum.beds.ac.uk/10.3390/axioms8030094

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Why Use a Fuzzy Partition in F-Transform?

Abstract

1. Formulation of the Problem

1.1. F-Transform: A Brief Reminder

1.2. The General Idea behind F-Transform Is Very Reasonable

1.3. However, Why a Fuzzy Partition?

1.3.1. Mathematical Comment

1.3.2. Application-Related Comment

1.4. It Is Desirable to Explain the Efficiency of a Fuzzy Partition Requirement

1.5. What We Do in this Paper

1.6. The Structure of this Paper

Comment

2. Main Idea

2.1. What if We Can Make Exact Measurements of Instantaneous Values?

2.2. Main Idea

Comment

3. Case of Probabilistic Uncertainty

3.1. Description of the Case

3.2. How Accurately Can We Estimate X ( T ) Based on Each Measurement

3.3. How Accurately Can We Estimate X ( T ) Based on All The Measurements

3.4. Discussion

4. How Uncertainties Can Be Combined in Different Approaches

4.1. Towards a General Formulation of the Problem

4.2. Commutativity

4.3. Associativity

4.4. Monotonicity

4.5. Non-Degenerate Case

4.6. Scale-Invariance

4.7. Discussion

Comment

Comment

4.8. Discussion

5. Which Functions A i ( T ) Should We Choose: General Uncertainty Situation and Case of Fuzzy Uncertainty

5.1. Analysis of the Problem

5.2. General Conclusion

5.3. Which Value β Should We Use in the Case of Fuzzy Uncertainty

6. Conclusions and Future Work

6.1. Conclusions

6.2. Possible Directions of Future Research

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.2. How Accurately Can We Estimate $X (T)$ Based on Each Measurement

3.3. How Accurately Can We Estimate $X (T)$ Based on All The Measurements

5. Which Functions $A_{i} (T)$ Should We Choose: General Uncertainty Situation and Case of Fuzzy Uncertainty

5.3. Which Value $β$ Should We Use in the Case of Fuzzy Uncertainty