Next Article in Journal
Electric-Vehicle Routing Planning Based on the Law of Electric Energy Consumption
Next Article in Special Issue
Public Debt, Governance, and Growth in Developing Countries: An Application of Quantile via Moments
Previous Article in Journal
Stability Analysis for Time-Delay Systems via a New Negativity Condition on Quadratic Functions
Previous Article in Special Issue
Impact of COVID-19-Related Lockdown Measures on Economic and Social Outcomes in Lithuania
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On the Robustness and Sensitivity of Several Nonparametric Estimators via the Influence Curve Measure: A Brief Study

Department of Mathematics and Statistics, University of North Carolina, Wilmington, NC 28403, USA
*
Author to whom correspondence should be addressed.
Submission received: 29 June 2022 / Revised: 18 August 2022 / Accepted: 23 August 2022 / Published: 29 August 2022
(This article belongs to the Special Issue Nonparametric Statistical Methods and Their Applications)

Abstract

:
The use of influence curve as a measure of sensitivity is not new in the literature but has not been properly explored to the best of our knowledge. In particular, the mathematical derivation of the influence function for several popular nonparametric estimators (such as trimmed mean, α -winsorized mean, Pearson product moment correlation coefficient etc. among notable ones) is not given in adequate detail. Moreover, the summary of the final expressions given in some sporadic cases does not appear to be correct. In this article, we aim to examine and summarize the derivation of the influence curve for various well-known estimators for estimating the location of a population, many of which are considered in the nonparametric paradigm.

1. Introduction

In the context of assessing robustness (or equivalently the “sensitivity”) of an estimator there are several methods that are well documented in the literature. A non-exhaustive list of such references can be cited as follows. Refs. [1,2] have developed and studied a wide collection of robust methods of estimation to reduce the influence of outliers in the data on the estimates. An outlying observation or ’outlier’ is one that appears to deviate markedly from other members of the sample in which it occurs, see, for example [3], It may arise because of generating from different mechanism or assumption, according to [4,5]. Once an observation(s) is deemed to be an outlier, the estimation procedure may fail to produce an efficient as well as a robust estimator. A natural remedy against this melody could be removing the contaminated observation(s) from the sample or be replaced by the correct observation(s). Another strategy would be to consider estimators that are not sensitive to outliers, such as the median, and or estimators based on sample quantiles. However, one major limitation of the use of such measures is that they discard a significant of observations in this process that may not be desirable always. Consequently, in statistical data analysis, the rejection of outliers from the data may have serious consequences on further analysis of the sample being reduced. If the outliers get rejected from the data, then the data is no more complete but censored. In practice, replacing the rejected outliers by statistical equivalents, i.e., by simulated random observations from the assumed underlying distribution may also have similar consequences. Outliers are samples that markedly deviate from the prevailing tendency. For example, in the case of parametric motion estimation, the presence of outliers, due to noisy measurements or poorly defined support regions, will lead to inaccurate model estimates. In the case of global motion estimation, foreground moving objects also correspond to outliers. In order to reduce the impact of outliers, robust estimation has been proposed, see [1] and the references cited therein. One indicator of the performance of a robust estimator is its breakdown point, roughly defined as the highest percentage of outliers that the robust estimator can tolerate. Three classes of robust estimators may be mentioned in this regard:
  • M-estimators: M-estimators are a generalization of maximum likelihood estimators. They involve the minimization of a function of the form:
    j Δ ( r j ) , where r j is the residual error between a sample data and its fitted value, and Δ is a symmetric positive-definite function with a unique minimum at Δ ( x = 0 ) . For a robust estimator, the function Δ infuses at large values of x .
  • L-estimators: L-estimators are a linear combination of order statistics. Two examples are the median and the α -trimmed-mean.
  • R-estimators: These estimators are based on rank tests.
In this scenario of assessing the robustness (and/or equivalently the sensitivity in some sense) of an estimator several different statistical procedures which do not directly examine the outliers but seek to accommodate them in a manner such that their influence on the estimation procedure becomes less serious have been developed in the literature. The robust methods which are usually used in this situation to characterize the underlying distribution are defined as “Winsorization” and “Trimming”. The main purpose of this paper is to discuss one of the robustness properties which we will evaluate by the Influence Curve (in short, IC, henceforth) several location estimators such as the regular mean, trimmed mean, winsorized mean, and the directional mean that are used in nonparametric paradigm as well. The influence function of an estimator measures the amount of change in an estimator that can be influenced by the change of an individual observation. In their seminal work, ref. [1] introduced the idea of influence function or influence curve (IC). Ref. [6] have extended the use of influence curves that are of a more general type. The authors stated that the condition of Fisher-consistency is necessary in Hampel’s theory. In fact, we will provide the definition of the influence curve that was due to [6] later on and will cite several results from their works that are quite useful in this context.
It is well known that robust estimation provides an alternative approach to classical methods which is not unduly affected by the presence of outliers. Recently, these robust estimators have been considered for models with functional data. In this paper, we try to summarize a compendium of influence curves for various nonparametric as well as parametric estimators to evaluate the location of a population. In this work, we will do a survey on the results that are already in the literature, but in some cases, without necessary adequate details. In addition, we will particularly focus on examining the point (or the set of points) that will have the largest influence on the estimator. Along with this, we will also examine (as appropriate) the discontinuity of the influence curve (such as the case with the Winsorized mean). In the next, we provide a non-exhaustive list of references on the notion of the robustness of estimators and associated IC measures in assessing sensitivity in the presence of an outlier(s). The first instance of a statistician referring to the robustness of estimators was by [7]. Shortly after, Ref. [8] began to realize something interesting about the standard statistical procedures of the time; he realized that they were optimized for a normal Gaussian distribution, but volatile for some contaminated distributions. This realization became part of the foundation for the developments in the influence curve by Huber and Hampel in the mid to late 1960, see [9] for more details.
An influence function, or influence curve, is in many ways considered “the first derivative of an estimator" of a certain distribution. This function can be useful for detecting various attributes of estimators. In this paper, we will be focusing on the advantageous features of influence functions in the sense of assessing the impact of outlier(s) on these estimators. A Robust estimator is unaffected or faces minimal effects from deviations in the distribution, especially from an assumed or idealized distribution, see [9]. These influence functions provide a useful tool related to nonparametric statistical inference. For a population characterized by the distribution function F and indexed by the parameter θ (i.e., F ( θ ) ), a common estimator of F will be given by
F ^ n ( θ ) = 1 n I ( X i X ) ,
the empirical cdf, and I ( u ) is an indicator function, and θ Θ , the parameter space. Nonparametric statistics estimates statistical functionals expressed as
θ = T ( F ) .
Using this functional, one can derive an influence curve that allows us to better understand how deviations in data and distribution affect various estimators. The assessment of the aspect of robustness (alias sensitivity) related to several nonparametric estimators for estimating the scale and shape parameters on the basis of the influence curve will be the subject matter of a separate article. There are other pertinent references in the context of computing the influence curve under the Bayesian paradigm. For example, Ref. [10] developed a general framework of Bayesian influence analysis for various perturbation schemes to the data, the prior and the sampling distribution for a class of statistical models. The remainder of the paper is given as follows. In Section 2, we provide the basic preliminaries related to the influence curve following [6] and will have several useful results corresponding to the influence curve for one-sample estimators (several of them are nonparametric estimators) as well as for two-sample estimators. In Section 3, we provide the mathematical details on the influence curve for several estimators starting with the sample mean, and several other types of the mean estimators, such as the trimmed mean, directional mean, etc. In Section 3, we discuss the influence curve for Spearman’s correlation coefficient and highlight the error in the computation of IC by [11]. Section 4 deals with the computation of IC for certain special types of distribution and convolution of two distribution functions. Finally, some concluding remarks are presented in Section 5.

2. Basic Preliminaries on Influence Curve

In this section, we provide the definition and some useful preliminaries related to the influence curve. We start with the definition in the one-sample situation. This section heavily draws on [6].
  • Influence curve for the one-sample scenario
    Let us consider the triplet Ω , F , P , where Ω , the sample space is a subset of the real line R , and F is the associated Borel σ - field of events, and P is the probability measure defined on this sample space. Furthermore, assume that the parameter space denoted by Θ is a convex subset of R . The fixed model consists of a family of probability measures characterized by F θ , which is identifiable with the associated cumulative distribution function (cdf). We make a note that one might consider θ as either a location or a scale parameter, but in the context of our current discussion, unless otherwise stated, θ will be considered as the location parameter. Next, we define a sequence of statistics T n X 1 , X 2 , , X n . Suppose there exists a functional T : η ( Ω ) R , where η ( Ω ) is the space of all signed measures with mass 1 on Ω , such that T n X 1 , X 2 , , X n converges in probability to T ( H ) as n and the observations are independently and identically distributed (i.i.d.) according to the true underlying distribution of the population, H . When T is Fisher-consistent, i.e., T ( F θ ) = θ , θ Θ , the associated influence curve has been defined in [1]. Next, we provide some more pertinent details in this regard. The influence curve is essentially the derivative of a statistical functional T ( F ) in respect to the function F itself. To take a derivative in this way, we need to invoke the definition of the generalized directional G a ^ teaux derivative. Given a function T ( F ) where F is also a function, the derivative of T in respect F in the direction of G is expressed as
    L T ; G = lim ϵ 0 T { ( 1 ϵ ) F + ϵ G } T ( F ) ϵ .
    Statistically speaking, using this definition of a derivative helps us to identify and measure “the rate of change in a statistical functional” while considering some amount of contamination, ϵ from another distribution which has the total probability concentrated on a point mass.
    In this case, we consider the contamination of F by Δ x that can be written as
    F ϵ , x = ( 1 ϵ ) F + ϵ Δ x ,
    where 0 < ϵ < 1 , and Δ x represents the distribution function with the entire probability concentrated at the mass point x. Equivalently, we can write
    Δ x ( y ) = 0 , y < x 1 , y x .
    Consequently, we can express the influence curve of a function T ( F ) as
    I C T , F ; x = lim ϵ 0 T ( F ϵ , x ) T ( F ) ϵ = d d ϵ T ( F ϵ , x ) | ϵ = 0 .
    However, according to [6] in the testing of statistical hypotheses, several test statistics converge to non-Fisher-consistent functionals. As a remedial measure, Ref. [6] has developed the following (as an alternative) definition of the IC. The influence curve of the functional T at F θ ˜ is defined by the following
    I C T ( F ) ; F θ ˜ ; x = lim ϵ 0 U F θ ˜ , ϵ , x U ( F θ ˜ ) ϵ ,
    for all x Ω where the limit (possibly + , or ) exists. In this case, U ( H ) as ξ 1 [ T ( H ) ] ; this functional provides the parameter value that the true underlying distribution H would have if it would belong to the model. This U is clearly Fisher-consistent, since U F θ ˜ = ξ 1 T ( F θ ˜ ) = θ . The Hampel’s influence curve is computed based on U . Observe that in this original paper, Ref. [1] had defined the IC as a right hand limit. However, Refs. [2,12,13] have replaced it with a two-sided limit for mathematical convenience. The influence curve thus describes the influence of outliers in the sample on the value of the statistic; a bounded IC consequently indicates a finite sensitivity to outliers.
  • Influence surface for the two-sample case
    The definition of a two-sample IC has been independently discussed and derived by [6]. However, we provide here some useful preliminaries as an extension from the IC for one sample given earlier mimicking the ideas as described in [6]. Let us assume that Ω 1 = Ω 2 are subsets of R with the associated σ - algebra. For each n , we partition the entire sample in such a way, such that we select m 1 ( n ) and m 2 ( n ) with the constraint that m 1 ( n ) + m 2 ( n ) = n . Specifically, m 1 ( n ) points are taken/selected from Ω 1 and m 2 ( n ) points are taken/selected from Ω 2 , respectively. Here, m 1 ( n ) and m 2 ( n ) goes to infinity as n . In the context of estimation of the location of a population, the “robust model” (i.e., a model without any contamination effect) establishes the fact that the sample of Ω 1 is distributed according to the distribution function H and the sample of Ω 2 is distributed according to the distribution function F , and the relation between them is of the following form H ( x ) = F x θ must be true for all x , with θ Θ , the parameter space. We want to either estimate θ or conduct a test of hypothesis for θ , say to test the null hypothesis H 0 : θ = θ 0 . Regardless of the nature of the inferential problem, one may consider the following test statistic: T n x 1 , x 2 , , x m 1 ; y 1 , y 2 , , y m 2 . Next, let us suppose that the functional as given by T : ξ 1 ( Ω 1 ) ξ 2 ( Ω 2 ) R [here, ∏ represents the Cartesian product] is such that T n x 1 , x 2 , , x m 1 ; y 1 , y 2 , , y m 2 tends to T W 1 , W 2 whenever n . Note that W represents the joint cdf, and the observations are i.i.d. according to W 1 , and W 2 . Assuming that T n is invariant with respect to an identical shift of both samples. We want to evaluate the IC in a pair H , F with H ( x ) = F x θ ˜ . It may be noted that the straight line going through H and F and is completely determined. On this straight line, the expected value of T n when the samples originate from the two distributions H and F differing by the location parameter θ and it depends on θ alone. Let us denote this expectation by η n ( θ ) , and we further assume that η n ( θ ) goes to η (the expected value of T) as n . In addition we assume that the inverse of η exists which is denoted by η 1 . Next, if we define
    W W 1 , W 2 : = η 1 T W 1 , W 2 ,
    valid for all W 1 , W 2 .
    We must note that outliers can happen in the first sample, second sample or in both. Therefore, the influence curve (to be more precise, the influence surface) needs to be defined appropriately. In this case, for a definition of the influence surface, the authors are referred to definition 2.2 of [6].
In the next section, we discuss the computation of the influence curve for several location parameters.

3. Influence Curve of Several Location Estimators

Throughout this paper, we will be using (1) to derive and assess influence curves for the estimators in an attempt to better understand the extent to which individual observations have on the value of the estimator itself. We begin our discussion with the sample mean.
  • Influence curve for the sample mean
    The mean of the data averages all observations. Therefore, it takes into account every observation and is highly affected by outliers. Because of this, the mean is not considered a robust statistic as it will be swayed by any contamination of the data. Using the idea of the influence curve, we can better understand this non-robustness characterization of the mean estimator. Let μ ( F ) denote the mean of an absolutely continuous distribution F. We note that similar derivations can be made when F is discrete. Given that
    T ( F ) = μ ( F ) = x d F ( x ) ,
    By plugging in Equation (1) to express the mean of our contaminated model will be
    T ( F ϵ ) = μ ( F ϵ , x ) = x d F ϵ , x ( x ) = x d [ ( 1 ϵ ) F + ϵ Δ x ] = 1 ϵ μ ( F ) + ϵ x .
    Next, we try to obtain the influence curve for the mean estimator in the next proposition.
    Proposition 1.
    The influence function for the mean estimator is denoted by I F ( μ , F ; z ) and is given by x μ ( F ) .
    Proof. 
    I C T ( F ) , F ; x = lim ϵ 0 T ( F ϵ , x ) T ( F ) ϵ = lim ϵ 0 ( 1 ϵ ) μ ( F ) + ϵ x μ ( F ) ϵ = lim ϵ 0 μ ( F ) ϵ ϵ μ ( F ) ϵ + ϵ x ϵ μ ( F ) ϵ = lim ϵ 0 ϵ μ ( F ) ϵ + ϵ x ϵ = x μ ( F ) .
    As we can see, each observation x has an influence on the overall estimation. This demonstrates that if x is contaminated in any way or x is an outlier, the mean will be heavily skewed as this bad observation holds a substantial amount of influence over the estimation. Therefore, because the mean cannot hold up for a contaminated model, it is not considered to be a robust statistic. In the next section, we consider the influence function for the median, a robust statistic.    □
  • Influence curve for Median
    We begin the discussion by evaluating the influence curve for the following functional
    T ( F ) = F 1 ( p ) , 0 < p < 1 .
    Obviously, the median is obtained by substituting p = 1 2 . A general expression for the Influence curve, denoted by I C ( x ) will be given by
    I C T ( F ) , F ; x = p 1 x < F 1 ( p ) f ( F 1 ( p ) ) , x F 1 ( p ) .
    and assuming that F is differentiable at F 1 ( p ) .
    In fact, it has been shown that when calculating the median, up to half of the data could be corrupted before seeing any effects on the estimation, see [14]. We can observe the median’s robustness from its influence curve. First, let us consider the statistical functional for this estimator given by
    T ( F ) = 1 2 ( M 1 + M 2 )
    where
    M 1 = sup { x | F ( x ) 1 2 } M 2 = inf { x | F ( x ) > 1 2 } .
    Along with this, the functional for the median can be thought of as
    T ( F ) = F 1 1 2 .
    By using this functional along with the contaminated model we can ascertain the influence function of the median given by
    I C T ( F ) , F ; x = s i g n ( X T ( F ) ) 2 f ( T ( F ) ) .
    We can see that the median is far less impacted by a single observation. This indicates the robustness of the median as an estimator. In the next section, we will delve into the robustness and influence function of a more efficient robust mean estimator.
    Note: The sign test statistic has the same expression of IC like the median estimator. Furthermore, for the one-sample Wilcoxon test statistic, the associated IC is equal to that of the Hodges–Lehman estimator.
  • Influence function for trimmed Mean
    The trimmed mean are the “smooth” L-estimators with the following functional
    T ( F ) = 1 2 α 1 α 1 α F 1 ( p ) d p .
    Next, on using an alternative definition of influence curve, we may write,
    λ T F + λ ϵ x F = 1 1 2 α α 1 α p 1 x < F 1 ( p ) f ( F 1 ( p ) ) d p = 1 1 2 α α 1 α p 1 F ( x ) < ( p ) f ( F 1 ( p ) ) d p .
    For F ( x ) < α , and on using the formula for integration by parts u d v = u v v s . d u , we can re-write (4)
    1 1 2 α α 1 α p 1 F ( x ) < ( p ) f ( F 1 ( p ) ) d p = 1 1 2 α α 1 α p 1 d F 1 ( p ) = 1 1 2 α F 1 ( α ) + α 1 α F 1 ( p ) d p α F 1 ( α ) + F 1 1 α .
    For α F ( x ) 1 α , we get
    1 1 2 α α F ( x ) p f ( F 1 ( p ) ) d p + F ( x ) 1 α p 1 x < F 1 ( p ) f ( F 1 ( p ) ) d p = 1 1 2 α x 1 α [ F 1 ( α ) + F 1 1 α ] α 1 α F 1 ( p ) d p .
    Again, for F ( x ) > 1 α , we get
    1 1 2 α F 1 1 α α [ F 1 ( α ) + F 1 1 α ] + α 1 α F 1 ( p ) d p .
    Observe that T ( F n ) = F n 1 ( α ) F n 1 1 α x d F n ( x ) is the average of the n 1 2 α middle observations.
  • Influence curve for the Winsorized Mean
    The winsorized mean W = W ( F n ) is obtained by replacing the α n smallest observations by x ( n α + 1 ) and the α n largest observations by x n ( 1 α ) , and taking the mean of the modified sample (if n α is not an integer, then consider the nearest integer). Consequently, the Winsorized mean will be given by
    W ( F ) = 1 2 α T ( F ) + α F 1 ( α ) + α F 1 1 α .
    Next, we try to derive the influence curve for this estimator and we will also show that it is discontinuous at F 1 ( α ) and F 1 1 α .
    Here, we have
    I C T ( F ) , F ; x = α 2 f ( F 1 ( α ) ) + α 1 α f F 1 ( 1 α ) + 1 1 2 α α 1 α s d s f F 1 ( s ) α f F 1 ( α ) I F ( x ) α α f F 1 1 α 1 1 2 α α F ( x ) F ( x ) 1 α d s f F 1 ( s ) d F 1 ( s ) .
    This is linear between F 1 ( α ) and F 1 ( 1 α ) with jumps of magnitude α f ( F 1 ( α ) ) at F 1 ( α ) and of magnitude α f F 1 ( 1 α ) at F 1 1 α .
    Therefore, we can write
    I C T ( F ) , F ; x = 1 1 2 α F 1 ( α ) μ ( F ) x < F 1 ( α ) 1 1 2 α x μ ( F ) .
  • Influence curve for α -Winsorized Mean
    An α -winsorized mean is calculated by first determining a proportion, α , from both sides of the data (for details, see [15] and the references cited therein). This proportion of data points is then replaced with the next closest observation. The mean is then taken over this altered data set. Whereas the regular mean estimator loses its robustness when an alteration to the data occurs, the α -winsorized mean estimator is far more robust and less likely to be impacted by small deviations from the original data. To better observe this, we can attempt to derive the influence function for the α -winsorized mean estimator. Let us start by considering our functional
    T w ( F ) = α x α + x α x 1 α x d F + α x 1 α .
    Next, the contaminated model is
    T w ( F ϵ , x ) = α x α + x α x 1 α x d [ ( 1 ϵ ) F x + ϵ Δ x ] + α x 1 α = α x α + ( 1 ϵ ) x α x 1 α x d F x + ϵ ( x 1 α x α ) + α x 1 α .
    Here, it is important to note that F ( x α ) = α as it is the α t h quantile. Using, the two above equations, we can now try to derive the influence function for the α -winsorized mean by considering a small contamination at ϵ . This will be given by
    I C T ( F ) , F ; x = lim ϵ 0 T ( F ϵ , x ) T ( F ) ϵ = lim ϵ 0 ϵ 1 [ { α x α + ( 1 ϵ ) x α x 1 α x d F ( x ) + ϵ ( x 1 α x α ) + α x 1 α } { α x α + x α x 1 α x d F ( x ) + α x 1 α } ] = x 1 α x α lim ϵ 0 x α x 1 α x d F ( x ) ( ϵ ϵ ) = x 1 α x α x α x 1 α x d F ( x ) = x 1 α x α x d F ( x ) x α x d F ( x ) x 1 α x d F ( x ) = x 1 α x 1 α x d F ( x ) x α + x α x d F ( x ) μ .
    It appears that further numerical evaluation is required to obtain and simplify the influence function for the α -winsorized mean.
  • Influence curve for the directional mean
    This problem has actually appeared in [16]. Let X 1 , X 2 , , X n be a set of independent and identically distributed observations from the unit circle, so that x i T x i = 1 , with the probability density function f x , ¯ where ¯ T ¯ = 1 . We define the directional mean of f as T ( F ) = D ( F ) | | D ( F ) | | , where D ( F ) = [ y d F ( y ) , y d G ( y ) ] T , and F and G are the marginal distributions of each coordinate of x .
    Then, it can be shown that
    • the influence curve of T at F with C ( F ) 0 is
      I C T ( F ) , F ; x = x | | C ( F ) | | 2 C ( F ) x T C ( F ) | | C ( F ) | | 3 .
    • The functional T is robust.
    • | | I C T ( F ) , F ; x | | = ( 1 x T T ( F ) 2 ) 1 / 2 | | C ( F ) | | .
      We begin our discussion with the following
      C ( 1 λ F + λ ϵ x ) = C ( F ) + λ x C ( F ) . Consequently, we can write
      I C T ( F ) , F ; x = x C ( F ) | | C ( F ) | | C ( F ) x T C ( F ) | | C ( F ) | | 2 | | C ( F ) | | 3 = x | | C ( F ) | | 2 C ( F ) x T C ( F ) | | C ( F ) | | 3 .
      Observe that since
      sup x T x = 1 | | I C T ( F ) , F ; x | | = sup sin δ δ 0 | | C ( F ) | | ,
      where δ is the direction of x and δ 0 is the direction of T ( F ) . This is the largest | | C ( F ) | | 1 , the gross error sensitivity (GES, in short) when x is orthogonal to T ( F ) . The GES is bounded since C ( F ) 0 , and therefore, T is robust. The most influential direction would be for an observation located at 90 from the mean direction.
      Next, consider the following
      | | I C T ( F ) , F ; x | | 2 = x T x | | C ( F ) | | 4 2 x T C ( F ) 2 | | C ( F ) | | 2 + C ( F ) T C ( F ) x T x 2 | | C ( F ) | | 6 = 1 x T T ( F ) 2 | | C ( F ) | | 2 ,
      after some algebraic simplification. Our result immediately follows by taking the positive square root of both sides of (8).
In the next section, we consider the influence function for a well-known dependence function, popularly known as the product moment correlation coefficient.

4. Influence Function for the Spearman’s Correlation Coefficient ( ρ )

The bivariate correlation coefficient (Pearson’s product moment correlation coefficient) is a robust estimator that is incredibly useful for multivariate analysis, see [11]. We examine the robustness by determining the influence curve for the Pearson product moment correlation coefficient, denoted by ρ . Assuming that the population is absolutely continuous, the associated functional is given by
ρ = T ( F ) = C o v ( X 1 , X 2 ) v a r ( X 1 ) v a r ( X 2 ) = x 1 x 2 d F ( x 1 , x 2 ) [ x 1 d F ( x 1 , ) ] [ x 2 d F ( , x 2 ) ] [ x 1 2 d F ( x 1 , ) ( x 1 d F ( x 1 , ) ) 2 ] [ x 2 2 d F ( , x 2 ) ( x 2 d F ( , x 2 ) ) 2 ] ,
where
  • E ( X 1 ) = x 1 d F ( x 1 ) = x 1 d F ( x 1 , ) .
  • E ( X 2 ) = x 2 d F ( x 2 ) = x 2 d F ( , x 2 ) .
  • E ( X 1 2 ) = x 1 2 d F ( x 1 ) = x 1 2 d F ( x 1 , ) .
  • E ( X 2 2 ) = x 2 2 d F ( x 2 ) = x 2 2 d F ( , x 2 ) .
Next, using our familiar contaminated model in (1), we can express it for a bivariate situation. That is
F ϵ , x = ( 1 ϵ ) F + ϵ Δ x ,
where Δ x represents the point mass 1 at a pair of bivariate observations such that x = { ( x 11 , x 12 ) , ( x 21 , x 22 ) } . This is necessary as we need to consider contamination by a mixture of two distributions. In this case, a bivariate observation is randomly sampled from the distribution F = F x , y with a probability of ( 1 ϵ ) . In other words, our observed value is x with a probability of ϵ .
Next, in terms of this contaminated model, one may get
T F ϵ , x , x = x 1 x 2 d F ϵ , x ( x 1 , x 2 ) [ x 1 d F ϵ , x ( X 1 , ) ] [ x 2 d F ( , x 2 ) ] [ x 1 2 d F ϵ , x ( x 1 , ) ( x 1 d F ϵ , x ( x 1 , ) ) 2 ] [ x 2 2 d F ϵ , x ( , x 2 ) ( x 2 d F ϵ , x ( , x 2 ) ) 2 ] = A 1 A 2 ,
where
A 1 = x 1 x 2 d [ ( 1 ϵ ) F ( x 1 , x 2 ) + ϵ Δ ( x 1 , x 2 ) ] x 1 d [ ( 1 ϵ ) F ( x 1 , ) + ϵ Δ x 1 ] x 2 d F [ ( 1 ϵ ) F ( , x 2 ) + ϵ Δ x 2 ] = ( 1 ϵ ) x 1 x 2 d F ( x 1 , x 2 ) + ϵ x 1 x 2 ( 1 ϵ ) x 1 d F ( x 1 , ) + ϵ x 1 ( 1 ϵ ) x 2 d F ( , x 2 ) + ϵ x 2
Let us define Y 1 = X 1 μ 1 σ 1 , and Y 2 = X 2 μ 2 σ 2 . Furthermore, let G be the joint distribution function for the random vector Y 1 , Y 2 . Next, since the transformation from X 1 , X 2 Y 1 , Y 2 does not affect the correlation coefficient value, the following functional relationship holds: I C T ( F ) , F , x = I C T ( G ) , G , y .
Noticeably, E ( Y 1 ) = E ( Y 2 ) = 0 , and V a r ( Y 1 ) = V a r ( Y 2 ) = 1 . Consequently, (9) reduces to
A 1 = ( 1 ϵ ) y 1 y 2 d F ( y 1 , y 2 ) + ϵ y 1 y 2 [ ϵ y 1 ] [ ϵ y 2 ] = ( 1 ϵ ) ρ + ϵ y 1 y 2 ϵ 2 y 1 y 2 .
Next, the denominator, A 2 will be
A 2 = y 1 2 d [ ( 1 ϵ ) F ( y 1 , ) + ϵ Δ y 1 ] y 1 d [ ( 1 ϵ ) F ( y 1 , ) + ϵ Δ y 1 ] 2 × y 2 2 d [ ( 1 ϵ ) F ( , y 2 ) + ϵ Δ y 2 ] y 2 d [ ( 1 ϵ ) F ( , y 2 ) + ϵ Δ y 2 ] 2 = ( 1 ϵ ) y 1 2 d F ( y 1 , ) + ϵ y 1 ( 1 ϵ ) y 1 2 d F ( y 1 , ) + ϵ y 1 2 × ( 1 ϵ ) y 2 2 d F ( , y 2 ) + ϵ y 2 ( 1 ϵ ) y 2 d F ( , y 2 ) + ϵ y 2 2 .
Consequently,
A 2 = ( 1 ϵ ) V a r ( Y 1 ) + ϵ y 1 ( ( 1 ϵ ) E ( Y 1 ) + ϵ y 1 ) 2 × ( 1 ϵ ) v a r ( Y 2 ) + ϵ y 2 2 ( ( 1 ϵ ) E ( Y 2 ) + ϵ y 2 ) 2 = [ ( 1 ϵ ) + ϵ y 1 ϵ 2 y 1 2 ] [ ( 1 ϵ ) + ϵ y 2 ϵ 2 y 2 2 ] ,
since E ( Y 1 ) = y 1 d F ( y 1 , ) + 0 = E ( Y 2 ) = y 2 d F ( , y 2 ) by definition of Y 1 and Y 2 .
Next, combining all the above, the influence function for the bivariate correlation coefficient will be given in the next proposition.
Proposition 2.
The influence function for the Pearson’s Correlation Coefficient is I C T ( G ) , G ; y = y 1 y 2 ρ y 1 2 + y 2 2 2 .
Proof. 
I C T ( G ) , G ; y = lim ϵ 0 T ( F ϵ , y , y ) T ( F ) ϵ = lim ϵ 0 1 ϵ ( ( 1 ϵ ) ρ + ϵ y 1 y 2 ϵ 2 y 1 y 2 [ ( 1 ϵ ) + ϵ y 1 ϵ 2 y 1 2 ] [ ( 1 ϵ ) + ϵ y 2 ϵ 2 y 2 2 ] ρ ) = lim ϵ 0 y 1 y 2 h ( ϵ ) lim ϵ 0 ϵ y 1 y 2 h ( ϵ ) ρ lim ϵ 0 1 ϵ h ( ϵ ) ϵ h ( ϵ ) ,
where h ( ϵ ) = [ ( 1 ϵ ) + ϵ y 1 ϵ 2 y 1 2 ] [ ( 1 ϵ ) + ϵ y 2 ϵ 2 y 2 2 ] .
Observe that
lim ϵ 0 ϵ h ( ϵ ) = 0 , lim ϵ 0 h ( ϵ ) = 1 , lim ϵ 0 ( 1 h ( ϵ ) ) = 0 ,
which gives us our influence curve to the form
I C T ( G ) , G ; y = y 1 y 2 ρ ( 0 0 ) .
Since the limit is of the form 0 0 , we need to apply L’Hospital’s Rule, which we have
lim ϵ 0 1 h ( ϵ ) ϵ h ( ϵ ) = lim ϵ 0 h ( ϵ ) h ( ϵ ) + ϵ h ( ϵ ) = A 3 A 4 ,
say, where
A 3 = lim ϵ 0 h ( ϵ ) = y 1 2 + y 2 2 2 ,
and
A 4 = lim ϵ 0 h ( ϵ ) + ϵ h ( ϵ ) = 1 + 0 = 1 .
By combining the results from all the above, we can conclude that our influence function is
I C T ( F ) , F ; x = y 1 y 2 ρ y 1 2 + y 2 2 2 .
The derivation above has been independently studied by [11]. However, in the paper, the author discussed the proof by introducing a bivariate indicator function which is not required/necessary based on what we have found in this derivation. In this paper, we have provided alternative proof.
In the next section, we discuss evaluating the influence curve for parameters related to certain types of probability distributions that are different from univariate probability models and associated location parameters in their usual sense.

5. Influence Curve for Certain Types of Probability Distributions

We begin this section by computing the influence curve for the convolution of two absolutely continuous distribution functions. We conjecture at this point that similar development can be made in the discrete domain albeit with computational complexity.
  • Influence curve for the convolution of two distribution functions
    Let T ( F ) = F 1 d G ( t ) , where G is a cdf defined on ξ , 1 ξ , with 0 ξ 1 2 , and ξ is assumed to be a maximum value in the given interval. Then, the associated influence curve for T will be
    I C T ( F ) , F ; x = u d G ( u ) f F 1 ( u ) F ( u ) 1 d G ( u ) f F 1 ( u ) .
    Proof. 
    We begin by noting the fact that the IC for F 1 is u 1 x < F 1 ( u ) f F 1 ( u ) . In this case, T ( F ) is a linear functional of F 1 , subsequently, the IC for T ( F ) here will be
    I C T ( F ) , F ; x = u 1 x < F 1 ( u ) f F 1 ( u ) d G ( u ) = u d G ( u ) f F 1 ( u ) F ( u ) 1 d G ( u ) f F 1 ( u ) .
    Hence, the proof. □
  • Next, we consider the influence curve for the von Mises distribution. The associated pdf will be
    f x ; ¯ , κ = 2 π I 0 ( κ ) 1 exp ( κ ¯ T x ) ,
    where I 0 ( κ ) = π π exp κ cos x d x = j = 0 κ 2 j ( j ! ) 2 , is the modified Bessel function of the first kind and order zero, and ¯ T ¯ = x T x = 1 . Next, we derive that the influence curve for estimating the parameter κ T ( F ) will be
    I C x , T = x T T ( F ) | | C ( F ) | | 1 | | C ( F ) | | 2 | | C ( F ) | | T ( F ) .
    Proof. 
    At first, we make a note of the following:
    • If we write x , ¯ in terms of angles t and θ , such that x = cos t , sin t T , and ¯ = cos θ , sin θ T , , respectively, then the reparametrize family is a location family when κ is known.
    • In this case, the directional mean will be the maximum likelihood estimate of μ (irrespective of the fact whether κ is known or not.)
    Next, we write, for simplicity in deriving the main result
    B ( x ) = d d x log I 0 ( κ ) .
    It can be verified that
    d d x B ( x ) = B 2 ( x ) + B ( x ) x 1 .
    and
    d d x B 1 ( x ) = 1 x 2 x B 1 ( x ) 1 .
    Next, by the chain rule of derivative, the influence curve for κ can be written as
    I C x ; κ = d d x B 1 ( x ) | x = | | C ( F ) | | I C x ; | | C ( F ) | | .
    Next,
    I C x ; | | C ( F ) | | = x C ( F ) T C ( F ) | | C ( F ) | | = x T T ( F ) | | C ( F ) | | .
    Our result immediately follows by considering (11), (13) in (12). □

6. Simulation Study

We performed a simulation study to confirm some aspects of our theoretical findings. For illustrative purposes, we generate a random sample of size 500 from an exponential distribution (with rate λ = 5 ) and compute the numerical values of the influence curves for each of the population functionals discussed earlier at varying contamination levels. Precisely, the Cauchy distribution was contaminated at levels ϵ = 0.02 , 0.04 , 0.1 , respectively. The findings are summarized in Table 1. The expression of the influence curve for the directional mean and the correlation coefficient are not reported here as we feel that a separate study is required. Regarding the gross error sensitivity of the Pearson correlation coefficient based on a random sample drawn from a standard normal population, see [1].
In studying the table, we realize that all robustness concepts are meaningful and necessary. Furthermore, the following observations can be made based on this small simulation study:
  • The mean is sensitive to outliers while the median is not (as expected).
  • With an increase in the contamination level ( ϵ ) the influence due to “outliers/extreme observations” increases as well.
  • While the sample mean and the product moment correlation coefficient do not enjoy qualitative robustness (one might coincide it with weak-continuity) but rest of the estimators (precisely the location estimators) do enjoy the property of qualitative robustness.

7. Conclusions

The ability to derive an influence curve using the statistical functions of estimators is an incredibly advantageous aspect of nonparametric statistical inference as it allows us to better understand the robustness and efficiency of various estimators. In this article, we discussed a wide range of popular/well-known location estimators and associated expressions of the influence curve. We conducted a small simulation study by drawing samples from a probability distribution and contaminated at varying levels to examine the behavior of the influence curve. The outcome of this simulation study is promising. We have also discussed the influence curve for certain types of probability distributions during this course. A full scale study on this topic, for example, influence curve for scale estimators is subject to a separate report. It is fascinating to be able to characterize the effects that deviations from an ideal distribution could have on these estimators. With more research into deriving these influence curves, we can hope to ascertain a more comprehensive understanding of robust measures as nonparametric statistical inferences.

Author Contributions

Conceptualization, I.G.; Formal analysis, K.F.; Investigation, K.F.; Methodology, I.G. and K.F.; Supervision, I.G.; Writing—original draft, I.G. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by the Open Access Publishing Fund given by the William Madison Randall Library, UNCW.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Hampel, F.R. The influence curve and its role in robust estimation. J. Am. Stat. Assoc. 1974, 69, 383–393. [Google Scholar] [CrossRef]
  2. Huber, P.J. Robust Statistical Procedures; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1996. [Google Scholar]
  3. Grubbs, F.E. Procedures for detecting outlying observations in samples. Technometrics 1969, 11, 1–21. [Google Scholar] [CrossRef]
  4. Hawkins, D.M. Identification of Outliers; Chapman and Hall: London, UK, 1980; Volume 11. [Google Scholar]
  5. Johnson, R.A.; Wichern, D.W. Applied Multivariate Statistical Analysis; Prentice Hall: Upper Saddle River, NJ, USA, 2002; Volume 5. [Google Scholar]
  6. Rousseeuw, P.J.; Ronchetti, E. Influence curves of general statistics. J. Comput. Appl. Math. 1981, 7, 161–166. [Google Scholar] [CrossRef]
  7. Box, G.E. Non-normality and tests on variances. Biometrika 1953, 40, 318–335. [Google Scholar] [CrossRef]
  8. Tukey, J.W. A survey of sampling from contaminated distributions. In Contributions to Probability and Statistics; Olkin, I., Ed.; Stanford University Press: Stanford, CA, USA, 1960; pp. 448–485. [Google Scholar]
  9. Huber, P.J. Robust Statistics in International Encyclopedia of Statistical Science; Springer: Berlin/Heidelberg, Germany, 2011; pp. 1248–1251. [Google Scholar]
  10. Zhu, H.; Ibrahim, J.G.; Tang, N. Bayesian influence analysis: A geometric approach. Biometrika 2011, 98, 307–323. [Google Scholar] [CrossRef] [PubMed]
  11. Chernick, M.R. The influence function and its application to data validation. Am. J. Math. Manag. Sci. 1982, 2, 263–288. [Google Scholar] [CrossRef]
  12. Hettmansperger, T.P.; Utts, J.M. Robustness properties for a simple class of rank estimates. Commun. Stat.-Theory Methods 1977, 6, 855–868. [Google Scholar] [CrossRef]
  13. Reeds, J.A. On the Definition of von Mises Functionals. Ph.D. Thesis, Harvard University, Cambridge, MA, USA, 1976. [Google Scholar]
  14. Genton, M.G.; Ruiz-Gazen, A. Visualizing influential observations in dependent data. J. Comput. Graph. Stat. 2010, 19, 808–825. [Google Scholar] [CrossRef]
  15. Wu, M.; Zuo, Y. Trimmed and Winsorized standard deviations based on a scaled deviation. J. Nonparametric Stat. 2008, 20, 319–335. [Google Scholar] [CrossRef]
  16. Ko, D.; Guttorp, P. Robustness of estimators for directional data. Ann. Stat. 1988, 16, 609–618. [Google Scholar] [CrossRef]
Table 1. Numerical computation of robustness of several estimators via the Influence curve.
Table 1. Numerical computation of robustness of several estimators via the Influence curve.
Contamination Level ( ϵ )Estimator IC T ( F ) , F , x
0.02Mean
Median0.1386
5% Trimmed mean0.743
10% Trimmed mean0.682
α = 0.5 -Winsorized mean1.294
α = 0.75 -Winsorized mean1.487
0.04Mean
Median0.1387
5% Trimmed mean0.894
10% Trimmed mean0.686
α = 0.5 -Winsorized mean1.488
α = 0.75 -Winsorized mean1.363
0.1Mean
Median0.1386
5% Trimmed mean1.233
10% Trimmed mean0.785
α = 0.5 -Winsorized mean1.678
α = 0.75 -Winsorized mean1.416
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ghosh, I.; Fleming, K. On the Robustness and Sensitivity of Several Nonparametric Estimators via the Influence Curve Measure: A Brief Study. Mathematics 2022, 10, 3100. https://0-doi-org.brum.beds.ac.uk/10.3390/math10173100

AMA Style

Ghosh I, Fleming K. On the Robustness and Sensitivity of Several Nonparametric Estimators via the Influence Curve Measure: A Brief Study. Mathematics. 2022; 10(17):3100. https://0-doi-org.brum.beds.ac.uk/10.3390/math10173100

Chicago/Turabian Style

Ghosh, Indranil, and Kathleen Fleming. 2022. "On the Robustness and Sensitivity of Several Nonparametric Estimators via the Influence Curve Measure: A Brief Study" Mathematics 10, no. 17: 3100. https://0-doi-org.brum.beds.ac.uk/10.3390/math10173100

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop