Efficient Difference and Ratio-Type Imputation Methods under Ranked Set Sampling

Bhushan, Shashi; Kumar, Anoop; Zaman, Tolga; Al Mutairi, Aned

doi:10.3390/axioms12060558

Open AccessArticle

Efficient Difference and Ratio-Type Imputation Methods under Ranked Set Sampling

¹

Department of Statistics, University of Lucknow, Lucknow 226007, India

²

Department of Statistics, Amity University, Lucknow 226028, India

³

Department of Statistics, Faculty of Science, Çankiri Karatekin University, Çankiri 18100, Turkey

⁴

Department of Mathematical Sciences, College of Science, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Axioms 2023, 12(6), 558; https://0-doi-org.brum.beds.ac.uk/10.3390/axioms12060558

Submission received: 10 May 2023 / Revised: 1 June 2023 / Accepted: 2 June 2023 / Published: 5 June 2023

Download

Browse Figure

Versions Notes

Abstract

:

It is well known that ranked set sampling (RSS) is more efficient than simple random sampling (SRS). Furthermore, the presence of missing data vitiates the conventional results. Only a minuscule amount of work has been conducted under RSS with missing data. This paper makes a modest attempt to provide some efficient difference- and ratio-type imputation methods in the presence of missing values under RSS. The envisaged imputation methods are demonstrated to provide better results than the existing imputation methods. The theoretical results are enhanced by a computational analysis using real and hypothetically generated symmetric (Normal) and asymmetric (Gamma and Weibull) populations. The computational results show that the proposed imputation method outperforms the existing imputation methods in terms of its higher percent relative efficiency. Additionally, the impact of skewness and kurtosis on the efficiency of the suggested imputation methods has also been calculated.

Keywords:

bias; mean square error; missing data; imputation; ranked set sampling

MSC:

2020: 62D04

1. Introduction

The most common problem reported by a survey statistician in their daily life is making inferences from data containing missing values. Such problems of missing values in survey sampling may be tackled through the technique of imputation. A wide range of imputation methods have been suggested by various authors. The authors of [1] discussed three noteworthy concepts on missing values as missing at random (

M A R

), observed at random (

O A R

), and parameter distribution (

P D

). Subsequently, [2,3,4,5,6] suggested different types of imputation methods. The authors of [7] showed that missing at random and missing completely at random (

M C A R

) are totally different approaches. Many renowned authors [8,9,10,11,12] assumed the

M C A R

approach in their studies for the imputation of missing values. The authors of [13] introduced some imputation methods which outperformed the imputation methods suggested by [14]. The authors of [15] developed logarithmic-type imputation methods under

S R S

. The authors of [16] utilized robust measures and suggested compromised imputation-based mean estimators.

In real-life applications, situations may arise where the measurement of the study variable is not easy or expensive to do so but can be ranked visually or by a cost-free measure. In this situation, ref. [17] envisaged the concept of ranked set sampling (RSS) but did not provide any rigorous mathematical support. The authors of [18] explored the idea of [17] and furnished the essential mathematical foundation to the theory of RSS. In sample surveys, when each group has very few observations, each observation then becomes essential to make an effective prediction. Furthermore, the utilization of these types of datasets based on missing values may alter the final conclusion and decrease the efficiency of the estimation procedure. To deal with such problems, refs. [19,20,21,22] introduced an analytical comparison of imputation methods under RSS.

This paper conducted a search for efficient imputation procedures. We adapted some efficient difference- and ratio-type imputation methods under RSS based on [11,12], which are more efficient compared to the mean imputation method and the imputation methods suggested by [21,22] under RSS.

The paper is organised as follows: Section 2 describes the detailed methodology of RSS as well as the notations utilized throughout the paper. In Section 3, we consider a concise recap of some imputation methods under RSS, whereas in Section 4, we consider the proposed methods of imputation. In Section 5, we provide the efficiency conditions. Section 6 is devoted to the computational analysis and finally, Section 7 considers the conclusions of this study.

2. Sampling Methodology and Notations

The methodology of RSS was initiated by [17], based on drawing m simple random samples of size m from the parent population. These m units are now ranked inside each set regarding the auxiliary variable. The

r a n k 1

unit is chosen from the first set for the measurement of the auxiliary variables along with the associated study variable. The

r a n k 2

unit is chosen from the second ranked set for the measurement of auxiliary variable X along with the associated study variable Y, and the process is proceeded until the

r a n k m

unit is chosen from the last set. The above process is referred to as a cycle. This whole procedure is repeated k times, providing

n = m k

ranked set samples.

In the presence of missing values in a dataset, an alteration in the aforesaid methodology is proposed for the estimation of the population mean of the study variables under the consideration of usable auxiliary information. To facilitate ranking, m bivariate random samples, each consisting of m units, are quantified from the parent population. These m units are ranked within each set regarding the auxiliary variable as it is hypothesized that the study variable has some missing values. Now, from the first sample, the smallest ranked unit of X along with the correlated Y is selected. From the second sample, the second smallest ranked unit of X along with the correlated Y is selected. The above procedure is continued in the same mode until the mth sample from the highest ranked unit of X along with the correlated Y is selected. Compatible to the study variable from the first cycle,

m^{'}

units can provide a response for the measurement of the element out of the selected m units such that

m > m^{'}

. The whole procedure is repeated k times until responses from

n^{'}

units out of n selected units is obtained, where

n > n^{'}

.

Notations

Let

μ_{y} = N^{- 1} \sum_{i = 1}^{N} Y_{i}

be the mean of the finite population

Ω

of N identifiable units with values

Y_{i}

,

i \in Ω

. Let a ranked set sample s of size

n = m k

be quantified from

Ω

to estimate the population mean

μ_{y}

. Let

m^{'}

be the number of responding units out of the sampled m units. Let P be the probability that the ith respondent belongs to the responding group A and (

1 - P)

be the probability that the ith respondent belongs to the non-responding group

\bar{A}

such that

s = A \cup \bar{A}

. The value

Y_{i}, i \in A

is observed for every unit, but for the units

i \in \bar{A}

the values are missing and need imputation to build the complete structure of the data to draw a valid conclusion. The auxiliary variable X assists in the execution of imputation of missing values. Let

X_{i}

be the value of X for the unit i which is positive and known ∀

i \in s

such that

X_{s} = X_{i}; i \in s

are known. Let

{\bar{X}}_{r, r s s} = \sum_{i = 1}^{m^{'}} \sum_{j = 1}^{k} X_{(i : i) j} / m k P

and

{\bar{Y}}_{r, r s s} = \sum_{i = 1}^{m^{'}} \sum_{j = 1}^{k} Y_{[i : i] j} / m k P

possess the unbiased estimator of population means

μ_{x}

and

μ_{y}

, respectively. Here,

X_{(i : i) j}

and

Y_{[i : i] j}

are the ith order statistics and ith judgement order in the ith sample, respectively, of size m in cycle j for variable X and Y. For the sake of simplicity, we denote

X_{(i : i) j}

and

Y_{[i : i] j}

by

X_{(i)}

and

Y_{[i]}

, respectively. Let P be the probability of determining the response, then

E (r^{- j}) = {E (r)}^{- j}

, which provides the variance as

\begin{matrix} E {V ({\bar{Y}}_{r, r s s})} & = (\frac{σ_{y}^{2}}{m k P} - \frac{1}{m^{2} k P} \sum_{i = 1}^{m} τ_{y_{[i]}}^{2}) \end{matrix}

(1)

\begin{matrix} then E (j^{- 1}) & = {E (j)}^{- 1} = n P \end{matrix}

(2)

The proof of (1) and (2) can be viewed in [20].

To tabulate the bias and mean square error (

M S E

), the following notations and results are used throughout this paper. Let

{\bar{Y}}_{r, r s s} = μ_{y} (1 + ϵ_{0})

,

{\bar{X}}_{r, r s s} = μ_{x} (1 + ϵ_{1})

, and

{\bar{X}}_{n, r s s} = μ_{x} (1 + ϵ_{2})

, where

ϵ_{0}

,

ϵ_{1}

, and

ϵ_{2}

are the error terms, such that

E (ϵ_{0}) = E (ϵ_{1}) = E (ϵ_{2}) = 0

and

\begin{matrix} E (ϵ_{0}^{2}) & = (\frac{C_{y}^{2}}{m k P} - \frac{1}{m^{2} k P} \sum_{i = 1}^{m} \frac{τ_{y_{[i]}}^{2}}{μ_{y}^{2}}) = (γ^{*} C_{y}^{2} - W_{y}^{2^{*}}) \\ E (ϵ_{1}^{2}) & = (\frac{C_{x}^{2}}{m k P} - \frac{1}{m^{2} k P} \sum_{i = 1}^{m} \frac{τ_{x_{(i)}}^{2}}{μ_{x}^{2}}) = (γ^{*} C_{x}^{2} - W_{x}^{2^{*}}) \\ E (ϵ_{2}^{2}) & = (\frac{C_{x}^{2}}{m k} - \frac{1}{m^{2} k} \sum_{i = 1}^{m} \frac{τ_{x_{(i)}}^{2}}{μ_{x}^{2}}) = (γ C_{x}^{2} - W_{x}^{2}) \\ E (ϵ_{0}, ϵ_{1}) & = (\frac{ρ_{x y} C_{x} C_{y}}{m k P} - \frac{1}{m^{2} k P} \sum_{i = 1}^{m} \frac{τ_{{x y}_{[i]}}}{μ_{x} μ_{y}}) = (γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*}) \\ E (ϵ_{0}, ϵ_{2}) & = (\frac{ρ_{x y} C_{x} C_{y}}{m k} - \frac{1}{m^{2} k} \sum_{i = 1}^{m} \frac{τ_{{x y}_{[i]}}}{μ_{x} μ_{y}}) = (γ ρ_{x y} C_{x} C_{y} - W_{x y}) \\ E (ϵ_{1}, ϵ_{2}) & = (\frac{C_{x}^{2}}{m k} - \frac{1}{m^{2} k} \sum_{i = 1}^{m} \frac{τ_{x_{(i)}}^{2}}{μ_{x}^{2}}) = (γ C_{x}^{2} - W_{x}^{2}) \end{matrix}

where

γ^{*} = 1 / m k P

,

γ = 1 / m k

,

W_{y}^{2} = 1 \sum_{i = 1}^{m} τ_{y_{[i]}}^{2} / m^{2} k P μ_{y}^{2}

,

W_{x}^{2} = 1 \sum_{i = 1}^{m} τ_{x_{(i)}}^{2} / m^{2} k μ_{x}^{2}

,

W_{x}^{2^{*}} = 1 \sum_{i = 1}^{m} τ_{x_{(i)}}^{2} / m^{2} k P μ_{x}^{2}

,

W_{x y} = 1 \sum_{i = 1}^{m} τ_{x y_{[i]}} / m^{2} k μ_{x} μ_{y}

,

W_{x y}^{*} = 1 \sum_{i = 1}^{m} τ_{x y_{[i]}} / m^{2} k p μ_{x} μ_{y}

,

τ_{y_{[i]}} = (μ_{y_{[i]}} - μ_{y})

,

τ_{x_{(i)}} = (μ_{x_{(i)}} - μ_{x})

,

τ_{x y} = (μ_{x_{(i)}} - μ_{x}) (μ_{y_{[i]}} - μ_{y})

,

C_{x} = S_{x} / μ_{x}

,

C_{y} = S_{y} / μ_{y}

,

μ_{y} = E (Y)

,

μ_{x} = E (X)

,

μ_{y_{[i]}} = E (Y_{[i]})

, and

μ_{x_{(i)}} = E (X_{(i)})

.

Here,

S_{x}

and

S_{y}

are the population standard deviations due to the auxiliary variable X and study variable Y, respectively,

C_{x}

and

C_{y}

are the population coefficients of variation due to the auxiliary variable X and study variable Y, respectively, and

ρ_{x y}

is the population correlation coefficient between the auxiliary variable X and study variable Y. Moreover, we would also like to annotate that the quantities

μ_{x_{(i)}}

and

μ_{y_{[i]}}

consist of order statistics from some particular distributions and can be easily determined from [23].

3. Review of Imputation Methods under RSS

3.1. Mean Imputation Method

The method of imputation is

y_{. i} = \{\begin{matrix} Y_{i} & for i \in A \\ {\bar{Y}}_{r, r s s} & for i \in \bar{A} \end{matrix}

The consequent estimator is

t_{m} = {\bar{Y}}_{r, r s s}

(3)

The imputation methods are categorized into three strategies under the consideration of the availability of auxiliary information.

S t r a t e g y I

: When

μ_{x}

is known and

{\bar{X}}_{n, r s s}

is used.

S t r a t e g y I I

: When

μ_{x}

is known and

{\bar{X}}_{r, r s s}

is used.

S t r a t e g y I I I

: When

μ_{x}

is unknown and

{\bar{X}}_{n, r s s}

and

{\bar{X}}_{r, r s s}

are used.

3.2. The Al-Omari and Bouza Imputation Method

To improve the efficiency of the estimators in the presence of missing data, [9,21] suggested some regression-cum-ratio-type estimators under RSS as

S t r a t e g y I

\begin{matrix} {\bar{y}}_{K C_{1}} & = \frac{{\bar{Y}}_{r, r s s} + b (μ_{x} - {\bar{X}}_{n, r s s}) μ_{x}}{{\bar{X}}_{n, r s s}} \end{matrix}

(4)

S t r a t e g y I I

\begin{matrix} {\bar{y}}_{K C_{2}} & = \frac{{\bar{Y}}_{r, r s s} + b (μ_{x} - {\bar{X}}_{r, r s s}) μ_{x}}{{\bar{X}}_{r, r s s}} \end{matrix}

(5)

S t r a t e g y I I I

\begin{matrix} {\bar{y}}_{K C_{3}} & = \frac{{\bar{Y}}_{r, r s s} + b ({\bar{X}}_{n, r s s} - {\bar{X}}_{r, r s s}) {\bar{X}}_{n, r s s}}{{\bar{X}}_{r, r s s}} \end{matrix}

(6)

where

b = S_{x y} / S_{x}^{2}

is the regression coefficient of Y on X.

3.3. The Sohail, Shabbir and Ahmed Imputation Methods

Following [20,21,22], we examined the ratio-type estimators of [8] using RSS for the imputation of missing values. These imputation methods are

S t r a t e g y I

\begin{matrix} y_{{. i s}_{1}} & = \{\begin{matrix} Y_{i} & for i \in A \\ \frac{1}{n - r} \{n {\bar{Y}}_{r, r s s} {(\frac{μ_{x}}{{\bar{X}}_{n, r s s}})}^{β_{1}} - r {\bar{Y}}_{r, r s s}\} & for i \in \bar{A} \end{matrix} \\ y_{{. i s}_{4}} & = \{\begin{matrix} Y_{i} & for i \in A \\ \frac{1}{n - r} [n {\bar{Y}}_{r, r s s} \{\frac{μ_{x}}{β_{4} {\bar{X}}_{n, r s s} + (1 - β_{4}) μ_{x}}\} - r {\bar{Y}}_{r, r s s}] & for i \in \bar{A} \end{matrix} \end{matrix}

S t r a t e g y I I

\begin{matrix} y_{{. i s}_{2}} & = \{\begin{matrix} Y_{i} & for i \in A \\ \frac{1}{n - r} \{n {\bar{Y}}_{r, r s s} {(\frac{μ_{x}}{{\bar{X}}_{r, r s s}})}^{β_{2}} - r {\bar{Y}}_{r, r s s}\} & for i \in \bar{A} \end{matrix} \\ y_{{. i s}_{5}} & = \{\begin{matrix} Y_{i} & for i \in A \\ \frac{1}{n - r} [n {\bar{Y}}_{r, r s s} \{\frac{μ_{x}}{β_{5} {\bar{X}}_{r, r s s} + (1 - β_{5}) μ_{x}}\} - r {\bar{Y}}_{r, r s s}] & for i \in \bar{A} \end{matrix} \end{matrix}

S t r a t e g y I I I

\begin{matrix} y_{{. i s}_{3}} & = \{\begin{matrix} Y_{i} & for i \in A \\ \frac{1}{n - r} \{n {\bar{Y}}_{r, r s s} {(\frac{{\bar{X}}_{n, r s s}}{{\bar{X}}_{r, r s s}})}^{β_{3}} - r {\bar{Y}}_{r, r s s}\} & for i \in \bar{A} \end{matrix} \end{matrix}

\begin{matrix} y_{{. i s}_{6}} & = \{\begin{matrix} Y_{i} & for i \in A \\ \frac{1}{n - r} [n {\bar{Y}}_{r, r s s} \{\frac{{\bar{X}}_{n, r s s}}{β_{6} {\bar{X}}_{r, r s s} + (1 - β_{6}) {\bar{X}}_{n, r s s}}\} - r {\bar{Y}}_{r, r s s}] & for i \in \bar{A} \end{matrix} \end{matrix}

The consequent estimators are

t_{s_{1}} = {\bar{y}}_{r, r s s} {(\frac{μ_{x}}{{\bar{X}}_{n, r s s}})}^{β_{1}}

(7)

t_{s_{2}} = {\bar{y}}_{r, r s s} {(\frac{μ_{x}}{{\bar{X}}_{r, r s s}})}^{β_{2}}

(8)

t_{s_{3}} = {\bar{y}}_{r, r s s} {(\frac{{\bar{X}}_{n, r s s}}{{\bar{X}}_{r, r s s}})}^{β_{3}}

(9)

t_{s_{4}} = {\bar{y}}_{r, r s s} \{\frac{μ_{x}}{β_{4} {\bar{X}}_{n, r s s} + (1 - β_{4}) μ_{x}}\}

(10)

t_{s_{5}} = {\bar{y}}_{r, r s s} \{\frac{μ_{x}}{β_{5} {\bar{X}}_{r, r s s} + (1 - β_{5}) μ_{x}}\}

(11)

t_{s_{6}} = {\bar{y}}_{r, r s s} \{\frac{{\bar{X}}_{n, r s s}}{β_{6} {\bar{X}}_{r, r s s} + (1 - β_{6}) {\bar{X}}_{n, r s s}}\}

(12)

where

β_{i}

;

i = 1, 2, \dots, 6

are appropriately chosen optimizing scalars.

The

M S E

values of the consequent estimators consisting of different imputation methods are given in Appendix A for quick reference and further analytical comparison.

4. The Proposed Imputation Methods

The crux of the present article is:

To provide efficient imputation methods for mean estimation.
To access the impact of the skewness and kurtosis coefficients on the choice of imputation procedures.

Motivated by the works of [11,12], we propose nine new imputation methods under the three strategies discussed earlier, defined as

S t r a t e g y I

\begin{matrix} y_{{. i}_{1}} & = \{\begin{matrix} α_{1} Y_{i} & for i \in A \\ α_{1} {\bar{Y}}_{r, r s s} + \frac{n θ_{1}}{n - r} ({\bar{X}}_{n, r s s} - μ_{x}) & for i \in \bar{A} \end{matrix} \\ y_{{. i}_{4}} & = \{\begin{matrix} Y_{i} & for i \in A \\ \frac{1}{n - r} \{n α_{4} {\bar{Y}}_{r, r s s} {(\frac{μ_{x}}{{\bar{X}}_{n, r s s}})}^{θ_{4}} - r {\bar{Y}}_{r, r s s}\} & for i \in \bar{A} \end{matrix} \\ y_{{. i}_{7}} & = \{\begin{matrix} Y_{i} & for i \in A \\ \frac{1}{n - r} \{n α_{7} {\bar{Y}}_{r, r s s} (\frac{μ_{x}}{μ_{x} + θ_{7} ({\bar{X}}_{n, r s s} - μ_{x})}) - r {\bar{Y}}_{r, r s s}\} & for i \in \bar{A} \end{matrix} \end{matrix}

S t r a t e g y I I

\begin{matrix} y_{{. i}_{2}} & = \{\begin{matrix} α_{2} Y_{i} & for i \in A \\ α_{2} {\bar{Y}}_{r, r s s} + \frac{n θ_{2}}{n - r} ({\bar{X}}_{r, r s s} - μ_{x}) & for i \in \bar{A} \end{matrix} \\ y_{{. i}_{5}} & = \{\begin{matrix} Y_{i} & for i \in A \\ \frac{1}{n - r} \{n α_{5} {\bar{Y}}_{r, r s s} {(\frac{μ_{x}}{{\bar{X}}_{r, r s s}})}^{θ_{5}} - r {\bar{Y}}_{r, r s s}\} & for i \in \bar{A} \end{matrix} \\ y_{{. i}_{8}} & = \{\begin{matrix} Y_{i} & for i \in A \\ \frac{1}{n - r} [n α_{8} {\bar{Y}}_{r, r s s} \{\frac{μ_{x}}{μ_{x} + θ_{8} ({\bar{X}}_{r, r s s} - μ_{x})}\} - r {\bar{Y}}_{r, r s s}] & for i \in \bar{A} \end{matrix} \end{matrix}

S t r a t e g y I I I

\begin{matrix} y_{{. i}_{3}} & = \{\begin{matrix} α_{3} Y_{i} & for i \in A \\ α_{3} {\bar{Y}}_{r, r s s} + \frac{n θ_{3}}{n - r} ({\bar{X}}_{r, r s s} - {\bar{X}}_{n, r s s}) & for i \in \bar{A} \end{matrix} \\ y_{{. i}_{6}} & = \{\begin{matrix} Y_{i} & for i \in A \\ \frac{1}{n - r} \{n α_{6} {\bar{Y}}_{r, r s s} {(\frac{{\bar{X}}_{n, r s s}}{{\bar{X}}_{r, r s s}})}^{θ_{6}} - r {\bar{Y}}_{r, r s s}\} & for i \in \bar{A} \end{matrix} \\ y_{{. i}_{9}} & = \{\begin{matrix} Y_{i} & for i \in A \\ \frac{1}{n - r} [n α_{9} {\bar{Y}}_{r, r s s} \{\frac{{\bar{X}}_{n, r s s}}{{\bar{X}}_{n, r s s} + θ_{9} ({\bar{X}}_{r, r s s} - {\bar{X}}_{n, r s s})}\} - r {\bar{Y}}_{r, r s s}] & for i \in \bar{A} \end{matrix} \end{matrix}

Under the above strategies, the consequent estimators are

T_{1} = α_{1} {\bar{Y}}_{r, r s s} + θ_{1} ({\bar{X}}_{n, r s s} - μ_{x})

(13)

T_{2} = α_{2} {\bar{Y}}_{r, r s s} + θ_{2} ({\bar{X}}_{r, r s s} - μ_{x})

(14)

T_{3} = α_{3} {\bar{Y}}_{r, r s s} + θ_{3} ({\bar{X}}_{r, r s s} - {\bar{X}}_{n, r s s})

(15)

T_{4} = α_{4} {\bar{Y}}_{r, r s s} {(\frac{μ_{x}}{{\bar{X}}_{n, r s s}})}^{θ_{4}}

(16)

T_{5} = α_{5} {\bar{Y}}_{r, r s s} {(\frac{μ_{x}}{{\bar{X}}_{r, r s s}})}^{θ_{5}}

(17)

T_{6} = α_{6} {\bar{Y}}_{r, r s s} {(\frac{{\bar{X}}_{n, r s s}}{{\bar{X}}_{r, r s s}})}^{θ_{6}}

(18)

T_{7} = α_{7} {\bar{Y}}_{r, r s s} \{\frac{μ_{x}}{μ_{x} + θ_{7} ({\bar{X}}_{n, r s s} - μ_{x})}\}

(19)

T_{8} = α_{8} {\bar{Y}}_{r, r s s} \{\frac{μ_{x}}{μ_{x} + θ_{8} ({\bar{X}}_{r, r s s} - μ_{x})}\}

(20)

T_{9} = α_{9} {\bar{Y}}_{r, r s s} \{\frac{{\bar{X}}_{n, r s s}}{{\bar{X}}_{n, r s s} + θ_{9} ({\bar{X}}_{r, r s s} - {\bar{X}}_{n, r s s})}\}

(21)

where

α_{i} and θ_{i}

;

i = 1, 2, \dots, 9

are suitably chosen scalars.

Theorem 1.

The

M S E

of the consequent estimators comprising the suggested imputation methods are

\begin{matrix} M S E (T_{1}) & = \{\begin{matrix} {(α_{1} - 1)}^{2} μ_{y}^{2} + α_{1}^{2} μ_{y}^{2} (γ^{*} C_{y}^{2} - W_{y}^{2^{*}}) + θ_{1}^{2} μ_{x}^{2} (γ C_{x}^{2} - W_{x}^{2}) \\ + 2 α_{1} θ_{1} μ_{x} μ_{y} (γ ρ_{x y} C_{x} C_{y} - W_{x y}) \end{matrix}\} \end{matrix}

(22)

\begin{matrix} M S E (T_{2}) & = \{\begin{matrix} {(α_{2} - 1)}^{2} μ_{y}^{2} + α_{2}^{2} μ_{y}^{2} (γ^{*} C_{y}^{2} - W_{y}^{2^{*}}) + θ_{2}^{2} μ_{x}^{2} (γ^{*} C_{x}^{2} - W_{x}^{2^{*}}) \\ + 2 α_{2} θ_{2} μ_{x} μ_{y} (γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*}) \end{matrix}\} \end{matrix}

(23)

\begin{matrix} M S E (T_{3}) & = \{\begin{matrix} {(α_{3} - 1)}^{2} μ_{y}^{2} + α_{3}^{2} μ_{y}^{2} (γ^{*} C_{y}^{2} - W_{y}^{2^{*}}) + θ_{3}^{2} μ_{x}^{2} (γ^{*} C_{x}^{2} - W_{x}^{2^{*}} - γ C_{x}^{2} + W_{x}^{2}) \\ + 2 α_{3} θ_{3} μ_{x} μ_{y} (γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*} - γ ρ_{x y} C_{x} C_{y} + W_{x y}) \end{matrix}\} \end{matrix}

(24)

\begin{matrix} M S E (T_{4}) & = μ_{y}^{2} [\begin{matrix} 1 + α_{4}^{2} \{\begin{matrix} 1 + γ^{*} C_{y}^{2} - W_{y}^{2^{*}} + θ_{4} (2 θ_{4} + 1) (γ C_{x}^{2} - W_{x}^{2}) \\ - 4 θ_{4} (γ ρ_{x y} C_{x} C_{y} - W_{x y}) \end{matrix}\} \\ - 2 α_{4} \{1 - θ_{4} (γ ρ_{x y} C_{x} C_{y} - W_{x y}) + \frac{θ_{4} (θ_{4} + 1)}{2} (γ C_{x}^{2} - W_{x}^{2})\} \end{matrix}] \end{matrix}

(25)

\begin{matrix} M S E (T_{5}) & = μ_{y}^{2} [\begin{matrix} 1 + α_{5}^{2} \{\begin{matrix} 1 + γ^{*} C_{y}^{2} - W_{y}^{2^{*}} + θ_{5} (2 θ_{5} + 1) (γ^{*} C_{x}^{2} - W_{x}^{2^{*}}) \\ - 4 θ_{5} (γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*}) \end{matrix}\} \\ - 2 α_{5} \{1 - θ_{5} (γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*}) + \frac{θ_{5} (θ_{5} + 1)}{2} (γ^{*} C_{x}^{2} - W_{x}^{2^{*}})\} \end{matrix}] \end{matrix}

(26)

\begin{matrix} M S E (T_{6}) & = μ_{y}^{2} [\begin{matrix} 1 + α_{6}^{2} \{\begin{matrix} 1 + γ^{*} C_{y}^{2} - W_{y}^{2^{*}} + θ_{6} (2 θ_{6} + 1) (γ^{*} C_{x}^{2} - W_{x}^{2^{*}} - γ C_{x}^{2} + W_{x}^{2}) \\ - 4 θ_{6} (γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*} - γ ρ_{x y} C_{x} C_{y} + W_{x y}) \end{matrix}\} \\ - 2 α_{6} \{\begin{matrix} 1 - θ_{6} (γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*} - γ ρ_{x y} C_{x} C_{y} + W_{x y}) \\ + \frac{θ_{6} (θ_{6} + 1)}{2} (γ^{*} C_{x}^{2} - W_{x}^{2^{*}} - γ C_{x}^{2} + W_{x}^{2}) \end{matrix}\} \end{matrix}] \end{matrix}

(27)

\begin{matrix} M S E (T_{7}) & = μ_{y}^{2} [\begin{matrix} 1 + α_{7}^{2} \{\begin{matrix} 1 + γ^{*} C_{y}^{2} - W_{y}^{2^{*}} + 3 θ_{7}^{2} (γ C_{x}^{2} - W_{x}^{2}) - 4 θ_{7} (γ ρ_{x y} C_{x} C_{y} - W_{x y}) \end{matrix}\} \\ - 2 α_{7} \{1 + θ_{7}^{2} (γ C_{x}^{2} - W_{x}^{2}) - θ_{7} (γ ρ_{x y} C_{x} C_{y} - W_{x y})\} \end{matrix}] \end{matrix}

(28)

\begin{matrix} M S E (T_{8}) & = μ_{y}^{2} [\begin{matrix} 1 + α_{8}^{2} \{\begin{matrix} 1 + γ^{*} C_{y}^{2} - W_{y}^{2^{*}} + 3 θ_{8}^{2} (γ^{*} C_{x}^{2} - W_{x}^{2^{*}}) - 4 θ_{8} (γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*}) \end{matrix}\} \\ - 2 α_{8} \{1 + θ_{8}^{2} (γ^{*} C_{x}^{2} - W_{x}^{2^{*}}) - θ_{8} (γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*})\} \end{matrix}] \end{matrix}

(29)

\begin{matrix} M S E (T_{9}) & = μ_{y}^{2} [\begin{matrix} 1 + α_{9}^{2} \{\begin{matrix} 1 + γ^{*} C_{y}^{2} - W_{y}^{2^{*}} + 3 θ_{9}^{2} (γ^{*} C_{x}^{2} - W_{x}^{2^{*}} - γ C_{x}^{2} + W_{x}^{2}) \\ - 4 θ_{9} (γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*} - γ ρ_{x y} C_{x} C_{y} + W_{x y}) \end{matrix}\} \\ - 2 α_{9} \{\begin{matrix} 1 + θ_{9}^{2} \begin{matrix} (γ^{*} C_{x}^{2} - W_{x}^{2^{*}} - γ C_{x}^{2} + W_{x}^{2}) \end{matrix} \\ - θ_{9} (γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*} - γ ρ_{x y} C_{x} C_{y} + W_{x y}) \end{matrix}\} \end{matrix}] \end{matrix}

(30)

Proof.

The precis of the derivations are given in Appendix B for quick reference. □

Corollary 1.

The minimum

M S E

of the consequent estimators comprising the suggested imputation methods are given by

\begin{matrix} m i n M S E (T_{i}) & = μ_{y}^{2} (1 - α_{i (o p t)}); i = 1, 2, 3, 7, 8, 9 \end{matrix}

(31)

\begin{matrix} m i n M S E (T_{j}) & = μ_{y}^{2} (1 - α_{j (o p t)} A_{j}); j = 4, 5, 6 \end{matrix}

(32)

Proof.

A summary of the derivations and the definition of the parametric function

A_{j}

are given in Appendix B. □

5. Efficiency Conditions

By successively comparing the

M S E s

of the suggested imputation methods

y_{i_{1}}

to

y_{i_{9}}

regarding the other existing imputation methods proposed by [21,22], we obtain the following efficiency conditions.

(i).: From (31) and (A1)

$α_{i (o p t)} > 1 - γ^{*} C_{y}^{2} + W_{y}^{2^{*}}; i = 1, 2, 3, 7, 8, 9$

(33)
(ii).: From (32) and (A1)

$α_{j (o p t)} > \frac{1}{A_{j}} (1 - γ^{*} C_{y}^{2} + W_{y}^{2^{*}}); j = 4, 5, 6$

(34)
(iii).: From (31) and (A2)

$α_{i (o p t)} > 1 - γ^{*} C_{y}^{2} + W_{y}^{2^{*}} - \{1 - {(\frac{B}{R})}^{2}\} (γ C_{x}^{2} - W_{x}^{2}); i = 1, 2, 3, 7, 8, 9$

(35)
(iv).: From (32) and (A2)

$α_{j (o p t)} > \frac{1}{A_{j}} [1 - γ^{*} C_{y}^{2} + W_{y}^{2^{*}} - \{1 - {(\frac{B}{R})}^{2}\} (γ C_{x}^{2} - W_{x}^{2})]; j = 4, 5, 6$

(36)
(v).: From (31) and (A3)

$α_{i (o p t)} > 1 - γ^{*} C_{y}^{2} + W_{y}^{2^{*}} - γ^{*} C_{x}^{2} + W_{x}^{2^{*}} + (\frac{B}{R}) (γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*}); i = 1, 2, 3, 7, 8, 9$

(37)
(vi).: From (32) and (A3)

$α_{j (o p t)} > \frac{1}{A_{j}} \{\begin{matrix} 1 - γ^{*} C_{y}^{2} + W_{y}^{2^{*}} - γ^{*} C_{x}^{2} + W_{x}^{2^{*}} + (\frac{B}{R}) (γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*}) \end{matrix}\}; j = 4, 5, 6$

(38)
(vii).: From (31) and (A4)

$α_{i (o p t)} > [\begin{matrix} 1 - γ^{*} C_{y}^{2} + W_{y}^{2^{*}} - {\{1 + (\frac{B}{R})\}}^{2} (γ^{*} C_{x}^{2} - W_{x}^{2^{*}}) \\ + 2 \{1 + (\frac{B}{R})\} (γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*}) \end{matrix}]; i = 1, 2, 3, 7, 8, 9$

(39)
(viii).: From (32) and (A4)

$α_{j (o p t)} > \frac{1}{A_{j}} [\begin{matrix} 1 - γ^{*} C_{y}^{2} + W_{y}^{2^{*}} - {\{1 + (\frac{B}{R})\}}^{2} (γ^{*} C_{x}^{2} - W_{x}^{2^{*}}) \\ + 2 \{1 + (\frac{B}{R})\} (γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*}) \end{matrix}]; j = 4, 5, 6$

(40)
(ix).: From (31) and (A8)

$α_{i (o p t)} > 1 - γ^{*} C_{y}^{2} + W_{y}^{2^{*}} + \frac{{(γ ρ_{x y} C_{x} C_{y} - W_{x y})}^{2}}{(γ C_{x}^{2} - W_{x}^{2})}; i = 1, 2, 3, 7, 8, 9$

(41)
(x).: From (32) and (A8)

$α_{j (o p t)} > \frac{1}{A_{j}} \{1 - γ C_{y}^{2} + W_{y}^{2} + \frac{{(γ ρ_{x y} C_{x} C_{y} - W_{x y})}^{2}}{(γ^{*} C_{x}^{2} - W_{x}^{2^{*}})}\}; j = 4, 5, 6$

(42)
(xi).: From (31) and (A9)

$α_{i (o p t)} > 1 - γ^{*} C_{y}^{2} + W_{y}^{2^{*}} + \frac{{(γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*} - γ ρ_{x y} C_{x} C_{y} + W_{x y})}^{2}}{(γ^{*} C x^{2} - W_{x}^{2^{*}} - γ C x^{2} + W_{x}^{2})}; i = 1, 2, 3, 7, 8, 9$

(43)
(xii).: From (32) and (A9)

$α_{j (o p t)} > \frac{1}{A_{j}} \{1 - γ^{*} C_{y}^{2} + W_{y}^{2^{*}} + \frac{{(γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*} - γ ρ_{x y} C_{x} C_{y} + W_{x y})}^{2}}{(γ^{*} C x^{2} - W_{x}^{2^{*}} - γ C x^{2} + W_{x}^{2})}\}; j = 4, 5, 6$

(44)

It is notable that only under these conditions, we can ensure the efficiency of the suggested imputation methods. Furthermore, we observed that these conditions are usually satisfied in various populations.

6. Computational Study

To enhance the soundness of the efficiency conditions obtained in the previous section, a computational study was designed in three subsections, namely, a numerical analysis based on a real population, a simulation analysis based on an artificially generated population, and a discussion of the computational findings.

6.1. Numerical Study

In this subsection, a numerical study is performed and the performance of the proposed imputation methods is compared with existing imputation methods. The numerical analysis was accomplished on four real datasets. Population 1 was taken from [24], where the level of apple production was taken as the study variable and the number of apple trees taken as the auxiliary variable in 69 villages of the South Anatolia region of Turkey in 1999. Population 2 was taken from [25], where the population (in millions) in 1983 was considered as the study variable and the export (in millions of U.S. dollars) was considered the auxiliary variable. Population 3 was taken from [26], where the amount (in U.S. dollars) of real estate farm loans in different states during 1997 was considered as the study variable and the amount (in U.S. dollars) of non-real estate farm loans in different states during 1997 was considered the auxiliary variable. Population 4 was taken from [25], where the total number of seats in the municipal council in 1982 was considered the study variable and the number of conservative seats in the municipal council in 1982 was considered the auxiliary variable. The necessary values of the parameters for all four populations are reported in Table 1.

The percent relative efficiency (PRE) of the proposed imputation methods regarding the conventional imputation methods was calculated using the following formula:

\begin{matrix} P R E = \frac{M S E (t_{m})}{M S E (T)} \times 100 \end{matrix}

(45)

where

T = t_{m}, t_{s_{i}}, i = 1, 2, \dots, 6

, and

T_{i}, i = 1, 2, \dots, 9

. The results of the numerical analysis are summarized in Table 2 and depicted in Figure 1 under strategies I, II, III and III for each population.

6.2. Simulation Analysis

To assess the performance of the suggested imputation methods, following [27], simulation experiments were conducted over three parent populations, namely, Normal, Gamma, and Weibull, of size

N = 1000

units with variables X and Y, expressed by

\begin{matrix} Y & = 2.9 + \sqrt{(1 - ρ_{x y}^{2})} Y^{*} + ρ_{x y} (\frac{S_{y}}{S_{x}}) X^{*} \\ X & = 2.5 + X^{*} \end{matrix}

where

X^{*}

and

Y^{*}

are independent variables of the corresponding parent population. The sampling methodology of Section 2 was used to draw an RSS of size 12 units with set size 3 from each parent population. Using 20,000 iterations, the PRE of the consequent estimators compared to the conventional mean estimator were computed as

P R E = \frac{\frac{1}{20, 000} \sum_{i = 1}^{20, 000} {(t_{m} - μ_{y})}^{2}}{\frac{1}{20, 000} \sum_{i = 1}^{20, 000} {(T - μ_{y})}^{2}} \times 100

The outcomes of the simulation experiments are reported in Table 3, Table 4, Table 5, Table 6, Table 7, Table 8 and Table 9 by their PRE for each sensibly opted values of response probability

P = 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8

with corresponding correlation coefficient

ρ_{x y} = 0.6, 0.7, 0.8, 0.9

.

6.3. Discussion of Computational Findings

After carefully observing the findings reported in Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8 and Table 9, we discuss the following points:

(i).: From the findings of Table 2, the proposed imputation methods $y_{. i j}$ , $j = 1, 2, \dots, 9$ outperform the mean imputation methods, ref. [21] imputation methods and ref. [22] imputation methods in each real population. Furthermore, the proposed imputation methods $y_{. i j}$ , $j = 1, 3, 7, 9$ were superior among the proposed imputation methods in population 1, whereas the proposed imputation methods $y_{. i j}$ , $j = 4, 5, 6$ were superior among the proposed imputation methods in populations 2–4. This is easily observed in Figure 1.
(ii).: From the findings of Table 3, Table 4, Table 5, Table 6, Table 7, Table 8 and Table 9, the proposed imputation methods $y_{. i j}$ , $j = 1, 2, \dots, 9$ are also better than the mean imputation, ref. [21] imputation methods and ref. [22] imputation methods under both the symmetric and asymmetric populations for different correlation coefficients $ρ_{x y}$ , coefficients of skewness $β_{1}$ and coefficients of kurtosis $β_{2} .$
(iii).: When the parent population was normal (symmetric) and Weibull (asymmetric), the proposed ratio-type imputation methods $y_{. i j}$ , $j = 4, 5, 6$ always performed better than the competitors as well as within the proposed class of imputation methods under strategies I, II and III.
(iv).: When the parent population was Gamma (asymmetric), the proposed difference- and ratio-type imputation methods $y_{. i j}$ , $j = 1, 2, 3, 7, 8, 9$ were equally efficient and outperformed the conventional methods and performed better in comparison with the proposed imputation methods under strategies I, II and III.
(v).: The suggested imputation methods performed better in strategy II compared to strategies I and III in the real and artificially generated populations.
(vi).: It can be easily seen that the PRE decreases with the increase in asymmetry and peakedness for asymmetric distributions such as Gamma and Weibull.
(vii).: Moreover, the numerical analysis is summarized in Table 2 and Figure 1 under strategies I, II, and III for real populations 1–4. The PRE of the consequent estimators for the remaining simulation results in Table 3, Table 4, Table 5, Table 6 and Table 7 exhibit the same pattern and can be easily presented as line diagrams, if required.

7. Conclusions

In this manuscript, we proposed efficient difference- and ratio-type imputation methods for the estimation of the population mean in the presence of missing data. The efficiency conditions have been derived and sustained with computational analysis on some real and hypothetically generated symmetric and asymmetric populations. The computational and theoretical results show that the proposed imputation methods

y_{. i j}, j = 1, 2, \dots, 9

outperformed the mean imputation method

y_{. i}

, ref. [21] imputation methods

{\bar{y}}_{K C_{i}}

, i = 1, 2, 3, and ref. [22] imputation methods

y_{. i s_{j}}

,

j = 1, 2, \dots, 6

.

In the simulation analysis, we considered one family of a symmetric population, namely, Normal, and two families of asymmetric populations, namely, Gamma and Weibull, to ascertain the effect of the correlation coefficient for a symmetric population and the effect of skewness and kurtosis for asymmetric populations.

It is worth mentioning that among the asymmetric populations, all imputation methods exhibited a decreasing trend in PRE as the coefficient of skewness and kurtosis increased. Although, in such cases, the proposed estimators fared better than their conventional counterparts. These results are in agreement with the results of [17,28,29], where these authors took skewed distributions and reported that the efficiency of the estimators decreased with an increase in skewness and kurtosis. The same was also true for imputation as well.

Lastly, the proposed imputation methods currently provide the best possible imputation methods for the estimation of a population mean in the presence of the missing data.

Furthermore, the proposed imputation strategies can be defined using multi-auxiliary information, which our future research with investigate.

Author Contributions

Supervision, S.B.; conceptualization, S.B. and A.K.; methodology, S.B. and A.K.; software, A.K.; validation, S.B.; writing—original draft preparation, A.K.; writing—review and editing, A.K., S.B., T.Z. and A.A.M.; Funding, T.Z. and A.A.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by researchers supporting project number (PNURSP2023R368), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Data Availability Statement

The article includes all data utilized for this investigation.

Acknowledgments

Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2023R368), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The expressions of

M S E

, minimum

M S E

, and the optimum scalar values of the existing resultant estimators is reported below.

\begin{matrix} V (t_{m}) & = μ_{y}^{2} (γ^{*} C_{y}^{2} - W_{y}^{2^{*}}) \end{matrix}

(A1)

\begin{matrix} M S E ({\bar{y}}_{{K C}_{1}}) & ≅ \{μ_{y}^{2} (γ^{*} C_{y}^{2} - W_{y}^{2^{*}}) + (R^{2} - B^{2}) μ_{x}^{2} (γ C_{x}^{2} - W_{x}^{2})\} \end{matrix}

(A2)

\begin{matrix} M S E ({\bar{y}}_{{K C}_{2}}) & = \{μ_{y}^{2} (γ^{*} C_{y}^{2} - W_{y}^{2^{*}}) + μ_{x}^{2} (γ^{*} C_{x}^{2} - W_{x}^{2^{*}}) - B μ_{x} μ_{y} (γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*})\} \end{matrix}

(A3)

\begin{matrix} M S E ({\bar{y}}_{{K C}_{3}}) & = \{\begin{matrix} μ_{y}^{2} (γ^{*} C_{y}^{2} - W_{y}^{2^{*}}) + {(R + B)}^{2} μ_{x}^{2} (γ^{*} C_{x}^{2} - W_{x}^{2^{*}}) \\ - 2 (R + B) μ_{x} μ_{y} (γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*}) \end{matrix}\} \end{matrix}

(A4)

\begin{matrix} M S E (t_{s_{i}}) = & μ_{y}^{2} \{γ^{*} C_{y}^{2} - W_{y}^{2^{*}} + β_{i}^{2} (γ C_{x}^{2} - W_{x}^{2}) - 2 β_{i} (γ ρ_{x y} C_{x} C_{y} - W_{x y})\}, i = 1, 4 \end{matrix}

(A5)

\begin{matrix} M S E (t_{s_{i}}) = & μ_{y}^{2} \{γ^{*} C_{y}^{2} - W_{y}^{2^{*}} + β_{i}^{2} (γ^{*} C_{x}^{2} - W_{x}^{2^{*}}) - 2 β_{i} (γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*})\}, i = 2, 5 \end{matrix}

(A6)

\begin{matrix} M S E (t_{s_{i}}) = & μ_{y}^{2} \{\begin{matrix} γ^{*} C_{y}^{2} - W_{y}^{2^{*}} + β_{i}^{2} (\begin{matrix} γ^{*} C_{x}^{2} - W_{x}^{2^{*}} - γ C_{x}^{2} + W_{x}^{2} \end{matrix}) \\ - 2 β_{i} (γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*} - γ ρ_{x y} C_{x} C_{y} + W_{x y}) \end{matrix}\}, i = 3, 6 \end{matrix}

(A7)

\begin{matrix} m i n M S E (t_{s_{i}}) & = μ_{y}^{2} \{γ^{*} C_{y}^{2} - W_{y}^{2^{*}} - \frac{{(γ ρ_{x y} C_{x} C_{y} - W_{x y})}^{2}}{(γ C_{x}^{2} - W_{x}^{2})}\}; i = 1, 4 \end{matrix}

(A8)

\begin{matrix} m i n M S E (t_{s_{i}}) & = μ_{y}^{2} \{γ^{*} C_{y}^{2} - W_{y}^{2^{*}} - \frac{{(γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*})}^{2}}{(γ^{*} C_{x}^{2} - W_{x}^{2^{*}})}\}; i = 2, 5 \end{matrix}

(A9)

\begin{matrix} m i n M S E (t_{s_{i}}) & = μ_{y}^{2} \{\begin{matrix} γ^{*} C_{y}^{2} - W_{y}^{2^{*}} - \frac{{(γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*} - γ ρ_{x y} C_{x} C_{y} + W_{x y})}^{2}}{(\begin{matrix} γ^{*} C_{x}^{2} - W_{x}^{2^{*}} - γ C x^{2} + W_{x}^{2} \end{matrix})} \end{matrix}\}; i = 3, 6 \end{matrix}

(A10)

To obtain the minimum

M S E s

, the optimum scalar values associated with the estimators discussed in Section 3 are given below.

B = \frac{S_{x y}}{S_{x}^{2}}

,

β_{1 (o p t)} = β_{4 (o p t)} = \frac{(γ ρ_{x y} C_{x} C_{y} - W_{x y})}{(γ C_{x}^{2} - W_{x}^{2})}

,

β_{2 (o p t)} = β_{5 (o p t)} = \frac{(γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*})}{(γ^{*} C_{x}^{2} - W_{x}^{2^{*}})}

,

β_{3 (o p t)} = β_{6 (o p t)} = \frac{(γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*} - γ ρ_{x y} C_{x} C_{y} + W_{x y})}{(γ^{*} C x^{2} - W_{x}^{2^{*}} - γ C x^{2} + W_{x}^{2})}

Appendix B

In this section, we outline the proof of Theorem 1 and Corollary 1.

Under strategy I, consider the estimator

\begin{matrix} T_{1} & = α_{1} {\bar{Y}}_{r, r s s} + θ_{1} ({\bar{X}}_{n, r s s} - μ_{x}) \end{matrix}

Using the notations discussed in the earlier section, we obtain

T_{1} - μ_{y} = (α_{1} - 1) μ_{y} + α_{1} μ_{y} ϵ_{0} + θ_{1} μ_{x} ϵ_{1}

(A11)

Squaring both sides of (A11) and taking the expectation, we obtain the

M S E

of the estimator as

\begin{matrix} M S E (T_{1}) & = \{\begin{matrix} {(α_{1} - 1)}^{2} μ_{y}^{2} + α_{1}^{2} μ_{y}^{2} (γ^{*} C_{y}^{2} - W_{y}^{2^{*}}) + θ_{1}^{2} μ_{x}^{2} (γ C_{x}^{2} - W_{x}^{2}) \\ + 2 α_{1} θ_{1} μ_{x} μ_{y} (γ ρ_{x y} C_{x} C_{y} - W_{x y}) \end{matrix}\} \end{matrix}

(A12)

The optimum values of

α_{1}

and

θ_{1}

can be obtained by minimizing (A12) with respect to

α_{1}

and

θ_{1}

as

\begin{matrix} α_{1 (o p t)} & = \frac{1}{\{1 + γ^{*} C_{y}^{2} - W_{y}^{2^{*}} - \frac{{(γ ρ_{x y} C_{x} C_{y} - W_{x y})}^{2}}{(γ C_{x}^{2} - W_{x}^{2})}\}} = α_{7 (o p t)} \end{matrix}

(A13)

\begin{matrix} and θ_{1 (o p t)} & = - \frac{μ_{y}}{μ_{x}} \frac{(γ ρ_{x y} C_{x} C_{y} - W_{x y})}{(γ C_{x}^{2} - W_{x}^{2})} α_{1 (o p t)} \end{matrix}

(A14)

Introducing

α_{1 (o p t)}

and

θ_{1 (o p t)}

into (A12), we obtain the minimum

M S E

as

\begin{matrix} m i n M S E (T_{1}) & = μ_{y}^{2} (1 - α_{1 (o p t)}) \end{matrix}

(A15)

Similarly, we can obtain the optimum values of constants and minimum

M S E s

of other proposed estimators, which are

\begin{matrix} α_{2 (o p t)} & = \frac{1}{\{1 + (γ^{*} C_{y}^{2} - W_{y}^{2^{*}}) - \frac{{(γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*})}^{2}}{(γ^{*} C_{x}^{2} - W_{x}^{2^{*}})}\}} = α_{8 (o p t)} \end{matrix}

(A16)

\begin{matrix} θ_{2 (o p t)} & = - \frac{μ_{y}}{μ_{x}} \frac{(γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*})}{(γ^{*} C_{x}^{2} - W_{x}^{2^{*}})} α_{2 (o p t)} \end{matrix}

(A17)

\begin{matrix} α_{3 (o p t)} & = \frac{1}{\{1 + γ^{*} C_{y}^{2} - W_{y}^{2^{*}} - \frac{{(γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*} - γ ρ_{x y} C_{x} C_{y} + W_{x y})}^{2}}{(γ C_{x}^{2} - W_{x}^{2} - γ^{*} C_{x}^{2} + W_{x}^{2^{*}})}\}} = α_{9 (o p t)} \end{matrix}

(A18)

\begin{matrix} θ_{3 (o p t)} & = - \frac{μ_{y}}{μ_{x}} (\frac{γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*} - γ ρ_{x y} C_{x} C_{y} + W_{x y}}{γ C_{x}^{2} - W_{x}^{2} - γ^{*} C_{x}^{2} + W_{x}^{2^{*}}}) α_{3 (o p t)} \end{matrix}

(A19)

\begin{matrix} α_{j (o p t)} & = \frac{A_{j}}{B_{j}}; j = 4, 5, 6 \end{matrix}

(A20)

\begin{matrix} θ_{j (o p t)} & = \frac{(γ ρ_{x y} C_{x} C_{y} - W_{x y})}{(γ C_{x}^{2} - W_{x}^{2})}; j = 4, 7 \end{matrix}

(A21)

\begin{matrix} θ_{j (o p t)} & = \frac{(γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*})}{(γ^{*} C_{x}^{2} - W_{x}^{2^{*}})}; j = 5, 8 \end{matrix}

(A22)

\begin{matrix} θ_{j (o p t)} & = \frac{(γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*} - γ ρ_{x y} C_{x} C_{y} + W_{x y})}{(γ C_{x}^{2} - W_{x}^{2} - γ^{*} C_{x}^{2} + W_{x}^{2^{*}})}; j = 6, 9 \end{matrix}

(A23)

where

\begin{matrix} A_{4} & = 1 + \frac{(γ ρ_{x y} C_{x} C_{y} - W_{x y})}{2} - \frac{{(γ ρ_{x y} C_{x} C_{y} - W_{x y})}^{2}}{2 (γ C_{x}^{2} - W_{x}^{2})}, \\ B_{4} & = 1 + γ^{*} C_{y}^{2} - W_{y}^{2^{*}} + γ ρ_{x y} C_{x} C_{y} - W_{x y} - \frac{2 {(γ ρ_{x y} C_{x} C_{y} - W_{x y})}^{2}}{(γ C_{x}^{2} - W_{x}^{2})}, \\ A_{5} & = 1 + \frac{(γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*})}{2} - \frac{{(γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*})}^{2}}{2 (γ^{*} C_{x}^{2} - W_{x}^{2^{*}})}, \\ B_{5} & = 1 + γ^{*} C_{y}^{2} - W_{y}^{2^{*}} + γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*} - \frac{2 {(γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*})}^{2}}{(γ^{*} C_{x}^{2} - W_{x}^{2^{*}})}, \\ A_{6} & = 1 - \frac{1}{2} \frac{{(γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*} - γ ρ_{x y} C_{x} C_{y} + W_{x y})}^{2}}{(γ^{*} C_{x}^{2} - W_{x}^{2^{*}} - γ C_{x}^{2} + W_{x}^{2})} + \frac{1}{2} (γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*} - γ ρ_{x y} C_{x} C_{y} + W_{x y}), \\ B_{6} & = \{\begin{matrix} 1 + γ^{*} C_{y}^{2} - W_{y}^{2^{*}} + γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*} - γ ρ_{x y} C_{x} C_{y} + W_{x y} \\ - 2 \frac{{(γ^{*} ρ_{x y} C_{x} C_{y} - W_{x y}^{*} - γ ρ_{x y} C_{x} C_{y} + W_{x y})}^{2}}{(γ^{*} C_{x}^{2} - W_{x}^{2^{*}} - γ C_{x}^{2} + W_{x}^{2})} \end{matrix}\} . \end{matrix}

References

Rubin, R.B. Inference and missing data. Biometrika 1976, 63, 581–592. [Google Scholar] [CrossRef]
Lee, H.; Rancourt, E.; Sarndal, C.E. Experiments with variance estimation from survey data with imputed values. J. Off. Stat. 1994, 10, 231–243. [Google Scholar]
Singh, S.; Horn, S. Compromised imputation in survey sampling. Metrika 2000, 51, 267–276. [Google Scholar] [CrossRef]
Singh, S.; Deo, B. Imputation by power transformation. Stat. Pap. 2003, 44, 555–579. [Google Scholar] [CrossRef]
Singh, S. A new method of imputation in survey sampling. Stat. A J. Theor. Appl. Stat. 2009, 43, 499–511. [Google Scholar] [CrossRef]
Singh, S.; Valdes, S.R. Optimal method of imputation in survey sampling. Appl. Math. Sci. 2009, 3, 1727–1737. [Google Scholar]
Heitjan, D.F.; Basu, S. Distinguishing ‘Missing at Random’ and ‘Missing Completely at Random’. Am. Stat. 1996, 50, 207–213. [Google Scholar]
Ahmed, M.S.; Al-Titi, O.; Al-Rawi, Z.; Abu-Dayyeh, W. Estimation of a population mean using different imputation methods. Stat. Transit. 2006, 7, 1247–1264. [Google Scholar]
Kadilar, C.; Cingi, H. Estimators for the population mean in the case of missing data. Commun. Stat. Theory Methods 2008, 37, 2226–2236. [Google Scholar] [CrossRef]
Diana, G.; Perri, P.F. Improved estimators of the population mean for missing data. Commun. Stat. Theory Methods 2010, 39, 3245–3251. [Google Scholar] [CrossRef]
Bhushan, S.; Pandey, A.P. Optimal imputation of missing data for estimation of population mean. J. Stat. Manag. Syst. 2016, 19, 755–769. [Google Scholar] [CrossRef]
Bhushan, S.; Pandey, A.P. Optimality of ratio type estimation methods for population mean in presence of missing data. Commun. Stat. Theory Methods 2018, 47, 2576–2589. [Google Scholar] [CrossRef]
Bhushan, S.; Pandey, A.P.; Pandey, A. On optimality of imputation methods for estimation of population mean using higher order moments of an auxiliary variable. Commun. Stat. Simul. Comput. 2018, 49, 1–15. [Google Scholar] [CrossRef]
Mohamed, C.; Sedory, S.A.; Singh, S. Imputation using higher order moments of an auxiliary variable. Commun. Stat. Simul. Comput. 2016, 46, 6588–6617. [Google Scholar] [CrossRef]
Bhushan, S.; Kumar, A.; Pandey, A.P.; Singh, S. Estimation of population mean in presence of missing data under simple random sampling. Commun. Stat. Simul. Comput. 2022, 1–22. [Google Scholar] [CrossRef]
Anas, M.M.; Huang, Z.; Shahzad, U.; Zaman, T.; Shahzadi, S. Compromised imputation based mean estimators using robust quantile regression. Commun. Stat. Theory Methods 2022, 1–16. [Google Scholar] [CrossRef]
McIntyre, G.A. A method of unbiased selective sampling using ranked set. Aust. J. Agric. Res. 1952, 3, 385–390. [Google Scholar] [CrossRef]
Takahasi, K.; Wakimoto, K. On unbiased estimates of the population mean based on the sample stratified by means of ordering. Ann. Inst. Stat. Math. 1968, 20, 1–31. [Google Scholar] [CrossRef]
Bouza Herrera, C.N.; Al-Omari, A.I. Ranked set estimation with imputation of the missing observations: The median estimator. Rev. Investig. Oper. 2011, 32, 30–37. [Google Scholar]
Bouza, C.N. Handling Missing Data in Ranked Set Sampling; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
Al-Omari, A.; Bouza, C. Ratio estimators of the population mean with missing values using ranked set sampling. Environmetrics 2014, 26, 67–76. [Google Scholar] [CrossRef]
Sohail, M.U.; Shabbir, J.; Ahmed, S. A class of ratio type estimators for imputing the missing values under rank set sampling. J. Stat. Theory Pract. 2018, 12, 704–717. [Google Scholar] [CrossRef]
Arnold, B.C.; Balakrishnan, N.; Nagaraja, H.N. A First Course in Order Statistics; Wiley: New York, NY, USA, 1993. [Google Scholar]
Kadilar, C.; Cingi, H. Ratio estimators in stratified random sampling. Biom. J. 2003, 45, 218–225. [Google Scholar] [CrossRef]
Sarndal, C.E.; Swensson, B.; Wretman, J. Model Assisted Survey Sampling; Springer: New York, NY, USA, 2003. [Google Scholar]
Singh, S. Advanced Sampling Theory with Applications: How Michael Selected Amy; Kluwer: Amsterdam, The Netherlands, 2003; Volumes 1 and 2. [Google Scholar]
Singh, S.; Horn, S. An alternative survey in multi-character survey. Metrika 1998, 48, 99–107. [Google Scholar] [CrossRef]
Dell, T.R.; Clutter, J.L. Ranked set sampling theory with order statistics background. Biometrics 1972, 28, 545–555. [Google Scholar] [CrossRef]
Bhushan, S.; Kumar, A. Novel log type class of estimators under ranked set sampling. Sankhya B 2022, 84, 421–447. [Google Scholar] [CrossRef]

Figure 1. PRE results of the consequent estimators under (a) strategy I, (b) strategy II, and (c) strategy III for the real populations reported in Table 2.

Table 1. Description of the population parameters.

Parameters	Population 1	Population 2	Population 3	Population 4
N	69	124	50	284
n	12	12	12	12
m	3	3	3	3
r	4	4	4	4
P	0.5	0.3	0.7	0.6
$μ_{y}$	71.34	36.65	555.43	47.50
$μ_{x}$	3165.02	14276.03	878.16	9.05
$S_{y}$	110.85	116.80	584.82	11.06
$S_{x}$	3965.24	31431.81	1084677	4.95
$ρ_{x y}$	0.91	0.23	0.80	0.66

Table 2. PRE of the proposed estimators for real populations.

Estimators	Population 1	Population 2	Population 3	Population 4
$t_{m}$	100.00	100.00	100.00	100.00
Strategy I
$T_{1}$	200.23	379.07	213.14	266.61
$T_{4}$	198.93	388.40	216.47	276.07
$T_{7}$	200.23	379.07	213.14	266.61
$t_{s_{i}}$ , i = 1, 4	169.41	102.59	199.16	233.60
${\bar{y}}_{{k c}_{1}}$	111.93	89.60	72.25	78.77
Strategy II
$T_{2}$	584.83	385.66	360.33	2169.67
$T_{5}$	576.85	421.37	369.29	2459.06
$T_{8}$	584.83	385.66	360.33	2169.67
$t_{s_{i}}$ , i = 2, 5	554.02	109.18	346.35	2136.65
${\bar{y}}_{{k c}_{2}}$	118.15	72.99	68.44	69.78
Strategy III
$T_{3}$	220.06	380.63	126.93	167.17
$T_{6}$	218.39	395.85	127.34	169.46
$T_{9}$	223.56	368.69	125.09	160.86
$t_{s_{i}}$ , i = 3, 6	189.24	104.14	112.94	134.15
${\bar{y}}_{{k c}_{3}}$	98.91	79.99	89.12	72.58

Table 3. PRE of the proposed estimators at P = 0.20.

$ρ_{xy}$	0.6	0.7	0.8	0.9
$t_{m}$	100	100	100	100
$X^{*} \sim N (20, 25)$
$Y^{*} \sim N (30, 35)$
Strategy I
$T_{1}$	108.2314	110.1882	113.1969	118.9193
$T_{4}$	108.9253	110.9860	114.1003	119.9397
$t_{s_{i}}$ , i = 1, 4	102.4440	102.7442	102.8370	102.7567
${\bar{y}}_{{k c}_{1}}$	56.3365	62.5280	70.0558	78.6946
Strategy II
$T_{i}$ , i = 2, 8	119.3313	122.8567	126.3605	131.6545
$T_{5}$	123.7161	127.9222	132.0179	137.8729
$t_{s_{i}}$ , i = 2, 5	113.5438	115.4126	116.0005	115.492
${\bar{y}}_{{k c}_{2}}$	20.5860	25.1720	32.1576	43.0168
Strategy III
$T_{i}, i = 3, 9$	116.3368	119.4054	122.7634	128.1837
$T_{6}$	119.6366	123.2108	127.0276	132.9048
$t_{s_{i}}$ , i = 3, 6	110.5493	111.9614	112.4035	112.0212
${\bar{y}}_{{k c}_{3}}$	21.7699	27.3798	35.9864	49.5031
$X^{*} \sim G a m m a (5.6, 0.9)$
$Y^{*} \sim G a m m a (6.9, 0.9)$
Skewness of Y	0.6046	0.6134	0.6571	0.7487
Kurtosis of Y	3.4184	3.4308	3.5393	3.7750
Strategy I
$T_{i}$ , $i = 1, 7$	104.1361	104.0201	103.8752	103.7148
$T_{4}$	103.9699	103.8591	103.7210	103.5712
$t_{s_{i}}$ , i = 1, 4	102.2484	102.1062	101.8782	101.5178
${\bar{y}}_{{k c}_{1}}$	84.4838	84.5867	85.0399	86.0938
Strategy II
$T_{i}$ , i = 2, 8	114.2407	113.4139	112.1508	110.2765
$T_{5}$	113.4229	112.6215	111.3919	109.5685
$t_{s_{i}}$ , i = 2, 5	112.3531	111.5000	110.1539	108.0795
${\bar{y}}_{{k c}_{2}}$	52.5444	52.7365	53.6026	55.6966
Strategy III
$T_{i}, i = 3, 9$	111.5319	110.9071	109.9584	108.5579
$T_{6}$	110.8735	110.2693	109.3479	107.9888
$t_{s_{i}}$ , i = 3, 6	109.6442	108.9931	107.9614	106.3608
${\bar{y}}_{{k c}_{3}}$	50.0729	50.1878	50.9778	52.9983
$X^{*} \sim W b (0.945, 1.0)$
$Y^{*} \sim W b (0.953, 0.99)$
Skewness of Y	1.3561	1.3526	1.4714	1.7607
Kurtosis of Y	5.2276	5.2844	6.0535	7.8079
Strategy I
$T_{i}$ , i = 1, 7	107.3833	107.2255	107.1537	107.1449
$T_{4}$	107.5074	107.3496	107.2772	107.2674
$t_{s_{i}}$ , i = 1, 4	105.6587	105.0803	104.5253	103.9469
${\bar{y}}_{{k c}_{1}}$	80.5933	83.9483	86.6202	88.7951
Strategy II
$T_{i}$ , i = 2, 8	138.2957	134.0250	130.2555	126.6323
$T_{5}$	139.1752	134.8628	131.0549	127.3948
$t_{s_{i}}$ , i = b2, 5	136.5711	131.8799	127.6271	123.4344
${\bar{y}}_{{k c}_{2}}$	46.4315	52.3644	57.8042	62.7748
Strategy III
$T_{i}, i = 3, 9$	128.9875	126.1204	123.5728	121.1061
$T_{6}$	129.6256	126.7374	124.1689	121.6812
$t_{s_{i}}$ , i = 3, 6	127.2628	123.9752	120.9444	117.9082
${\bar{y}}_{{k c}_{3}}$	55.5027	62.5806	68.7953	74.1058