Next Article in Journal
Eigenloss: Combined PCA-Based Loss Function for Polyp Segmentation
Next Article in Special Issue
A New Modified Kies Family: Properties, Estimation Under Complete and Type-II Censored Samples, and Engineering Applications
Previous Article in Journal
Schedule Execution for Two-Machine Job-Shop to Minimize Makespan with Uncertain Processing Times
Previous Article in Special Issue
The Heavy-Tailed Exponential Distribution: Risk Measures, Estimation, and Application to Actuarial Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimation of Non-Linear Parameters with Data Collected Using Respondent-Driven Sampling

1
Department of Statistics and Operations Research, University of Granada, 18071 Granada, Spain
2
Facultad de Ciencias, Escuela Superior Politécnica de Chimborazo (ESPOCH), 060155 Riobamba, Ecuador
*
Author to whom correspondence should be addressed.
Submission received: 23 July 2020 / Revised: 31 July 2020 / Accepted: 5 August 2020 / Published: 7 August 2020
(This article belongs to the Special Issue Probability, Statistics and Their Applications)

Abstract

:
Respondent-driven sampling (RDS) is a snowball-type sampling method used to survey hidden populations, that is, those that lack a sampling frame. In this work, we consider the problem of regression modeling and association for continuous RDS data. We propose a new sample weight method for estimating non-linear parameters such as the covariance and the correlation coefficient. We also estimate the variances of the proposed estimators. As an illustration, we performed a simulation study and an application to an ethnic example. The proposed estimators are consistent and asymptotically unbiased. We discuss the applicability of the method as well as future research.

1. Introduction

Respondent-driven sampling (RDS) is a refined form of snowball sampling for collecting data from hidden populations that lack a sampling frame. Therefore, these populations are difficult to reach and cannot be dealt with using traditional sampling techniques. RDS was first introduced by Heckathorn [1] and was developed afterwards by Salganik and Heckathorn [2] and Volz and Heckathorn [3]. Some recent papers in parameter and variance estimation are given by [4,5,6]. Some popular examples of RDS are HIV at-risk people, the LGBTI community and injection drug users [7,8,9]. Other examples are given in [10,11].
A long-standing problem in non-probabilistic sampling is regression modeling for RDS data. Several authors have considered this issue from different perspectives. For instance, there have been a surge of studies in machine learning and statistical framework to solve similar problems. Wong et al. [12] studies the problem of biased standard errors of non-linear transport models. Imani et al. [13,14] use an approximation of a distribution using a Markov chain Monte carlo (MCMC) algorithm and an approximate MCMC implementation, respectively.
Model-fitting should incorporate sample weights as well as information about correlation between sample units [3]. Avery et al. [15] review some available methods in an RDS framework dealing with this problem in a simulation study with binary response. One popular method for addressing the problem of the correlation structure between recruits and recruiter in a network is clustering, that is, transforming RDS data into clustered data [16,17,18]. Clustering is connected with the homophily in the population, that is the tendency to associate with those with similar characteristics. Becket et al. [16] use clusters to consider correlation for estimating diabetes prevalence and discrete covariates in an empirical study. Nevertheless, using clustering can be problematic as sometimes clusters, if they exist, may be difficult to identify. Hubbart et al. [19] point some of the problems involved with using generalized estimating equations (GEE) when the number of clusters is small and Rao et al. [20] stress some of the problems involved with not adjusting properly to clusters.
Regression methods for analyzing data have not been fully validated [16]. Therefore, as no clear approach for regression modeling in RDS is available, some authors use standard statistical methods without adjusting for RDS data [21,22]. On the other hand, some authors use weighted regression for estimating prevalence of characteristics of interest in real-life examples [23,24,25,26]. Most of them use individual weights calculated (typically with the respondent driven sampling analytical tool (RDSAT)) and export them to standard statistical software to apply the weighted method. Methods incorporating sample weights will tend to improve their performance when homophily is small in the population, as they typically do not account for the potential dependence of the units. While this might be an issue in populations with high homophily, there is no clear reliable regression method in RDS accounting for clustering, that can be extensively used in applications. Adjusting for clusters requires knowledge of the population and if it is not well performed in practice, or even if clusters do not actually exist in the population, might result in biased estimates [20]. Our method addresses the problem of regression modeling and association between continuous variables by proposing a new sample weight estimation method for continuous data. The focus of our work was to propose a method for estimating non-linear parameters such as the covariance and the correlation coefficient. We derived expressions for the estimators that make use of the RDS estimators admitting continuous data and showed that they share properties with them, such as being consistent and asymptotically unbiased. A diagram is given in Figure 1. We also estimated the variances of the proposed methods. Our method may fill a gap as no such an approach has achieved in an RDS framework: most studies incorporate the weights using standard statistical software and unlike our proposal, they are focused on prevalence estimation.
We begin in the next section by introducing respondent-driven sampling. In Section 3, we propose estimators for the population covariance and for the correlation coefficient. Estimators of the variances of the proposed estimators are considered in Section 4. A simulation study was performed to illustrate their performance, described in Section 5. An application to study the living conditions of Indigenous, Montubios and Afro-Ecuadorian young people are presented in Section 6. Finally, Section 7 presents concluding remarks.

2. Background

The main idea behind the estimation in RDS [3] is to treat this sampling as a random walk on an undirected network. It is well known from Markov chain theory that the stationary (equilibrium) probability of a node is then proportional to its degree.
We assume the target population consists of N people (nodes) with labels 1 , . . . , N . We assume the target population is connected by a network of mutual relations with N × N adjacency matrix  Z . That is z i j = z j i = 1 if i and j are connected and 0 otherwise. We define the nodal degree of a the person i, δ i = j z i j , as the number of network ties or alters of node i.
An small initial sample s is selected from the population members accessible to researchers that are called the seeds and comprise wave 0 of the sample. Each member of wave v is given a number of uniquely identified coupons to distribute among their alters. Coupon recipients returning their coupons to the study center are subsequently enrolled in the study. The wave number of a respondent is one more than that of their recruiter. This procedure is repeated until the desired sample size, n, is attained.
Let the N-vector y represent a variable of interest. If y have binary response and groups A and B, the more usual estimators in RDS are the RDS-I ratio estimator, the RDS-II estimator [3] and the Gile and Hanckock [27] version for sampling with replacement.
The RDS-I estimator for estimating proportions with y binary response and groups A and B is defined as
p ^ A = C ^ B A D ^ B / ( C ^ B A D ^ B + C ^ A B D ^ A ) ,
with C ^ A B = r A B r A B + r A A , r A B is the number of people of A s recruiting B s in the sample, r A A the number of people of A s recruiting A s in the sample, C ^ B A = r B A r B A + r B B , r B A is the number of people of B s recruiting A s in the sample, n A and n B the number of sample units belong to groups A and B respectively, and  D ^ A and D ^ B are the average degree of people in groups A and B, respectively.
The RDS-SS [27] estimator for estimating proportions:
p ^ A = k s A ( π ^ ( δ k ) 1 ) / k s π ^ ( δ k ) 1 ,
with π ^ ( δ k ) the estimated population distribution of degrees through successive sampling.
The RDS-II estimator of the mean Y ¯ allows continuous variables and takes the form of the Hajek estimator as follows:
Y ¯ ^ = k s ( δ k 1 y k ) / k s δ k 1 ,
with δ k the degree reported by respondent k.

3. Estimation of Some Non-Linear Parameters

The widespread use of regression based on sample survey data requires a careful assessment of the use of standard techniques. It is clear that usual estimators of parameters involved in regression are not valid in the case of RDS scheme. In this section, we develop some estimators for population variances, covariances and the correlation coefficient.

3.1. Estimation of the Variance and the Covariance

We define the population covariance as:
S y x = 1 N 1 U ( y k Y ¯ ) ( x k X ¯ ) .
We can write this parameter as:
S y x = 1 N 1 T y x 1 N ( N 1 ) T y T x = θ = f ( θ 1 , θ 2 , θ 3 ) ,
being T y x = θ 1 = U y k x k ,   T y = θ 2 = U y k and T x = θ 3 = U x k .
Similarly, the finite population variances are defined as
S y 2 = 1 N 1 T y y 1 N ( N 1 ) T y 2 ,
and
S x 2 = 1 N 1 T x x 1 N ( N 1 ) T x 2 .
Let us construct estimators for these parameters assuming that y k and x k are observed for the units of the RDS sample s.
If there exists θ ^ 1 , θ ^ 2 and θ ^ 3 consistent estimators of θ 1 , θ 2 and θ 3 , a consistent estimator of S y x will be
S ^ y x = 1 N 1 θ ^ 1 1 N ( N 1 ) θ ^ 2 θ ^ 3 = θ ^ .
We can estimate these totals with the RDS-II estimator:
T ^ y H H = N δ ^ v n s y k δ k 1 ,   T ^ y y H H = N δ ^ v n s y k 2 δ k 1 , and T ^ y x H H = N δ ^ v n s y k x k δ k 1 , being δ ^ v = n U δ k 1 the average degree.
Then, the estimator of the covariance is
S ^ y x H H = 1 N 1 T ^ y x H H 1 N ( N 1 ) T ^ y H H T ^ x H H .
If N is large, T ^ y H H can be written in a more straightforward way that does not depends on N:
S ^ y x H H = δ ^ v n s y k x k δ k 1 δ ^ v 2 n 2 s y k δ k 1 s x k δ k 1 .
Using the idea of the RDS-SS estimator, we propose to estimate the totals as:
T ^ y S S = s y k π ^ ( δ k ) 1 ,   T ^ y x S S = s y k x k π ^ ( δ k ) 1 and T ^ y y S S = s y k 2 π ^ ( δ k ) 1 , being π ^ ( δ k ) the estimated population distribution of degrees through successive sampling.
Then, the estimator of the covariance is
S ^ y x S S = 1 N 1 T ^ y x S S 1 N ( N 1 ) T ^ y S S T ^ x S S .
If N is unknown, a consistent estimator for S y x is:
S ^ y x S S = 1 N ^ 1 T ^ y x S S 1 N ^ ( N ^ 1 ) T ^ y S S T ^ x S S ,
with N ^ = s π ^ ( δ j ) 1 .
RDS-SS and RDS-II estimators of a total are asymptotically unbiased, thus the proposed estimators will be asymptotically unbiased.

3.2. Estimation of the Correlation Coefficient

In this section, we consider the estimation of the correlation coefficient between two variables, say y and x, defined by
ρ = S y x / S y S x .
Two estimators for this parameter can be obtained by using RDS-II and RDS-SS estimators which are previously defined:
ρ ^ H H = S ^ y x H H S ^ y H H S ^ x H H ,
and
ρ ^ S S = S ^ y x S S S ^ y S S S ^ x S S ,
being S ^ y H H = 1 N 1 T ^ y y H H 1 N ( N 1 ) T ^ y H H 2 , S ^ x H H = 1 N 1 T ^ x x H H 1 N ( N 1 ) T ^ x H H 2 , S ^ y S S = 1 N 1 T ^ y y S S 1 N ( N 1 ) T ^ y S S 2 and S ^ x S S = 1 N 1 T ^ x x S S 1 N ( N 1 ) T ^ x S S 2 .

4. Estimation of the Variances

We consider the variance estimation of the covariance of S ^ y x
Using a Taylor linearization, we write
θ ^ θ o ^ = θ + 1 3 w j ( θ ^ j θ j ) ,
with
w j = f ( θ 1 ^ ( s ) , . . . , θ 3 ^ ( s ) ) θ j ^ | θ 1 , . . . , θ 3 .
V ( θ o ^ ) = V ( w j θ j ^ ) = w j 2 V ( θ j ^ ) + w i w j c o v ( θ ^ i , θ ^ j ) ,
and
V ^ ( θ ^ ) V ^ ( θ o ^ ) = w ^ j 2 V ^ ( θ j ^ ) + w ^ i w ^ j c o v ^ ( θ ^ i , θ ^ j ) ,
being w 1 = 1 N 1 , w 2 = θ 3 N ( N 1 ) , w 3 = θ 2 N ( N 1 ) .
They are estimated by
w ^ 1 = w 1 , w ^ 2 = T ^ x N ( N 1 ) , w ^ 3 = T ^ y N ( N 1 ) .
Note: A more straightforward computational expression can be derived from formulae 5.5.10 in Särndal et al. [28]
We estimate the variances and covariances of the above-mentioned totals for the RDS-II estimator as
V ^ ( T ^ y x H H ) = 1 ( n 1 ) n s ( N δ ^ v y k x k δ k 1 T ^ y x H H ) 2 ,
V ^ ( T ^ y H H ) = 1 ( n 1 ) n s ( N δ ^ v y k δ k 1 T ^ y H H ) 2 ,
V ^ ( T ^ x H H ) = 1 ( n 1 ) n s ( N δ ^ v x k δ k 1 T ^ x H H ) 2 ,
c o v ^ ( T ^ y x H H , T ^ x H H ) = 1 ( n 1 ) n s ( N δ ^ v y k x k δ k 1 T ^ y x H H ) ( N δ ^ v x k δ k 1 T ^ x H H ) ,
c o v ^ ( T ^ y x H H , T ^ y H H ) = 1 ( n 1 ) n s ( N δ ^ v y k x k δ k 1 T ^ y x H H ) ( N δ ^ v y k δ k 1 T ^ y H H ) ,
and
c o v ^ ( T ^ y H H , T ^ x H H ) = 1 ( n 1 ) n s ( N δ ^ v y k δ k 1 T ^ y H H ) ( N δ ^ v x k δ k 1 T ^ x H H ) .
The proposed RDS-II estimator is only analogous to the Hansen and Hurvitz estimator [29], but as data are correlated in an RDS framework, the above-mentioned estimators can perform poorly. Even though Volz and Heckathorn [3] derived a variance estimator that accounts the MCMC structure of the sample for categorical variables, we can not use this variance estimator in this context.
We estimate now the variances and covariances of the totals for the RDS-SS estimator by using the Deville and Särndal [30] method for estimating the variance of the Horvitz–Thompson estimator. The variances are estimated as
V ^ ( T ^ y x S S ) = 1 1 k s a k 2 k s ( 1 π ^ ( δ k ) ) ( y k x k π ^ ( δ k ) l s a l y l x l / π ^ ( δ l ) ) 2 ,
V ^ ( T ^ y S S ) = 1 1 k s a k 2 k s ( 1 π ^ ( δ k ) ) ( y k π ^ ( δ k ) l s a l y l / π ^ ( δ l ) ) 2 ,
and
V ^ ( T ^ x S S ) = 1 1 k s a k 2 k s ( 1 π ^ ( δ k ) ) ( x k π ^ ( δ k ) l s a l x l / π ^ ( δ l ) ) 2 .
The covariances are estimated as
c o v ^ ( T ^ y x S S , T ^ y S S ) = 1 1 k s a k 2 k s ( 1 π ^ ( δ k ) ) ( y k x k π ^ ( δ k ) l s a l y l x l / π ^ ( δ l ) ) ( y k π ^ ( δ k ) l s a l y l / π ^ ( δ l ) ) ,
c o v ^ ( T ^ y x S S , T ^ x S S ) = 1 1 k s a k 2 k s ( 1 π ^ ( δ k ) ) ( y k x k π ^ ( δ k ) l s a l y l x l / π ^ ( δ l ) ) ( x k π ^ ( δ k ) l s a l x l / π ^ ( δ l ) ) ,
and
c o v ^ ( T ^ y S S , T ^ x S S ) = 1 1 k s a k 2 k s ( 1 π ^ ( δ k ) ) ( y k π ^ ( δ k ) l s a l y l / π ^ ( δ l ) ) ( x k π ^ ( δ k ) l s a l x l / π ^ ( δ l ) ) ,
where a k = ( 1 π ^ ( δ k ) ) / l s ( 1 π ^ ( δ l ) ) .
As the correlation coefficient estimators are ratio estimators, the estimators of their variances can be easily obtain by using Taylor linearization (see e.g., [31]).

5. Simulation Experiments

In this section, a limited simulation study was carried out to illustrate the performance of the proposed estimators under different scenarios. The main factor of interest was the estimation of the population covariance and the correlation between continuous covariates. We used our own code written in R to compute the proposed estimators. Programming details and code are available from the authors.
The simulated population size was N = 10000 . A N x N network connection indicator matrix C was randomly generated, with c i j either 0 or 1, a connection indicator between node i and j, for i , j = 1 , , N . Resulting c i j will determine degree, as i U , i j c i j = δ j . Ten seeds were selected at random from the network with probability proportional to their degree, with three maximal coupons issued for each participant.
The values of the variable of interest y were generated from a normal distribution y j N ( 5000 , 500 ) , for j = 1 , , 5000 . Three auxiliary variables were then generated from the values of y, which were: x 1 = ( y e 1 ) / 0 . 5 with e 1 N ( 500 , 500 ) , x 2 = ( y e 2 ) / 0 . 5 with e 2 N ( 500 , 700 ) and x 3 = ( y e 3 ) / 0 . 5 , where e 3 N ( 500 , 300 ) . The resulting correlation coefficients were ρ = 0 . 7007 for x 1 , ρ = 0 . 571 for x 2 and ρ = 0 . 8579 for x 3 , respectively. The simulations were also performed for other different covariates and therefore different values of ρ , but the results were qualitatively similar and hence are not reported here. Sample size was n = 500 and samples were selected using simple random sampling without replacement, just like RDS is usually conducted in practice.
For each regression model, we computed the two proposed estimators of the population covariance S y x and the correlation coefficient ρ . We investigated the percent relative bias
r b % = E M C ( θ ^ θ ) / θ 100 ,
and the percent relative mean squared error
r m s e % = E M C [ ( θ ^ θ ) 2 ] / θ 2 100 ,
for each estimator S ^ y x and ρ ^ . Simulation results were based on B = 1000 samples and E M C denotes the average of the Monte Carlo replications.
The estimators of the covariance are approximately unbiased, as relative biases are around 1% for all scenarios considered, with even lower biases for the correlation coefficient estimates, with all of them less than 1%, as shown in Table 1 and Table 2. Small relative efficiency values for estimating the parameters with quite similar results obtained with both estimators, indicating that they are effective in estimating these non-linear parameters.

6. Application to a Real Survey

In this section, the proposed estimators were applied to a real survey involving discrimination and the under-representation of young Indigenous, Montubios and Afro-Ecuadorian people in Ecuador. The RDS methodology was applied to a population of young (18 to 29 years old) Indigenous, Montubios and Afro-Ecuadorian people living in the city of Riobamba (Ecuador). They have historically been suffering from exclusion and under-representation and therefore, this group lacks a reliable sampling frame [32,33,34,35]. A total of 814 people were recruited in six waves and questioned on their social and economic background and living conditions using a dual system of incentives to motivate recruitment. The reported income of the household is the variable of interest and the age of the respondent is the covariate. This is unpublished data that is intended for publication in a manuscript that is in preparation [36].
Good overall performance of the two proposed estimators for the covariance and the correlation coefficient, with a bias approximately around 5% and similar small values of the relative mean squared error r m s e , as shown in Table 3 and Table 4.

7. Discussion

RDS were used extensively to study the prevalence of a disease. As more RDS practitioners are incorporating this methodology to their toolbox, model-fitting in an RDS framework has become an important issue of interest. We proposed a new sample weight estimation method for continuous data. Our approach is most appropriate for situations in which homophily is small. While we consider this is a novel approach for continuous RDS data, accounting for clustering remains an open question. It is possible to extend this methodology to adjusting to clusters, as part of future research.
As an illustration of the applicability of the proposed method, we performed a simulation study and an application to an ethnic example. Nevertheless, the focus of our work has been to propose a method for estimating non-linear parameters with new sample weights. We derived expressions of the variances and showed that the proposed estimators have desirable properties. Our simulation study does not show significant differences in terms of bias or root mean square error between the two proposed estimators. Furthermore, the calculation complexity of the two estimators is similar. There is therefore no objective reason to prefer one over the other.
Taken together, the results about the dependence between continuous variables presented in this paper add to the growing literature on respondent-driven sampling, allowing researchers to obtain better information about key hidden populations.

Author Contributions

The authors contributed equally to this work in conceptualization, methodology, software and original draft preparation. All authors have read and agree to the published version of the manuscript.

Funding

The work was supported by the Ministerio de Economia, Industria y Competitividad, Spain, under Grant MTM2015-63609-R.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
RDSRespondent-Driven Sampling
MCMCMarkov Chain Monte Carlo

References

  1. Heckathorn, D. Respondent-driven sampling: A new approach to the study of hidden populations. Soc. Probl. 1997, 44, 174–199. [Google Scholar] [CrossRef]
  2. Salganik, M.; Heckathorn, D. Sampling and estimation in hidden populations using respondent-driven sampling. Sociol. Methodol. 2004, 34, 193–240. [Google Scholar] [CrossRef]
  3. Volz, E.; Heckathorn, D. Probability based estimation theory for respondent driven sampling. J. Off. Stat. 2008, 14, 79–97. [Google Scholar]
  4. Spiller, M.W.; Gile, K.J.; Handcock, M.S.; Mar, C.M.; Wejnert, C. Evaluating Variance Estimators for Respondent-Driven Sampling. J. Surv. Stat. Methodol. 2018, 6, 23–45. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Beaudry, I.S.; Gile, K.J. Correcting for differential recruitment in respondent-driven sampling data using ego-network information. Electron. J. Stat. 2020, 14, 2678–2713. [Google Scholar] [CrossRef]
  6. Rocha, L.E.; Thorson, A.E.; Lambiotte, R.; Liljeros, F. Respondent-driven sampling bias induced by community structure and response rates in social networks. J. R. Stat. Soc. Ser. A Stat. Soc. 2017, 180, 99–118. [Google Scholar] [CrossRef] [Green Version]
  7. Kan, M.; Garfinkel, D.; Samoylova, O.; Gray, R.; Little, K. Social network methods for HIV case-finding among people who inject drugs in Tajikistan. J. Int. AIDS Soc. 2018, 21, 57–64. [Google Scholar] [CrossRef] [Green Version]
  8. Sypsa, V.; Psichogiou, M.; Paraskevis, D. Rapid decline in HIV incidence among persons who inject drugs during a fast-track combination prevention program after an HIV outbreak in Athens. J. Infect. Dis. 2017, 215, 1496–1505. [Google Scholar]
  9. Card, K.G.; Lachowsky, N.J.; Cui, Z. Exploring the role of sex-seeking apps and websites in the social and sexual lives of gay, bisexual and other men who have sex with men: A cross-sectional study. Sex Health 2017, 14, 229–237. [Google Scholar] [CrossRef]
  10. Bernard, J.; Daňková, H.; Vašát, P. Ties, sites and irregularities: Pitfalls and benefits in using respondent-driven sampling for surveying a homeless population. Int. J. Soc. Res. Methodol. 2018, 21, 603–618. [Google Scholar] [CrossRef]
  11. Hipp, L.; Kohler, U.; Leumann, S. How to Implement Respondent-Driven Sampling in Practice: Insights from Surveying 24-Hour Migrant Home Care Workers. Surv. Methods Insights Field 2019. [Google Scholar] [CrossRef]
  12. Wong, W.; Wang, S.; Liu, H. Bootstrap standard error estimations of non-linear transport models based on linearly projected data. Transp. A 2018, 15, 1–35. [Google Scholar]
  13. Imani, M.; Ghoreishi, S.F.; Braga-Neto, U. Bayesian Control of Large MDPs with Unknown Dynamics in Data-Poor Environments. Adv. Neural Dyn. 2018, 8146–8156. [Google Scholar]
  14. Imani, M.; Dougherty, E.R.; Braga-Neto, U. Boolean Kalman Filter and Smoother under Model Uncertainty. Automatica 2020. [Google Scholar] [CrossRef]
  15. Avery, L.; Rotondi, N.; McKnight, C.; Firestone, M.; Smylie, J.; Rotondie, M. Unweighted regression models perform better than weighted regression techniques for respondent-driven sampling data: Results from a simulation study. BMC Med. Res. Methodol. 2019, 19, 202. [Google Scholar] [CrossRef] [Green Version]
  16. Beckett, M.; Firestone, M.A.; McKnight, C.D. A cross-sectional analysis of the relationship between diabetes and health access barriers in an urban First Nations population in Canada. BMJ Open 2017, 8, e018272. [Google Scholar] [CrossRef]
  17. da Silva Lima, F.S.; Merchán-Hamann, E.; Urdaneta, M. Fatores associados à violência contra mulheres profissionais do sexo de dez cidades brasileiras. CAD Saude Publica 2017, 33, 1–15. [Google Scholar]
  18. Selvaraj, B.; Boopathi, K.; Paranjape, R.; Mehendale, S. A single weighting approach to analyze respondent-driven sampling data. Indian J. Med. Res. 2016, 144, 447–459. [Google Scholar]
  19. Hubbart, A.E.; Ahern, J.; Fleischer, N.L.; Van der Laan, M.; Lippman, S.A.; Jewell, T.B.; Satariano, W.A. To GEE or not to GEE. Epidemiology 2010, 21, 467–474. [Google Scholar] [CrossRef]
  20. Rao, S.; LaRoque, R.; Jentes, E. Comparison of methods for clustered data analysis in a non-ideal situation: Results from an evaluation of predictors of yellow fever vaccine refusal in the global TravEpiNet (GTEN) consortium. Int. J. Stat. Med. Res. 2014, 3, 215–223. [Google Scholar] [CrossRef]
  21. Lyons, C.E.; Grosso, A.; Drame, F.M. Physical and sexual violence affecting female sex workers in Abidjan, Côte d’Ivoire: Prevalence, and the relationship with the work environment, HIV, and access to health services. J. Acquir. Immune Defic. Syndr. 2017, 75, 9–17. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  22. Schwartz, S.; Papworth, E.; Thiam-Niangoin, M. An urgent need for integration of family planning services into HIV care. J. Acquir. Immune Defic. Syndr. 2015, 68, 91–98. [Google Scholar] [CrossRef] [PubMed]
  23. de Matos, M.A.; da Silva França, D.D.; dos Santos Carneiro, M.A. Viral hepatitis in female sex workers using the respondent-driven sampling. Rev. Saude Publica 2017, 51, 1–11. [Google Scholar] [CrossRef] [Green Version]
  24. Scheim, A.; Bauer, G.; Coleman, T. Sociodemographic differences by survey mode in a respondent-driven sampling study of transgender people in Ontario, Canada. LGBT Health 2016, 3, 391–395. [Google Scholar] [CrossRef] [PubMed]
  25. Pan, X.; Wu, M.; Ma, Q. High prevalence of HIV among men who have sex with men in Zhejiang, China: A respondent-driven sampling survey. BMJ Open 2015, 5, 1–7. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  26. Maragh-Bass, A.C.; Powell, C.; Park, J. Sociodemographic and access related correlates of health-care utilization among African American injection drug users: The BESURE study. J. Ethn. Subst. Abuse 2017, 16, 344–362. [Google Scholar] [CrossRef]
  27. Gile, K.; Handcock, M. Respondent-driven sampling: An assessment of current methodology. Sociol. Methodol. 2010, 40, 285–327. [Google Scholar] [CrossRef] [Green Version]
  28. Särndal, C.E.; Swensson, B.; Wretman, J. Model Assisted Survey Sampling; Springer: Berlin/Heidelberg, Germany, 1992. [Google Scholar]
  29. Hansen, M.H.; Hurvitz, W.N. On the theory of sampling from finite populations. Ann. Math. Stat. 1943, 14, 333–362. [Google Scholar] [CrossRef]
  30. Deville, J.C.; Särndal, C.E. Calibration estimators in survey sampling. J. Am. Stat. Assoc. 1992, 87, 376–382. [Google Scholar] [CrossRef]
  31. Wolter, K. Introduction to Variance Estimation, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
  32. Larrea, C.; Torres, F.; López, N.; Rueda, M. Pueblos indíGenas, Desarrollo Humano y Discriminación en el Ecuador; Abya Yala: Quito, Ecuador, 2007. [Google Scholar]
  33. Chisaguano, S. La Población Indígena del Ecuador (Análisis de Estadísticas Socio-Demográficas). INEC. Available online: https://www.acnur.org/fileadmin/Documentos/Publicaciones/2009/7015.pdf (accessed on 5 May 2020).
  34. Araki, H. Movimientos éTnicos y Multiculturalismo en el Ecuador: Pueblos indíGenas, Afrodescendientes y Montubios. Master’s Thesis, University of Kanagawa, Kanagawa, Japan, 2012. Available online: http://klibredb.lib.kanagawa-u.ac.jp/dspace/handle/10487/12164 (accessed on 5 May 2020).
  35. Uquillas, J.; Carrasco, T.; Rees, M. Exclusión Social y Estrategias de vida de los indíGenas Urbanos en Perú; Banco Mundial: Quito, Ecuador, 2003. [Google Scholar]
  36. Mullo, H.S.; Sánchez-Borrego, I.; Pasadas, S. Respondent-driven sampling for surveying ethnic minority in Ecuador. Manuscript in preparation.
Figure 1. Schematic representation of the proposed method.
Figure 1. Schematic representation of the proposed method.
Mathematics 08 01315 g001
Table 1. Percent relative bias ( r b % ) and the relative mean squared error ( r m s e % ) for estimating S y x with estimators S ^ y x , R D S I I and S ^ y x , R D S S S in the three scenarios. RDS—respondent-driven sampling.
Table 1. Percent relative bias ( r b % ) and the relative mean squared error ( r m s e % ) for estimating S y x with estimators S ^ y x , R D S I I and S ^ y x , R D S S S in the three scenarios. RDS—respondent-driven sampling.
Estimators S ^ y x , R D S I I S ^ y x , R D S S S
r b % r m s e % r b % r m s e %
Scenario 11.49530.81581.50550.8065
Scenario 21.78571.09061.78971.0782
Scenario 31.27450.65161.29240.6447
Table 2. Percent relative bias ( r b % ) and the relative mean squared error ( r m s e % ) for estimating the correlation coefficient ρ with estimators ρ ^ R D S I I and ρ ^ R D S S S in the three scenarios.
Table 2. Percent relative bias ( r b % ) and the relative mean squared error ( r m s e % ) for estimating the correlation coefficient ρ with estimators ρ ^ R D S I I and ρ ^ R D S S S in the three scenarios.
Estimators ρ ^ R D S I I ρ ^ R D S S S
r b % r m s e % r b % r m s e %
Scenario 10.47380.12620.47860.1245
Scenario 20.86560.33470.86280.3304
Scenario 30.18890.02440.20030.0242
Table 3. Percent relative bias ( r b % ) and the relative mean squared error ( r m s e % ) for estimating S y x with estimators S ^ y x , R D S I I and S ^ y x , R D S S S for the ethnic example.
Table 3. Percent relative bias ( r b % ) and the relative mean squared error ( r m s e % ) for estimating S y x with estimators S ^ y x , R D S I I and S ^ y x , R D S S S for the ethnic example.
S ^ yx , RDS II S ^ yx , RDS SS
rb % rmse % rb % rmse %
−5.6710210.8551362−7.1473570.5216048
Table 4. Percent relative bias ( r b % ) and the relative mean squared error ( r m s e % ) for estimating the correlation coefficient ρ with estimators ρ ^ R D S I I and ρ ^ R D S S S for the ethnic example.
Table 4. Percent relative bias ( r b % ) and the relative mean squared error ( r m s e % ) for estimating the correlation coefficient ρ with estimators ρ ^ R D S I I and ρ ^ R D S S S for the ethnic example.
ρ ^ RDS II ρ ^ RDS SS
rb % rmse % rb % rmse %
6.13710.44604.90650.2407

Share and Cite

MDPI and ACS Style

Sánchez-Borrego, I.; Rueda, M.d.M.; Mullo, H. Estimation of Non-Linear Parameters with Data Collected Using Respondent-Driven Sampling. Mathematics 2020, 8, 1315. https://0-doi-org.brum.beds.ac.uk/10.3390/math8081315

AMA Style

Sánchez-Borrego I, Rueda MdM, Mullo H. Estimation of Non-Linear Parameters with Data Collected Using Respondent-Driven Sampling. Mathematics. 2020; 8(8):1315. https://0-doi-org.brum.beds.ac.uk/10.3390/math8081315

Chicago/Turabian Style

Sánchez-Borrego, Ismael, María del Mar Rueda, and Héctor Mullo. 2020. "Estimation of Non-Linear Parameters with Data Collected Using Respondent-Driven Sampling" Mathematics 8, no. 8: 1315. https://0-doi-org.brum.beds.ac.uk/10.3390/math8081315

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop