1. Introduction
Regression models incorporating change-points have been popular in many application areas, such as finance (Chen [
1]), medicine research (Kim et al. [
2]), biology (Chen and Wang [
3], Acitas and Senoglu [
4]), climate (Reeves et al. [
5]). In these documents, change-point detection in the normal linear regression model is probably the most popular. Chen and Gupta [
6] first used the Schwarz Information Criterion (SIC) model selection method to detect the mean and variance change-points in the normal linear regression model. Chen and Wang [
3] applied the normal mean change-point model to detect DNA copy number variations using the SIC method. Chen et al. [
7] compared four methods for estimating single or multiple change-points in a normal regression model when both the error variance and regression coefficients change simultaneously at the unknown point. Meanwhile, many robust change-point detecting models with heavy-tailed distributions have been developed to deal with data with outliers. Osorio and Galea [
8] discussed the regression coefficient change-point model with the Student-t error assumption and designed the SIC method to detect the position of the change-point. Moreover, in the Student t regression framework, Lin et al. [
9] proposed a kind of Bayesian methodology to detect the variance change-point and analyze the U.S. stock market data. Kang et al. [
10] extended Lin’ work to linear models with variance change-point by the assumption of a class of symmetric heavy-tailed errors which includes the Student-t distribution. Lu and Chang [
11] developed a robust algorithm to detect regression coefficient change-point by incorporating the Student-t distribution with the expectation-maximization (EM) algorithm and the fuzzy classification procedure. Yang [
12] designed a robust mean change-point detection procedure by employing the Laplace regression model via the EM algorithm. Alin et al. [
13] proposed a robust
F-test for change-point detection in linear regression models with a non-normal error structure.
The quantile regression (QR) model, pioneered by Koenker and Bassett [
14], has been powerful in comprehensively assessing the effects of explanatory covariates at different quantiles of response outcomes. In contrast to the usual mean regression model, the QR model is less sensitive to outliers in the data; thus, when the data comes from some heavy-tailed distribution, the inferences based on QR are more credible than the ones derived from the usual mean regression. Since a set of quantiles often provides a more detailed description of the response distribution than the usual mean, QR offers a practically important alternative to the classical mean regression. Koenker [
15] provides a detailed introduction to QR in theories, applications and computation. In recent years, many authors have devoted themselves to studying the change-point problem in the QR framework; Qu [
16] proposed several tests for structural change in regression quantiles, such as fluctuation type statistic and Wald type statistic. Zhang et al. [
17] developed a procedure for testing change-point due to covariate threshold in regression quantiles. Gabriela [
18] considered the estimation problem in a nonlinear quantile model with change-points. Guo and He [
19] discussed the quantile auto-regression change-point issue from a Bayesian perspective and developed the monte carlo markov chain (MCMC) algorithm. Although MCMC algorithms are widely used in Bayesian inference for complex data, we should mention that it is an iterative sampling algorithm; thus, the samples generated from the full conditional distributions are not independent, and the convergence assessment is first needed before inferences using these samples. Despite several well-known convergence criteria, determining when the markov chain achieves convergence is still a challenging task in Bayesian analysis.
Tan et al. [
20] discovered an equation called the Inverse Bayes formula (IBF) and developed it into a sampling algorithm named IBF sampler, which is highly useful in doing bayesian sampling. Compared to the classical MCMC sampling, this sampler possesses two main advantages. First, it is non-iterative and thus fast. Second, it can draw independently and identically distributed (i.i.d) samples exactly or approximately from the target posterior distribution, and the samples can be used for statistical inferences immediately without convergence diagnosis. The main idea of the IBF sampler is to use the EM algorithm to obtain an optimal importance sampling density and then use the sampling/importance resampling algorithm to get i.i.d. samples from the target distribution. To get a comprehensive understanding of the IBF sampler, one can refer to Tan et al. [
21], which gave a detailed introduction to IBF, including its extensions and wide applications. In recent years, the applications of the IBF algorithm appear prosperous, to name a few, Yang and Yuan [
22,
23] applied the idea of IBF to deal with quantile regression model and scale mixture normal regression model, respectively. Tsionas [
24] used the IBF sampler to research financial areas with stochastic volatility models. As to change-point detection, Tian et al. [
25] was the first to apply the non-iterative Bayesian method to deal with change-point problem, and the authors identified the position of change-points in a sequence of hemolytic uremic syndrome using IBF sampler. Recently, Yang and Yuan [
26,
27] used the idea of an IBF sampler to detect the position of change-point in normal mean and normal regression coefficient, respectively.
Motivated by Tan et al. [
20,
21], in this article, a non-iterative Bayesian sampling algorithm to detect the position of regression coefficient change-point based on the QR model is discussed. The procedure involves several key steps. Firstly, the representation of asymmetric Laplace distribution (ALD) makes it possible to transfer the observed likelihood into a missing data structure by augmenting a latent exponential distributed variable, which is crucial for implementing the EM algorithm and IBF sampler. Then, the best posterior importance sampling distribution to overlap the target density is obtained by using the posterior mode from the EM algorithm. Finally, the sampling IBF and sampling/importance resampling algorithm are combined to get i.i.d. samples approximately from the observed posterior distributions. Simulation studies are conducted to evaluate the robustness of the procedure, and real data are analyzed to illustrate the usefulness of the algorithm by comparison with the classical method to detect the position of the change-point in regression models. The novelty of this paper lies in that it is the first time to apply the IBF algorithm to focus on a change-point problem based on a quantile regression model, which combines the robustness of quantile regression in detecting change-point and the computational effectiveness of the IBF sampler.
2. Quantile Regression (QR) Model
Let
Y denote a response variable and
X the related explanatory vector with
p components, and suppose the observations are
. We introduce the linear quantile regression model as
where
is a unknown parameters vector of interest,
is the error term whose distribution is restricted to have the
q-th quantile equal to zero, and
denotes the transpose of
. So,
quantified the
q-th conditional quantile regression coefficient of
given
, for
, and hereafter,
is simplified as
. The estimation
of
is the one minimizing
with
denoting the loss function
and here
is the indicator function. Note that when
, the loss function is the symmetric absolute loss, which leads to the Laplace regression or median regression. Realizing that
is not differentiable at zero,
cannot be obtained in a closed form. In documents, some elaborate methods, such as the interior point algorithm and the Majorization–Minimization algorithm are developed to obtain
, see Koenker and Park [
28], Hunter and Lange [
29]. Many authors have realized that minimizing
is equivalent to maximizing the likelihood function of a linear regression model with errors following the asymmetric Laplace distribution (ALD
), whose density function is given by
In the literature, Koenker and Machado [
30] studied the likelihood ratio test statistics quantile regression coefficients with ALD assumption, Yu and Moyeed [
31] proposed the statistical inference of QR from a Bayesian perspective based on the ALD error distribution. Using a novel location-scale mixture representation of ALD, Reed and Yu [
32], Kozumi and Kobayashi [
33] developed an efficient and easily implemented Gibbs sampling method for quantile regression in the Bayesian framework. This representation has been widely used in the latest parameter quantile regression studies, such as longitudinal data and variable selection. To name a few, Zhou et al. [
34] and Tian et al. [
35] proposed the EM algorithm for computing the MLE or posterior mode of regression coefficients in the linear quantile regression model, Luo et al. [
36] studied the quantile regression model for longitudinal data using a Gibbs sampler. Alhamzawi and Yu [
37], Ji et al. [
38] studied the variable selection problems in quantile regression, tobit and binary quantile regression. Tian et al. [
39] discussed the inferences problems of a mixture of linear quantile regression using the EM algorithm. We notice that recently, Gallardo et al. [
40] introduced a new parametric quantile regression model for asymmetric response variables with a power skew–normal distribution, and Reyes et al. [
41] proposed a kind of quantile regression model with a generalization of the Student-t distribution as response variables, although these two distributions are different from the asymmetric Laplace distribution in this paper, the ideas of re-parametrization of these two distributions are similar to the mixture representation of ALD.
3. Quantile Regression Model with Change-Point
In this paper, we suppose a change-point exists in the
q-th quantile regression coefficients at an unknown position
k, such that
where
and
are the
q-th conditional quantile regression coefficients for the first
k observations and the last
observations, respectively. For simplicity, the scale parameter
in the ALD error distribution is set to 1. Our principal interest involves the determination of the unknown change-point position
k and other parameters from a Bayesian perspective. Denote
and observation vector
, then the likelihood function of
based on
is
By combining this likelihood with some prior distributions, we can get the kernel of the posterior distribution, but the complexity of the ALD makes the posterior inference difficult. So Reed and Yu [
32], Kozumi and Kobayashi [
33], and Zhou et al. [
34] represented the ALD
into a normal-exponential mixing distribution as Proposition 1.
Proposition 1 (Mixture Representation of the Asymmetric Laplace Distribution)
. Let z be an exponential variable with mean σ and u a standard normal variable. If a random variable ϵ follows the asymmetric Laplace distribution with density (2), then we can represent ε as a location-scale mixture of normals given by where Based on this representation, the quantile regression model with a single change-point (3) becomes
where
are mutually independent. By introducing latent vector
, the likelihood function based on complete data
appears as
4. Bayesian Inferences and IBF Algorithm
4.1. The Prior and Conditional Distributions
To emphasize the influence of data on the position of change-point and other parameters, we adopt a set of independent non-informative priors as follows
here the prior of
k is a discrete uniform distribution on
. The implementation of the IBF algorithm involves several conditional distributions, which are
,
,
, and
. These distributions are all proportional to the joint conditional distribution of the parameters, that is
with
representing the indicator function. Denote
and
we have
consequently,
Likewise, let
and
we have
consequently,
Next, by integrating
from the joint distribution
in Equation (
4), one can get the conditional distribution of the position of change-point
k based on complete data
, that is for
,
with
Finally, we present the conditional distribution of latent vector
, that is
explicitly,
are mutually conditional independent, and for
,
for
,
where
denotes the generalized inverse Gaussian distribution with density
and
presents the modified Bessel function of the third kind in Barndorff-Nielsen and Shephard [
42].
4.2. EM Algorithm to Compute the Posterior Modes
To implement the IBF sampler, we first use the EM algorithm to find the observed posterior mode
of
and get the optimal importance sampling density
. The log-posterior based on complete data
is that
Based on Zhou et al. [
34] and Tian et al. [
35], we get
Suppose the initial value of is , and the -th iteration values is , the EM algorithm in the t-th iteration involves two steps:
E step: Omitting the terms without relation to
, the
Q function in the E step is the conditional expectation of
with respect to the conditional distribution
, which is presented as
with
M step: We maximize the
Q function with respect to
. For fixed
k, let
and
then we get
with
indicating the
k dimension vector with each element equal 1. By denoting the convergence values as
and
, we can obtain
with
and then, we use samples
and
to obtain the posterior mode
and
via another EM algorithm, respectively. After convergence, we get
, the mode of posterior distribution
.
4.3. IBF Sampler
Based on the following expression
in order to get samples from
, we first need to draw samples from
. The IBF sampler proposed by Tan et al. [
20,
21] shows that
where
is an importance sampling density or proposal density in the sampling/importance resampling algorithm, with
denoting the posterior mode of
obtained by EM algorithm in
Section 4.2. The IBF sampling algorithm follows by
Step 1. Based on (14), for
, draw i.i.d. samples
and based on (15), for
, draw i.i.d. samples
Denote
, then
are the samples from
Step 2. Calculate the weights
where
is obtained from (8), (10) and (12) with
replaced by
.
Step 3. Choose a subset from via resampling without replacement from the Step discrete distribution on with probabilities to obtain an i.i.d. sample of size approximately from , denoted by
For
, let
and
For , sample from with probabilities based on (12).
Step 5. Denote
for
, based on (9), generate
and based on (11), generate
Denote , then, are the i.i.d. samples approximately from , which can be used for statistical inferences immediately without convergence diagnosis.
5. Simulation Studies
In this section, we conduct simulations to investigate the performance of the proposed approach to detect the position of the change-point by comparing it with the normal mean change-point detecting method proposed by Chen [
1]. We generate
observations with a change-point at position
k from the following model:
where
,
, and
for
and
. Here
k is set as 40, 60, 80, 100, 120, 140, and 160, respectively, and the following four error distributions are considered:
Normal distribution: ,
Laplace distribution: ,
t distribution with three degree of freedom: ,
Cauchy distribution.
In these distributions, cases 2–4 are heavy-tailed and often used to describe the outlier situations, and especially case 4, the Cauchy distribution is extremely heavy-tailed. The quantile levels to estimate the position of change-point are set as 0.25, 0.5 and 0.75, and the experiment is replicated 500 times. In each replication, firstly, a position
and the related least square estimators
and
based on the first 100 observations and later 100 observations respectively are used as initial values to implement the EM algorithm with aim to obtain the posterior mode
, which is then used in the IBF sampling to get an importance sampling density
. Secondly,
are generated from
with
by SIR algorithm, and finally,
i.i.d. posterior samples
are generated from
. The estimation of change-point position
k is denoted as
in the
i-th replication, which is the mean of
L posterior samples of
k, and the final estimate of
k is calculated as
. For comparison purposes, the absolute deviance (AD), the root of the mean square error (RMSE), and mean absolute deviance (MAD) of the estimated change-point position
k are given as
and
The results based on the normal method proposed by Chen [
1] are also recorded.
Table 1 presents the results. It is not surprising that when the errors come from
distribution, the estimators based on the normal model to detect the position of the change-point are more accurate than the ones via quantile regression method, with smaller AD, MAD and RMSE. However, in the heavy-tailed
,
and Cauchy cases, overwhelmingly, the estimators based on the quantile regression approach with IBF sampler are considerately superior to the ones according to the normal method. Specifically, in the extremely heavy-tailed Cauchy situation, the normal method simply uses values around 100 to estimate the position of the change-point whenever the real
k is small or large; on the contrary, the performance of the quantile regression method is relatively much better. Clearly, the quantile regression method based on the IBF sampler is more robust than the usual normal method to estimate the position of regression coefficient change-point when the data are from heavy-tailed distributions.
6. American Stock Market Data Analysis
Holbert [
43] conducted a Bayesian analysis of a switching linear model and analyzed the Stock Exchange data in America, which includes
monthly observations from January 1967 to November 1969. Response variable
Y is the monthly dollar volume of sales (in millions) on the Boston Stock Exchange (BSE), and regressor
X is the combined New York American Stock Exchange (NYAMSE). The scatter plot shows that a suspicious change-point may exist at some time position, so a regression model with a change-point may be better to fit this data than a simple linear regression model. Chen [
1] analyzed this data using the normal change-point model and detected a possible position of change-point, which is
with minimal SIC=358.18. Therefore, Chen [
1] believed the regression model change starts at the time point 24, December 1968. Osorio and Galea [
8] studied this data based on the Student-t model with the SIC technique, finding that
for
and
when the degree of freedom is bigger than 4. Osorio and Galea [
8] finally drew the same conclusion as Chen [
1] that the regression model change starts at time point 24. Yang [
12] detected change-point position 9 based on the Laplace regression model with a minimal SIC=353.83. Carefully examining the scatter plot, it can be seen that point 22 at the top-right is far away from other points, and
is much bigger than others; thus, point 22 appears as an outlier. We consider the following robust QR change-point model
Using the above QR change-point model with 3000 generated IBF samples, we find out that the change-point possibly occurs at
for quantile level
q inside interval [0.45, 0.78], and
for
q outside this interval.
Figure 1 presents the scatter plots and estimated regression lines based on the normal model and QR change-point model with five quantile levels.
Figure 1a presents the result by the normal model with detected change-point position 23, clearly point 22 has a high effect on the estimation, for that point 22 increases the slope of the black line by dragging this line towards itself.
Figure 1b,c,f show the estimated quantile regression lines with relatively low level
,
and high level
respectively, and the estimated change-points are all 23, but the recognized regression lines can not effectively reflect the whole dependent relationship between BSE and NYAMSE.
Figure 1d,e gives the estimated median and level
regression lines with change-point 9, which are robust to outlier 22 and reasonable in describing the relationship between the response and regressor. Therefore, we consider time point 9 as the position of the change-point and believe that the regression model change starts at time point 10, which is October of 1967. The median and
quantile regression models with change-point 9 are both reasonable. Take
for example, in
Figure 1e, the estimated regression coefficients (black solid line) are
for observations 1–9, and
for observations 10–35 (red dashed line). This example illustrates that the quantile regression model with the IBF sampler effectively detects the position of possible change-point. As a by-product, the density histograms of the posterior samples generated by the IBF sampler from the observed posterior distribution of parameters are presented in
Figure 2, which shows us how the parameters are distributed given the observed data graphically.