Estimating Distributions of Parameters in Nonlinear State Space Models with Replica Exchange Particle Marginal Metropolis–Hastings Method

Inoue, Hiroaki; Hukushima, Koji; Omori, Toshiaki

doi:10.3390/e24010115

Open AccessArticle

Estimating Distributions of Parameters in Nonlinear State Space Models with Replica Exchange Particle Marginal Metropolis–Hastings Method

by

Hiroaki Inoue

¹,

Koji Hukushima

^2,3

and

Toshiaki Omori

^1,4,5,*

¹

Graduate School of Engineering, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan

²

Graduate School of Arts and Sciences, The University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8902, Japan

³

Komaba Institute for Science, The University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8902, Japan

⁴

Organization for Advanced and Integrated Research, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan

⁵

Center for Mathematical and Data Sciences, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan

^*

Author to whom correspondence should be addressed.

Entropy 2022, 24(1), 115; https://0-doi-org.brum.beds.ac.uk/10.3390/e24010115

Submission received: 25 November 2021 / Revised: 29 December 2021 / Accepted: 7 January 2022 / Published: 12 January 2022

(This article belongs to the Special Issue Information Theory, Data Assimilation and Stochastics for Multiscale Nonlinear Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Extracting latent nonlinear dynamics from observed time-series data is important for understanding a dynamic system against the background of the observed data. A state space model is a probabilistic graphical model for time-series data, which describes the probabilistic dependence between latent variables at subsequent times and between latent variables and observations. Since, in many situations, the values of the parameters in the state space model are unknown, estimating the parameters from observations is an important task. The particle marginal Metropolis–Hastings (PMMH) method is a method for estimating the marginal posterior distribution of parameters obtained by marginalization over the distribution of latent variables in the state space model. Although, in principle, we can estimate the marginal posterior distribution of parameters by iterating this method infinitely, the estimated result depends on the initial values for a finite number of times in practice. In this paper, we propose a replica exchange particle marginal Metropolis–Hastings (REPMMH) method as a method to improve this problem by combining the PMMH method with the replica exchange method. By using the proposed method, we simultaneously realize a global search at a high temperature and a local fine search at a low temperature. We evaluate the proposed method using simulated data obtained from the Izhikevich neuron model and Lévy-driven stochastic volatility model, and we show that the proposed REPMMH method improves the problem of the initial value dependence in the PMMH method, and realizes efficient sampling of parameters in the state space models compared with existing methods.

Keywords:

state space model; probabilistic graphical model; replica exchange particle Metropolis–Hastings method; replica exchange method; particle Metropolis–Hastings method; particle Markov chain Monte Carlo method

1. Introduction

Extracting latent nonlinear dynamics from observed time-series data is important for understanding the dynamic system against the background of the observed data. A state space model is a probabilistic graphical model for time-series data that assumes the existence of latent variables that cannot be observed directly [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]. State space models are used in various fields to forecast observation values [7,15,22] and to estimate latent variables [11,20,21,26]. In many cases, however, model parameters are unknown. Therefore, estimating the model parameters from observations is an important task for the state space models.

To estimate the parameters in the state space models from observations, a method combining the sequential Monte Carlo (SMC) method [2,4,8,9,10,11,12,14,16,17,18,19,20,23,24,25,27] with the expectation–maximization (EM) algorithm [8,28,29] has been proposed [9,10,11,14,20]. This method is based on a maximum likelihood estimation framework, and it estimates parameters by sequentially updating the parameters so that the likelihood increases. Although it is guaranteed that a local optimum can be estimated by iteratively updating the parameters, the global optimum may not be estimated depending on the initial values of parameters. Furthermore, since the EM algorithm is a point estimation method, it is not possible to identify whether converged values are local or global optima.

To estimate the distribution of parameters in a state space model, two kinds of particle Markov chain Monte Carlo (PMCMC) methods have been proposed: the particle Gibbs (PG) method and the particle marginal Metropolis–Hastings (PMMH) method [12]. Both methods combine Markov chain Monte Carlo (MCMC) methods with the SMC method, and the distribution of parameters is estimated by collecting samples. The PG method combines the SMC method with Gibbs sampling [8,30], and the PG method samples latent variables and parameters in a state space model from the joint posterior distribution of latent variables and parameters alternately. In the PG method, the SMC method is employed for sampling latent variables. The PMMH method, on the other hand, combines the SMC method with the Metropolis–Hastings (MH) algorithm [8,25,31,32], and the PMMH method samples parameters in a state space model from the marginal posterior distribution of parameters. In the PMMH method, the SMC method is employed for calculating the likelihood marginalized over the distribution of latent variables. Both the PG method and the PMMH method have been widely applied (for example, the PG method [33,34], the PMMH method [35,36,37]).

In recent years, some extended versions of PG methods have been proposed to improve the sampling efficiency and initial value dependence, such as the particle Gibbs with ancestor sampling (PGAS) method and the replica exchange particle Gibbs with ancestor sampling (REPGAS) method [16,18,19,24]. Thus far, however, the existing methods for such improvement have been proposed for only the PG method. Therefore, it is important to improve the PMMH method for the accurate estimation of parameters since the PMMH method may have the problem of the initial value dependence.

In this paper, we propose the replica exchange particle marginal Metropolis–Hastings (REPMMH) method, which combines the PMMH method with the replica exchange method [24,38,39,40] in order to improve the problem of initial value dependence in the PMMH method. Combining the replica exchange method with the PMMH method makes it possible to estimate the parameters governing the dynamics for very complex and nonlinear time-series data. We first describe the state space models and explain the PMMH method as a conventional method. Then, after explaining our proposed method, we conduct experiments to compare the proposed method with the conventional methods, the PMMH method and the REPGAS method.

2. Methods

In this section, we propose our replica exchange particle marginal Metropolis–Hastings (REPMMH) method. First, we describe a state space model for time-series data using a probabilistic graphical model. Next, we describe the conventional particle marginal Metropolis–Hastings (PMMH) method to estimate the marginal posterior distribution of parameters obtained by marginalization over the distribution of latent variables in a state space model. After this, we propose the REPMMH method, which combines the PMMH method with the replica exchange method to improve the problem of initial value dependence in the PMMH method.

2.1. State Space Model

We show a state space model as a probabilistic graphical model in Figure 1. Note that there are two type of variables, latent variables

z_{1 : N} = \{z_{1}, z_{2}, \dots, z_{N}\}

and observations

y_{1 : N} = \{y_{1}, y_{2}, \dots, y_{N}\}

, at the time steps

\{1, 2, \dots, N\}

in the state space model. The latent variables

z_{1 : N}

cannot be observed directly and only the observations

y_{1 : N}

are observable. At a time step n, the state space model is represented as follows:

\begin{matrix} z_{n} \sim f (z_{n} | z_{n - 1}, θ_{f}), \end{matrix}

(1)

\begin{matrix} y_{n} \sim g (y_{n} | z_{n}, θ_{g}), \end{matrix}

(2)

where

f (z_{n} | z_{n - 1}, θ_{f})

and

g (y_{n} | z_{n}, θ_{g})

are called a system model and an observation model, respectively.

θ_{f}

are the parameters of the system model, and

θ_{g}

are the parameters of the observation model. The system model

f (z_{n} | z_{n - 1}, θ_{f})

represents the process of updating the latent variables

z_{n}

from the previous latent variables

z_{n - 1}

. Moreover, the observation model

g (y_{n} | z_{n}, θ_{g})

represents the process of obtaining observations

y_{n}

from the latent variables

z_{n}

.

The goal of this paper is to estimate the posterior distribution of the parameters

p (Θ | y_{1 : N})

for the given observations

y_{1 : N}

, where

Θ

is represented as

Θ = \{θ_{f}, θ_{g}\}

. However, since the latent variables exist in the state space models, it is necessary to perform marginalization with respect to the latent variables in order to obtain the marginal posterior distribution

p (Θ | y_{1 : N})

. Because it is often difficult to calculate the marginal posterior distribution

p (Θ | y_{1 : N})

analytically, we propose a new method for estimating the marginal posterior distribution

p (Θ | y_{1 : N})

based on the PMMH method in this paper.

2.2. Particle Marginal Metropolis–Hastings Method

The PMMH method combines the sequential Monte Carlo (SMC) method [2,4,8,9,10,11,12,14,16,17,18,19,20,23,24,25,27] with the Metropolis–Hastings (MH) algorithm [8,25,31,32]. The PMMH method was proposed to estimate the marginal posterior distribution of parameters

p (Θ | y_{1 : N})

for time-series observations

y_{1 : N}

represented as a state space model [12].

In the PMMH method, the marginal likelihood

p (y_{1 : N} | Θ)

is used to evaluate the appropriateness of parameters

Θ

. Here, the SMC method is used to calculate the marginal likelihood

p (y_{1 : N} | Θ)

of the parameters

Θ

obtained by marginalization over the distribution of latent variables

z_{1 : N}

. A new sample candidate of parameters

Θ^{*} = \{θ_{f}^{*}, θ_{g}^{*}\}

is proposed from an arbitrary proposal distribution

q (Θ | Θ [k - 1])

given the sample one step before

Θ [k - 1]

, where k is the sample number. Moreover, whether to accept or reject the sample candidate

Θ^{*}

is determined based on the following acceptance probability:

p_{accept} = min (1, \frac{p (y_{1 : N} | Θ^{*}) p (Θ^{*})}{p (y_{1 : N} | Θ [k - 1]) p (Θ [k - 1])} \frac{q (Θ [k - 1] | Θ^{*})}{q (Θ^{*} | Θ [k - 1])}),

(3)

where

p (Θ)

represents the prior distribution of parameters.

p (y_{1 : N} | Θ)

is the marginal likelihood obtained by marginalization over the distributions of latent variables

z_{1 : N}

as follows:

\begin{matrix} p (y_{1 : N} | Θ) & = & \int p (y_{1 : N}, z_{1 : N} | Θ) d z_{1 : N} \\ = & \int g (y_{1} | z_{1}, θ_{g}) p (z_{1}) \prod_{n = 2}^{N} g (y_{n} | z_{n}, θ_{g}) f (z_{n} | z_{n - 1}, θ_{f}) d z_{1 : N}, \end{matrix}

(4)

where

p (z_{1})

is the distribution of latent variables

z_{1}

at time step 1. Since it is difficult to obtain the marginal likelihood

p (y_{1 : N} | Θ)

analytically, the SMC method is used in the PMMH method to calculate the marginal likelihood

p (y_{1 : N} | Θ)

numerically.

The SMC method estimates the distribution of latent variables by approximating the distribution with the density of the particles

\{z_{1 : N}^{(1)}, z_{1 : N}^{(2)}, \dots, z_{1 : N}^{(M)}\}

as follows:

p (z_{1 : N} | y_{1 : N}, Θ) ≃ \frac{1}{M} \sum_{m = 1}^{M} δ (z_{1 : N} - z_{1 : N}^{(m)}),

(5)

where

z_{1 : N}^{(m)}

is the m-th particle and M is the number of particles.

δ (z_{1 : N})

is the Dirac delta distribution.

To obtain particles

\{z_{n}^{(1)}, z_{n}^{(2)}, \dots, z_{n}^{(M)}\}

at a time step n, we sample the m-th particle

z_{n}^{(m)}

at the time step n from the m-th particle

z_{n - 1}^{(m)}

at the previous time step

n - 1

for each

m \in \{1, 2, \dots, M\}

with the system model as follows:

z_{n}^{(m)} \sim f (z_{n} | z_{n - 1}^{(m)}, θ_{f}) .

(6)

Moreover, the obtained particles

\{z_{n}^{(1)}, z_{n}^{(2)}, \dots, z_{n}^{(M)}\}

are resampled based on the normalized weights

\{W_{n}^{(1)}, W_{n}^{(2)}, \dots, W_{n}^{(M)}\}

obtained as follows:

W_{n}^{(m)} = \frac{w_{n}^{(m)}}{\sum_{l = 1}^{M} w_{n}^{(l)}},

(7)

w_{n}^{(m)} = g (y_{n} | z_{n}^{(m)}, θ_{g}) .

(8)

By iterating the above flow for time step

n \in \{1, 2, \dots, N\}

, particles that approximate the distribution of latent variables

z_{1 : N}

can be obtained. Here, the marginal likelihood

p (y_{1 : N} | Θ)

can be calculated approximately as follows:

p (y_{1 : N} | Θ) = \prod_{n = 1}^{N} p (y_{n} | y_{1 : n - 1}, Θ) ≃ \frac{1}{M} \prod_{n = 1}^{N} \sum_{m = 1}^{M} w_{n}^{(m)} .

(9)

By calculating the acceptance probability

p_{accept}

in Equation (3) with the marginal likelihood

p (y_{1 : N} | Θ^{*})

for the sample candidate

Θ^{*}

obtained by Equation (9), it is determined whether to accept or reject the proposed sample candidate

Θ^{*}

. We show the flow of the PMMH method described above in Algorithm 1.

Algorithm 1 Particle Marginal Metropolis–Hastings (PMMH) Method.

1:: initialize the parameters $Θ [0]$
2:: for $k = 1, \dots, K$ (K is the number of samples) do
3:: draw the sample candidate of parameters $Θ^{*} \sim q (Θ^{*} | Θ [k - 1])$
4:: draw the initial particles $z_{1}^{(m)} \sim p (z_{1})$ for $m = 1, \dots, M$ (m is the particle number of the particle that is the source of resampling)
5:: calculate the weights of particles $\{w_{1}^{(1)}, w_{1}^{(2)}, \dots, w_{1}^{(M)}\}$ with Equation (8)
6:: normalize the weights of particles $\{W_{1}^{(1)}, W_{1}^{(2)}, \dots, W_{1}^{(M)}\}$ with Equation (7)
7:: resample the particles $\{z_{1}^{(1)}, z_{1}^{(2)}, \dots, z_{1}^{(M)}\}$ according to the normalized weights $\{W_{1}^{(1)}, W_{1}^{(2)}, \dots, W_{1}^{(M)}\}$
8:: for $n = 2, \dots, N$ do
9:: draw the particles $\{z_{n}^{(1)}, z_{n}^{(2)}, \dots, z_{n}^{(M)}\}$ at time step n with Equation (6)
10:: calculate the weights of particles $\{w_{n}^{(1)}, w_{n}^{(2)}, \dots, w_{n}^{(M)}\}$ with Equation (8)
11:: normalize the weights of particles $\{W_{n}^{(1)}, W_{n}^{(2)}, \dots, W_{n}^{(M)}\}$ with Equation (7)
12:: resample the particles $\{z_{n}^{(1)}, z_{n}^{(2)}, \dots, z_{n}^{(M)}\}$ according to the normalized weights $\{W_{n}^{(1)}, W_{n}^{(2)}, \dots, W_{n}^{(M)}\}$
13:: end for
14:: calculate the marginal likelihood $p (y_{1 : N} | Θ^{*})$ with Equation (9)
15:: calculate the acceptance probability $p_{accept}$ with Equation (3)
16:: draw a uniform random number $α \sim U (0, 1)$ ( $U (a, b)$ is a uniform distribution with range $[a, b)$ )
17:: if $α \leq p_{accept}$ then
18:: set the sample of parameters $Θ [k] \leftarrow Θ^{*}$
19:: else
20:: set the sample of parameters $Θ [k] \leftarrow Θ [k - 1]$
21:: end if
22:: end for
23:: return ${\{Θ [k]\}}_{k = 1}^{K}$

2.3. Proposed Method

In our study, we propose the REPMMH method, which combines the PMMH method with the replica exchange method [24,38,39,40] to improve the problem of initial value dependence in the PMMH method. By employing the REPMMH method, we estimate the marginal posterior distribution of parameters from the time-series observations.

2.3.1. Brief Summary of Our Proposed Method

We show the schematic diagram of the REPMMH method in Figure 2. In our proposed REPMMH method, we introduce multiple different replicas of parameters

\{Θ\} = \{Θ^{(1)}, Θ^{(2)}, \dots, Θ^{(r)}, \dots, Θ^{(R)}\}

at temperatures

T = [T^{(1)}, T^{(2)}, \dots, T^{(r)}, \dots, T^{(R)}]

into the PMMH method. As shown in the middle part of Figure 2, we employ the PMMH method in parallel at each temperature. In the PMMH method at each temperature

T^{(r)}

, we obtain the respective marginal likelihood

p {(y_{1 : N} | Θ^{(r) *})}^{\frac{1}{T^{(r)}}}

by employing the SMC method (Figure 2c) with the respective sample candidate

Θ^{(r) *}

proposed in the MH algorithm (Figure 2b).

For each temperature

T^{(r)}

, the SMC method and the MH algorithm are conducted as follows. In the SMC method (Figure 2c), the marginal likelihood

p {(y_{1 : N} | Θ^{(r) *})}^{\frac{1}{T^{(r)}}}

is obtained by iterative procedures of predictions, likelihood calculations and resampling; the latent variables

z_{n}

of the current time step n are predicted and the likelihood is calculated for each particle, and resampling is performed according to the calculated likelihoods of particles at each time step. In the MH algorithm (Figure 2b), the sample candidate

Θ^{(r) *}

is determined to be accepted or rejected with the marginal likelihood

p {(y_{1 : N} | Θ^{(r) *})}^{\frac{1}{T^{(r)}}}

. At this time, the target distribution becomes smooth as the temperature becomes high. As a result, it becomes easier to obtain samples from a wide range. Furthermore, exchanges between the samples at different temperatures are conducted in order to realize the transitions that are difficult depending on the initial values by the conventional PMMH method.

2.3.2. Introducing the Replica Exchange Method into the PMMH Method

Here, we propose the REPMMH method to accurately estimate the distribution of parameters from observed data

y_{1 : N}

. In our proposed method, we introduce replicas of parameters

\{Θ\} = \{Θ^{(1)}, Θ^{(2)}, \dots, Θ^{(r)}, \dots, Θ^{(R)}\}

at different temperatures

T = [T^{(1)}, T^{(2)}, \dots, T^{(r)}, \dots, T^{(R)}]

and consider the extended joint marginal posterior distribution as follows:

π_{EX} (\{Θ\} | y_{1 : N}) = \prod_{r = 1}^{R} π_{T^{(r)}} (Θ^{(r)} | y_{1 : N}),

(10)

where

π_{T^{(r)}} (Θ^{(r)} | y_{1 : N})

expresses the marginal posterior distribution at temperature

T^{(r)}

, which is expressed by using the original marginal posterior distribution

p (Θ^{(r)} | y_{1 : N})

of the parameter

Θ^{(r)}

at the temperature

T^{(1)} = 1.0

as follows:

π_{T^{(r)}} (Θ^{(r)} | y_{1 : N}) = \frac{1}{z (T^{(r)})} p {(Θ^{(r)} | y_{1 : N})}^{\frac{1}{T^{(r)}}} (r = 1, 2, \dots, R),

(11)

where

z (T^{(r)})

is a partition function. Note that, as expressed in Equation (11), at sufficiently high temperatures, the distribution of parameters becomes closer to a uniform distribution, independent of the values of

y_{1 : N}

. The distribution with

T^{(1)} = 1.0

corresponds to the original marginal posterior distribution

p (Θ | y_{1 : N})

to be investigated.

The marginal posterior distribution at each temperature

p (Θ^{(r)} | y_{1 : N})

is obtained using Bayes’ theorem as follows:

p (Θ^{(r)} | y_{1 : N}) = \frac{p (y_{1 : N} | Θ^{(r)}) p (Θ^{(r)})}{p (y_{1 : N})} (r = 1, 2, \dots, R) .

(12)

Namely, the marginal posterior distribution

p (Θ^{(r)} | y_{1 : N})

is proportional to a product of a marginal likelihood

p (y_{1 : N} | Θ^{(r)})

and a prior distribution

p (Θ^{(r)})

of parameters

Θ^{(r)}

. To obtain the marginal likelihood

p (y_{1 : N} | Θ^{(r)})

, marginalization of the joint distribution at each temperature should be conducted as follows:

p (y_{1 : N} | Θ^{(r)}) = \int p (y_{1 : N}, z_{1 : N}^{(r)} | Θ^{(r)}) d z_{1 : N}^{(r)} (r = 1, 2, \dots, R),

(13)

where

z_{1 : N}^{(r)}

are the latent variables at the temperature

T^{(r)}

. By performing the SMC method for all the time steps at each temperature, the marginalization is conducted numerically.

As shown in Figure 2b,c, in the proposed method, the SMC method and the MH algorithm are conducted for each temperature. In the SMC method, the marginal likelihood of parameters

p (y_{1 : N} | Θ^{(r) *})

is determined by the numerical marginalization using the candidate of parameters

Θ^{(r) *}

proposed in the MH algorithms.

In the MH algorithm, the candidate of parameters

Θ^{(r) *}

is determined to be accepted or rejected at each temperature

T^{(r)}

with the marginal posterior

π_{T^{(r)}} (Θ^{(r) *} | y_{1 : N})

(Figure 2b). Here, the acceptance probability

p_{accept}^{(r)}

at each temperature is calculated as follows:

p_{accept}^{(r)} = min (1, \frac{p (y_{1 : N} | Θ^{(r) *}) p (Θ^{(r) *})}{p (y_{1 : N} | Θ^{(r)} [k - 1]) p (Θ^{(r)} [k - 1])} \frac{q (Θ^{(r)} [k - 1] | Θ^{(r) *})}{q (Θ^{(r) *} | Θ^{(r)} [k - 1])}) .

(14)

Moreover, we exchange samples between different temperatures

T^{(r)}

and

T^{(r + 1)}

according to the exchange probability as follows:

p_{EX} (\{Θ\}, {\{Θ\}}^{*}) = min (1, R_{EX} (\{Θ\}, {\{Θ\}}^{*})),

(15)

R_{EX} (\{Θ\}, {\{Θ\}}^{*}) = \frac{π_{EX} ({\{Θ\}}^{*} | y_{1 : N})}{π_{EX} (\{Θ\} | y_{1 : N})},

(16)

where

{\{Θ\}}^{*}

is expressed as follows:

{\{Θ\}}^{*} = \{Θ^{(1)}, \dots, Θ^{(r + 1)}, Θ^{(r)}, \dots, Θ^{(R)}\} .

(17)

Note that the exchange probability

p_{EX} (\{Θ\}, {\{Θ\}}^{*})

corresponds to the Metropolis criterion for proposing to exchange the samples between different temperatures

T^{(r)}

and

T^{(r + 1)}

. By deciding whether to accept or reject the proposed samples

{\{Θ\}}^{*}

with the Metropolis criterion of Equation (15), the transition probability

W (\{Θ\} \to {\{Θ\}}^{*})

for the exchange process satisfies the detailed balance condition as follows:

\begin{matrix} π_{EX} (\{Θ\} | y_{1 : N}) W (\{Θ\} \to {\{Θ\}}^{*}) \\ = π_{EX} (\{Θ\} | y_{1 : N}) q ({\{Θ\}}^{*} | \{Θ\}) p_{EX} (\{Θ\}, {\{Θ\}}^{*}) \\ = min (π_{EX} (\{Θ\} | y_{1 : N}) q ({\{Θ\}}^{*} | \{Θ\}), π_{EX} ({\{Θ\}}^{*} | y_{1 : N}) q ({\{Θ\}}^{*} | \{Θ\})) \\ = π_{EX} ({\{Θ\}}^{*} | y_{1 : N}) q ({\{Θ\}}^{*} | \{Θ\}) min (\frac{π_{EX} (\{Θ\} | y_{1 : N})}{π_{EX} ({\{Θ\}}^{*} | y_{1 : N})}, 1) \\ = π_{EX} ({\{Θ\}}^{*} | y_{1 : N}) q ({\{Θ\}}^{*} | \{Θ\}) p_{EX} ({\{Θ\}}^{*}, \{Θ\}) \\ = π_{EX} ({\{Θ\}}^{*} | y_{1 : N}) q (\{Θ\} | {\{Θ\}}^{*}) p_{EX} ({\{Θ\}}^{*}, \{Θ\}) \\ = π_{EX} ({\{Θ\}}^{*} | y_{1 : N}) W ({\{Θ\}}^{*} \to \{Θ\}), \end{matrix}

where

q ({\{Θ\}}^{*} | \{Θ\})

is the proposed probability for

{\{Θ\}}^{*}

and the proposed probability of the exchange process is symmetric

q ({\{Θ\}}^{*} | \{Θ\}) = q (\{Θ\} | {\{Θ\}}^{*})

. Thus, since the exchange process in the proposed method satisfies the detailed balance condition, the proposed method can sample from the distribution

π_{EX} (\{Θ\} | y_{1 : N})

.

By this exchange process, the REPMMH method makes it possible to improve the problem of initial value dependence in the PMMH method. The sampled distributions of the replica

π_{T^{(r)}} (Θ^{(r)} | y_{1 : N})

at higher temperatures become closer to a uniform distribution ideally as follows:

lim_{T^{(r)} \to \infty} π_{T^{(r)}} (Θ^{(r)} | y_{1 : N}) = lim_{T^{(r)} \to \infty} \frac{1}{z (T^{(r)})} p {(Θ^{(r)} | y_{1 : N})}^{\frac{1}{T^{(r)}}} \propto p {(Θ^{(r)} | y_{1 : N})}^{0} = const .

(18)

Therefore, in practice, it becomes possible to escape from local optima at sufficiently high temperatures (Figure 2b). Moreover, the samples may not stay in one local optimum since each replica is exchanged between the high temperature and low temperature repeatedly, and we can sample the parameters efficiently. We show the flow of our REPMMH method described above in Algorithm 2.

2.3.3. Relations among Particle Markov Chain Monte Carlo Methods

We briefly summarize the differences among the conventional particle Markov chain Monte Carlo (PMCMC) methods and our proposed REPMMH method that can estimate parameters of a state space model in Table 1. The particle Gibbs (PG) method is another PMCMC method, and it samples latent variables and parameters in a state space model alternately by using Gibbs sampling [8,30]. The PMMH method combines the SMC method with the MH algorithm, whereas the PG method combines the SMC method with Gibbs sampling. While the SMC method is employed to calculate the marginal likelihood

p (y_{1 : N} | Θ)

of parameters

Θ

in the PMMH method, the SMC method is employed to obtain samples of latent variables

z_{1 : N}

in the PG method [12]. The PMMH method directly targets the marginal posterior distribution

p (Θ | y_{1 : N})

, whereas the PG method targets the joint posterior distribution

p (z_{1 : N}, Θ | y_{1 : N})

[12]. Note that the SMC method used in the PG method is called the conditional SMC method and uses the previous sample of latent variables

z_{1 : N} [k - 1]

as a particle in the SMC method [12]. Furthermore, advanced versions of the PG method have been proposed, such as the particle Gibbs with ancestor sampling (PGAS) method for improving sampling efficiency and the replica exchange particle Gibbs with ancestor sampling (REPGAS) method to improve the initial value dependence [16,18,19,24]. Samples obtained by employing the PMMH method also have a problem of initial value dependence, similar to those obtained by employing the PG method, and it is considered that combining the PMMH method with the replica exchange method would be effective.

Algorithm 2 Replica Exchange Particle Marginal Metropolis–Hastings (REPMMH) Method.

1:: initialize the parameters $\{Θ\} [0]$
2:: for $k = 1, \dots, K$ do
3:: for $r = 1, \dots, R$ do
4:: draw the sample candidate of parameters $Θ^{(r) *} \sim q (Θ^{(r) *} ∣ Θ^{(r)} [k - 1])$
5:: calculate the marginal likelihood $p (y_{1 : N} | Θ^{(r) *})$ by using the SMC method according to Equation (13)
6:: calculate the acceptance probability $p_{accept}^{(r)}$ with Equation (14)
7:: draw a uniform random number $α \sim U (0, 1)$ ( $U (a, b)$ is a uniform distribution with range $[a, b)$ )
8:: if $α \leq p_{accept}^{(r)}$ then
9:: set the sample of parameters $Θ^{(r)} [k] \leftarrow Θ^{(r) *}$
10:: else
11:: set the sample of parameters $Θ^{(r)} [k] \leftarrow Θ^{(r)} [k - 1]$
12:: end if
13:: end for
14:: choose the replica number $r_{EX} \leftarrow 1$ or $r_{EX} \leftarrow 2$ for the exchange
15:: repeat
16:: calculate exchange probability $p_{EX} (\{Θ\}, {\{Θ\}}^{*})$ with Equation (15) for replica numbers $r_{EX}$ and $r_{EX} + 1$
17:: draw a uniform random number $α_{EX} \sim U (0, 1)$
18:: if $α_{EX} \leq p_{EX} (\{Θ\}, {\{Θ\}}^{*})$ then
19:: exchange replicas $(Θ^{(r_{EX})} [k], Θ^{(r_{EX} + 1)} [k]) \leftarrow (Θ^{(r_{EX} + 1)} [k], Θ^{(r_{EX})} [k])$
20:: end if
21:: set the replica number $r_{EX} \leftarrow r_{EX} + 2$ for the exchange
22:: until $r_{EX} \leq R - 1$
23:: end for
24:: return ${\{\{Θ\} [k]\}}_{k = 1}^{K}$

3. Results

In this section, we show that by employing our proposed replica exchange particle marginal Metropolis–Hastings (REPMMH) method for the Izhikevich neuron model [41,42] and Lévy-driven stochastic volatility model [12,43,44], the marginal posterior distribution of parameters

p (Θ | y_{1 : N})

can be estimated from observations

y_{1 : N}

, and we verify whether the REPMMH method can overcome the problem of initial value dependence in the particle marginal Metropolis–Hastings (PMMH) method. Moreover, we compare the sampling efficiency of the REPMMH method with that of the conventional methods, the PMMH method and the replica exchange particle Gibbs with ancestor sampling (REPGAS) method.

3.1. Izhikevich Neuron Model

To verify the effectiveness of our proposed method, we use the Izhikevich neuron model. The Izhikevich neuron model is a computational model for the membrane potential of a neuron [41,42]:

\frac{d v}{d t} = 0.04 v^{2} + 5 v + 140 - u + I_{ext} + ξ_{v} (t),

(19)

\frac{d u}{d t} = a (b v - u) + ξ_{u} (t),

(20)

where v is the membrane potential and u is the membrane recovery variable.

I_{ext}

is the input current from outside the neuron, and a and b are parameters in the Izhikevich neuron model that represent the characteristic of the neuron. In Equations (19) and (20), we consider additive white Gaussian noise terms

ξ_{v} (t)

and

ξ_{u} (t)

(

〈 ξ_{v} (t) 〉 = 〈 ξ_{u} (t) 〉 = 0

,

〈 ξ_{v} (t) ξ_{v} (s) 〉 = σ_{v}^{2} δ (t - s)

,

〈 ξ_{u} (t) ξ_{u} (s) 〉 = σ_{u}^{2} δ (t - s)

and

〈 ξ_{v} (t) ξ_{u} (s) 〉 = 0

, where

δ (t)

is the Dirac delta function). Here, standard deviations of the membrane potential and membrane recovery variable are expressed by

σ_{v}

and

σ_{u}

, respectively. If the membrane potential v exceeds the threshold value

v_{th} = 30

, the membrane potential v and the membrane recovery variable u are reset to c and

u + d

, respectively, as follows:

\begin{matrix} v & \leftarrow & c, \\ u & \leftarrow & u + d, \end{matrix}

where c and d are parameters representing the characteristic of the neuron.

Here, we assume that the observations

y_{1 : N}

are the membrane potentials with Gaussian observation noise, and we estimate the parameters

Θ = \{a, b, c, d\}

from only the observations

y_{1 : N}

. We use the true parameters

Θ = \{a, b, c, d\} = \{0.02, 0.2, - 65, 6\}

and the number of data

N = 5.0 \times 10^{2}

to generate data. In the system model, the means and the variances of the Gaussian noise are

\{μ_{v}, σ_{v}^{2}\} = \{0, 0.25\}

and

\{μ_{u}, σ_{u}^{2}\} = \{0, 10^{- 4}\}

. In the observation model, the mean and the variance of the Gaussian noise are

\{μ_{y}, σ_{y}^{2}\} = \{0, 1\}

. We show the generated data from the Izhikevich neuron model in Figure 3. In Figure 3, complex spike activities with different inter-spike intervals and different peaks are seen in response to external inputs. We assume that only one-dimensional time-series of observed data

y_{n}

and external inputs can be used, while the latent dynamics are governed by two-dimensional nonlinear dynamical systems with four parameters

Θ = \{a, b, c, d\}

. We employ our REPMMH method and the conventional methods, the PMMH method and the REPGAS method, for the generated data in Figure 3 to estimate the posterior distribution of the parameters

Θ = \{a, b, c, d\}

. In all methods, the initial values of the parameters

Θ [0]

are

\{a, b, c, d\} = \{0.025, 0.15, - 60, 5.5\}

, the number of samples K is

10^{6}

, the number of burn-in samples

K_{burn - in}

is

10^{6}

, and the number of particles M is 50. In our REPMMH method and the REPGAS method, the number of temperatures R is 64.

We show in Figure 4 the estimated posterior distribution of parameters

p (Θ | y_{1 : N})

obtained by employing the PMMH method. In each graph, the vertical axis expresses the value of the probability density function, while the horizontal axis expresses the values of parameters a, b, c and d. Furthermore, the solid lines represent the true values, the dashed lines represent the initial values, and the histograms represent the estimated posterior distributions of the parameters. From Figure 4, we find that a peak of the estimated posterior distribution of parameter d,

p (d | y_{1 : N})

, is located around its true value

d = 6.0

. However, the maximum values of the estimated posterior distribution of the other three parameters, a, b and c, remain around their initial values (

a = 0.025

,

b = 0.15

,

c = - 60

), which are far from their true values (

a = 0.020

,

b = 0.20

,

c = - 65

). Thus, the joint posterior distribution of four parameters is found to be not adequately estimated. From this result, the samples in the PMMH method are considered to remain in the local optimum since the initial values are far from the true value.

We show in Figure 5 the estimated posterior distribution of parameters

p (Θ | y_{1 : N})

obtained by employing the REPMMH method. From Figure 5, we find that the maximum values of the estimated posterior distribution of parameters are located around the true values (

a = 0.020

,

b = 0.20

,

c = - 65

,

d = 6.0

), even though the initial values of the parameters (

a = 0.025

,

b = 0.15

,

c = - 60

,

d = 5.5

) are set to be far from the true values. This improvement in estimation accuracy would be induced by combination with the replica exchange method. It is easier to obtain samples from a wider range since the replica exchange method allows samples to pass through high temperatures. From these results, we find that the problem of initial value dependence in the PMMH method is improved by employing our proposed method.

Moreover, we show in Figure 6 the estimated posterior distribution of parameters

p (Θ | y_{1 : N})

obtained by employing the REPGAS method. As shown in this figure, the distributions are estimated to be almost the same as those obtained by employing the REPMMH method, which indicates that the true values are estimated properly.

In order to investigate the efficiency of sampling parameters in the proposed method and existing methods, we show in Figure 7 the autocorrelation function results calculated using the samples of the PMMH method, the REPGAS method and our proposed REPMMH method. In all the parameters a, b, c and d, the decay of the autocorrelation in the REPMMH samples is faster using the REPMMH samples than those calculated using the PMMH samples and the REPGAS samples. The time constant of the autocorrelation function has a strong influence on the convergence time of the PMCMC method. The time constants of the autocorrelation functions for the REPMMH samples are around 20 for all parameters a, b, c and d, while those of the autocorrelation functions for the PMMH samples are more than

10^{5}

, as shown in Figure 7. Since the computational cost of the exchange process in the REPMMH method is very small compared to the computational cost of the SMC method, the computational cost of the REPMMH method is approximately

R = 64

times the computational cost of the PMMH method. Nevertheless, the REPMMH method drastically improves the sampling efficiency compared to the increase in the computational cost; the REPMMH method is

R = 64

times more computationally expensive than the PMMH method, while the effective sample size of the REPMMH method is much larger (around

10^{3}

times larger) than that of the PMMH method.

When the same number of temperatures and particles is used, the REPGAS method is more computationally expensive than the REPMMH method since the REPGAS method requires the sampling of the latent variables

z_{1 : N}

and the ancestor sampling, which considers sampling of the latent variables

z_{1 : N}

not only in the forward direction but also in the backward direction in the conditional SMC method. Nevertheless, the REPMMH method has high sampling efficiency compared to the REPGAS method. Thus, we find that the sampling efficiency of our proposed REPMMH method is higher than that of the conventional methods.

Moreover, in order to evaluate the influence of the number of temperatures R and the number of particles M on the estimated results, we compare the estimated results in various settings. We show the estimated results with the numbers of temperatures

R = 1

, 4, 16 and 64 in Table 2. Table 2 shows the mode values of the estimated distributions, the standard deviations (Std) of the estimated distributions and the values of autocorrelation functions (ACF) with the lag length 30 for the numbers of temperatures

R = 1

, 4, 16 and 64. Note that the numbers of particles M are 50 in all cases and the maximum value of temperature is fixed at

T^{(R)} = {1.1}^{63}

for

R > 1

.

As mentioned above, for the number of temperatures

R = 1

, we can estimate the parameter d around the true value

d = 6.0

, while the other three parameters a, b and c remain at their initial values (

a = 0.025

,

b = 0.15

,

c = - 60

). We also find that the samples of parameters a, b and c cannot move enough from their initial values since the values of the standard deviations are very small. For the number of temperatures

R = 4

, we find the samples can escape the local optima and we can estimate the true values of all parameters a, b, c and d accurately due to the high temperatures that allow escape from the local optima. However, since the values of the autocorrelation functions are close to

1.0

, we need a large number of samples in order to estimate the shape of the distribution

p (Θ | y_{1 : N})

. On the other hand, we find that the values of the autocorrelation functions are smaller for

R = 16

and 64.

We show the estimated results with the numbers of particles

M = 10

, 20, 30, 40 and 50 in Table 3. Note that the numbers of temperatures R are 64 in all cases. For the numbers of particles

M = 10

and 20, the estimated values of the parameters a, b, c and d are far from the true values (

a = 0.020

,

b = 0.20

,

c = - 65

,

d = 6.0

). We consider that these results are due to the low approximation accuracy of the marginal likelihood

p (y_{1 : N} | Θ)

in the SMC method with too small numbers of particles. For the numbers of particles

M \geq 30

, we can estimate the true values of parameters a, b, c and d. Since there is no significant difference between the mode values, the standard deviations and the values of the autocorrelation functions for the numbers of particles

M = 30

, 40 and 50, we consider that the number of particles M is sufficient for this problem if it is above 30.

3.2. Lévy-Driven Stochastic Volatility Model

Next, we also verify the effectiveness of the proposed method using the Lévy-driven stochastic volatility model [12,43,44]. In this model, the dynamics of the logarithm of asset price

y^{*} (t)

are represented by the following differential equation:

d y^{*} (t) = \{μ + β σ^{2} (t)\} d t + σ (t) d B (t),

(21)

where

μ

is the drift parameter and

β

is the risk premium.

B (t)

is the Brownian motion and

σ^{2} (t)

represents the volatility. The dynamics of the volatility

σ^{2} (t)

are modeled by the following Lévy-driven Ornstein–Unlenbeck process:

d σ^{2} (t) = - λ σ^{2} (t) d t + d z (λ t),

(22)

where

λ

is a positive constant and

z (t)

is a non-Gaussian Lévy process with positive increments. The observation at the time step n,

y_{n}

, in this model is obtained by the following Gaussian distribution:

y_{n} \sim N (μ Δ + β σ_{n}^{2}, σ_{n}^{2}),

(23)

where

Δ

is the length of the time interval.

The stochastic volatility models are numerically investigated by using discretized dynamical models [12,43,44], and the estimation algorithm for the parameters of the stochastic volatility models has been investigated using such the discretized ones [12]. The integrated volatility

σ_{n}^{2}

at the time step n is calculated as follows:

\begin{matrix} σ_{n}^{2} & = & \int_{(n - 1) Δ}^{n Δ} σ^{2} (u) d u \\ = & λ^{- 1} [z (λ n Δ) - σ^{2} (n Δ) - z \{λ (n - 1) Δ\} + σ^{2} \{(n - 1) Δ\}], \end{matrix}

(24)

where

σ^{2} (n Δ)

and

z (λ n Δ)

are, respectively, represented as follows:

σ^{2} (n Δ) = exp (- λ Δ) σ^{2} \{(n - 1) Δ\} + η_{σ, n},

(25)

z (λ n Δ) = z \{λ (n - 1) Δ\} + η_{z, n} .

(26)

Here, we address the case where the volatility

σ^{2} (t)

follows a tempered stable marginal distribution [44]. Following [2,44],

η_{σ, n}

and

η_{z, n}

are obtained as follows:

η_{σ, n} = \sum_{i = 1}^{\infty} min ({(\frac{a_{i} κ}{A λ Δ})}^{- 1 / κ}, e_{i} v_{i}^{1 / κ}) exp (- λ Δ r_{i}) + \sum_{i = 1}^{N (λ Δ)} c_{i} exp (- λ Δ r_{i}^{*}),

(27)

η_{z, n} = \sum_{i = 1}^{\infty} min ({(\frac{a_{i} κ}{A λ Δ})}^{- 1 / κ}, e_{i} v_{i}^{1 / κ}) + \sum_{i = 1}^{N (λ Δ)} c_{i},

(28)

where

A = 2^{κ} δ κ^{2} / Γ (1 - κ)

,

a_{1} < a_{2} < \dots

are the arrival times of a Poisson process with intensity 1,

e_{1}, e_{2}, \dots

are independent and identically distributed exponential random variables with mean

2 γ^{- 1 / κ}

, and

v_{1}, v_{2}, \dots

,

r_{1}, r_{2}, \dots

and

r_{1}^{*}, r_{2}^{*}, \dots

are standard uniform random variables.

c_{1}, c_{2}, \dots

are obtained from a gamma distribution with the shape parameter

1 - κ

and the scale parameter

2 γ^{- 1 / κ}

, and

N (λ Δ)

is obtained from a Poisson distribution with mean

λ Δ δ γ κ

. Here,

κ

,

δ

,

γ

and

λ

are the parameters

Θ = \{κ, δ, γ, λ\}

to be estimated.

In this paper, we employ the proposed method and the PMMH method for the stochastic volatility model. The PMMH-based methods, including the proposed method, can be applied to complex models such as the Lévy-driven stochastic volatility model, as long as the probability density of the observation model can be calculated. On the other hand, the PG-based method is difficult to apply to the stochastic volatility model due to the need to calculate the probability density of the system model [12]. Following [12], we use the true parameters

Θ = \{κ, δ, γ, λ\} = \{0.5, 1.41, 2.83, 0.1\}

, the number of data

N = 4.0 \times 10^{2}

and the time interval of length

Δ = 1.0

to generate data. In order to estimate the parameters

Θ

, we use the initial values of the parameters

Θ [0] = \{0.25, 7.41, 9.83, 1.5\}

, the number of samples

K = 1.5 \times 10^{5}

, the number of burn-in samples

K_{burn - in} = 10^{5}

and the number of particles

M = 200

in both the proposed REPMMH method and the PMMH method. In the REPMMH method, the number of temperatures R is 64.

We show in Figure 8 the estimated posterior distribution of parameters

p (Θ | y_{1 : N})

obtained by employing the PMMH method. From Figure 8, we find that a peak of the estimated posterior distribution of parameter

λ

,

p (λ | y_{1 : N})

, is located around its true value

λ = 0.1

. However, the maximum values of the estimated posterior distribution of the other three parameters,

κ

,

δ

and

γ

, are far from their true values (

κ = 0.5

,

δ = 1.41

,

γ = 2.83

). Thus, the joint posterior distribution of the four parameters is found to be not adequately estimated. It is considered that the target distribution is not reached with a small number of samples since the sampling efficiency of the PMMH method is low.

We show in Figure 9 the estimated posterior distributions of parameters

p (Θ | y_{1 : N})

obtained by employing the REPMMH method. From Figure 9, we find that the true values of parameters

Θ = \{κ, δ, γ, λ\} = \{0.5, 1.41, 2.83, 0.1\}

are estimated appropriately by using the same number of samples and the same initial values

Θ [0] = \{0.25, 7.41, 9.83, 1.5\}

as the PMMH method. The results in Figure 8 and Figure 9 show that our REPMMH method has higher sampling efficiency than the PMMH method.

Moreover, we show in Figure 10 the autocorrelation function results calculated using the samples of the PMMH method and the REPMMH method. In all parameters

κ

,

δ

,

γ

and

λ

, the decay of the autocorrelation is faster using the REPMMH samples than that using the PMMH samples. As shown in Figure 10, the time constants of the autocorrelation functions for the REPMMH samples are less than 15 for all parameters

κ

,

δ

,

γ

and

λ

, while the time constant of the autocorrelation functions for the PMMH samples for the parameter

γ

is more than

3.0 \times 10^{3}

. As mentioned above, since the computational cost of the REPMMH method is approximately

R = 64

times the computational cost of the PMMH method, the REPMMH method improves the sampling efficiency compared to the increase in the computational cost.

4. Concluding Remarks

In this paper, we have proposed the replica exchange particle marginal Metropolis–Hastings (REPMMH) method in order to estimate the marginal posterior distribution of parameters

p (Θ | y_{1 : N})

of the state space model. Our proposed method can be applied to complex models such as the Lévy-driven stochastic volatility model, even if the probability densities of the system models cannot be calculated explicitly. By the proposed method, we introduce the exchange between samples of model parameters

Θ

at different temperatures and realize an efficient sampling method for model parameters governing the nonlinear dynamical systems.

Using nonlinear dynamical models such as the Izhikevich neuron model and Lévy-driven stochastic volatility model, we show that our proposed REPMMH method can improve the problem of initial value dependence of the particle marginal Metropolis–Hastings (PMMH) method. The results have shown that the proposed REPMMH method accurately estimates the marginal posterior distribution of parameters. Moreover, by comparing the autocorrelation functions of the obtained samples, it has been also shown that our proposed REPMMH method can sample more efficiently than the conventional methods. In the replica exchange particle Gibbs with ancestor sampling (REPGAS) method, the next sample of latent variables is obtained under the strong influence of the current sample of latent variables. On the other hand, in the REPMMH method, the correlation of the latent variables between the current and next steps is low since the REPMMH method only calculates the marginal likelihood of the next step, regardless of the latent variables obtained in the current step. Therefore, it is considered that the REPMMH method can sample parameters more efficiently than the REPGAS method.

In this paper, although we conducted the experiments by using two specific dynamical models: the Izhikevich neuron model and the Lévy-driven stochastic volatility model, the proposed REPMMH method can be applied to various dynamical systems described by ordinary or partial differential equations. Although the proposed method can be applied to any ordinary or partial differential equations that can be represented as state space models, applications of the proposed method are difficult when the system models for the dynamical systems or the observation models are completely unknown. In such cases, we consider that combining the proposed method with non-parametric Bayesian methods is effective. We leave this for future work.

Author Contributions

Conceptualization, T.O.; Formal analysis, H.I. and T.O.; Investigation, H.I., K.H. and T.O.; Methodology, H.I. and T.O.; Supervision, T.O.; Writing—original draft, H.I.; Writing—review & editing, H.I., K.H. and T.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Grants-in-Aid for Scientific Research for Innovative Areas “Initiative for High-Dimensional Data driven Science through Deepening of Sparse Modeling” (JSPS KAKENHI Grant No. JP25120010) and for Scientific Research (B) (JSPS KAKENHI Grant No. JP17H02923, JP19H04125, JP21H03509), and a Fund for the Promotion of Joint International Research (Fostering Joint International Research) (JSPS KAKENHI Grant No. JP15KK0010) from the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan, and CREST (Nos. JPMJCR1755, JPMJCR1861, JPMJCR1914), Japan Science and Technology Agency, Japan. This work was partially supported by “Program for Promoting Researches on the Supercomputer Fugaku” (DPMSD, Project ID: JPMXP1020200307) from the MEXT, Japan.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

PMMH	Particle Marginal Metropolis–Hastings
REPMMH	Replica Exchange Particle Marginal Metropolis–Hastings
SMC	Sequential Monte Carlo
EM	Expectation–Maximization
MCMC	Markov Chain Monte Carlo
PMCMC	Particle Markov Chain Monte Carlo
MH	Metropolis–Hastings
PG	Particle Gibbs
PGAS	Particle Gibbs with Ancestor Sampling
REPGAS	Replica Exchange Particle Gibbs with Ancestor Sampling

References

Netto, M.A.; Gimeno, L.; Mendes, M.J. A new spline algorithm for non-linear filtering of discrete time systems. IFAC Proc. Vol. 1978, 11, 2123–2130. [Google Scholar] [CrossRef]
Doucet, A.; Godsill, S.; Andrieu, C. On sequential Monte Carlo sampling methods for Bayesian filtering. Stat. Comput. 2000, 10, 197–208. [Google Scholar] [CrossRef]
Ghahramani, Z.; Hinton, G.E. The EM Algorithm for Mixtures of Factor Analyzers; Technical Report CRG-TR-96-1; Department of Computer Science, University of Toronto: Toronto, ON, USA, 1996. [Google Scholar]
Kitagawa, G. A self-organizing state-space model. J. Am. Stat. Assoc. 1998, 93, 1203–1215. [Google Scholar] [CrossRef]
Meyer, R.; Christensen, N. Fast Bayesian reconstruction of chaotic dynamical systems via extended Kalman filtering. Phys. Rev. E 2001, 65, 016206. [Google Scholar] [CrossRef] [Green Version]
Doucet, A.; De Freitas, N.; Gordon, N. An introduction to sequential Monte Carlo methods. In Sequential Monte Carlo Methods in Practice; Doucet, A., De Freitas, N., Gordon, N., Eds.; Springer: New York, NY, USA, 2001; pp. 3–14. [Google Scholar]
Hyndman, R.J.; Koehler, A.B.; Snyder, R.D.; Grose, S. A state space framework for automatic forecasting using exponential smoothing methods. Int. J. Forecast. 2002, 18, 439–454. [Google Scholar] [CrossRef] [Green Version]
Bishop, C.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006. [Google Scholar]
Vogelstein, J.T.; Watson, B.O.; Packer, A.M.; Yuste, R.; Jedynak, B.; Paninski, L. Spike inference from calcium imaging using sequential Monte Carlo methods. Biophys. J. 2009, 97, 636–655. [Google Scholar] [CrossRef] [Green Version]
Vogelstein, J.T.; Packer, A.M.; Machado, T.A.; Sippy, T.; Babadi, B.; Yuste, R.; Paninski, L. Fast nonnegative deconvolution for spike train inference from population calcium imaging. J. Neurophysiol. 2010, 104, 3691–3704. [Google Scholar] [CrossRef] [Green Version]
Tsunoda, T.; Omori, T.; Miyakawa, H.; Okada, M.; Aonishi, T. Estimation of intracellular calcium ion concentration by nonlinear state space modeling and expectation-maximization algorithm for parameter estimation. J. Phys. Soc. Jpn. 2010, 79, 124801. [Google Scholar] [CrossRef]
Andrieu, C.; Doucet, A.; Holenstein, R. Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. B 2010, 72, 269–342. [Google Scholar] [CrossRef] [Green Version]
Meng, L.; Kramer, M.A.; Eden, U.T. A sequential Monte Carlo approach to estimate biophysical neural models from spikes. J. Neural Eng. 2011, 8, 065006. [Google Scholar] [CrossRef]
Paninski, L.; Vidne, M.; DePasquale, B.; Ferreira, D.G. Inferring synaptic inputs given a noisy voltage trace via sequential Monte Carlo methods. J. Comput. Neurosci. 2012, 33, 1–19. [Google Scholar] [CrossRef]
Snyder, R.D.; Ord, J.K.; Koehler, A.B.; McLaren, K.R.; Beaumont, A.N. Forecasting compositional time series: A state space approach. Int. J. Forecast. 2017, 33, 502–512. [Google Scholar] [CrossRef] [Green Version]
Lindsten, F.; Schön, T.B.; Jordan., M.I. Ancestor sampling for particle Gibbs. In Proceedings of the Advances in Neural Information Processing Systems, Stateline, NV, USA, 3–8 December 2012; pp. 2600–2608. [Google Scholar]
Henriksen, S.; Wills, A.; Schön, T.B.; Ninness, B. Parallel implementation of particle MCMC methods on a GPU. IFAC Proc. Vol. 2012, 45, 1143–1148. [Google Scholar] [CrossRef] [Green Version]
Frigola, R.; Lindsten, F.; Schön, T.B.; Rasmussen, C.E. Bayesian inference and learning in Gaussian process state-space models with particle MCMC. In Proceedings of the Advances in Neural Information Processing Systems, Stateline, NV, USA, 5–10 December 2013; pp. 3181–3189. [Google Scholar]
Lindsten, F.; Jordan, M.I.; Schön, T.B. Particle Gibbs with ancestor sampling. J. Mach. Learn. Res. 2014, 15, 2145–2184. [Google Scholar]
Omori, T.; Kuwatani, T.; Okamoto, A.; Hukushima, K. Bayesian inversion analysis of nonlinear dynamics in surface heterogeneous reactions. Phys. Rev. E 2016, 94, 033305. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Omori, T.; Sekiguchi, T.; Okada, M. Belief propagation for probabilistic slow feature analysis. J. Phys. Soc. Jpn. 2017, 86, 084802. [Google Scholar] [CrossRef]
Rangapuram, S.S.; Seeger, M.W.; Gasthaus, J.; Stella, L.; Wang, Y.; Januschowski, T. Deep state space models for time series forecasting. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 2–8 December 2018; pp. 7785–7794. [Google Scholar]
Wang, P.; Yang, M.; Peng, Y.; Zhu, J.; Ju, R.; Yin, Q. Sensor control in anti-submarine warfare—A digital twin and random finite sets based approach. Entropy 2019, 21, 767. [Google Scholar] [CrossRef] [Green Version]
Inoue, H.; Hukushima, K.; Omori, T. Replica exchange particle-Gibbs method with ancestor sampling. J. Phys. Soc. Jpn. 2020, 89, 104801. [Google Scholar] [CrossRef]
Shapovalova, Y. “Exact” and approximate methods for Bayesian inference: Stochastic volatility case study. Entropy 2021, 23, 466. [Google Scholar] [CrossRef]
Gregory, P. Bayesian Logical Data Analysis for the Physical Sciences; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
Andrieu, C.; Doucet, A.; Punskaya, E. Sequential Monte Carlo methods for optimal filtering. In Sequential Monte Carlo Methods in Practice; Doucet, A., De Freitas, N., Gordon, N., Eds.; Springer: New York, NY, USA, 2001; pp. 79–95. [Google Scholar]
Wu, C.J. On the convergence properties of the EM algorithm. Annal. Stat. 1983, 11, 95–103. [Google Scholar] [CrossRef]
McLachlan, G.J.; Krishnan, T. The EM Algorithm and Extensions; Wiley: Hoboken, NY, USA, 1996. [Google Scholar]
Geman, S.; Geman, D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 1984, 6, 721–741. [Google Scholar] [CrossRef]
Metropolis, N.; Rosenbluth, A.W.; Rosenbluth, M.N.; Teller, A.H.; Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys. 1953, 21, 1087–1092. [Google Scholar] [CrossRef] [Green Version]
Hastings, W.K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 1970, 57, 97–109. [Google Scholar] [CrossRef]
Cunningham, N.; Griffin, J.E.; Wild, D.L. ParticleMDI: Particle Monte Carlo methods for the cluster analysis of multiple datasets with applications to cancer subtype identification. Adv. Data Anal. Classif. 2020, 14, 463–484. [Google Scholar] [CrossRef]
Wang, S.; Wang, L. Particle Gibbs sampling for Bayesian phylogenetic inference. Bioinformatics 2021, 37, 642–649. [Google Scholar] [CrossRef]
Jasra, A.; Persing, A.; Beskos, A.; Heine, K.; De Iorio, M. Bayesian inference for duplication–mutation with complementarity network models. J. Comput. Biol. 2015, 22, 1025–1033. [Google Scholar] [CrossRef] [PubMed]
Du, D.; Hu, Z.; Du, Y. Model Identification and Physical Exercise Control using Nonlinear Heart Rate Model and Particle Filter. In Proceedings of the 2019 IEEE 15th International Conference on Automation Science and Engineering, Vancouver, BC, Canada, 22–26 August 2019; pp. 405–410. [Google Scholar]
Osmundsen, K.K.; Kleppe, T.S.; Liesenfeld, R.; Oglend, A. Estimating the Competitive Storage Model with Stochastic Trends in Commodity Prices. Econometrics 2021, 9, 40. [Google Scholar] [CrossRef]
Hukushima, K.; Nemoto, K. Exchange Monte Carlo method and application to spin glass simulations. J. Phys. Soc. Jpn. 1996, 65, 1604–1608. [Google Scholar] [CrossRef] [Green Version]
Urano, R.; Okamoto, Y. Designed-walk replica-exchange method for simulations of complex systems. Comput. Phys. Commun. 2015, 196, 380–383. [Google Scholar] [CrossRef] [Green Version]
Motonaka, K.; Miyoshi, S. Connecting PM and MAP in Bayesian spectral deconvolution by extending exchange Monte Carlo method and using multiple data sets. Neural Netw. 2019, 118, 159–166. [Google Scholar] [CrossRef]
Izhikevich, E.M. Simple model of spiking neurons. IEEE Trans. Neural Netw. 2003, 14, 1569–1572. [Google Scholar] [CrossRef] [Green Version]
Izhikevich, E.M. Which model to use for cortical spiking neurons? IEEE Trans. Neural Netw. 2004, 15, 1063–1070. [Google Scholar] [CrossRef]
Barndorff-Nielsen, O.E.; Shephard, N. Non-Gaussian Ornstein–Uhlenbeck-based models and some of their uses in financial economics. J. R. Stat. Soc. B 2001, 63, 167–241. [Google Scholar] [CrossRef]
Barndorff-Nielsen, O.E.; Shephard, N. Normal modified stable processes. Theor. Probab. Math. Stat. 2001, 65, 1–19. [Google Scholar]

Figure 1. Probabilistic graphical model of a state space model.

z_{1 : N} = \{z_{1}, z_{2}, \dots, z_{N}\}

and

y_{1 : N} = \{y_{1}, y_{2}, \dots, y_{N}\}

, respectively, represent latent variables and observations for time step

n = 1, 2, \dots, N

. The arrow to the latent variables

z_{n}

at the time step n from the latent variables

z_{n - 1}

at the previous time step

n - 1

represents a system model

f (z_{n} | z_{n - 1}, θ_{f})

, and the arrow to the observations

y_{n}

at the time step n from the latent variables

z_{n}

at the time step n represents an observation model

g (y_{n} | z_{n}, θ_{g})

.

Θ = \{θ_{f}, θ_{g}\}

are parameters to be estimated.

Figure 1. Probabilistic graphical model of a state space model.

z_{1 : N} = \{z_{1}, z_{2}, \dots, z_{N}\}

and

y_{1 : N} = \{y_{1}, y_{2}, \dots, y_{N}\}

, respectively, represent latent variables and observations for time step

n = 1, 2, \dots, N

. The arrow to the latent variables

z_{n}

at the time step n from the latent variables

z_{n - 1}

at the previous time step

n - 1

represents a system model

f (z_{n} | z_{n - 1}, θ_{f})

, and the arrow to the observations

y_{n}

at the time step n from the latent variables

z_{n}

at the time step n represents an observation model

g (y_{n} | z_{n}, θ_{g})

.

Θ = \{θ_{f}, θ_{g}\}

are parameters to be estimated.

Figure 2. Schematic diagrams of the proposed replica exchange particle marginal Metropolis–Hastings (REPMMH) method. (a) The time-series observations

y_{1 : N}

as inputs. (b,c) The REPMMH method consisting of (b) the Metropolis–Hastings (MH) algorithms and (c) the sequential Monte Carlo (SMC) methods parallelly conducted at multiple temperatures. In the SMC method, the sample candidate

Θ^{(r) *}

proposed by the MH algorithm is used to obtain the marginal likelihood

p {(y_{1 : N} | Θ^{(r) *})}^{\frac{1}{T^{(r)}}}

. By the SMC method, the marginalization over time-series of latent variables

z_{1 : N}

is conducted iteratively for time steps

n = 1, 2, \dots, N

. In the MH algorithm, the marginal likelihood

p {(y_{1 : N} | Θ^{(r) *})}^{\frac{1}{T^{(r)}}}

is used to determine whether to accept or reject the sample candidate. In the REPMMH method, exchanges between samples at different temperatures are considered in order to achieve the transitions that are difficult to achieve with the particle marginal Metropolis–Hastings (PMMH) method. The transitions can be realized by passing through a high temperature due to exchange between temperatures, as shown by the red arrows in the MH algorithm. (d) The estimated posterior distributions of parameters

Θ

as the output.

Figure 2. Schematic diagrams of the proposed replica exchange particle marginal Metropolis–Hastings (REPMMH) method. (a) The time-series observations

y_{1 : N}

as inputs. (b,c) The REPMMH method consisting of (b) the Metropolis–Hastings (MH) algorithms and (c) the sequential Monte Carlo (SMC) methods parallelly conducted at multiple temperatures. In the SMC method, the sample candidate

Θ^{(r) *}

proposed by the MH algorithm is used to obtain the marginal likelihood

p {(y_{1 : N} | Θ^{(r) *})}^{\frac{1}{T^{(r)}}}

. By the SMC method, the marginalization over time-series of latent variables

z_{1 : N}

is conducted iteratively for time steps

n = 1, 2, \dots, N

. In the MH algorithm, the marginal likelihood

p {(y_{1 : N} | Θ^{(r) *})}^{\frac{1}{T^{(r)}}}

is used to determine whether to accept or reject the sample candidate. In the REPMMH method, exchanges between samples at different temperatures are considered in order to achieve the transitions that are difficult to achieve with the particle marginal Metropolis–Hastings (PMMH) method. The transitions can be realized by passing through a high temperature due to exchange between temperatures, as shown by the red arrows in the MH algorithm. (d) The estimated posterior distributions of parameters

Θ

as the output.

Figure 3. Observations and external inputs used to evaluate the proposed method. Observed membrane potential of the Izhikevich neuron model

y_{n}

(top) in response to input current

I_{ext, n}

(bottom) is shown.

Figure 3. Observations and external inputs used to evaluate the proposed method. Observed membrane potential of the Izhikevich neuron model

y_{n}

(top) in response to input current

I_{ext, n}

(bottom) is shown.

Figure 4. Estimated posterior distributions obtained by employing the PMMH method in the Izhikevich neuron model. In each graph, the estimated probability density function of the parameter (a, b, c and d) is shown by the blue histogram. The red solid and black dashed lines express the true and initial values, respectively.

Figure 5. Estimated posterior distributions obtained by employing the REPMMH method in the Izhikevich neuron model. See also the captions of the figure and subfigures for Figure 4.

Figure 6. Estimated posterior distributions obtained by employing the replica exchange particle Gibbs with ancestor sampling (REPGAS) method in the Izhikevich neuron model. See also the captions of the figure and subfigures for Figure 4.

Figure 7. Autocorrelation as a function of the lag length for parameters a, b, c and d in the Izhikevich neuron model. Results for the PMMH method (black dashed-dotted line), the REPGAS method (blue dashed line) and the REPMMH method (red solid line) are shown. Each inset figure represents the result when the horizontal axis is the logarithmic scale. In results obtained by the REPGAS method and the REPMMH method, samples at

T^{(1)} = 1.0

were used.

Figure 7. Autocorrelation as a function of the lag length for parameters a, b, c and d in the Izhikevich neuron model. Results for the PMMH method (black dashed-dotted line), the REPGAS method (blue dashed line) and the REPMMH method (red solid line) are shown. Each inset figure represents the result when the horizontal axis is the logarithmic scale. In results obtained by the REPGAS method and the REPMMH method, samples at

T^{(1)} = 1.0

were used.

Figure 8. Estimated posterior distributions obtained by employing the PMMH method in the Lévy-driven stochastic volatility model. In each graph, the estimated probability density function of parameters (

κ

,

δ

,

γ

and

λ

) is shown by the blue histogram. The red solid and black dashed lines express the true and initial values, respectively.

Figure 8. Estimated posterior distributions obtained by employing the PMMH method in the Lévy-driven stochastic volatility model. In each graph, the estimated probability density function of parameters (

κ

,

δ

,

γ

and

λ

) is shown by the blue histogram. The red solid and black dashed lines express the true and initial values, respectively.

Figure 9. Estimated posterior distributions obtained by employing the REPMMH method in the Lévy-driven stochastic volatility model. See also the captions of the figure and subfigures for Figure 8.

Figure 10. Autocorrelation as a function of the lag length for parameters

κ

,

δ

,

γ

and

λ

in the Lévy-driven stochastic volatility model. Results for the PMMH method (black dashed-dotted line) and the REPMMH method at

T^{(1)} = 1.0

(red solid line) are shown.

Figure 10. Autocorrelation as a function of the lag length for parameters

κ

,

δ

,

γ

and

λ

in the Lévy-driven stochastic volatility model. Results for the PMMH method (black dashed-dotted line) and the REPMMH method at

T^{(1)} = 1.0

(red solid line) are shown.

Table 1. The PMCMC methods for estimating parameters in a state space model.

Method	Target Distribution	Overview
PG	$p (z_{1 : N}, Θ \| y_{1 : N})$	Sample parameters $Θ$ and latent variables $z_{1 : N}$ alternately with Gibbs sampling for targeting the joint posterior distribution $p (z_{1 : N}, Θ \| y_{1 : N})$ Note that the SMC method is used for sampling latent variables $z_{1 : N}$ . The SMC method used in the PG method is called the conditional SMC method and uses the previous sample of latent variables $z_{1 : N} [k - 1]$ as a particle in the SMC method [12].
PGAS	$p (z_{1 : N}, Θ \| y_{1 : N})$	Sample latent variables $z_{1 : N}$ not only in the forward direction but also in the backward direction in the PG method [16,18,19].
REPGAS	$p (z_{1 : N}, Θ \| y_{1 : N})$	Improve the problem of initial value dependence in the PGAS method by combining the replica exchange method and the PGAS method [24].
PMMH	$p (Θ \| y_{1 : N})$	Sample parameters $Θ$ with the MH algorithm for targeting directly the marginal posterior distribution $p (Θ \| y_{1 : N})$ $obtained by$ marginalization over the distribution of latent variables $z_{1 : N}$ . Note that the SMC method is used to calculate the marginal likelihood $p (y_{1 : N} \| Θ)$ [12].
REPMMH	$p (Θ \| y_{1 : N})$	Improve the problem of initial value dependence in the PMMH method by combining the replica exchange method and the PMMH method.

Table 2. The estimated results with the numbers of temperatures

R = 1

, 4, 16 and 64.

Table 2. The estimated results with the numbers of temperatures

R = 1

, 4, 16 and 64.

Parameter		$R = 1$	$R = 4$	$R = 16$	$R = 64$
$a = 0.020$	Mode	$0.0251$	$0.0200$	$0.0205$	$0.0205$
	Std	$5.5 \times 10^{- 5}$	$6.9 \times 10^{- 4}$	$6.7 \times 10^{- 4}$	$7.8 \times 10^{- 4}$
	ACF	$0.9999$	$0.9914$	$0.5175$	$0.3074$
$b = 0.20$	Mode	$0.155$	$0.200$	$0.200$	$0.200$
	Std	$3.0 \times 10^{- 4}$	$7.2 \times 10^{- 3}$	$7.3 \times 10^{- 3}$	$7.0 \times 10^{- 3}$
	ACF	$0.9999$	$0.9919$	$0.5773$	$0.3082$
$c = - 65$	Mode	$- 60.0$	$- 64.75$	$- 65.0$	$- 65.0$
	Std	$2.1 \times 10^{- 3}$	$2.8 \times 10^{- 1}$	$2.5 \times 10^{- 1}$	$2.6 \times 10^{- 1}$
	ACF	$0.9999$	$0.9926$	$0.5359$	$0.3117$
$d = 6.0$	Mode	$6.10$	$6.10$	$6.05$	$6.05$
	Std	$2.0 \times 10^{- 2}$	$8.9 \times 10^{- 2}$	$9.8 \times 10^{- 2}$	$9.8 \times 10^{- 2}$
	ACF	$0.9999$	$0.9928$	$0.5222$	$0.3176$

Table 3. The estimated results with the numbers of particles

M = 10

, 20, 30, 40 and 50.

Table 3. The estimated results with the numbers of particles

M = 10

, 20, 30, 40 and 50.

Parameter		$M = 10$	$M = 20$	$M = 30$	$M = 40$	$M = 50$
$a = 0.020$	Mode	$0.0220$	$0.0210$	$0.0200$	$0.0200$	$0.0205$
	Std	$5.9 \times 10^{- 4}$	$7.5 \times 10^{- 4}$	$7.3 \times 10^{- 4}$	$6.7 \times 10^{- 4}$	$7.8 \times 10^{- 4}$
	ACF	$0.5548$	$0.2707$	$0.3072$	$0.3391$	$0.3074$
$b = 0.20$	Mode	$0.195$	$0.190$	$0.200$	$0.195$	$0.200$
	Std	$4.9 \times 10^{- 3}$	$8.3 \times 10^{- 3}$	$7.1 \times 10^{- 3}$	$5.9 \times 10^{- 3}$	$7.0 \times 10^{- 3}$
	ACF	$0.2738$	$0.2611$	$0.2932$	$0.3352$	$0.3082$
$c = - 65$	Mode	$- 60.75$	$- 64.50$	$- 65.00$	$- 65.00$	$- 65.00$
	Std	$2.9 \times 10^{- 1}$	$3.7 \times 10^{- 1}$	$2.7 \times 10^{- 1}$	$2.8 \times 10^{- 1}$	$2.6 \times 10^{- 1}$
	ACF	$0.9223$	$0.3071$	$0.2793$	$0.3701$	$0.3117$
$d = 6.0$	Mode	$6.50$	$6.10$	$6.10$	$6.05$	$6.05$
	Std	$1.4 \times 10^{- 1}$	$8.2 \times 10^{- 2}$	$1.0 \times 10^{- 1}$	$9.3 \times 10^{- 2}$	$9.8 \times 10^{- 2}$
	ACF	$0.6096$	$0.2750$	$0.3220$	$0.2985$	$0.3176$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Inoue, H.; Hukushima, K.; Omori, T. Estimating Distributions of Parameters in Nonlinear State Space Models with Replica Exchange Particle Marginal Metropolis–Hastings Method. Entropy 2022, 24, 115. https://0-doi-org.brum.beds.ac.uk/10.3390/e24010115

AMA Style

Inoue H, Hukushima K, Omori T. Estimating Distributions of Parameters in Nonlinear State Space Models with Replica Exchange Particle Marginal Metropolis–Hastings Method. Entropy. 2022; 24(1):115. https://0-doi-org.brum.beds.ac.uk/10.3390/e24010115

Chicago/Turabian Style

Inoue, Hiroaki, Koji Hukushima, and Toshiaki Omori. 2022. "Estimating Distributions of Parameters in Nonlinear State Space Models with Replica Exchange Particle Marginal Metropolis–Hastings Method" Entropy 24, no. 1: 115. https://0-doi-org.brum.beds.ac.uk/10.3390/e24010115

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Estimating Distributions of Parameters in Nonlinear State Space Models with Replica Exchange Particle Marginal Metropolis–Hastings Method

Abstract

1. Introduction

2. Methods

2.1. State Space Model

2.2. Particle Marginal Metropolis–Hastings Method

2.3. Proposed Method

2.3.1. Brief Summary of Our Proposed Method

2.3.2. Introducing the Replica Exchange Method into the PMMH Method

2.3.3. Relations among Particle Markov Chain Monte Carlo Methods

3. Results

3.1. Izhikevich Neuron Model

3.2. Lévy-Driven Stochastic Volatility Model

4. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI