Next Article in Journal
General Formulas for the Central and Non-Central Moments of the Multinomial Distribution
Previous Article in Journal
A New Biased Estimator to Combat the Multicollinearity of the Gaussian Linear Regression Model
 
 
Article
Peer-Review Record

An Upper Bound of the Bias of Nadaraya-Watson Kernel Regression under Lipschitz Assumptions

by Samuele Tosatto 1,*, Riad Akrour 1 and Jan Peters 1,2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Submission received: 30 October 2020 / Revised: 12 December 2020 / Accepted: 25 December 2020 / Published: 30 December 2020
(This article belongs to the Section Regression Models)

Round 1

Reviewer 1 Report


The Authors propose new non-asymptotic upper bound of the bias of Nadaraya-Watson density
estimator using weak Lipschitz assumptins and Gaussian kernels. The obtained results are
interested, important, have many apllications especially in multidimesional data analsis and
artifiacial inteligence e.g. self-driving cars, where we wante to know the prediction-error. The
Authors bound requires less restrictive assumptions than the previous results.
The presented results seem to be correct. The results are supported by numerical simulations.
I have some minor remarks:
1. Page 2, Theorem 1, $m(s_i)$ is undefined and $s_i$,
2. Page 2, Theorem 1, if $f_X$ is the density of $y_i$ or for $(x_i,y_i)$?
3. Page 2, Theorem 1, what means $\sigma(\epsilon_i)$? it is a function of errors or it is
multiplication?
4. Page 4, In Theorem 2 and later, you use $\hat(f_n(x))$ but in the previous pages you use
$\hat(m_n(x))$. This is a certain inconsistency in the notation.
5. Page 4, In Theorem 2, what is the exact definition of function $\phi(x,y,z)$? This is the
scaled the standard normal density?
6. Page 5, Line 113. In what sense is the convergence of $L_m$ ? With respect to $n$?
7. Page 6, In Numerical Simulation, what distribution has error term $\epsilon$ in simulations?
8. Page 11, Proposition 1 holds under (A2).
9. Page 11, In Proposition 4, $\l$ without bold
10. Page 12, the first line in the proof, $h_i$ replace by $h$
I recommend this paper for publication under minor revision.

 

Comments for author File: Comments.pdf

Author Response

Dear reviewer,

 

thanks for your time! Your suggestion was very precious to improve the quality of our submission.

In the following, we answer each of your concerns.

 

  1. You are right. We corrected it.
  2. It is the density of x_i. We specified it in the paper.
  3. This bug is due to previous notation. We corrected it.
  4. You are right. We replaced them with $\hat{m}_n(x)$.
  5. Yes, it was. Now, this quantity has been replaced with the integral of the kernel function.
  6. We do not consider the number of samples in this work, which are always assumed to be infinite. The limit $L_m \to 0$  is meant to show that the bias goes to $0$ when $L_m$ goes to $0$. We replaced $L_m\to 0$ with $L_m=0$. Instead, we kept $h \to 0$ since in the theorem we require $h > 0$ and with some particular choice o kernel, the bound can be undefined with $h=0$.
  7. The noise used in the  "uni-dimensional" and "multidimensional" analysis was normal with mean 0 and standard deviation $0.05$. In the new paragraph "Realistic Scenario" we used no noise. We updated this information in the new submission.
  8. Yes, you are right. We inserted this information in the new submission.
  9. Yes, we corrected.

Thanks again!

 

Best regards, 

The authors.

 

Reviewer 2 Report

Report on   "An Upper Bound of the Bias of Nadaraya–Watson  Kernel Regression under Lipschitz Assumptions" with # stats-1002323  submitted to  Stats  by  Tosatto et al.

Major Contributions:

In this paper, the authors consider the Nadaraya–Watson kernel estimator. Its asymptotic bias has been studied by Rosenblatt.  This paper   proposed an upper bound of the bias,  which holds for finite bandwidths using Lipschitz assumptions   and Gaussian kernels.  The authors conducted simulation studies  to  show that the proposed approach works well.

                                                                                        

Main Comments:

  1. In p. 3, the paper displayed the Lipschitz condition. It is worthwhile  to    explain these conditions. In particular, how to check it  in the simulation and real data analysis? It is interesting to  weaken this Lipschitz condition in the main result of this paper.

 

  1. The Gaussian kernels in the paper  are strong. It is worthwhile to consider  other kernels  for the verification of an upper bound of the bias.

 

  1. It is of interest to add a real example, which is used to illustrate the proposed methods.

 

  1. It is interesting to compare the computational cost between the proposed method and  several competing alternatives in the simulation study.

 

     5. There are numerous typos and grammatical    Please improve the poor             organization.

 

Minor  Comments:

 

  1. 1,   line  20,     ->  “e.g.,".
  2. 3, line  76,  add “,” before   "we".
  3. 5, line  113,     add    "."  at the end of the equation.
  4. 10, line 248, add “vol number”.
  5. 10, line 255,    add  "place".
  6. 10, line 256,    add    “vol number”.

Author Response

Dear reviewer, 

 

thanks a lot for your feedback.

We built a new derivation that works for a broader family of kernels. All we require, is that few integrals have finite solution (\int_{-inf}^{inf} k(x) \de x, \int_{-inf}^{inf} k(x)e^{-xL} \de x and \int_{-inf}^{inf} k(x)e^{-xL}x \de x  - where L is a non-negative constant and k(x) is the kernel function).

In the main paper, we present some numerical analyses with the Gaussian kernel, Box kernel, and Triangular kernel. In the Appendix, we show more numerical simulations, and we compute all integrals necessary for computing the bound for the different kernels discussed in the main paper. 

We will now answer your specific concerns.

    1. We added a small explanation in lines 55-61, 91-93 of the new submission. The Lipschitz continuity is not very restrictive, and it is very common in fields like optimization or statistical machine learning. We think that this assumption is very reasonable for many different regression functions. In our particular case, we give the possibility to select the Lipschitz constant of a finite interval of the regression function, which widens, even more, the class of admissible functions. Furthermore, it allows functions like $f(x) = |x|$ which are not admissible for the Rosenblatt's analysis (since it requires finite m''). Besides that, we agree that the Lipschitz constant might be unknown. In those cases, it can be estimated from the data. We agree that it would be very interesting to weaken this condition, but it is far from trivial.
    2. We agree with you, and we have been able to relax this assumption. We tested the numerical simulation of three different kernels in total.
    3. We agree. We added the regression of a dynamical system (we have chosen an inverted pendulum). We estimated the Lipschitz constant from the data, and we plotted the bias and our upper bound.
    4. When the mentioned integrals are known, our bound requires negligible computation (i.e., the evaluation of the formula in Theorem 2 and 3). When the integrals are not known, one needs to use numerical integration. Note that the number of integrals to solve is still limited, and grows linearly with the number of dimensions. We included this information in the new submission.

 

 

Thanks again for your time,

Best regards

 

 

 

 

Round 2

Reviewer 2 Report

The new version has a significant improvement.The paper  addressed my concerns of reviewer.

Back to TopTop