Next Article in Journal
A Practical Cross-Sectional Framework to Contextual Reactivity in Personality: Response Times as Indicators of Reactivity to Contextual Cues
Next Article in Special Issue
Automated Test Assembly in R: The eatATA Package
Previous Article in Journal
Conditional or Pseudo Exact Tests with an Application in the Context of Modeling Response Times
 
 
Article
Peer-Review Record

Comparison of Recent Acceleration Techniques for the EM Algorithm in One- and Two-Parameter Logistic IRT Models

by Marie Beisemann *,†, Ortrud Wartlick and Philipp Doebler
Reviewer 1: Anonymous
Reviewer 2:
Submission received: 14 September 2020 / Revised: 29 October 2020 / Accepted: 30 October 2020 / Published: 10 November 2020

Round 1

Reviewer 1 Report

This is a very well written paper. A short list of typos appears below. The authors should be congratulated on an expert contribution in the area of using accelerated EM within IRT work. The findings are valuable to psychometricians in particular, who seek to improve latent variable estimations. Although quite technical (which is to be expected), the narrative flow is logical, coherent and exceptionally well presented. The authors may of course lose a small grouping of practitioner psychologists but the audience to whom this paper is aimed will surely find the contents very informative indeed. I know I did. I recommend publication. 

List of typos/grammar-related concerns:

Line 21: 'While conditional maximum likelihood is a feasible' - delete the word 'a'.

Line 188: 'show that QN methods can b very efficient:' correct 'b' to 'be'. 

Lines 260-261: 'However, since the evaluation of L is often costly less than an evaluation
261 of F'. This could be re-phrased as it's a bit confusing. 

Lines 377-378: The author comment should be deleted. 

Line 416: 'QN even provided deceleration': perhaps 'QN also provided deceleration'. 

Line 459: 'Another limitation is constituted by the failed convergence': delete 'constituted by the'. 

Line 463: 'While we did therewith obtain perfect convergence': delete 'therewith'.

 

Author Response

Overall Response: Thank you very much for your positive reception of our manuscript for which we are very grateful. We truly are very happy that you liked our work. Thank you also for your pointers and corrections, which we have very happily taken care of in our revised manuscript.

Below, we respond to each correction. To allow for easier reading, we have pasted the comments into this text and marked our responses with a bold-face "Response" and then written our responses in italics.

Again, thank you very much for having taken the time to review our manuscript and providing us with feedback.

Line 21: 'While conditional maximum likelihood is a feasible' - delete the word 'a'.

Response: Thank you for pointing out the typo, we have removed the ‘a’.

Line 188: 'show that QN methods can b very efficient:' correct 'b' to 'be'. 

Response: Thank you for pointing out the typo, we have corrected the ‘b’ to ‘be’.

Lines 260-261: 'However, since the evaluation of L is often costly less than an evaluation
261 of F'. This could be re-phrased as it's a bit confusing. 

Response: Thank you for pointing this out, we absolutely agree that this could be confusing. We have now re-phrased that sentence to ‘However, this may not be too great of a disadvantage, as the evaluation of $L$ is often not as costly.’ We hope, this is clearer now.

Lines 377-378: The author comment should be deleted. 

Response: Thank you for catching our mistake here, we are very sorry that we missed this in our last read-through and are quite embarrassed that we had left it in. In any case, we have removed the author comment now.

Line 416: 'QN even provided deceleration': perhaps 'QN also provided deceleration'. 

Response: Thank you for the idea, we have adjusted the manuscript accordingly.

Line 459: 'Another limitation is constituted by the failed convergence': delete 'constituted by the'. 

Response: Thank you for the pointer. As we have found the bug in the turboEM package that caused the first-order SQUAREM method to fail during the revision of the manuscript, we have now removed that passage from the manuscript altogether, as it was fortunately no longer an issue.

Line 463: 'While we did therewith obtain perfect convergence': delete 'therewith'.

Response: Thank you for your correction. As we have explained in the response above, the section to which the correction would have applied has been removed from the manuscript.

Reviewer 2 Report

This paper caught my interest because I implemented the SQUAREM method in OpenMx for EM acceleration. This paper represents an impressive amount of work; this makes it difficult to review. I appreciate the 2-dimensional plots of EM trajectories. I appreciate that online supplement provides complete R code to reproduce the results.

Line 616: "For global convergence, modify α (section ??);" -- The ?? needs to be filled in.

I don't really see the point of the angle change plots in Figure 5/6. I agree that the oscillation is curious, but what does it mean? Looking back at my old code where I implemented Ramsay (1975), it looks like I interpret oscillations of increasing magnitude as a catastrophic increase in step size. However, this algorithm is probably obsolete compared to the ones you reviewed. In short, I'd like to know how you interpret the oscillation, or just remove the Figures and de-emphasize it. "How successful this has the potential to be for each acceleration method also depends on the shape of the trajectory." -- I'm not sure that change-in-angle is the best way to summarize the difference in trajectory shape.

Line 359: "Studying the trajectories of the standard EM algorithm in our simulation study, we observed that, using starting values provided by mirt, the majority of trajectories found themselves in proximity of the fixed point right from the starting point." -- This is actually bad and confounds EM acceleration performance with the selection of starting points. I'd much rather you consistently set a dumb starting point so the only different between EM runs is the acceleration method. At a minimum, please clarify if this was the case. If you can re-run all your simulations with the same dumb starting point then that would be great.

"[NOTE:] Das war meine Ueberlegung als Erklaerung fuer die Overshoots. Ich bin mir nicht sicher, ob das ganz richtig ist, was meinst du?" -- English translation?

"As there is no reason evident to us why first-order SQUAREM should not be working in this condition in conjucntion with a constraint on the parameter space, this is likely an error in our implementation." -- This is an invitation for the editor to reject or ask for a major revision. SQUAREM (Varadhan 2008) is the default EM acceleration method for OpenMx; I've never seen it fail.

"Thus, our results are not necessarily comparable to e.g. the runtime of standard EM as implemented and runtime optimized in mirt. This was mostly due to ensure comparability between all acceleration methods, as they are not all available in mirt." -- This seems like a missed opportunity. mirt is open source software. Why didn't you add your algorithms to mirt? Certainly, OpenMx would welcome a similar contribution. At least your algorithms are available in turboEM; that's good.

Line 493-494: Yes, of course we should ignore absolute CPU times. No need to apologize. Maybe it would be better to report relative CPU times in your tables instead of absolute times.

There are more mysterious ?? in Algorithm 3 pseudocode. Should be fixed.

Ramsay, J. O. (1975). Solving Implicit Equations in Psychometric Data Analysis. Psychometrika, 40(3), 337-360.

Author Response

Overall response: Thank you very much for having taken the time to read and review our work as well as for having provided us with such valuable feedback which we believe helped us improve the quality of our manuscript substantially. 

Below, we respond to each one of your comments. For easier reading, we have pasted them into this document and written our response below. All our responses are marked with a bold-face "Response" and then written in italics. All changes to the manuscript have been highlighted in yellow so that they are easier to track.

Line 616: "For global convergence, modify α (section ??);" -- The ?? needs to be filled in.

Response: Thank you for catching that reference error. The reference was from an older version of this work where we had a separate section on how to modify $\alpha$ for global convergence. However, in order to shorten the manuscript, this is now touched upon in the introduction / theory part of the paper and no longer has a separate section. We missed the fact that their was still a reference to that section in the pseudo-code, for which we apologize. Thank you for pointing it out to us, we have now removed the reference.

I don't really see the point of the angle change plots in Figure 5/6. I agree that the oscillation is curious, but what does it mean? Looking back at my old code where I implemented Ramsay (1975), it looks like I interpret oscillations of increasing magnitude as a catastrophic increase in step size. However, this algorithm is probably obsolete compared to the ones you reviewed. In short, I'd like to know how you interpret the oscillation, or just remove the Figures and de-emphasize it. "How successful this has the potential to be for each acceleration method also depends on the shape of the trajectory." -- I'm not sure that change-in-angle is the best way to summarize the difference in trajectory shape.

Response: Thank you for your comment on the angle change plots. We absolutely agree with you that change-in-angle is not necessarily the best way to summarize the difference in trajectory shape; it most certainly isn’t the perfect summary of information here. However, we have found it difficult to come up with an alternative summary or depiction of the results regarding the trajectories. E.g., a mean trajectory was unfortunately not an option, as for each trial, the underlying true parameter value(s) for all items were sampled randomly and therefore different in every trial. We also still believe that observing that these kinds of overshoots happen is interesting and valuable, and we would therefore like to shy away from removing these plots altogether. Nonetheless, you absolutely have a point and we would like to take this issue into account appropriately. As a compromise we hope that you will find acceptable, we have now shortened the description of the angle plots in the results (l. 274-295) as well as their discussion (l. 372-390) so that they are de-emphasized and added the drawbacks of our graphical summary in more detail to the discussion (l. 391-394). We have also depicted the newly added plots for the additional start values (as detailed below in response to your next point) in the appendix.

Line 359: "Studying the trajectories of the standard EM algorithm in our simulation study, we observed that, using starting values provided by mirt, the majority of trajectories found themselves in proximity of the fixed point right from the starting point." -- This is actually bad and confounds EM acceleration performance with the selection of starting points. I'd much rather you consistently set a dumb starting point so the only different between EM runs is the acceleration method. At a minimum, please clarify if this was the case. If you can re-run all your simulations with the same dumb starting point then that would be great.

Response: Thank you for your very good suggestion. Just to explain our reasoning behind using the mirt start values: As this paper is directed towards psychological researchers who wish to apply the acceleration methods we have summarized, explained and compared in this work, as they might be unfamiliar to a lot of psychological researchers, we wanted our simulation settings to reflect the type of situations psychological researchers would actually use the acceleration methods in. However, you are of course right that this is much too trivial a setting to winkle out any very good performances by the algorithms (if this is generally of interest and not just in a psychometric setting, the papers in which the methods were first presented by their authors might be helpful). As we absolutely understand your point and value your suggestion and input, we have now added to our simulations: Now, we have run the six simulation conditions once with the start values provided by mirt and one with less ideal start values, where we have set all difficulties to 0 and all discriminations to 1. We have adjusted the description of our simulation procedure (l. 511-514, l. 538-539), our results (l. 297 - 347), and our discussion (l. 353-355, l. 367-370, l. 413-414, l. 425-427) accordingly (all changes are highlighted in yellow in the manuscript), as well as added two new tables (Tables 3 and 4) to depict the results for the less ideal start values.

"[NOTE:] Das war meine Ueberlegung als Erklaerung fuer die Overshoots. Ich bin mir nicht sicher, ob das ganz richtig ist, was meinst du?" -- English translation?

Response: We are so sorry and ever so embarrassed that we have left this comment in the manuscript, which was just asking one of the authors asking for input from the others on the explanation of the results provided above that comment. Clearly, this should not be part of the manuscript and has now been removed. Again, we apologize for having forgotten to remove it before.

"As there is no reason evident to us why first-order SQUAREM should not be working in this condition in conjucntion with a constraint on the parameter space, this is likely an error in our implementation." -- This is an invitation for the editor to reject or ask for a major revision. SQUAREM (Varadhan 2008) is the default EM acceleration method for OpenMx; I've never seen it fail.

Response: During the revision of this manuscript, we have taken another go at finding the error which had caused the convergence problems in our original simulations. We hadn’t been able to find it before, but now have been so lucky to discover a bug in the turboEM package which caused missing values and thus the method to fail (it was only in specific situations that this issue would even arise, so we do not mean to criticize the turboEM package at all – it is an immensely helpful and convenient package in our opinion; we just wanted to explain to you what the problem was). Fixing this remedied the convergence problems. We apologize that we hadn’t been able to locate the bug before. Thank you for inspiring us to do so and thereby increasing the quality of our manuscript substantially. We have re-run our simulations (also with additional different start values as explained above in response to another comment) and have no longer experienced any convergence issues. We have removed any reference to the convergence issues from the manuscript (changes highlighted in yellow).

"Thus, our results are not necessarily comparable to e.g. the runtime of standard EM as implemented and runtime optimized in mirt. This was mostly due to ensure comparability between all acceleration methods, as they are not all available in mirt." -- This seems like a missed opportunity. mirt is open source software. Why didn't you add your algorithms to mirt? Certainly, OpenMx would welcome a similar contribution. At least your algorithms are available in turboEM; that's good.

Response: Thank you very much for your suggestion. We completely agree that of course implementing all the reviewed algorithms in mirt would be useful. However, we are afraid that this undertaking is beyond the scope of this work, for which our aim was to review, discuss and compare these recently proposed and very promising acceleration methods in a psychometric setting. While we understand your point and value your suggestion, we hope that you can also understand our stance. As the runtime overhead due to the implementation is constant for all accelerators as well as standard EM, and especially in conjunction with adding the information about the relative runtimes (as you suggested below), we think our comparison is helpful to psychological researchers, and maybe also in regard to which accelerators in particular should be implemented in mirt in the future. Especially so, as (as you have also pointed out) all methods are available as implementations in R. Nonetheless, to give this more consideration in our work, we now mention the possibility of implementing the reviewed acceleration methods in mirt in the future to the discussion (l. 477-481).

Line 493-494: Yes, of course we should ignore absolute CPU times. No need to apologize. Maybe it would be better to report relative CPU times in your tables instead of absolute times.

Response: Thank you for the great idea. We have added a column with the relative CPU times (relative to the time of the standard EM algorithm) to the tables presenting the results of our simulation. This is really helpful in getting a better grasp of the results by looking at the tables, so thank you very much for helping us improve the quality of our manuscript.

There are more mysterious ?? in Algorithm 3 pseudocode. Should be fixed.

Response: Thank you for pointing out the broken reference. This was a reference to an equation in the introduction / theory which was originally numbered in an earlier, longer version of the manuscript. Similarly to what we have explained above, this equation was moved into the text and now no longer has a number to be referenced, in order to shorten the manuscript. We apologize for having forgotten to remove the now obsolete reference. We have corrected this mistake now.

Back to TopTop