Bayesian estimation of multidimensional item response theory (IRT) models in large data sets may come with impractical computational burdens when general-purpose Markov chain Monte Carlo (MCMC) samplers are employed. Variational Bayes (VB)—a method for approximating the posterior distribution—poses a potential remedy. Stan’s general-purpose [...] Read more.

Bayesian estimation of multidimensional item response theory (IRT) models in large data sets may come with impractical computational burdens when general-purpose Markov chain Monte Carlo (MCMC) samplers are employed. Variational Bayes (VB)—a method for approximating the posterior distribution—poses a potential remedy. Stan’s general-purpose VB algorithms have drastically improved the accessibility of VB methods for a wide psychometric audience. Using marginal maximum likelihood (MML) and MCMC as benchmarks, the present simulation study investigates the utility of Stan’s built-in VB function for estimating multidimensional IRT models with between-item dimensionality. VB yielded a marked speed-up in comparison to MCMC, but did not generally outperform MML in terms of run time. VB estimates were trustworthy only for item difficulties, while bias in item discriminations depended on the model’s dimensionality. Under realistic conditions of non-zero correlations between dimensions, VB correlation estimates were subject to severe bias. The practical relevance of performance differences is illustrated with data from PISA 2018. We conclude that in its current form, Stan’s built-in VB algorithm does not pose a viable alternative for estimating multidimensional IRT models. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

28 pages, 380 KiB

Open AccessArticle

An Introduction to Factored Regression Models with Blimp

by Brian Tinnell Keller

Psych 2022, 4(1), 10-37; https://0-doi-org.brum.beds.ac.uk/10.3390/psych4010002 - 31 Dec 2021

Cited by 1 | Viewed by 3296

Abstract

In this paper, we provide an introduction to the factored regression framework. This modeling framework applies the rules of probability to break up or “factor” a complex joint distribution into a product of conditional regression models. Using this framework, we can easily specify [...] Read more.

In this paper, we provide an introduction to the factored regression framework. This modeling framework applies the rules of probability to break up or “factor” a complex joint distribution into a product of conditional regression models. Using this framework, we can easily specify the complex multivariate models that missing data modeling requires. The article provides a brief conceptual overview of factored regression and describes the functional notation used to conceptualize the models. Furthermore, we present a conceptual overview of how the models are estimated and imputations are obtained. Finally, we discuss how users can use the free software package, Blimp, to estimate the models in the context of a mediation example. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

19 pages, 425 KiB

Open AccessArticle

Automated Essay Scoring Using Transformer Models

by Sabrina Ludwig, Christian Mayer, Christopher Hansen, Kerstin Eilers and Steffen Brandt

Psych 2021, 3(4), 897-915; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3040056 - 14 Dec 2021

Cited by 12 | Viewed by 5672

Abstract

Automated essay scoring (AES) is gaining increasing attention in the education sector as it significantly reduces the burden of manual scoring and allows ad hoc feedback for learners. Natural language processing based on machine learning has been shown to be particularly suitable for [...] Read more.

Automated essay scoring (AES) is gaining increasing attention in the education sector as it significantly reduces the burden of manual scoring and allows ad hoc feedback for learners. Natural language processing based on machine learning has been shown to be particularly suitable for text classification and AES. While many machine-learning approaches for AES still rely on a bag of words (BOW) approach, we consider a transformer-based approach in this paper, compare its performance to a logistic regression model based on the BOW approach, and discuss their differences. The analysis is based on 2088 email responses to a problem-solving task that were manually labeled in terms of politeness. Both transformer models considered in the analysis outperformed without any hyperparameter tuning of the regression-based model. We argue that, for AES tasks such as politeness classification, the transformer-based approach has significant advantages, while a BOW approach suffers from not taking word order into account and reducing the words to their stem. Further, we show how such models can help increase the accuracy of human raters, and we provide a detailed instruction on how to implement transformer-based models for one’s own purposes. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

24 pages, 2913 KiB

Open AccessArticle

Cognitively Diagnostic Analysis Using the G-DINA Model in R

by Qingzhou Shi, Wenchao Ma, Alexander Robitzsch, Miguel A. Sorrel and Kaiwen Man

Psych 2021, 3(4), 812-835; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3040052 - 08 Dec 2021

Cited by 6 | Viewed by 5191

Abstract

Cognitive diagnosis models (CDMs) have increasingly been applied in education and other fields. This article provides an overview of a widely used CDM, namely, the G-DINA model, and demonstrates a hands-on example of using multiple R packages for a series of CDM analyses. [...] Read more.

Cognitive diagnosis models (CDMs) have increasingly been applied in education and other fields. This article provides an overview of a widely used CDM, namely, the G-DINA model, and demonstrates a hands-on example of using multiple R packages for a series of CDM analyses. This overview involves a step-by-step illustration and explanation of performing Q-matrix evaluation, CDM calibration, model fit evaluation, item diagnosticity investigation, classification reliability examination, and the result presentation and visualization. Some limitations of conducting CDM analysis in R are also discussed. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

29 pages, 734 KiB

Open AccessArticle

Comparing the MCMC Efficiency of JAGS and Stan for the Multi-Level Intercept-Only Model in the Covariance- and Mean-Based and Classic Parametrization

by Martin Hecht, Sebastian Weirich and Steffen Zitzmann

Psych 2021, 3(4), 751-779; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3040048 - 30 Nov 2021

Cited by 8 | Viewed by 3203

Abstract

Bayesian MCMC is a widely used model estimation technique, and software from the BUGS family, such as JAGS, have been popular for over two decades. Recently, Stan entered the market with promises of higher efficiency fueled by advanced and more sophisticated algorithms. With [...] Read more.

Bayesian MCMC is a widely used model estimation technique, and software from the BUGS family, such as JAGS, have been popular for over two decades. Recently, Stan entered the market with promises of higher efficiency fueled by advanced and more sophisticated algorithms. With this study, we want to contribute empirical results to the discussion about the sampling efficiency of JAGS and Stan. We conducted three simulation studies in which we varied the number of warmup iterations, the prior informativeness, and sample sizes and employed the multi-level intercept-only model in the covariance- and mean-based and in the classic parametrization. The target outcome was MCMC efficiency measured as effective sample size per second (ESS/s). Based on our specific (and limited) study setup, we found that (1) MCMC efficiency is much higher for the covariance- and mean-based parametrization than for the classic parametrization, (2) Stan clearly outperforms JAGS when the covariance- and mean-based parametrization is used, and that (3) JAGS clearly outperforms Stan when the classic parametrization is used. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

23 pages, 541 KiB

Open AccessArticle

Concepts and Coefficients Based on John L. Holland’s Theory of Vocational Choice—Examining the R Package holland

by Florian G. Hartmann, Jörg-Henrik Heine and Bernhard Ertl

Psych 2021, 3(4), 728-750; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3040047 - 29 Nov 2021

Cited by 4 | Viewed by 10734

Abstract

John L. Holland’s theory of vocational choice is one of the most prominent career theories and is used by both researchers and practitioners around the world. The theory states that people should seek work environments that fit their vocational interests in order to [...] Read more.

John L. Holland’s theory of vocational choice is one of the most prominent career theories and is used by both researchers and practitioners around the world. The theory states that people should seek work environments that fit their vocational interests in order to be satisfied and successful. Its application in research and practice requires the determination of coefficients, which quantify its core concepts such as person-environment fit. The recently released R package holland aims at providing a holistic collection of the references, descriptions and calculations of the most important coefficients. The current paper presents the package and examines it in terms of its application for research and practice. For this purpose, the functions of the package are applied and discussed. Furthermore, recommendations are made in the case of multiple coefficients for the same theoretical concept and features that future releases should include are discussed. The R package holland is a promising computational environment providing multiple coefficients for Holland’s most important theoretical concepts. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

14 pages, 493 KiB

Open AccessArticle

Anonymiced Shareable Data: Using mice to Create and Analyze Multiply Imputed Synthetic Datasets

by Thom Benjamin Volker and Gerko Vink

Psych 2021, 3(4), 703-716; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3040045 - 23 Nov 2021

Cited by 4 | Viewed by 4018

Abstract

Synthetic datasets simultaneously allow for the dissemination of research data while protecting the privacy and confidentiality of respondents. Generating and analyzing synthetic datasets is straightforward, yet, a synthetic data analysis pipeline is seldom adopted by applied researchers. We outline a simple procedure for [...] Read more.

Synthetic datasets simultaneously allow for the dissemination of research data while protecting the privacy and confidentiality of respondents. Generating and analyzing synthetic datasets is straightforward, yet, a synthetic data analysis pipeline is seldom adopted by applied researchers. We outline a simple procedure for generating and analyzing synthetic datasets with the multiple imputation software mice (Version 3.13.15) in R. We demonstrate through simulations that the analysis results obtained on synthetic data yield unbiased and valid inferences and lead to synthetic records that cannot be distinguished from the true data records. The ease of use when synthesizing data with mice along with the validity of inferences obtained through this procedure opens up a wealth of possibilities for data dissemination and further research on initially private data. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

21 pages, 435 KiB

Open AccessArticle

Handling Missing Responses in Psychometrics: Methods and Software

by Shenghai Dai

Psych 2021, 3(4), 673-693; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3040043 - 19 Nov 2021

Cited by 11 | Viewed by 3899

Abstract

The presence of missing responses in assessment settings is inevitable and may yield biased parameter estimates in psychometric modeling if ignored or handled improperly. Many methods have been proposed to handle missing responses in assessment data that are often dichotomous or polytomous. Their [...] Read more.

The presence of missing responses in assessment settings is inevitable and may yield biased parameter estimates in psychometric modeling if ignored or handled improperly. Many methods have been proposed to handle missing responses in assessment data that are often dichotomous or polytomous. Their applications remain nominal, however, partly due to that (1) there is no sufficient support in the literature for an optimal method; (2) many practitioners and researchers are not familiar with these methods; and (3) these methods are usually not employed by psychometric software and missing responses need to be handled separately. This article introduces and reviews the commonly used missing response handling methods in psychometrics, along with the literature that examines and compares the performance of these methods. Further, the use of the TestDataImputation package in R is introduced and illustrated with an example data set and a simulation study. Corresponding R codes are provided. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

21 pages, 3026 KiB

Open AccessArticle

An Evaluation of DIF Tests in Multistage Tests for Continuous Covariates

by Rudolf Debelak and Dries Debeer

Psych 2021, 3(4), 618-638; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3040040 - 15 Oct 2021

Cited by 2 | Viewed by 2321

Abstract

Multistage tests are a widely used and efficient type of test presentation that aims to provide accurate ability estimates while keeping the test relatively short. Multistage tests typically rely on the psychometric framework of item response theory. Violations of item response models and [...] Read more.

Multistage tests are a widely used and efficient type of test presentation that aims to provide accurate ability estimates while keeping the test relatively short. Multistage tests typically rely on the psychometric framework of item response theory. Violations of item response models and other assumptions underlying a multistage test, such as differential item functioning, can lead to inaccurate ability estimates and unfair measurements. There is a practical need for methods to detect problematic model violations to avoid these issues. This study compares and evaluates three methods for the detection of differential item functioning with regard to continuous person covariates in data from multistage tests: a linear logistic regression test and two adaptations of a recently proposed score-based DIF test. While all tests show a satisfactory Type I error rate, the score-based tests show greater power against three types of DIF effects. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

25 pages, 1964 KiB

Open AccessArticle

The Theoretical and Statistical Ising Model: A Practical Guide in R

by Adam Finnemann, Denny Borsboom, Sacha Epskamp and Han L. J. van der Maas

Psych 2021, 3(4), 593-617; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3040039 - 08 Oct 2021

Cited by 8 | Viewed by 6420

Abstract

The “Ising model” refers to both the statistical and the theoretical use of the same equation. In this article, we introduce both uses and contrast their differences. We accompany the conceptual introduction with a survey of Ising-related software packages in R. Since [...] Read more.

The “Ising model” refers to both the statistical and the theoretical use of the same equation. In this article, we introduce both uses and contrast their differences. We accompany the conceptual introduction with a survey of Ising-related software packages in R. Since the model’s different uses are best understood through simulations, we make this process easily accessible with fully reproducible examples. Using simulations, we show how the theoretical Ising model captures local-alignment dynamics. Subsequently, we present it statistically as a likelihood function for estimating empirical network models from binary data. In this process, we give recommendations on when to use traditional frequentist estimators as well as novel Bayesian options. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

17 pages, 3206 KiB

Open AccessArticle

Bivariate Distributions Underlying Responses to Ordinal Variables

by Laura Kolbe, Frans Oort and Suzanne Jak

Psych 2021, 3(4), 562-578; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3040037 - 01 Oct 2021

Cited by 2 | Viewed by 3235

Abstract

The association between two ordinal variables can be expressed with a polychoric correlation coefficient. This coefficient is conventionally based on the assumption that responses to ordinal variables are generated by two underlying continuous latent variables with a bivariate normal distribution. When the underlying [...] Read more.

The association between two ordinal variables can be expressed with a polychoric correlation coefficient. This coefficient is conventionally based on the assumption that responses to ordinal variables are generated by two underlying continuous latent variables with a bivariate normal distribution. When the underlying bivariate normality assumption is violated, the estimated polychoric correlation coefficient may be biased. In such a case, we may consider other distributions. In this paper, we aimed to provide an illustration of fitting various bivariate distributions to empirical ordinal data and examining how estimates of the polychoric correlation may vary under different distributional assumptions. Results suggested that the bivariate normal and skew-normal distributions rarely hold in the empirical datasets. In contrast, mixtures of bivariate normal distributions were often not rejected. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

11 pages, 489 KiB

Open AccessArticle

Robust Chi-Square in Extreme and Boundary Conditions: Comments on Jak et al. (2021)

by Tihomir Asparouhov and Bengt Muthén

Psych 2021, 3(3), 542-551; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3030035 - 10 Sep 2021

Cited by 4 | Viewed by 2648

Abstract

In this article we describe a modification of the robust chi-square test of fit that yields more accurate type I error rates when the estimated model is at the boundary of the admissible space. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure A1

21 pages, 3237 KiB

Open AccessArticle

Modelling Norm Scores with the cNORM Package in R

by Sebastian Gary, Wolfgang Lenhard and Alexandra Lenhard

Psych 2021, 3(3), 501-521; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3030033 - 30 Aug 2021

Cited by 9 | Viewed by 3340

Abstract

In this article, we explain and demonstrate how to model norm scores with the cNORM package in R. This package is designed specifically to determine norm scores when the latent ability to be measured covaries with age or other explanatory variables such as [...] Read more.

In this article, we explain and demonstrate how to model norm scores with the cNORM package in R. This package is designed specifically to determine norm scores when the latent ability to be measured covaries with age or other explanatory variables such as grade level. The mathematical method used in this package draws on polynomial regression to model a three-dimensional hyperplane that smoothly and continuously captures the relation between raw scores, norm scores and the explanatory variable. By doing so, it overcomes the typical problems of classical norming methods, such as overly large age intervals, missing norm scores, large amounts of sampling error in the subsamples or huge requirements with regard to the sample size. After a brief introduction to the mathematics of the model, we describe the individual methods of the package. We close the article with a practical example using data from a real reading comprehension test. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

22 pages, 524 KiB

Open AccessArticle

Estimating the Stability of Psychological Dimensions via Bootstrap Exploratory Graph Analysis: A Monte Carlo Simulation and Tutorial

by Alexander P. Christensen and Hudson Golino

Psych 2021, 3(3), 479-500; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3030032 - 27 Aug 2021

Cited by 70 | Viewed by 6760

Abstract

Exploratory Graph Analysis (EGA) has emerged as a popular approach for estimating the dimensionality of multivariate data using psychometric networks. Sampling variability, however, has made reproducibility and generalizability a key issue in network psychometrics. To address this issue, we have developed a novel [...] Read more.

Exploratory Graph Analysis (EGA) has emerged as a popular approach for estimating the dimensionality of multivariate data using psychometric networks. Sampling variability, however, has made reproducibility and generalizability a key issue in network psychometrics. To address this issue, we have developed a novel bootstrap approach called Bootstrap Exploratory Graph Analysis (bootEGA). bootEGA generates a sampling distribution of EGA results where several statistics can be computed. Descriptive statistics (median, standard error, and dimension frequency) provide researchers with a general sense of the stability of their empirical EGA dimensions. Structural consistency estimates how often dimensions are replicated exactly across the bootstrap replicates. Item stability statistics provide information about whether dimensions are unstable due to misallocation (e.g., item placed in the wrong dimension), multidimensionality (e.g., item belonging to more than one dimension), and item redundancy (e.g., similar semantic content). Using a Monte Carlo simulation, we determine guidelines for acceptable item stability. After, we provide an empirical example that demonstrates how bootEGA can be used to identify structural consistency issues (including a fully reproducible R tutorial). In sum, we demonstrate that bootEGA is a robust approach for identifying the stability and robustness of dimensionality in multivariate data. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

25 pages, 3122 KiB

Open AccessArticle

shinyReCoR: A Shiny Application for Automatically Coding Text Responses Using R

by Nico Andersen and Fabian Zehner

Psych 2021, 3(3), 422-446; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3030030 - 16 Aug 2021

Cited by 10 | Viewed by 3926

Abstract

In this paper, we introduce shinyReCoR: a new app that utilizes a cluster-based method for automatically coding open-ended text responses. Reliable coding of text responses from educational or psychological assessments requires substantial organizational and human effort. The coding of natural language in responses [...] Read more.

In this paper, we introduce shinyReCoR: a new app that utilizes a cluster-based method for automatically coding open-ended text responses. Reliable coding of text responses from educational or psychological assessments requires substantial organizational and human effort. The coding of natural language in responses to tests depends on the texts’ complexity, corresponding coding guides, and the guides’ quality. Manual coding is thus not only expensive but also error-prone. With shinyReCoR, we provide a more efficient alternative. The use of natural language processing makes texts utilizable for statistical methods. shinyReCoR is a Shiny app deployed as an R-package that allows users with varying technical affinity to create automatic response classifiers through a graphical user interface based on annotated data. The present paper describes the underlying methodology, including machine learning, as well as peculiarities of the processing of language in the assessment context. The app guides users through the workflow with steps like text corpus compilation, semantic space building, preprocessing of the text data, and clustering. Users can adjust each step according to their needs. Finally, users are provided with an automatic response classifier, which can be evaluated and tested within the process. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

18 pages, 360 KiB

Open AccessArticle

Between-Item Multidimensional IRT: How Far Can the Estimation Methods Go?

by Mauricio Garnier-Villarreal, Edgar C. Merkle and Brooke E. Magnus

Psych 2021, 3(3), 404-421; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3030029 - 09 Aug 2021

Cited by 7 | Viewed by 3376

Abstract

Multidimensional item response models are known to be difficult to estimate, with a variety of estimation and modeling strategies being proposed to handle the difficulties. While some previous studies have considered the performance of these estimation methods, they typically include only one or [...] Read more.

Multidimensional item response models are known to be difficult to estimate, with a variety of estimation and modeling strategies being proposed to handle the difficulties. While some previous studies have considered the performance of these estimation methods, they typically include only one or two methods, or a small number of factors. In this paper, we report on a large simulation study of between-item multidimensional IRT estimation methods, considering five different methods, a variety of sample sizes, and up to eight factors. This study provides a comprehensive picture of the methods’ relative performance, as well as each individual method’s strengths and weaknesses. The study results lead us to make recommendations for applied research, related to which estimation methods should be used under various scenarios. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

18 pages, 1961 KiB

Open AccessArticle

cdcatR: An R Package for Cognitive Diagnostic Computerized Adaptive Testing

by Miguel A. Sorrel, Pablo Nájera and Francisco J. Abad

Psych 2021, 3(3), 386-403; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3030028 - 09 Aug 2021

Cited by 4 | Viewed by 2766

Abstract

Cognitive diagnosis models (CDMs) are confirmatory latent class models that provide fine-grained information about skills and cognitive processes. These models have gained attention in the last few years because of their usefulness in educational and psychological settings. Recently, numerous developments have been made [...] Read more.

Cognitive diagnosis models (CDMs) are confirmatory latent class models that provide fine-grained information about skills and cognitive processes. These models have gained attention in the last few years because of their usefulness in educational and psychological settings. Recently, numerous developments have been made to allow for the implementation of cognitive diagnosis computerized adaptive testing (CD-CAT). Despite methodological advances, CD-CAT applications are still scarce. To facilitate research and the emergence of empirical applications in this area, we have developed the cdcatR package for R software. The purpose of this document is to illustrate the different functions included in this package. The package includes functionalities for data generation, model selection based on relative fit information, implementation of several item selection rules (including item exposure control), and CD-CAT performance evaluation in terms of classification accuracy, item exposure, and test length. In conclusion, an R package is made available to researchers and practitioners that allows for an easy implementation of CD-CAT in both simulation and applied studies. Ultimately, this is expected to facilitate the development of empirical applications in this area. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

26 pages, 1101 KiB

Open AccessArticle

Predicting Differences in Model Parameters with Individual Parameter Contribution Regression Using the R Package ipcr

by Manuel Arnold, Andreas M. Brandmaier and Manuel C. Voelkle

Psych 2021, 3(3), 360-385; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3030027 - 06 Aug 2021

Cited by 3 | Viewed by 2706

Abstract

Unmodeled differences between individuals or groups can bias parameter estimates and may lead to false-positive or false-negative findings. Such instances of heterogeneity can often be detected and predicted with additional covariates. However, predicting differences with covariates can be challenging or even infeasible, depending [...] Read more.

Unmodeled differences between individuals or groups can bias parameter estimates and may lead to false-positive or false-negative findings. Such instances of heterogeneity can often be detected and predicted with additional covariates. However, predicting differences with covariates can be challenging or even infeasible, depending on the modeling framework and type of parameter. Here, we demonstrate how the individual parameter contribution (IPC) regression framework, as implemented in the R package ipcr, can be leveraged to predict differences in any parameter across a wide range of parametric models. First and foremost, IPC regression is an exploratory analysis technique to determine if and how the parameters of a fitted model vary as a linear function of covariates. After introducing the theoretical foundation of IPC regression, we use an empirical data set to demonstrate how parameter differences in a structural equation model can be predicted with the ipcr package. Then, we analyze the performance of IPC regression in comparison to alternative methods for modeling parameter heterogeneity in a Monte Carlo simulation. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

12 pages, 397 KiB

Open AccessArticle

Using the Effective Sample Size as the Stopping Criterion in Markov Chain Monte Carlo with the Bayes Module in Mplus

by Steffen Zitzmann, Sebastian Weirich and Martin Hecht

Psych 2021, 3(3), 336-347; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3030025 - 30 Jul 2021

Cited by 10 | Viewed by 3154

Abstract

Bayesian modeling using Markov chain Monte Carlo (MCMC) estimation requires researchers to decide not only whether estimation has converged but also whether the Bayesian estimates are well-approximated by summary statistics from the chain. On the contrary, software such as the Bayes module in [...] Read more.

Bayesian modeling using Markov chain Monte Carlo (MCMC) estimation requires researchers to decide not only whether estimation has converged but also whether the Bayesian estimates are well-approximated by summary statistics from the chain. On the contrary, software such as the Bayes module in Mplus, which helps researchers check whether convergence has been achieved by comparing the potential scale reduction (PSR) with a prespecified maximum PSR, the size of the MCMC error or, equivalently, the effective sample size (ESS), is not monitored. Zitzmann and Hecht (2019) proposed a method that can be used to check whether a minimum ESS has been reached in Mplus. In this article, we evaluated this method with a computer simulation. Specifically, we fit a multilevel structural equation model to a large number of simulated data sets and compared different prespecified minimum ESS values with the actual (empirical) ESS values. The empirical values were approximately equal to or larger than the prespecified minimum ones, thus indicating the validity of the method. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

14 pages, 490 KiB

Open AccessArticle

Testing and Interpreting Latent Variable Interactions Using the semTools Package

by Alexander M. Schoemann and Terrence D. Jorgensen

Psych 2021, 3(3), 322-335; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3030024 - 30 Jul 2021

Cited by 26 | Viewed by 7753

Abstract

Examining interactions among predictors is an important part of a developing research program. Estimating interactions using latent variables provides additional power to detect effects over testing interactions in regression. However, when predictors are modeled as latent variables, estimating and testing interactions requires additional [...] Read more.

Examining interactions among predictors is an important part of a developing research program. Estimating interactions using latent variables provides additional power to detect effects over testing interactions in regression. However, when predictors are modeled as latent variables, estimating and testing interactions requires additional steps beyond the models used for regression. We review methods of estimating and testing latent variable interactions with a focus on product indicator methods. Product indicator methods of examining latent interactions provide an accurate method to estimate and test latent interactions and can be implemented in any latent variable modeling software package. Significant latent interactions require additional steps (plotting and probing) to interpret interaction effects. We demonstrate how these methods can be easily implemented using functions in the semTools package with models fit using the lavaan package in R, and we illustrate how these methods work using an applied example concerning teacher stress and testing. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

14 pages, 3345 KiB

Open AccessArticle

Estimating Explanatory Extensions of Dichotomous and Polytomous Rasch Models: The eirm Package in R

by Okan Bulut, Guher Gorgun and Seyma Nur Yildirim-Erbasli

Psych 2021, 3(3), 308-321; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3030023 - 29 Jul 2021

Cited by 12 | Viewed by 3882

Abstract

Explanatory item response modeling (EIRM) enables researchers and practitioners to incorporate item and person properties into item response theory (IRT) models. Unlike traditional IRT models, explanatory IRT models can explain common variability stemming from the shared variance among item clusters and person groups. [...] Read more.

Explanatory item response modeling (EIRM) enables researchers and practitioners to incorporate item and person properties into item response theory (IRT) models. Unlike traditional IRT models, explanatory IRT models can explain common variability stemming from the shared variance among item clusters and person groups. In this tutorial, we present the R package eirm, which provides a simple and easy-to-use set of tools for preparing data, estimating explanatory IRT models based on the Rasch family, extracting model output, and visualizing model results. We describe how functions in the eirm package can be used for estimating traditional IRT models (e.g., Rasch model, Partial Credit Model, and Rating Scale Model), item-explanatory models (i.e., Linear Logistic Test Model), and person-explanatory models (i.e., latent regression models) for both dichotomous and polytomous responses. In addition to demonstrating the general functionality of the eirm package, we also provide real-data examples with annotated R codes based on the Rosenberg Self-Esteem Scale. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

16 pages, 1013 KiB

Open AccessArticle

RALSA: Design and Implementation

by Plamen Vladkov Mirazchiyski

Psych 2021, 3(2), 233-248; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3020018 - 12 Jun 2021

Cited by 1 | Viewed by 3919

Abstract

International large-scale assessments (ILSAs) provide invaluable information for researchers and policy makers. Analysis of their data, however, requires methods that go beyond the usual analysis techniques assuming simple random sampling. Several software packages that serve this purpose are available. One such is the [...] Read more.

International large-scale assessments (ILSAs) provide invaluable information for researchers and policy makers. Analysis of their data, however, requires methods that go beyond the usual analysis techniques assuming simple random sampling. Several software packages that serve this purpose are available. One such is the R Analyzer for Large-Scale Assessments (RALSA), a newly developed R package. The package can work with data from a large number of ILSAs. It was designed for user experience and is suitable for analysts who lack technical expertise and/or familiarity with the R programming language and statistical software. This paper presents the technical aspects of RALSA—the overall design and structure of the package, its internal organization, and the structure of the analysis and data preparation functions. The use of the data.table package for memory efficiency, speed, and embedded computations is explained through examples. The central aspect of the paper is the utilization of code reuse practices to the achieve consistency, efficiency, and safety of the computations performed by the analysis functions of the package. The comprehensive output system to produce multi-sheet MS Excel workbooks is presented and its workflow explained. The paper also explains how the graphical user interface is constructed and how it is linked to the data preparation and analysis functions available in the package. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

36 pages, 502 KiB

Open AccessArticle

Evaluating the Observed Log-Likelihood Function in Two-Level Structural Equation Modeling with Missing Data: From Formulas to R Code

by Yves Rosseel

Psych 2021, 3(2), 197-232; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3020017 - 07 Jun 2021

Cited by 5 | Viewed by 3335

Abstract

This paper discusses maximum likelihood estimation for two-level structural equation models when data are missing at random at both levels. Building on existing literature, a computationally efficient expression is derived to evaluate the observed log-likelihood. Unlike previous work, the expression is valid for [...] Read more.

This paper discusses maximum likelihood estimation for two-level structural equation models when data are missing at random at both levels. Building on existing literature, a computationally efficient expression is derived to evaluate the observed log-likelihood. Unlike previous work, the expression is valid for the special case where the model implied variance–covariance matrix at the between level is singular. Next, the log-likelihood function is translated to R code. A sequence of R scripts is presented, starting from a naive implementation and ending at the final implementation as found in the lavaan package. Along the way, various computational tips and tricks are given. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

19 pages, 645 KiB

Open AccessArticle

Evaluating Cluster-Level Factor Models with lavaan and Mplus

by Suzanne Jak, Terrence D. Jorgensen and Yves Rosseel

Psych 2021, 3(2), 134-152; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3020012 - 31 May 2021

Cited by 9 | Viewed by 3944

Abstract

Background: Researchers frequently use the responses of individuals in clusters to measure cluster-level constructs. Examples are the use of student evaluations to measure teaching quality, or the use of employee ratings of organizational climate. In earlier research, Stapleton and Johnson (2019) provided [...] Read more.

Background: Researchers frequently use the responses of individuals in clusters to measure cluster-level constructs. Examples are the use of student evaluations to measure teaching quality, or the use of employee ratings of organizational climate. In earlier research, Stapleton and Johnson (2019) provided advice for measuring cluster-level constructs based on a simulation study with inadvertently confounded design factors. We extended their simulation study using both Mplus and lavaan to reveal how their conclusions were dependent on their study conditions. Methods: We generated data sets from the so-called configural model and the simultaneous shared-and-configural model, both with and without nonzero residual variances at the cluster level. We fitted models to these data sets using different maximum likelihood estimation algorithms. Results: Stapleton and Johnson’s results were highly contingent on their confounded design factors. Convergence rates could be very different across algorithms, depending on whether between-level residual variances were zero in the population or in the fitted model. We discovered a worrying convergence issue with the default settings in Mplus, resulting in seemingly converged solutions that are actually not. Rejection rates of the normal-theory test statistic were as expected, while rejection rates of the scaled test statistic were seriously inflated in several conditions. Conclusions: The defaults in Mplus carry specific risks that are easily checked but not well advertised. Our results also shine a different light on earlier advice on the use of measurement models for shared factors. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

21 pages, 611 KiB

Open AccessArticle

How to Estimate Absolute-Error Components in Structural Equation Models of Generalizability Theory

by Terrence D. Jorgensen

Psych 2021, 3(2), 113-133; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3020011 - 29 May 2021

Cited by 10 | Viewed by 3073

Abstract

Structural equation modeling (SEM) has been proposed to estimate generalizability theory (GT) variance components, primarily focusing on estimating relative error to calculate generalizability coefficients. Proposals for estimating absolute-error components have given the impression that a separate SEM must be fitted to a transposed [...] Read more.

Structural equation modeling (SEM) has been proposed to estimate generalizability theory (GT) variance components, primarily focusing on estimating relative error to calculate generalizability coefficients. Proposals for estimating absolute-error components have given the impression that a separate SEM must be fitted to a transposed data matrix. This paper uses real and simulated data to demonstrate how a single SEM can be specified to estimate absolute error (and thus dependability) by placing appropriate constraints on the mean structure, as well as thresholds (when used for ordinal measures). Using the R packages lavaan and gtheory, different estimators are compared for normal and discrete measurements. Limitations of SEM for GT are demonstrated using multirater data from a planned missing-data design, and an important remaining area for future development is discussed. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

17 pages, 412 KiB

Open AccessArticle

Automated Test Assembly in R: The eatATA Package

by Benjamin Becker, Dries Debeer, Karoline A. Sachse and Sebastian Weirich

Psych 2021, 3(2), 96-112; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3020010 - 21 May 2021

Cited by 5 | Viewed by 4061

Abstract

Combining items from an item pool into test forms (test assembly) is a frequent task in psychological and educational testing. Although efficient methods for automated test assembly exist, these are often unknown or unavailable to practitioners. In this paper we present the R [...] Read more.

Combining items from an item pool into test forms (test assembly) is a frequent task in psychological and educational testing. Although efficient methods for automated test assembly exist, these are often unknown or unavailable to practitioners. In this paper we present the R package eatATA, which allows using several mixed-integer programming solvers for automated test assembly in R. We describe the general functionality and the common work flow of eatATA using a minimal example. We also provide four more elaborate use cases of automated test assembly: (a) The assembly of multiple test forms for a pilot study; (b) the assembly of blocks of items for a multiple matrix booklet design in the context of a large-scale assessment; (c) the assembly of two linear test forms for individual diagnostic purposes; (d) the assembly of multi-stage testing modules for individual diagnostic purposes. All use cases are accompanied with example item pools and commented R code. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

44 pages, 1809 KiB

Open AccessArticle

Comparison of Recent Acceleration Techniques for the EM Algorithm in One- and Two-Parameter Logistic IRT Models

by Marie Beisemann, Ortrud Wartlick and Philipp Doebler

Psych 2020, 2(4), 209-252; https://0-doi-org.brum.beds.ac.uk/10.3390/psych2040018 - 10 Nov 2020

Cited by 3 | Viewed by 2394

Abstract

The expectation–maximization (EM) algorithm is an important numerical method for maximum likelihood estimation in incomplete data problems. However, convergence of the EM algorithm can be slow, and for this reason, many EM acceleration techniques have been proposed. After a review of acceleration techniques [...] Read more.

The expectation–maximization (EM) algorithm is an important numerical method for maximum likelihood estimation in incomplete data problems. However, convergence of the EM algorithm can be slow, and for this reason, many EM acceleration techniques have been proposed. After a review of acceleration techniques in a unified notation with illustrations, three recently proposed EM acceleration techniques are compared in detail: quasi-Newton methods (QN), “squared” iterative methods (SQUAREM), and parabolic EM (PEM). These acceleration techniques are applied to marginal maximum likelihood estimation with the EM algorithm in one- and two-parameter logistic item response theory (IRT) models for binary data, and their performance is compared. QN and SQUAREM methods accelerate convergence of the EM algorithm for the two-parameter logistic model significantly in high-dimensional data problems. Compared to the standard EM, all three methods reduce the number of iterations, but increase the number of total marginal log-likelihood evaluations per iteration. Efficient approximations of the marginal log-likelihood are hence an important part of implementation. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

Other

Jump to: Editorial, Research

32 pages, 511 KiB

Open AccessTutorial

Reproducible Research in R: A Tutorial on How to Do the Same Thing More Than Once

by Aaron Peikert, Caspar J. van Lissa and Andreas M. Brandmaier

Psych 2021, 3(4), 836-867; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3040053 - 09 Dec 2021

Cited by 9 | Viewed by 5216

Abstract

Computational reproducibility is the ability to obtain identical results from the same data with the same computer code. It is a building block for transparent and cumulative science because it enables the originator and other researchers, on other computers and later in time, [...] Read more.

Computational reproducibility is the ability to obtain identical results from the same data with the same computer code. It is a building block for transparent and cumulative science because it enables the originator and other researchers, on other computers and later in time, to reproduce and thus understand how results came about, while avoiding a variety of errors that may lead to erroneous reporting of statistical and computational results. In this tutorial, we demonstrate how the R package repro supports researchers in creating fully computationally reproducible research projects with tools from the software engineering community. Building upon this notion of fully automated reproducibility, we present several applications including the preregistration of research plans with code (Preregistration as Code, PAC). PAC eschews all ambiguity of traditional preregistration and offers several more advantages. Making technical advancements that serve reproducibility more widely accessible for researchers holds the potential to innovate the research process and to help it become more productive, credible, and reliable. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

14 pages, 6930 KiB

Open AccessTutorial

Tutorial on the Use of the regsem Package in R

by Xiaobei Li, Ross Jacobucci and Brooke A. Ammerman

Psych 2021, 3(4), 579-592; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3040038 - 05 Oct 2021

Cited by 7 | Viewed by 3190

Abstract

Sparse estimation through regularization is gaining popularity in psychological research. Such techniques penalize the complexity of the model and could perform variable/path selection in an automatic way, and thus are particularly useful in models that have small parameter-to-sample-size ratios. This paper gives a [...] Read more.

Sparse estimation through regularization is gaining popularity in psychological research. Such techniques penalize the complexity of the model and could perform variable/path selection in an automatic way, and thus are particularly useful in models that have small parameter-to-sample-size ratios. This paper gives a detailed tutorial of the R package regsem, which implements regularization for structural equation models. Example R code is also provided to highlight the key arguments of implementing regularized structural equation models in this package. The tutorial ends by discussing remedies of some known drawbacks of a popular type of regularization, computational methods supported by the package that can improve the selection result, and some other practical issues such as dealing with missing data and categorical variables. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

21 pages, 465 KiB

Open AccessTutorial

Analysis of Categorical Data with the R Package confreq

by Jörg-Henrik Heine and Mark Stemmler

Psych 2021, 3(3), 522-541; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3030034 - 07 Sep 2021

Cited by 2 | Viewed by 3116

Abstract

The person-centered approach in categorical data analysis is introduced as a complementary approach to the variable-centered approach. The former uses persons, animals, or objects on the basis of their combination of characteristics which can be displayed in multiway contingency tables. Configural Frequency Analysis [...] Read more.

The person-centered approach in categorical data analysis is introduced as a complementary approach to the variable-centered approach. The former uses persons, animals, or objects on the basis of their combination of characteristics which can be displayed in multiway contingency tables. Configural Frequency Analysis (CFA) and log-linear modeling (LLM) are the two most prominent (and related) statistical methods. Both compare observed frequencies (

f_{o_{i \dots k}}

) with expected frequencies (

f_{e_{i \dots k}}

). While LLM uses primarily a model-fitting approach, CFA analyzes residuals of non-fitting models. Residuals with significantly more observed than expected frequencies (

f_{o_{i \dots k}} > f_{e_{i \dots k}}

) are called types, while residuals with significantly less observed than expected frequencies (

f_{o_{i \dots k}} < f_{e_{i \dots k}}

) are called antitypes. The R package confreq is presented and its use is demonstrated with several data examples. Results of contingency table analyses can be displayed in tables but also in graphics representing the size and type of residual. The expected frequencies represent the null hypothesis and different null hypotheses result in different expected frequencies. Different kinds of CFAs are presented: the first-order CFA based on the null hypothesis of independence, CFA with covariates, and the two-sample CFA. The calculation of the expected frequencies can be controlled through the design matrix which can be easily handled in confreq. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

32 pages, 1412 KiB

Open AccessTutorial

Flexible Item Response Modeling in R with the flexmet Package

by Leah Feuerstahler

Psych 2021, 3(3), 447-478; https://0-doi-org.brum.beds.ac.uk/10.3390/psych3030031 - 16 Aug 2021

Cited by 6 | Viewed by 2040

Abstract

The filtered monotonic polynomial (FMP) model is a semi-parametric item response model that allows flexible response function shapes but also includes traditional item response models as special cases. The flexmet package for R facilitates the routine use of the FMP model in real [...] Read more.

The filtered monotonic polynomial (FMP) model is a semi-parametric item response model that allows flexible response function shapes but also includes traditional item response models as special cases. The flexmet package for R facilitates the routine use of the FMP model in real data analysis and simulation studies. This tutorial provides several code examples illustrating how the flexmet package may be used to simulate FMP model parameters and data (both for dichotomous and polytomously scored items), estimate FMP model parameters, transform traditional item response models to different metrics, and more. This tutorial serves as both an introduction to the unique features of the FMP model and as a practical guide to its implementation in R via the flexmet package. Full article

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Computational Aspects, Statistical Algorithms and Software in Psychometrics

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Published Papers (32 papers)

Editorial

Research

Other

Further Information

Guidelines

MDPI Initiatives

Follow MDPI