Next Article in Journal
A Machine Learning-Based Multiple Imputation Method for the Health and Aging Brain Study–Health Disparities
Next Article in Special Issue
Systematic Review of English/Arabic Machine Translation Postediting: Implications for AI Application in Translation Research and Pedagogy
Previous Article in Journal
Conceptualization and Survey Instrument Development for Website Usability
Previous Article in Special Issue
Detection of Abnormal Patterns in Children’s Handwriting by Using an Artificial-Intelligence-Based Method
 
 
Article
Peer-Review Record

Analyzing Indo-European Language Similarities Using Document Vectors

by Samuel R. Schrader and Eren Gultepe *
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3:
Submission received: 9 August 2023 / Revised: 13 September 2023 / Accepted: 23 September 2023 / Published: 26 September 2023
(This article belongs to the Special Issue Digital Humanities and Visualization)

Round 1

Reviewer 1 Report

This research reconstructs phylogenetic trees of Indo-European languages using document vectors. The authors did an extensive and meticulous work. Overall, I find it delightful to read this article. I recommend acceptance in its present form. 

 

Minor point:

I did not understand how to compute Normalised Mutual Information. How are I(X,Y) and H(X) calculated? The authors are advised to provide more details on it.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

the authors analyzed methods for building phylogenetic trees and

clustering languages against the Indo-European language without the use of language-specific information.

The author has explained the research procedure well and it is easy to understand. This research is interesting, but only a few communities can use it. The authors need to discuss the practical impact that can be developed from this research in the discussion section.

The author does not acknowledge the latest studies (2020-2022). Besides that, several references need to be updated, such as papers on the preprint list that need to be checked to see if they have been published.

The author mentions that using A version of the Indo-European phylogenetic tree from Serva and Petroni [5] is used as the ground truth. There needs to be an explanation regarding this version and what treatment and quality checks were carried out so that it is suitable for use as the ground truth. Besides that, it doesn't explain training and test data to perform performance tests (Precision and Recall).

Authors must show limitations and threats to validity on several aspects of this study, such as the use of the bible (and its translation) and selection of ground truth data.

The author has not written the conclusion section.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Based on the 22 language translations of the Bible, this article calculates the similarity between languages through document vectors, thereby achieving language families.

This method provides a good inspiration and has certain reference value for subsequent language family calculations

But there are still some issues that need to be addressed in the paper.

(1) Why not use the word2vec+bilstm approach to learn verse embeddings and instead directly use doc2vec? Because with limited training data, it is difficult for doc2vec to learn good representations. Especially many verses may only appear once, making it difficult to learn effective representations.

(2) Language families is usually calculated based on homologous words, and this article uses the similarity represented by chapter. Is this suitable for calculating language families?

 

(3) What are the three methods used in the paper applicable to?

No

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop