entropy-logo

Journal Browser

Journal Browser

Digital Natural Language Text Management in End-User Computing, and Methods for Improving Its Effectiveness

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Information Theory, Probability and Statistics".

Deadline for manuscript submissions: closed (15 July 2022) | Viewed by 5972

Special Issue Editor


E-Mail Website
Guest Editor
Department of Computer Science, Faculty of Informatics, University of Debrecen, 4032 Debrecen, Hungary
Interests: end-user document handling; developing computational thinking skills; didactics of informatics; computer-aided teaching; subject integration
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The entropy of digital texts connected to their correctness is hardly researched. It is mutually accepted that program codes—written in artificial languages—should be error-free, and developers use any available tools to lessen the errors of these texts. However, this is not the case in end-user computing and digital natural language texts of any forms. Considering spreadsheet management, errors originating in coding are covered to an extent. However, the data content is hardly analyzed, similar to the output of coding activities, included webpage development.

In general, end-user computing has flooded the world with erroneous natural language texts, primarily in the form of world-processed texts, presentations, and webpages (e-texts for short) and professional computing with erroneous texts in the form of program outputs, spreadsheet documents, and webpages. These e-texts are mainly created and modified due to the authors’ ignorance and carelessness, causing serious financial losses both in human and machine resources. Research has already revealed that these discrepancies originate in misconceptions circulating in education and the software industry, including both the creation and application of these pieces of software. Detecting errors is closely related to entropy in information theory, where erroneous texts are comparable to their properly edited version. The error rate is one measure which is in close connection to entropy. However, there are various concepts which are not or are hardly researched in natural language e-texts but can be expressed with entropy: the recognition and sources of errors, their frequencies, consequences (considering human and machine resources, teaching aspects, sunk cost fallacy, etc.), probability, the proportion of artificial and natural languages, the factor of surprise, how these parameters effect information flow, etc.

While publications have focused on the definition of the properly edited text and error classes, further research on entropy, information surprise, and validated measuring systems to detect the error rate of e-texts is missing, and we thus invite works focusing on these topics to advance this field.

Dr. Maria Csernoch
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • end-user computing
  • natural language e-text management
  • error recognition model
  • properly edited and formatted e-text
  • error rate
  • entropy
  • surprise
  • frequency

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

34 pages, 10979 KiB  
Article
The Entropy of Digital Texts—The Mathematical Background of Correctness
by Mária Csernoch, Keve Nagy and Tímea Nagy
Entropy 2023, 25(2), 302; https://0-doi-org.brum.beds.ac.uk/10.3390/e25020302 - 06 Feb 2023
Cited by 1 | Viewed by 1535
Abstract
Based on Shannon’s communication theory, in the present paper, we provide the theoretical background to finding an objective measurement—the text-entropy—that can describe the quality of digital natural language documents handled with word processors. The text-entropy can be calculated from the formatting, correction, and [...] Read more.
Based on Shannon’s communication theory, in the present paper, we provide the theoretical background to finding an objective measurement—the text-entropy—that can describe the quality of digital natural language documents handled with word processors. The text-entropy can be calculated from the formatting, correction, and modification entropy, and based on these values, we are able to tell how correct or how erroneous digital text-based documents are. To present how the theory can be applied to real-world texts, for the present study, three erroneous MS Word documents were selected. With these examples, we can demonstrate how to build their correcting, formatting, and modification algorithms, to calculate the time spent on modification and the entropy of the completed tasks, in both the original erroneous and the corrected documents. In general, it was found that using and modifying properly edited and formatted digital texts requires less or an equal number of knowledge-items. In information theory, it means that less data must be put on the communication channel than in the case of erroneous documents. The analysis also revealed that in the corrected documents not only the quantity of the data is less, but the quality of the data (knowledge pieces) is higher. As the consequence of these two findings, it is proven that the modification time of erroneous documents is severalfold of the correct ones, even in the case of minimal first level actions. It is also proven that to avoid the repetition of the time- and resource-consuming actions, we must correct the documents before their modification. Full article
Show Figures

Figure 1

16 pages, 4328 KiB  
Article
The Interpretation of Graphical Information in Word Processing
by Mária Csernoch, János Máth and Tímea Nagy
Entropy 2022, 24(10), 1492; https://0-doi-org.brum.beds.ac.uk/10.3390/e24101492 - 19 Oct 2022
Cited by 2 | Viewed by 1259
Abstract
Word processing is one of the most popular digital activities. Despite its popularity, it is haunted by false assumptions, misconceptions, and ineffective and inefficient practices leading to erroneous digital text-based documents. The focus of the present paper is automated numbering and distinguishing between [...] Read more.
Word processing is one of the most popular digital activities. Despite its popularity, it is haunted by false assumptions, misconceptions, and ineffective and inefficient practices leading to erroneous digital text-based documents. The focus of the present paper is automated numbering and distinguishing between manual and automated numbering. In general, one bit of information on the GUI—the position of the cursor—is enough to tell whether a numbering is manual or automated. To decide how much information must be put on the channel—the teaching–learning process—in order to reach end-users, we designed and implemented a method that includes the analysis of teaching, learning, tutorial, and testing sources, the collection and analysis of Word documents shared on the internet or in closed groups, the testing of grade 7–10 students’ knowledge in automated numbering, and calculating the entropy of automated numbering. The combination of the test results and the semantics of the automated numbering was used to measure the entropy of automated numbering. It was found that to transfer one bit of information on the GUI, at least three bits of information must be transferred during the teaching–learning process. Furthermore, it was revealed that the information connected to numbering is not the pure use of tools, but the semantics of this feature put into a real-world context. Full article
Show Figures

Figure 1

17 pages, 631 KiB  
Article
Deep Multilabel Multilingual Document Learning for Cross-Lingual Document Retrieval
by Kai Feng, Lan Huang, Hao Xu, Kangping Wang, Wei Wei and Rui Zhang
Entropy 2022, 24(7), 943; https://0-doi-org.brum.beds.ac.uk/10.3390/e24070943 - 07 Jul 2022
Viewed by 1185
Abstract
Cross-lingual document retrieval, which aims to take a query in one language to retrieve relevant documents in another, has attracted strong research interest in the last decades. Most studies on this task start with cross-lingual comparisons at the word level and then represent [...] Read more.
Cross-lingual document retrieval, which aims to take a query in one language to retrieve relevant documents in another, has attracted strong research interest in the last decades. Most studies on this task start with cross-lingual comparisons at the word level and then represent documents via word embeddings, which leads to insufficient structure information. In this work, the cross-lingual comparison at the document level is achieved through the cross-lingual semantic space. Our method, MDL (deep multilabel multilingual document learning), leverages a six-layer fully connected network to project cross-lingual documents into a shared semantic space. The semantic distances can be calculated when the cross-lingual documents are transformed into embeddings in semantic space. The supervision signals are automatically extracted from the data and then used to construct the semantic space via a linear classifier. The ambiguity of manual labels could be avoided and the multilabel supervision signals can be acquired instead of a single label. The representation of the semantic space is enriched by multilabel supervision signals, which improves the discriminative ability of the embeddings. The MDL is easy to extend to other fields since it does not depend on specific data. Furthermore, MDL is more efficient than the models training all languages jointly, since each language is trained individually. Experiments on Wikipedia data showed that the proposed method outperforms the state-of-the-art cross-lingual document retrieval methods. Full article
Show Figures

Figure 1

10 pages, 242 KiB  
Article
Using Data Compression to Build a Method for Statistically Verified Attribution of Literary Texts
by Boris Ryabko and Nadezhda Savina
Entropy 2021, 23(10), 1302; https://0-doi-org.brum.beds.ac.uk/10.3390/e23101302 - 03 Oct 2021
Cited by 2 | Viewed by 1287
Abstract
We consider the problems of the authorship of literary texts in the framework of the quantitative study of literature. This article proposes a methodology for authorship attribution of literary texts based on the use of data compressors. Unlike other methods, the suggested one [...] Read more.
We consider the problems of the authorship of literary texts in the framework of the quantitative study of literature. This article proposes a methodology for authorship attribution of literary texts based on the use of data compressors. Unlike other methods, the suggested one gives a possibility to make statistically verified results. This method is used to solve two problems of attribution in Russian literature. Full article
Back to TopTop