entropy-logo

Journal Browser

Journal Browser

Coexistence of Complexity Metrics and Machine-Learning Approaches for Understanding Complex Biological Phenomena

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Entropy and Biology".

Deadline for manuscript submissions: closed (31 December 2023) | Viewed by 9725

Special Issue Editors


E-Mail Website
Guest Editor
Department of Environmental Engineering, Democritus University of Thrace, 671 00 Xanthi, Greece
Interests: complexity; nonlinear systems; Tsallis non-extensive statistics; machine learning; coding DNA; non-coding DNA; biological complexity; complexity metrics; phase space

E-Mail Website
Guest Editor
Department of Pathology and Laboratory Medicine, The Children’s Hospital of Philadelphia and Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
Interests: functional–structural aspects of histocompatibility molecules

Special Issue Information

Dear Colleagues,

The dynamics of complex systems and the ways in which they influence a number of biological processes are one of the most interesting physical problems through which current developments in the independent fields of physics and biology/genomics can be brought together and that they can attempt to address more effectively. These dynamics include the hierarchy of complex and self-organized phenomena such as intermittent turbulence, fractal structures, long-range correlations, far-from-equilibrium phase transitions, anomalous diffusion–dissipation and strange kinetics, the reduction of dimensionality in phase space etc. At equilibrium, the dynamical attractive phase space is practically infinitely dimensional, as the system state evolves in all dimensions according to the famous ergodic theorem of Boltzmann–Gibbs statistics. Far from equilibrium, the statistics of the dynamics follow the q-Gaussian generalization of the B–G statistics or other more generalized statistics. In Tsallis q-statistics, even for the case of q = 1 (corresponding to the Gaussian process), the non-extensive character permits the development of long-range correlations produced by equilibrium phase-transition multi-scale processes.

Many scientists have used complexity metrics such as generalized entropies, multifractal analysis, q-triplet of Tsallis statistics, complex networks, fractal dimension etc. to understand the complex behaviour of complex phenomena in biology/genomics. The projection of the dynamics to the statistics in the phase space develops a complete picture that can be integrated to the variations of the complexity metrics. This picture of dynamics can be identified from machine-learning tools for clustering, classification and prediction. The merging of complexity theory and machine-learning approaches can provide semantic results enabling a deeper understanding and promotion of the fundamental laws of complex biological phenomena.

This Special Issue emphasizes the merging of the complexity metrics and the machine-learning approaches, hoping to attain a deeper understanding of complex biological phenomena. The analysis and study of complex biological phenomena based on the aforementioned statistical approaches fall within the scope of this Special Issue.

Dr. Leonidas P. Karakatsanis

Prof. Dr. Dimitrios S. Monos
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • complexity metrics
  • generalized entropies
  • Tsallis q-triplet
  • Tsallis entropy
  • machine learning
  • phase space
  • biological complexity
  • coding DNA
  • non-coding DNA
  • genomics
  • evolutional biology

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

17 pages, 2500 KiB  
Article
Design of DNA Storage Coding with Enhanced Constraints
by Xiangjun Li, Shihua Zhou and Lewang Zou
Entropy 2022, 24(8), 1151; https://0-doi-org.brum.beds.ac.uk/10.3390/e24081151 - 19 Aug 2022
Cited by 4 | Viewed by 2074
Abstract
Traditional storage media have been gradually unable to meet the needs of data storage around the world, and one solution to this problem is DNA storage. However, it is easy to make errors in the subsequent sequencing reading process of DNA storage coding. [...] Read more.
Traditional storage media have been gradually unable to meet the needs of data storage around the world, and one solution to this problem is DNA storage. However, it is easy to make errors in the subsequent sequencing reading process of DNA storage coding. To reduces error rates, a method to enhance the robustness of the DNA storage coding set is proposed. Firstly, to reduce the likelihood of secondary structure in DNA coding sets, a repeat tandem sequence constraint is proposed. An improved DTW distance constraint is proposed to address the issue that the traditional distance constraint cannot accurately evaluate non-specific hybridization between DNA sequences. Secondly, an algorithm that combines random opposition-based learning and eddy jump strategy with Aquila Optimizer (AO) is proposed in this paper, which is called ROEAO. Finally, the ROEAO algorithm is used to construct the coding sets with traditional constraints and enhanced constraints, respectively. The quality of the two coding sets is evaluated by the test of the number of issuing card structures and the temperature stability of melting; the data show that the coding set constructed with ROEAO under enhanced constraints can obtain a larger lower bound while improving the coding quality. Full article
Show Figures

Figure 1

16 pages, 2420 KiB  
Article
A Novel Method for Colorectal Cancer Screening Based on Circulating Tumor Cells and Machine Learning
by Eleana Hatzidaki, Aggelos Iliopoulos and Ioannis Papasotiriou
Entropy 2021, 23(10), 1248; https://0-doi-org.brum.beds.ac.uk/10.3390/e23101248 - 25 Sep 2021
Cited by 4 | Viewed by 2603
Abstract
Colorectal cancer is one of the most common types of cancer, and it can have a high mortality rate if left untreated or undiagnosed. The fact that CRC becomes symptomatic at advanced stages highlights the importance of early screening. The reference screening method [...] Read more.
Colorectal cancer is one of the most common types of cancer, and it can have a high mortality rate if left untreated or undiagnosed. The fact that CRC becomes symptomatic at advanced stages highlights the importance of early screening. The reference screening method for CRC is colonoscopy, an invasive, time-consuming procedure that requires sedation or anesthesia and is recommended from a certain age and above. The aim of this study was to build a machine learning classifier that can distinguish cancer from non-cancer samples. For this, circulating tumor cells were enumerated using flow cytometry. Their numbers were used as a training set for building an optimized SVM classifier that was subsequently used on a blind set. The SVM classifier’s accuracy on the blind samples was found to be 90.0%, sensitivity was 80.0%, specificity was 100.0%, precision was 100.0% and AUC was 0.98. Finally, in order to test the generalizability of our method, we also compared the performances of different classifiers developed by various machine learning models, using over-sampling datasets generated by the SMOTE algorithm. The results showed that SVM achieved the best performances according to the validation accuracy metric. Overall, our results demonstrate that CTCs enumerated by flow cytometry can provide significant information, which can be used in machine learning algorithms to successfully discriminate between healthy and colorectal cancer patients. The clinical significance of this method could be the development of a simple, fast, non-invasive cancer screening tool based on blood CTC enumeration by flow cytometry and machine learning algorithms. Full article
Show Figures

Figure 1

20 pages, 12806 KiB  
Article
Semicovariance Coefficient Analysis of Spike Proteins from SARS-CoV-2 and Other Coronaviruses for Viral Evolution and Characteristics Associated with Fatality
by Jun Steed Huang, Jiamin Moran Huang and Wandong Zhang
Entropy 2021, 23(5), 512; https://0-doi-org.brum.beds.ac.uk/10.3390/e23050512 - 23 Apr 2021
Cited by 1 | Viewed by 2075
Abstract
Complex modeling has received significant attention in recent years and is increasingly used to explain statistical phenomena with increasing and decreasing fluctuations, such as the similarity or difference of spike protein charge patterns of coronaviruses. Different from the existing covariance or correlation coefficient [...] Read more.
Complex modeling has received significant attention in recent years and is increasingly used to explain statistical phenomena with increasing and decreasing fluctuations, such as the similarity or difference of spike protein charge patterns of coronaviruses. Different from the existing covariance or correlation coefficient methods in traditional integer dimension construction, this study proposes a simplified novel fractional dimension derivation with the exact Excel tool algorithm. It involves the fractional center moment extension to covariance, which results in a complex covariance coefficient that is better than the Pearson correlation coefficient, in the sense that the nonlinearity relationship can be further depicted. The spike protein sequences of coronaviruses were obtained from the GenBank and GISAID databases, including the coronaviruses from pangolin, bat, canine, swine (three variants), feline, tiger, SARS-CoV-1, MERS, and SARS-CoV-2 (including the strains from Wuhan, Beijing, New York, German, and the UK variant B.1.1.7) which were used as the representative examples in this study. By examining the values above and below the average/mean based on the positive and negative charge patterns of the amino acid residues of the spike proteins from coronaviruses, the proposed algorithm provides deep insights into the nonlinear evolving trends of spike proteins for understanding the viral evolution and identifying the protein characteristics associated with viral fatality. The calculation results demonstrate that the complex covariance coefficient analyzed by this algorithm is capable of distinguishing the subtle nonlinear differences in the spike protein charge patterns with reference to Wuhan strain SARS-CoV-2, which the Pearson correlation coefficient may overlook. Our analysis reveals the unique convergent (positive correlative) to divergent (negative correlative) domain center positions of each virus. The convergent or conserved region may be critical to the viral stability or viability; while the divergent region is highly variable between coronaviruses, suggesting high frequency of mutations in this region. The analyses show that the conserved center region of SARS-CoV-1 spike protein is located at amino acid residues 900, but shifted to the amino acid residues 700 in MERS spike protein, and then to amino acid residues 600 in SARS-COV-2 spike protein, indicating the evolution of the coronaviruses. Interestingly, the conserved center region of the spike protein in SARS-COV-2 variant B.1.1.7 shifted back to amino acid residues 700, suggesting this variant is more virulent than the original SARS-COV-2 strain. Another important characteristic our study reveals is that the distance between the divergent mean and the maximal divergent point in each of the viruses (MERS > SARS-CoV-1 > SARS-CoV-2) is proportional to viral fatality rate. This algorithm may help to understand and analyze the evolving trends and critical characteristics of SARS-COV-2 variants, other coronaviral proteins and viruses. Full article
Show Figures

Figure 1

Review

Jump to: Research

29 pages, 916 KiB  
Review
Entropy and Fractal Techniques for Monitoring Fish Behaviour and Welfare in Aquacultural Precision Fish Farming—A Review
by Harkaitz Eguiraun and Iciar Martinez
Entropy 2023, 25(4), 559; https://0-doi-org.brum.beds.ac.uk/10.3390/e25040559 - 24 Mar 2023
Cited by 3 | Viewed by 1863
Abstract
In a non-linear system, such as a biological system, the change of the output (e.g., behaviour) is not proportional to the change of the input (e.g., exposure to stressors). In addition, biological systems also change over time, i.e., they are dynamic. Non-linear dynamical [...] Read more.
In a non-linear system, such as a biological system, the change of the output (e.g., behaviour) is not proportional to the change of the input (e.g., exposure to stressors). In addition, biological systems also change over time, i.e., they are dynamic. Non-linear dynamical analyses of biological systems have revealed hidden structures and patterns of behaviour that are not discernible by classical methods. Entropy analyses can quantify their degree of predictability and the directionality of individual interactions, while fractal dimension (FD) analyses can expose patterns of behaviour within apparently random ones. The incorporation of these techniques into the architecture of precision fish farming (PFF) and intelligent aquaculture (IA) is becoming increasingly necessary to understand and predict the evolution of the status of farmed fish. This review summarizes recent works on the application of entropy and FD techniques to selected individual and collective fish behaviours influenced by the number of fish, tagging, pain, preying/feed search, fear/anxiety (and its modulation) and positive emotional contagion (the social contagion of positive emotions). Furthermore, it presents an investigation of collective and individual interactions in shoals, an exposure of the dynamics of inter-individual relationships and hierarchies, and the identification of individuals in groups. While most of the works have been carried out using model species, we believe that they have clear applications in PFF. The review ends by describing some of the major challenges in the field, two of which are, unsurprisingly, the acquisition of high-quality, reliable raw data and the construction of large, reliable databases of non-linear behavioural data for different species and farming conditions. Full article
Show Figures

Figure 1

Back to TopTop