Principles and Applications of Data Science

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 April 2022) | Viewed by 21425

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editor


E-Mail Website
Guest Editor
National Taipei University of Technology (Taipei Tech), Taiwan
Interests: big data management and processing; uncertain data management; data science; spatial data processing; data streams; ad-hoc and sensor networks; location-based services
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Data science is an emerging multidisciplinary field which lies at the intersection of computer science, statistics, and mathematics, with different applications and related to data mining, deep learning, and big data. This Special Issue focuses on the latest development on the theories, techniques, and applications of data science, and authors are invited to submit unpublished and original works. Potential topics include, but are not limited to:

  • Data Cleansing
  • Data Analytics
  • Data Mining
  • Machine Learning
  • Deep Learning
  • Data Engineering
  • Big Data Management and Processing
  • Uncertain Data Management and Processing
  • Streaming Data Management and Processing
  • Spatial–Temporal Data Management and Processing
  • Data Science of Internet of Things (IOTs)
  • Data Science of Medical Applications and Healthcare

Prof. Chuan-Ming Liu
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 3139 KiB  
Article
Internet of Things-Driven Data Mining for Smart Crop Production Prediction in the Peasant Farming Domain
by Luis Omar Colombo-Mendoza, Mario Andrés Paredes-Valverde, María del Pilar Salas-Zárate and Rafael Valencia-García
Appl. Sci. 2022, 12(4), 1940; https://0-doi-org.brum.beds.ac.uk/10.3390/app12041940 - 12 Feb 2022
Cited by 20 | Viewed by 3098
Abstract
Internet of Things (IoT) technologies can greatly benefit from machine-learning techniques and artificial neural networks for data mining and vice versa. In the agricultural field, this convergence could result in the development of smart farming systems suitable for use as decision support systems [...] Read more.
Internet of Things (IoT) technologies can greatly benefit from machine-learning techniques and artificial neural networks for data mining and vice versa. In the agricultural field, this convergence could result in the development of smart farming systems suitable for use as decision support systems by peasant farmers. This work presents the design of a smart farming system for crop production, which is based on low-cost IoT sensors and popular data storage services and data analytics services on the cloud. Moreover, a new data-mining method exploiting climate data along with crop-production data is proposed for the prediction of production volume from heterogeneous data sources. This method was initially validated using traditional machine-learning techniques and open historical data of the northeast region of the state of Puebla, Mexico, which were collected from data sources from the National Water Commission and the Agri-food Information Service of the Mexican Government. Full article
(This article belongs to the Special Issue Principles and Applications of Data Science)
Show Figures

Figure 1

15 pages, 4352 KiB  
Article
Combination of Transfer Learning Methods for Kidney Glomeruli Image Classification
by Hsi-Chieh Lee and Ahmad Fauzan Aqil
Appl. Sci. 2022, 12(3), 1040; https://0-doi-org.brum.beds.ac.uk/10.3390/app12031040 - 20 Jan 2022
Cited by 8 | Viewed by 1697
Abstract
The rising global incidence of chronic kidney disease necessitates the development of image categorization of renal glomeruli. COVID-19 has been shown to enter the glomerulus, a tissue structure in the kidney. This study observes the differences between focal-segmental, normal and sclerotic renal glomerular [...] Read more.
The rising global incidence of chronic kidney disease necessitates the development of image categorization of renal glomeruli. COVID-19 has been shown to enter the glomerulus, a tissue structure in the kidney. This study observes the differences between focal-segmental, normal and sclerotic renal glomerular tissue diseases. The splitting and combining of allied and multivariate models was accomplished utilizing a combined technique using existing models. In this study, model combinations are created by using a high-accuracy accuracy-based model to improve other models. This research exhibits excellent accuracy and consistent classification results on the ResNet101V2 combination using a mix of transfer learning methods, with the combined model on ResNet101V2 showing an accuracy of up to 97 percent with an F1-score of 0.97, compared to other models. However, this study discovered that the anticipated time required was higher than the model employed in general, which was mitigated by the usage of high-performance computing in this study. Full article
(This article belongs to the Special Issue Principles and Applications of Data Science)
Show Figures

Figure 1

14 pages, 51158 KiB  
Article
Deep Learning-Based Water Crystal Classification
by Hien Doan Thi, Frederic Andres, Long Tran Quoc, Hiro Emoto, Michiko Hayashi, Ken Katsumata and Takayuki Oshide
Appl. Sci. 2022, 12(2), 825; https://0-doi-org.brum.beds.ac.uk/10.3390/app12020825 - 14 Jan 2022
Cited by 3 | Viewed by 2877
Abstract
Much of the earth’s surface is covered by water. As was pointed out in the 2020 edition of the World Water Development Report, climate change challenges the sustainability of global water resources, so it is important to monitor the quality of water to [...] Read more.
Much of the earth’s surface is covered by water. As was pointed out in the 2020 edition of the World Water Development Report, climate change challenges the sustainability of global water resources, so it is important to monitor the quality of water to preserve sustainable water resources. Quality of water can be related to the structure of water crystal, the solid-state of water, so methods to understand water crystals can help to improve water quality. As a first step, a water crystal exploratory analysis has been initiated with the cooperation with the Emoto Peace Project (EPP). The 5K EPP dataset has been created as the first world-wide small dataset of water crystals. Our research focused on reducing the inherent limitations when fitting machine learning models to the 5K EPP dataset. One major result is the classification of water crystals and how to split our small dataset into several related groups. Using the 5K EPP dataset of human observations and past research on snow crystal classification, we created a simple set of visual labels to identify water crystal shapes, in 13 categories. A deep learning-based method has been used to automatically do the classification task with a subset of the label dataset. The classification achieved high accuracy when using a fine-tuning technique. Full article
(This article belongs to the Special Issue Principles and Applications of Data Science)
Show Figures

Figure 1

18 pages, 368 KiB  
Article
Data Mining of Students’ Consumption Behaviour Pattern Based on Self-Attention Graph Neural Network
by Fangyao Xu and Shaojie Qu
Appl. Sci. 2021, 11(22), 10784; https://0-doi-org.brum.beds.ac.uk/10.3390/app112210784 - 15 Nov 2021
Viewed by 1811
Abstract
Performance prediction is of significant importance. Previous mining of behaviour data was limited to machine learning models. Corresponding research has not made good use of the information of spatial location changes over time, in addition to discriminative students’ behavioural patterns and tendentious behaviour. [...] Read more.
Performance prediction is of significant importance. Previous mining of behaviour data was limited to machine learning models. Corresponding research has not made good use of the information of spatial location changes over time, in addition to discriminative students’ behavioural patterns and tendentious behaviour. Thus, we establish students’ behaviour networks, combine temporal and spatial information to mine behavioural patterns of academic performance discrimination, and predict student’s performance. Firstly, we put forward some principles to build graphs with a topological structure based on consumption data; secondly, we propose an improved self-attention mechanism model; thirdly, we perform classification tasks related to academic performance, and determine discriminative learning and life behaviour sequence patterns. Results showed that the accuracy of the two-category classification reached 84.86% and that of the three-category classification reached 79.43%. In addition, students with good academic performance were observed to study in the classroom or library after dinner and lunch. Apart from returning to the dormitory in the evening, they tended to stay focused in the library and other learning venues during the day. Lastly, different nodes have different contributions to the prediction, thereby providing an approach for feature selection. Our research findings provide a method to grasp students’ campus traces. Full article
(This article belongs to the Special Issue Principles and Applications of Data Science)
Show Figures

Figure 1

16 pages, 1660 KiB  
Article
Neuro-Fuzzy Transformation with Minimize Entropy Principle to Create New Features for Particulate Matter Prediction
by Krittakom Srijiranon and Narissara Eiamkanitchat
Appl. Sci. 2021, 11(14), 6590; https://0-doi-org.brum.beds.ac.uk/10.3390/app11146590 - 17 Jul 2021
Cited by 1 | Viewed by 1601
Abstract
Air pollution is a major global issue. In Thailand, this issue continues to increase every year, similar to other countries, especially during the dry season in the northern region. In this period, particulate matter with aerodynamic diameters smaller than 10 and 2.5 micrometers, [...] Read more.
Air pollution is a major global issue. In Thailand, this issue continues to increase every year, similar to other countries, especially during the dry season in the northern region. In this period, particulate matter with aerodynamic diameters smaller than 10 and 2.5 micrometers, known as PM10 and PM2.5, are important pollutants, most of which exceed the national standard levels, the so-called Thailand air quality index (T-AQI). Therefore, this study created a prediction model to classify T-AQI calculated from both types of PM. The neuro-fuzzy model with a minimum entropy principle model is proposed to transform the original data into new informative features. The processes in this model are able to discover appropriate separation points of the trapezoidal membership function by applying the minimum entropy principle. The membership value of the fuzzy section is then passed to the neural section to create a new data feature, the PM level, for each hour of the day. Finally, as an analytical process to obtain new knowledge, predictive models are created using new data features for better classification results. Various experiments were utilized to find an appropriate structure with high prediction accuracy. The results of the proposed model were favorable for predicting both types of PM up to three hours in advance. The proposed model can help people who are planning short-term outdoor activities. Full article
(This article belongs to the Special Issue Principles and Applications of Data Science)
Show Figures

Figure 1

22 pages, 7252 KiB  
Article
A Data Driven Approach for Raw Material Terminology
by Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić and Ljiljana Kolonja
Appl. Sci. 2021, 11(7), 2892; https://0-doi-org.brum.beds.ac.uk/10.3390/app11072892 - 24 Mar 2021
Cited by 2 | Viewed by 2299
Abstract
The research presented in this paper aims at creating a bilingual (sr-en), easily searchable, hypertext, born-digital, corpus-based terminological database of raw material terminology for dictionary production. The approach is based on linking dictionaries related to the raw material domain, both digitally born and [...] Read more.
The research presented in this paper aims at creating a bilingual (sr-en), easily searchable, hypertext, born-digital, corpus-based terminological database of raw material terminology for dictionary production. The approach is based on linking dictionaries related to the raw material domain, both digitally born and printed, into a lexicon structure, aligning terminology from different dictionaries as much as possible. This paper presents the main features of this approach, data used for compilation of the terminological database, the procedure by which it has been generated and a mobile application for its use. Available (terminological) resources will be presented—paper dictionaries and digital resources related to the raw material domain, as well as general lexica morphological dictionaries. Resource preparation started with dictionary (retro)digitisation and corpora enlargement, followed by adding new Serbian terms to general lexica dictionaries, as well as adding bilingual terms. Dictionary development is relying on corpus analysis, details of which are also presented. Usage examples, collocations and concordances play an important role in raw material terminology, and have also been included in this research. Some important related issues discussed are collocation extraction methods, the use of domain labels, lexical and semantic relations, definitions and subentries. Full article
(This article belongs to the Special Issue Principles and Applications of Data Science)
Show Figures

Figure 1

20 pages, 919 KiB  
Article
A Provable and Secure Patient Electronic Health Record Fair Exchange Scheme for Health Information Systems
by Ming-Te Chen and Tsung-Hung Lin
Appl. Sci. 2021, 11(5), 2401; https://0-doi-org.brum.beds.ac.uk/10.3390/app11052401 - 08 Mar 2021
Cited by 7 | Viewed by 1630
Abstract
In recent years, several hospitals have begun using health information systems to maintain electronic health records (EHRs) for each patient. Traditionally, when a patient visits a new hospital for the first time, the hospital’s help desk asks them to fill in relevant personal [...] Read more.
In recent years, several hospitals have begun using health information systems to maintain electronic health records (EHRs) for each patient. Traditionally, when a patient visits a new hospital for the first time, the hospital’s help desk asks them to fill in relevant personal information on a piece of paper and verifies their identity on the spot. This patient will find that many of her personal electronic records are in many hospital’s health information systems that she visited in the past, and each EHR in these hospital’s information systems cannot be accessed or shared between these hospitals. This is inconvenient because this patient will again have to provide their personal information. This is time-consuming and not practical. Therefore, in this paper, we propose a practical and provable patient EHR fair exchange scheme for each patient. In this scheme, each patient can securely delegate the information system of a current hospital to a hospital certification authority (HCA) to apply migration evidence that can be used to transfer their EHR to another hospital. The delegated system can also establish a session key with other hospital systems for later data transmission, and each patient can protect their anonymity with the help of the HCA. Additionally, we also provide formal security proofs for forward secrecy and functional comparisons with other schemes. Full article
(This article belongs to the Special Issue Principles and Applications of Data Science)
Show Figures

Figure 1

13 pages, 5035 KiB  
Article
Improvement in the Convolutional Neural Network for Computed Tomography Images
by Keisuke Manabe, Yusuke Asami, Tomonari Yamada and Hiroyuki Sugimori
Appl. Sci. 2021, 11(4), 1505; https://0-doi-org.brum.beds.ac.uk/10.3390/app11041505 - 07 Feb 2021
Cited by 11 | Viewed by 2227
Abstract
Background and purpose. This study evaluated a modified specialized convolutional neural network (CNN) to improve the accuracy of medical images. Materials and Methods. We defined computed tomography (CT) images as belonging to one of the following 10 classes: head, neck, chest, abdomen, and [...] Read more.
Background and purpose. This study evaluated a modified specialized convolutional neural network (CNN) to improve the accuracy of medical images. Materials and Methods. We defined computed tomography (CT) images as belonging to one of the following 10 classes: head, neck, chest, abdomen, and pelvis with and without contrast media, with 10,000 images per class. We modified the CNN based on the AlexNet with an input size of 512 × 512. We resized the filter sizes of the convolution layer and max pooling. Using these modified CNNs, various models were created and evaluated. The improved CNN was evaluated to classify the presence or absence of the pancreas in the CT images. We compared the overall accuracy, which was calculated from images not used for training, to that of the ResNet. Results. The overall accuracies of the most improved CNN and ResNet in the 10 classes were 94.8% and 89.3%, respectively. The filter sizes of the improved CNN for the convolution layer were (13, 13), (7, 7), (5, 5), (5, 5), and (5, 5) in order from the first layer, and that of max-pooling was (7, 7). The calculation times of the most improved CNN and ResNet were 56 and 120 min, respectively. Regarding the classification of the pancreas, the overall accuracies of the most improved CNN and ResNet were 75.75% and 58.25%, respectively. The calculation times of the most improved CNN and ResNet were 36 and 55 min, respectively. Conclusion. By optimizing the filter size of the convolution layer and max-pooling of 512 × 512 images, we quickly obtained a highly accurate medical image classification model. This improved CNN can be useful for classifying lesions and anatomies for related diagnostic aid applications. Full article
(This article belongs to the Special Issue Principles and Applications of Data Science)
Show Figures

Figure 1

17 pages, 3018 KiB  
Article
Predicting Implicit User Preferences with Multimodal Feature Fusion for Similar User Recommendation in Social Media
by Jenq-Haur Wang, Yen-Tsang Wu and Long Wang
Appl. Sci. 2021, 11(3), 1064; https://0-doi-org.brum.beds.ac.uk/10.3390/app11031064 - 25 Jan 2021
Cited by 4 | Viewed by 2247
Abstract
In social networks, users can easily share information and express their opinions. Given the huge amount of data posted by many users, it is difficult to search for relevant information. In addition to individual posts, it would be useful if we can recommend [...] Read more.
In social networks, users can easily share information and express their opinions. Given the huge amount of data posted by many users, it is difficult to search for relevant information. In addition to individual posts, it would be useful if we can recommend groups of people with similar interests. Past studies on user preference learning focused on single-modal features such as review contents or demographic information of users. However, such information is usually not easy to obtain in most social media without explicit user feedback. In this paper, we propose a multimodal feature fusion approach to implicit user preference prediction which combines text and image features from user posts for recommending similar users in social media. First, we use the convolutional neural network (CNN) and TextCNN models to extract image and text features, respectively. Then, these features are combined using early and late fusion methods as a representation of user preferences. Lastly, a list of users with the most similar preferences are recommended. The experimental results on real-world Instagram data show that the best performance can be achieved when we apply late fusion of individual classification results for images and texts, with the best average top-k accuracy of 0.491. This validates the effectiveness of utilizing deep learning methods for fusing multimodal features to represent social user preferences. Further investigation is needed to verify the performance in different types of social media. Full article
(This article belongs to the Special Issue Principles and Applications of Data Science)
Show Figures

Figure 1

Back to TopTop