Research

19 pages, 3139 KiB

Open AccessArticle

Internet of Things-Driven Data Mining for Smart Crop Production Prediction in the Peasant Farming Domain

by Luis Omar Colombo-Mendoza, Mario Andrés Paredes-Valverde, María del Pilar Salas-Zárate and Rafael Valencia-García

Appl. Sci. 2022, 12(4), 1940; https://0-doi-org.brum.beds.ac.uk/10.3390/app12041940 - 12 Feb 2022

Cited by 20 | Viewed by 3098

Abstract

Internet of Things (IoT) technologies can greatly benefit from machine-learning techniques and artificial neural networks for data mining and vice versa. In the agricultural field, this convergence could result in the development of smart farming systems suitable for use as decision support systems [...] Read more.

Internet of Things (IoT) technologies can greatly benefit from machine-learning techniques and artificial neural networks for data mining and vice versa. In the agricultural field, this convergence could result in the development of smart farming systems suitable for use as decision support systems by peasant farmers. This work presents the design of a smart farming system for crop production, which is based on low-cost IoT sensors and popular data storage services and data analytics services on the cloud. Moreover, a new data-mining method exploiting climate data along with crop-production data is proposed for the prediction of production volume from heterogeneous data sources. This method was initially validated using traditional machine-learning techniques and open historical data of the northeast region of the state of Puebla, Mexico, which were collected from data sources from the National Water Commission and the Agri-food Information Service of the Mexican Government. Full article

(This article belongs to the Special Issue Principles and Applications of Data Science)

► Show Figures

Figure 1

15 pages, 4352 KiB

Open AccessArticle

Combination of Transfer Learning Methods for Kidney Glomeruli Image Classification

by Hsi-Chieh Lee and Ahmad Fauzan Aqil

Appl. Sci. 2022, 12(3), 1040; https://0-doi-org.brum.beds.ac.uk/10.3390/app12031040 - 20 Jan 2022

Cited by 8 | Viewed by 1697

Abstract

The rising global incidence of chronic kidney disease necessitates the development of image categorization of renal glomeruli. COVID-19 has been shown to enter the glomerulus, a tissue structure in the kidney. This study observes the differences between focal-segmental, normal and sclerotic renal glomerular [...] Read more.

The rising global incidence of chronic kidney disease necessitates the development of image categorization of renal glomeruli. COVID-19 has been shown to enter the glomerulus, a tissue structure in the kidney. This study observes the differences between focal-segmental, normal and sclerotic renal glomerular tissue diseases. The splitting and combining of allied and multivariate models was accomplished utilizing a combined technique using existing models. In this study, model combinations are created by using a high-accuracy accuracy-based model to improve other models. This research exhibits excellent accuracy and consistent classification results on the ResNet101V2 combination using a mix of transfer learning methods, with the combined model on ResNet101V2 showing an accuracy of up to 97 percent with an F1-score of 0.97, compared to other models. However, this study discovered that the anticipated time required was higher than the model employed in general, which was mitigated by the usage of high-performance computing in this study. Full article

(This article belongs to the Special Issue Principles and Applications of Data Science)

► Show Figures

Figure 1

14 pages, 51158 KiB

Open AccessArticle

Deep Learning-Based Water Crystal Classification

by Hien Doan Thi, Frederic Andres, Long Tran Quoc, Hiro Emoto, Michiko Hayashi, Ken Katsumata and Takayuki Oshide

Appl. Sci. 2022, 12(2), 825; https://0-doi-org.brum.beds.ac.uk/10.3390/app12020825 - 14 Jan 2022

Cited by 3 | Viewed by 2877

Abstract

Much of the earth’s surface is covered by water. As was pointed out in the 2020 edition of the World Water Development Report, climate change challenges the sustainability of global water resources, so it is important to monitor the quality of water to [...] Read more.

Much of the earth’s surface is covered by water. As was pointed out in the 2020 edition of the World Water Development Report, climate change challenges the sustainability of global water resources, so it is important to monitor the quality of water to preserve sustainable water resources. Quality of water can be related to the structure of water crystal, the solid-state of water, so methods to understand water crystals can help to improve water quality. As a first step, a water crystal exploratory analysis has been initiated with the cooperation with the Emoto Peace Project (EPP). The 5K EPP dataset has been created as the first world-wide small dataset of water crystals. Our research focused on reducing the inherent limitations when fitting machine learning models to the 5K EPP dataset. One major result is the classification of water crystals and how to split our small dataset into several related groups. Using the 5K EPP dataset of human observations and past research on snow crystal classification, we created a simple set of visual labels to identify water crystal shapes, in 13 categories. A deep learning-based method has been used to automatically do the classification task with a subset of the label dataset. The classification achieved high accuracy when using a fine-tuning technique. Full article

(This article belongs to the Special Issue Principles and Applications of Data Science)

► Show Figures

Figure 1

18 pages, 368 KiB

Open AccessArticle

Data Mining of Students’ Consumption Behaviour Pattern Based on Self-Attention Graph Neural Network

by Fangyao Xu and Shaojie Qu

Appl. Sci. 2021, 11(22), 10784; https://0-doi-org.brum.beds.ac.uk/10.3390/app112210784 - 15 Nov 2021

Viewed by 1811

Abstract

Performance prediction is of significant importance. Previous mining of behaviour data was limited to machine learning models. Corresponding research has not made good use of the information of spatial location changes over time, in addition to discriminative students’ behavioural patterns and tendentious behaviour. [...] Read more.

Performance prediction is of significant importance. Previous mining of behaviour data was limited to machine learning models. Corresponding research has not made good use of the information of spatial location changes over time, in addition to discriminative students’ behavioural patterns and tendentious behaviour. Thus, we establish students’ behaviour networks, combine temporal and spatial information to mine behavioural patterns of academic performance discrimination, and predict student’s performance. Firstly, we put forward some principles to build graphs with a topological structure based on consumption data; secondly, we propose an improved self-attention mechanism model; thirdly, we perform classification tasks related to academic performance, and determine discriminative learning and life behaviour sequence patterns. Results showed that the accuracy of the two-category classification reached 84.86% and that of the three-category classification reached 79.43%. In addition, students with good academic performance were observed to study in the classroom or library after dinner and lunch. Apart from returning to the dormitory in the evening, they tended to stay focused in the library and other learning venues during the day. Lastly, different nodes have different contributions to the prediction, thereby providing an approach for feature selection. Our research findings provide a method to grasp students’ campus traces. Full article

(This article belongs to the Special Issue Principles and Applications of Data Science)

► Show Figures

Figure 1

16 pages, 1660 KiB

Open AccessArticle

Neuro-Fuzzy Transformation with Minimize Entropy Principle to Create New Features for Particulate Matter Prediction

by Krittakom Srijiranon and Narissara Eiamkanitchat

Appl. Sci. 2021, 11(14), 6590; https://0-doi-org.brum.beds.ac.uk/10.3390/app11146590 - 17 Jul 2021

Cited by 1 | Viewed by 1601

Abstract

Air pollution is a major global issue. In Thailand, this issue continues to increase every year, similar to other countries, especially during the dry season in the northern region. In this period, particulate matter with aerodynamic diameters smaller than 10 and 2.5 micrometers, [...] Read more.

Air pollution is a major global issue. In Thailand, this issue continues to increase every year, similar to other countries, especially during the dry season in the northern region. In this period, particulate matter with aerodynamic diameters smaller than 10 and 2.5 micrometers, known as PM₁₀ and PM_2.5, are important pollutants, most of which exceed the national standard levels, the so-called Thailand air quality index (T-AQI). Therefore, this study created a prediction model to classify T-AQI calculated from both types of PM. The neuro-fuzzy model with a minimum entropy principle model is proposed to transform the original data into new informative features. The processes in this model are able to discover appropriate separation points of the trapezoidal membership function by applying the minimum entropy principle. The membership value of the fuzzy section is then passed to the neural section to create a new data feature, the PM level, for each hour of the day. Finally, as an analytical process to obtain new knowledge, predictive models are created using new data features for better classification results. Various experiments were utilized to find an appropriate structure with high prediction accuracy. The results of the proposed model were favorable for predicting both types of PM up to three hours in advance. The proposed model can help people who are planning short-term outdoor activities. Full article

(This article belongs to the Special Issue Principles and Applications of Data Science)

► Show Figures

Figure 1

22 pages, 7252 KiB

Open AccessArticle

A Data Driven Approach for Raw Material Terminology

by Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić and Ljiljana Kolonja

Appl. Sci. 2021, 11(7), 2892; https://0-doi-org.brum.beds.ac.uk/10.3390/app11072892 - 24 Mar 2021

Cited by 2 | Viewed by 2299

Abstract

The research presented in this paper aims at creating a bilingual (sr-en), easily searchable, hypertext, born-digital, corpus-based terminological database of raw material terminology for dictionary production. The approach is based on linking dictionaries related to the raw material domain, both digitally born and [...] Read more.

The research presented in this paper aims at creating a bilingual (sr-en), easily searchable, hypertext, born-digital, corpus-based terminological database of raw material terminology for dictionary production. The approach is based on linking dictionaries related to the raw material domain, both digitally born and printed, into a lexicon structure, aligning terminology from different dictionaries as much as possible. This paper presents the main features of this approach, data used for compilation of the terminological database, the procedure by which it has been generated and a mobile application for its use. Available (terminological) resources will be presented—paper dictionaries and digital resources related to the raw material domain, as well as general lexica morphological dictionaries. Resource preparation started with dictionary (retro)digitisation and corpora enlargement, followed by adding new Serbian terms to general lexica dictionaries, as well as adding bilingual terms. Dictionary development is relying on corpus analysis, details of which are also presented. Usage examples, collocations and concordances play an important role in raw material terminology, and have also been included in this research. Some important related issues discussed are collocation extraction methods, the use of domain labels, lexical and semantic relations, definitions and subentries. Full article

(This article belongs to the Special Issue Principles and Applications of Data Science)

► Show Figures

Figure 1

20 pages, 919 KiB

Open AccessArticle

A Provable and Secure Patient Electronic Health Record Fair Exchange Scheme for Health Information Systems

by Ming-Te Chen and Tsung-Hung Lin

Appl. Sci. 2021, 11(5), 2401; https://0-doi-org.brum.beds.ac.uk/10.3390/app11052401 - 08 Mar 2021

Cited by 7 | Viewed by 1630

Abstract

In recent years, several hospitals have begun using health information systems to maintain electronic health records (EHRs) for each patient. Traditionally, when a patient visits a new hospital for the first time, the hospital’s help desk asks them to fill in relevant personal [...] Read more.

In recent years, several hospitals have begun using health information systems to maintain electronic health records (EHRs) for each patient. Traditionally, when a patient visits a new hospital for the first time, the hospital’s help desk asks them to fill in relevant personal information on a piece of paper and verifies their identity on the spot. This patient will find that many of her personal electronic records are in many hospital’s health information systems that she visited in the past, and each EHR in these hospital’s information systems cannot be accessed or shared between these hospitals. This is inconvenient because this patient will again have to provide their personal information. This is time-consuming and not practical. Therefore, in this paper, we propose a practical and provable patient EHR fair exchange scheme for each patient. In this scheme, each patient can securely delegate the information system of a current hospital to a hospital certification authority (HCA) to apply migration evidence that can be used to transfer their EHR to another hospital. The delegated system can also establish a session key with other hospital systems for later data transmission, and each patient can protect their anonymity with the help of the HCA. Additionally, we also provide formal security proofs for forward secrecy and functional comparisons with other schemes. Full article

(This article belongs to the Special Issue Principles and Applications of Data Science)

► Show Figures

Figure 1

13 pages, 5035 KiB

Open AccessArticle

Improvement in the Convolutional Neural Network for Computed Tomography Images

by Keisuke Manabe, Yusuke Asami, Tomonari Yamada and Hiroyuki Sugimori

Appl. Sci. 2021, 11(4), 1505; https://0-doi-org.brum.beds.ac.uk/10.3390/app11041505 - 07 Feb 2021

Cited by 11 | Viewed by 2227

Abstract

Background and purpose. This study evaluated a modified specialized convolutional neural network (CNN) to improve the accuracy of medical images. Materials and Methods. We defined computed tomography (CT) images as belonging to one of the following 10 classes: head, neck, chest, abdomen, and [...] Read more.

Background and purpose. This study evaluated a modified specialized convolutional neural network (CNN) to improve the accuracy of medical images. Materials and Methods. We defined computed tomography (CT) images as belonging to one of the following 10 classes: head, neck, chest, abdomen, and pelvis with and without contrast media, with 10,000 images per class. We modified the CNN based on the AlexNet with an input size of 512 × 512. We resized the filter sizes of the convolution layer and max pooling. Using these modified CNNs, various models were created and evaluated. The improved CNN was evaluated to classify the presence or absence of the pancreas in the CT images. We compared the overall accuracy, which was calculated from images not used for training, to that of the ResNet. Results. The overall accuracies of the most improved CNN and ResNet in the 10 classes were 94.8% and 89.3%, respectively. The filter sizes of the improved CNN for the convolution layer were (13, 13), (7, 7), (5, 5), (5, 5), and (5, 5) in order from the first layer, and that of max-pooling was (7, 7). The calculation times of the most improved CNN and ResNet were 56 and 120 min, respectively. Regarding the classification of the pancreas, the overall accuracies of the most improved CNN and ResNet were 75.75% and 58.25%, respectively. The calculation times of the most improved CNN and ResNet were 36 and 55 min, respectively. Conclusion. By optimizing the filter size of the convolution layer and max-pooling of 512 × 512 images, we quickly obtained a highly accurate medical image classification model. This improved CNN can be useful for classifying lesions and anatomies for related diagnostic aid applications. Full article

(This article belongs to the Special Issue Principles and Applications of Data Science)

► Show Figures

Figure 1

17 pages, 3018 KiB

Open AccessArticle

Predicting Implicit User Preferences with Multimodal Feature Fusion for Similar User Recommendation in Social Media

by Jenq-Haur Wang, Yen-Tsang Wu and Long Wang

Appl. Sci. 2021, 11(3), 1064; https://0-doi-org.brum.beds.ac.uk/10.3390/app11031064 - 25 Jan 2021

Cited by 4 | Viewed by 2247

Abstract

In social networks, users can easily share information and express their opinions. Given the huge amount of data posted by many users, it is difficult to search for relevant information. In addition to individual posts, it would be useful if we can recommend [...] Read more.

In social networks, users can easily share information and express their opinions. Given the huge amount of data posted by many users, it is difficult to search for relevant information. In addition to individual posts, it would be useful if we can recommend groups of people with similar interests. Past studies on user preference learning focused on single-modal features such as review contents or demographic information of users. However, such information is usually not easy to obtain in most social media without explicit user feedback. In this paper, we propose a multimodal feature fusion approach to implicit user preference prediction which combines text and image features from user posts for recommending similar users in social media. First, we use the convolutional neural network (CNN) and TextCNN models to extract image and text features, respectively. Then, these features are combined using early and late fusion methods as a representation of user preferences. Lastly, a list of users with the most similar preferences are recommended. The experimental results on real-world Instagram data show that the best performance can be achieved when we apply late fusion of individual classification results for images and texts, with the best average top-k accuracy of 0.491. This validates the effectiveness of utilizing deep learning methods for fusing multimodal features to represent social user preferences. Further investigation is needed to verify the performance in different types of social media. Full article

(This article belongs to the Special Issue Principles and Applications of Data Science)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Principles and Applications of Data Science

Share This Special Issue

Special Issue Editor

Special Issue Information

Published Papers (9 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI