Data Mining and Machine Learning in Multimedia Databases

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 April 2023) | Viewed by 13777

Special Issue Editor


E-Mail Website
Guest Editor
Department of Computer Science and Engineering DISI, University of Bologna, 40126 Bologna, Italy
Interests: multimedia data management; content-based analysis of multimedia databases; efficient query processing in multimedia databases; distributed big data query processing; real-time analysis of massive multimedia streams

Special Issue Information

Dear Colleagues,

Nowadays, thanks to the worldwide availability of cheap information-sensing devices (such as sensors, cameras, RFID readers, and mobile phones) and the growth of storage capacity, data generation has greatly increased, reaching several exabytes per day. Most of such data are of multimedia (MM) types, given the diffusion of inexpensive tools for creating/capturing images, videos, audio, textual documents, and so on.

This MM data avalanche has completely overrun existing techniques for extracting knowledge and value from conventional data. Automatic analysis of MM data is yet an open research issue due to their very complex nature and the lack of appropriate methodologies for accurate and efficient characterization of their content and semantics. Possible contexts of application include, among the others, smart cities, smart mobility, internet-of-things, public health, aging, public/citizen safety/security, advanced education, smart advertising, and automated industry.

This Special Issue focuses on data mining (DM) and machine learning (ML) techniques in the context of MM databases. Our aim is to collect the most recent evidence of innovation in extracting knowledge and value from MM data. We would like to gather researchers from different disciplines and methodological backgrounds to discuss new ideas, original research, recent results, and future challenges in this exciting area. Potential topics include, but are not limited to, the following:

  • Big data techniques for MM databases;
  • Real-time analysis of massive MM data streams;
  • Pipelines for MM data analysis;
  • Bias in ML for MM data;
  • MM data-driven decision making;
  • Classification of MM data;
  • Clustering of MM data;
  • Prediction of MM data;
  • Recommendation of MM data

Prof. Dr. Ilaria Bartolini
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Multimedia databases
  • Data mining
  • Machine learning
  • Knowledge extraction
  • Knowledge learning
  • Semantics in machine learning
  • Big data
  • Artificial intelligence
  • Deep learning

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

22 pages, 3015 KiB  
Article
Natural Language Processing Application on Commit Messages: A Case Study on HEP Software
by Yue Yang, Elisabetta Ronchieri and Marco Canaparo
Appl. Sci. 2022, 12(21), 10773; https://0-doi-org.brum.beds.ac.uk/10.3390/app122110773 - 24 Oct 2022
Cited by 1 | Viewed by 1754
Abstract
Version Control and Source Code Management Systems, such as GitHub, contain a large amount of unstructured historical information of software projects. Recent studies have introduced Natural Language Processing (NLP) to help software engineers retrieve information from a very large collection of unstructured data. [...] Read more.
Version Control and Source Code Management Systems, such as GitHub, contain a large amount of unstructured historical information of software projects. Recent studies have introduced Natural Language Processing (NLP) to help software engineers retrieve information from a very large collection of unstructured data. In this study, we have extended our previous study by increasing our datasets and machine learning and clustering techniques. We have followed a complex methodology made up of various steps. Starting from the raw commit messages we have employed NLP techniques to build a structured database. We have extracted their main features and used them as input of different clustering algorithms. Once each entry was labelled, we applied supervised machine learning techniques to build a prediction and classification model. We have developed a machine learning-based model to automatically classify commit messages of a software project. Our model exploits a ground-truth dataset that includes commit messages obtained from various GitHub projects belonging to the High Energy Physics context. The contribution of this paper is two-fold: it proposes a ground-truth database and it provides a machine learning prediction model that automatically identifies the more change-prone areas of code. Our model has obtained a very high average accuracy (0.9590), precision (0.9448), recall (0.9382), and F1-score (0.9360). Full article
(This article belongs to the Special Issue Data Mining and Machine Learning in Multimedia Databases)
Show Figures

Figure 1

15 pages, 279 KiB  
Article
Individual and School Correlates of DIT-2 Scores Using a Multilevel Modeling and Data Mining Analysis
by Meghan Bankhead, Youn-Jeng Choi, Yogendra Patil and Stephen J. Thoma
Appl. Sci. 2022, 12(9), 4573; https://0-doi-org.brum.beds.ac.uk/10.3390/app12094573 - 30 Apr 2022
Cited by 1 | Viewed by 1341
Abstract
Moral reasoning was investigated with respect to individual characteristics (i.e., education level, political orientation and sex) and school-related (i.e., university/college) factors using multilevel modeling and data mining analysis. We used the multilevel modeling to detect school effects on moral reasoning as well as [...] Read more.
Moral reasoning was investigated with respect to individual characteristics (i.e., education level, political orientation and sex) and school-related (i.e., university/college) factors using multilevel modeling and data mining analysis. We used the multilevel modeling to detect school effects on moral reasoning as well as individual effects for 16,334 students representing 79 different higher education institutions across the U.S. The school-related factors, such as the racial composition, student–faculty ratio, average SAT score, institution type, institutions’ geographical region, frequencies of morally relevant words in college course catalog, college mission and value statements were collected through website searches. Data mining analysis was utilized to extract and calculate the frequencies of morally relevant words from the website content. There were significant effects for the individual characteristic of political orientation. Additionally, all school-related factors were significant. Only main effects were observed for some school-related factors (i.e., average SAT score, institution type, frequency of morally relevant words in mission statements, value statements and course catalogs). For other school-related factors (i.e., the region, student–faculty ratio and racial composition), main effects were also observed; however, these effects were particularly illuminating given their interactions with political orientation. Implications for educational communities are discussed. Full article
(This article belongs to the Special Issue Data Mining and Machine Learning in Multimedia Databases)
16 pages, 7528 KiB  
Article
Text Mining from Free Unstructured Text: An Experiment of Time Series Retrieval for Volcano Monitoring
by Margherita Berardi, Luigi Santamaria Amato, Francesca Cigna, Deodato Tapete and Mario Siciliani de Cumis
Appl. Sci. 2022, 12(7), 3503; https://0-doi-org.brum.beds.ac.uk/10.3390/app12073503 - 30 Mar 2022
Cited by 4 | Viewed by 1993
Abstract
Volcanic activity may influence climate parameters and impact people safety, and hence monitoring its characteristic indicators and their temporal evolution is crucial. Several databases, communications and literature providing data, information and updates on active volcanoes worldwide are available, and will likely increase in [...] Read more.
Volcanic activity may influence climate parameters and impact people safety, and hence monitoring its characteristic indicators and their temporal evolution is crucial. Several databases, communications and literature providing data, information and updates on active volcanoes worldwide are available, and will likely increase in the future. Consequently, information extraction and text mining techniques aiming to efficiently analyze such databases and gather data and parameters of interest on a specific volcano can play an important role in this applied science field. This work presents a natural language processing (NLP) system that we developed to extract geochemical and geophysical data from free unstructured text included in monitoring reports and operational bulletins issued by volcanological observatories in HTML, PDF and MS Word formats. The NLP system enables the extraction of relevant gas parameters (e.g., SO2 and CO2 flux) from the text, and was tested on a series of 2839 daily and weekly bulletins published online between 2015 and 2021 for the Stromboli volcano (Italy). The experiment shows that the system proves capable in the extraction of the time series of a set of user-defined parameters that can be later analyzed and interpreted by specialists in relation with other monitoring and geospatial data. The text mining system can potentially be tuned to extract other target parameters from this and other databases. Full article
(This article belongs to the Special Issue Data Mining and Machine Learning in Multimedia Databases)
Show Figures

Figure 1

23 pages, 12533 KiB  
Article
Coverage Fulfillment Automation in Hardware Functional Verification Using Genetic Algorithms
by Gabriel Mihail Danciu and Alexandru Dinu
Appl. Sci. 2022, 12(3), 1559; https://0-doi-org.brum.beds.ac.uk/10.3390/app12031559 - 31 Jan 2022
Cited by 4 | Viewed by 2333
Abstract
The functional verification process is one of the most expensive steps in integrated circuit manufacturing. Functional coverage is the most important metric in the entire verification process. By running multiple simulations, different situations of DUT functionality can be encountered, and in this way, [...] Read more.
The functional verification process is one of the most expensive steps in integrated circuit manufacturing. Functional coverage is the most important metric in the entire verification process. By running multiple simulations, different situations of DUT functionality can be encountered, and in this way, functional coverage fulfillment can be improved. However, in many cases it is difficult to reach specific functional situations because it is not easy to correlate the required input stimuli with the expected behavior of the digital design. Therefore, both industry and academia seek solutions to automate the generation of stimuli to reach all the functionalities of interest with less human effort and in less time. In this paper, several approaches inspired by genetic algorithms were developed and tested using three different designs. In all situations, the percentage of stimulus sets generated using well-performing genetic algorithms approaches was higher than the values that resulted when random simulations were employed. In addition, in most cases the genetic algorithm approach reached a higher coverage value per test compared to the random simulation outcome. The results confirmed that in many cases genetic algorithms can outperform constrained random generation of stimuli, that is employed in the classical way of doing verification, considering coverage fulfillment level per verification test. Full article
(This article belongs to the Special Issue Data Mining and Machine Learning in Multimedia Databases)
Show Figures

Figure 1

19 pages, 3842 KiB  
Article
The Metamorphosis (of RAM3S)
by Ilaria Bartolini and Marco Patella
Appl. Sci. 2021, 11(24), 11584; https://0-doi-org.brum.beds.ac.uk/10.3390/app112411584 - 07 Dec 2021
Cited by 2 | Viewed by 1484
Abstract
The real-time analysis of Big Data streams is a terrific resource for transforming data into value. For this, Big Data technologies for smart processing of massive data streams are available, but the facilities they offer are often too raw to be effectively exploited [...] Read more.
The real-time analysis of Big Data streams is a terrific resource for transforming data into value. For this, Big Data technologies for smart processing of massive data streams are available, but the facilities they offer are often too raw to be effectively exploited by analysts. RAM3S (Real-time Analysis of Massive MultiMedia Streams) is a framework that acts as a middleware software layer between multimedia stream analysis techniques and Big Data streaming platforms, so as to facilitate the implementation of the former on top of the latter. RAM3S has been proven helpful in simplifying the deployment of non-parallel techniques to streaming platforms, such as Apache Storm or Apache Flink. In this paper, we show how RAM3S has been updated to incorporate novel stream processing platforms, such as Apache Samza, and to be able to communicate with different message brokers, such as Apache Kafka. Abstracting from the message broker also provides us with the ability to pipeline several RAM3S instances that can, therefore, perform different processing tasks. This represents a richer model for stream analysis with respect to the one already available in the original RAM3S version. The generality of this new RAM3S version is demonstrated through experiments conducted on three different multimedia applications, proving that RAM3S is a formidable asset for enabling efficient and effective Data Mining and Machine Learning on multimedia data streams. Full article
(This article belongs to the Special Issue Data Mining and Machine Learning in Multimedia Databases)
Show Figures

Figure 1

17 pages, 1953 KiB  
Article
Deep Learning-Based Community Detection Approach on Multimedia Social Networks
by Antonino Ferraro, Vincenzo Moscato and Giancarlo Sperlì
Appl. Sci. 2021, 11(23), 11447; https://0-doi-org.brum.beds.ac.uk/10.3390/app112311447 - 02 Dec 2021
Cited by 8 | Viewed by 2036
Abstract
Exploiting multimedia data to analyze social networks has recently become one the most challenging issues for Social Network Analysis (SNA), leading to defining Multimedia Social Networks (MSNs). In particular, these networks consider new ways of interaction and further relationships among users to support [...] Read more.
Exploiting multimedia data to analyze social networks has recently become one the most challenging issues for Social Network Analysis (SNA), leading to defining Multimedia Social Networks (MSNs). In particular, these networks consider new ways of interaction and further relationships among users to support various SNA tasks: influence analysis, expert finding, community identification, item recommendation, and so on. In this paper, we present a hypergraph-based data model to represent all the different types of relationships among users within an MSN, often mediated by multimedia data. In particular, by considering only user-to-user paths that exploit particular hyperarcs and relevant to a given application, we were able to transform the initial hypergraph into a proper adjacency matrix, where each element represents the strength of the link between two users. This matrix was then computed in a novel way through a Convolutional Neural Network (CNN), suitably modified to handle high data sparsity, in order to generate communities among users. Several experiments on standard datasets showed the effectiveness of the proposed methodology compared to other approaches in the literature. Full article
(This article belongs to the Special Issue Data Mining and Machine Learning in Multimedia Databases)
Show Figures

Figure 1

24 pages, 2064 KiB  
Article
SMM: Leveraging Metadata for Contextually Salient Multi-Variate Motif Discovery
by Silvestro R. Poccia, K. Selçuk Candan and Maria Luisa Sapino
Appl. Sci. 2021, 11(22), 10873; https://0-doi-org.brum.beds.ac.uk/10.3390/app112210873 - 17 Nov 2021
Viewed by 1266
Abstract
A common challenge in multimedia data understanding is the unsupervised discovery of recurring patterns, or motifs, in time series data. The discovery of motifs in uni-variate time series is a well studied problem and, while being a relatively new area of research, there [...] Read more.
A common challenge in multimedia data understanding is the unsupervised discovery of recurring patterns, or motifs, in time series data. The discovery of motifs in uni-variate time series is a well studied problem and, while being a relatively new area of research, there are also several proposals for multi-variate motif discovery. Unfortunately, motif search among multiple variates is an expensive process, as the potential number of sub-spaces in which a pattern can occur increases exponentially with the number of variates. Consequently, many multi-variate motif search algorithms make simplifying assumptions, such as searching for motifs across all variates individually, assuming that the motifs are of the same length, or that they occur on a fixed subset of variates. In this paper, we are interested in addressing a relatively broad form of multi-variate motif detection, which seeks frequently occurring patterns (of possibly differing lengths) in sub-spaces of a multi-variate time series. In particular, we aim to leverage contextual information to help select contextually salient patterns and identify the most frequent patterns among all. Based on these goals, we first introduce the contextually salient multi-variate motif (CS-motif) discovery problem and then propose a salient multi-variate motif (SMM) algorithm that, unlike existing methods, is able to seek a broad range of patterns in multi-variate time series. Full article
(This article belongs to the Special Issue Data Mining and Machine Learning in Multimedia Databases)
Show Figures

Figure 1

Back to TopTop