Big Data Analytics and Cloud Data Management

A special issue of Big Data and Cognitive Computing (ISSN 2504-2289).

Deadline for manuscript submissions: closed (15 January 2022) | Viewed by 19198

Special Issue Editor


E-Mail Website
Guest Editor
National Technical University of Athens, Athens, Greece
University of Ottawa, Ottawa, Canada
Interests: big data management; multi-engine systems; heterogeneous data; distributed data management; cloud data management; approximate querying

Special Issue Information

Dear Colleagues,

Big Data are large, complex, and unprocessed datasets that cannot be processed by traditional applications but can offer knowledge and value if properly analyzed. The application-specific capacity of the machines to compute information per capita has roughly doubled every 14 months in recent years, whereas the world's storage capacity per capita has required roughly 40 months to double during the same time. The exponential growth of data production, the diversity of data sources, along with the improvement of the computational capabilities of hardware have given rise to multifarious challenges of data management related to all seven V’s that are used to describe Big Data, i.e.: volume, variety, velocity, variability, veracity, visualization, value. Such challenges are related to capturing the raw data and storing the raw data and pertinent metadata; analyzing the data and producing new knowledge; sharing data and knowledge; as well as offering services on the data for visualization and exploration. For the storage and processing of Big Data, Cloud Computing seems to be the ideal paradigm, as it offers flexibility for the processing environment, renting resources from cloud providers and inherently distributed services. Hence, cloud data management techniques that are tailored for the processing of Big Data are highly sought after both in the research world and in the industry.

Dr. Verena Kantere
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Big Data and Cognitive Computing is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Data visualization
  • Multi-engine systems
  • Approximate query answering
  • Data analytics
  • Edge and fog computing
  • Hybrid clouds
  • Heterogeneous data stores
  • Big data stream management
  • Multi-dimensional data
  • Data services
  • Data pricing
  • Data privacy

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

26 pages, 1044 KiB  
Article
A Hierarchical Hadoop Framework to Process Geo-Distributed Big Data
by Giuseppe Di Modica and Orazio Tomarchio
Big Data Cogn. Comput. 2022, 6(1), 5; https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc6010005 - 06 Jan 2022
Cited by 1 | Viewed by 3395
Abstract
In the past twenty years, we have witnessed an unprecedented production of data worldwide that has generated a growing demand for computing resources and has stimulated the design of computing paradigms and software tools to efficiently and quickly obtain insights on such a [...] Read more.
In the past twenty years, we have witnessed an unprecedented production of data worldwide that has generated a growing demand for computing resources and has stimulated the design of computing paradigms and software tools to efficiently and quickly obtain insights on such a Big Data. State-of-the-art parallel computing techniques such as the MapReduce guarantee high performance in scenarios where involved computing nodes are equally sized and clustered via broadband network links, and the data are co-located with the cluster of nodes. Unfortunately, the mentioned techniques have proven ineffective in geographically distributed scenarios, i.e., computing contexts where nodes and data are geographically distributed across multiple distant data centers. In the literature, researchers have proposed variants of the MapReduce paradigm that obtain awareness of the constraints imposed in those scenarios (such as the imbalance of nodes computing power and of interconnecting links) to enforce smart task scheduling strategies. We have designed a hierarchical computing framework in which a context-aware scheduler orchestrates computing tasks that leverage the potential of the vanilla Hadoop framework within each data center taking part in the computation. In this work, after presenting the features of the developed framework, we advocate the opportunity of fragmenting the data in a smart way so that the scheduler produces a fairer distribution of the workload among the computing tasks. To prove the concept, we implemented a software prototype of the framework and ran several experiments on a small-scale testbed. Test results are discussed in the last part of the paper. Full article
(This article belongs to the Special Issue Big Data Analytics and Cloud Data Management)
Show Figures

Figure 1

16 pages, 2518 KiB  
Article
From data Processing to Knowledge Processing: Working with Operational Schemas by Autopoietic Machines
by Mark Burgin and Rao Mikkilineni
Big Data Cogn. Comput. 2021, 5(1), 13; https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc5010013 - 10 Mar 2021
Cited by 14 | Viewed by 5579
Abstract
Knowledge processing is an important feature of intelligence in general and artificial intelligence in particular. To develop computing systems working with knowledge, it is necessary to elaborate the means of working with knowledge representations (as opposed to data), because knowledge is an abstract [...] Read more.
Knowledge processing is an important feature of intelligence in general and artificial intelligence in particular. To develop computing systems working with knowledge, it is necessary to elaborate the means of working with knowledge representations (as opposed to data), because knowledge is an abstract structure. There are different forms of knowledge representations derived from data. One of the basic forms is called a schema, which can belong to one of three classes: operational, descriptive, and representation schemas. The goal of this paper is the development of theoretical and practical tools for processing operational schemas. To achieve this goal, we use schema representations elaborated in the mathematical theory of schemas and use structural machines as a powerful theoretical tool for modeling parallel and concurrent computational processes. We describe the schema of autopoietic machines as physical realizations of structural machines. An autopoietic machine is a technical system capable of regenerating, reproducing, and maintaining itself by production, transformation, and destruction of its components and the networks of processes downstream contained in them. We present the theory and practice of designing and implementing autopoietic machines as information processing structures integrating both symbolic computing and neural networks. Autopoietic machines use knowledge structures containing the behavioral evolution of the system and its interactions with the environment to maintain stability by counteracting fluctuations. Full article
(This article belongs to the Special Issue Big Data Analytics and Cloud Data Management)
Show Figures

Figure 1

18 pages, 1958 KiB  
Article
Processing Big Data with Apache Hadoop in the Current Challenging Era of COVID-19
by Otmane Azeroual and Renaud Fabre
Big Data Cogn. Comput. 2021, 5(1), 12; https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc5010012 - 09 Mar 2021
Cited by 21 | Viewed by 9077
Abstract
Big data have become a global strategic issue, as increasingly large amounts of unstructured data challenge the IT infrastructure of global organizations and threaten their capacity for strategic forecasting. As experienced in former massive information issues, big data technologies, such as Hadoop, should [...] Read more.
Big data have become a global strategic issue, as increasingly large amounts of unstructured data challenge the IT infrastructure of global organizations and threaten their capacity for strategic forecasting. As experienced in former massive information issues, big data technologies, such as Hadoop, should efficiently tackle the incoming large amounts of data and provide organizations with relevant processed information that was formerly neither visible nor manageable. After having briefly recalled the strategic advantages of big data solutions in the introductory remarks, in the first part of this paper, we focus on the advantages of big data solutions in the currently difficult time of the COVID-19 pandemic. We characterize it as an endemic heterogeneous data context; we then outline the advantages of technologies such as Hadoop and its IT suitability in this context. In the second part, we identify two specific advantages of Hadoop solutions, globality combined with flexibility, and we notice that they are at work with a “Hadoop Fusion Approach” that we describe as an optimal response to the context. In the third part, we justify selected qualifications of globality and flexibility by the fact that Hadoop solutions enable comparable returns in opposite contexts of models of partial submodels and of models of final exact systems. In part four, we remark that in both these opposite contexts, Hadoop’s solutions allow a large range of needs to be fulfilled, which fits with requirements previously identified as the current heterogeneous data structure of COVID-19 information. In the final part, we propose a framework of strategic data processing conditions. To the best of our knowledge, they appear to be the most suitable to overcome COVID-19 massive information challenges. Full article
(This article belongs to the Special Issue Big Data Analytics and Cloud Data Management)
Show Figures

Figure 1

Back to TopTop