Next Issue
Volume 6, March
Previous Issue
Volume 6, January

Data, Volume 6, Issue 2 (February 2021) – 16 articles

Cover Story (view full-size image): In recent years, the platform economy has been recognised by researchers and governments around the world for its potential to contribute to the sustainable development of society. Nonetheless, platform economy cases such as Uber, Airbnb, or Deliveroo have created a huge controversy for their socioeconomic impact, while other alternative models have been associated with a new form of cooperativism. In parallel, the United Nations are advocating global sustainable development by promoting Sustainable Development Goals (SDGs), considering elements like decent work, inclusive and sustainable economic growth, and fostering innovation. This data descriptor departs from two 2020 European projects’ (DECODE and PLUS) data collections and presents the possibility to compare different platform economy models and their connections to the SDGs. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Readerexternal link to open them.
Order results
Result details
Select all
Export citation of selected articles as:
Open AccessData Descriptor
A Long-Term, Real-Life Parkinson Monitoring Database Combining Unscripted Objective and Subjective Recordings
Data 2021, 6(2), 22; https://0-doi-org.brum.beds.ac.uk/10.3390/data6020022 - 23 Feb 2021
Viewed by 467
Abstract
Accurate real-life monitoring of motor and non-motor symptoms is a challenge in Parkinson’s disease (PD). The unobtrusive capturing of symptoms and their naturalistic fluctuations within or between days can improve evaluation and titration of therapy. First-generation commercial PD motion sensors are promising to [...] Read more.
Accurate real-life monitoring of motor and non-motor symptoms is a challenge in Parkinson’s disease (PD). The unobtrusive capturing of symptoms and their naturalistic fluctuations within or between days can improve evaluation and titration of therapy. First-generation commercial PD motion sensors are promising to augment clinical decision-making in general neurological consultation, but concerns remain regarding their short-term validity, and long-term real-life usability. In addition, tools monitoring real-life subjective experiences of motor and non-motor symptoms are lacking. The dataset presented in this paper constitutes a combination of objective kinematic data and subjective experiential data, recorded parallel to each other in a naturalistic, long-term real-life setting. The objective data consists of accelerometer and gyroscope data, and the subjective data consists of data from ecological momentary assessments. Twenty PD patients were monitored without daily life restrictions for fourteen consecutive days. The two types of data can be used to address hypotheses on naturalistic motor and/or non-motor symptomatology in PD. Full article
(This article belongs to the Special Issue Data from Smartphone and Wearables)
Show Figures

Figure 1

Open AccessData Descriptor
An Open GMNS Dataset of a Dynamic Multi-Modal Transportation Network Model of Melbourne, Australia
Data 2021, 6(2), 21; https://0-doi-org.brum.beds.ac.uk/10.3390/data6020021 - 19 Feb 2021
Viewed by 760
Abstract
Simulation-based dynamic traffic assignment models are increasingly used in urban transportation systems analysis and planning. They replicate traffic dynamics across transportation networks by capturing the complex interactions between travel demand and supply. However, their applications particularly for large-scale networks have been hindered by [...] Read more.
Simulation-based dynamic traffic assignment models are increasingly used in urban transportation systems analysis and planning. They replicate traffic dynamics across transportation networks by capturing the complex interactions between travel demand and supply. However, their applications particularly for large-scale networks have been hindered by the challenges associated with the collection, parsing, development, and sharing of data-intensive inputs. In this paper, we develop and share an open dataset for reproduction of a dynamic multi-modal transportation network model of Melbourne, Australia. The dataset is developed consistently with the General Modeling Network Specification (GMNS), enabling software-agnostic human and machine readability. GMNS is a standard readable format for sharing routable transportation network data that is designed to be used in multimodal static and dynamic transportation operations and planning models. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

Open AccessData Descriptor
Dataset of Two-Dimensional Gel Electrophoresis Images of Acute Myeloid Leukemia Patients before and after Induction Therapy
Data 2021, 6(2), 20; https://0-doi-org.brum.beds.ac.uk/10.3390/data6020020 - 18 Feb 2021
Viewed by 630
Abstract
Acute myeloid leukemia (AML) is a malignant disorder of the hematopoietic stem and progenitor cells, which results in the build-up of immature blasts in the bone marrow and eventually in the peripheral blood of affected patients. Accurately assessing a patient´s prognosis is very [...] Read more.
Acute myeloid leukemia (AML) is a malignant disorder of the hematopoietic stem and progenitor cells, which results in the build-up of immature blasts in the bone marrow and eventually in the peripheral blood of affected patients. Accurately assessing a patient´s prognosis is very important for clinical management of the disease, which is why there are several prognostic factors such as age, performance status at diagnosis, platelet count, serum creatinine and albumin that are taken into account by the clinician when deciding the course of treatment. However, proteomic changes related to treatment response in this patient group have not been widely explored. Here, we make available a set of 22 two-dimensional gel electrophoresis (2DGE) images obtained from the peripheral blood samples of 11 patients with AML, taken at the time of diagnosis and after induction therapy (approximately 21–28 days after starting treatment). The same set of 2DGE images is also made available after a preprocessing stage (an additional 22 2DGE pre-processed images), which was performed using algorithms developed in Python, in order to improve the visualization of characteristic spots and facilitate proteomic analysis of this type of images. Full article
Open AccessArticle
High-to-Low (Regional) Fertility Transitions in a Peripheral European Country: The Contribution of Exploratory Time Series Analysis
Data 2021, 6(2), 19; https://0-doi-org.brum.beds.ac.uk/10.3390/data6020019 - 16 Feb 2021
Viewed by 371
Abstract
Diachronic variations in demographic rates have frequently reflected social transformations and a (more or less evident) impact of sequential economic downturns. By assessing changes over time in Total Fertility Rate (TFR) at the regional scale in Italy, our study investigates the long-term transition [...] Read more.
Diachronic variations in demographic rates have frequently reflected social transformations and a (more or less evident) impact of sequential economic downturns. By assessing changes over time in Total Fertility Rate (TFR) at the regional scale in Italy, our study investigates the long-term transition (1952–2019) characteristic of Mediterranean fertility, showing a continuous decline of births since the late 1970s and marked disparities between high- and low-fertility regions along the latitude gradient. Together with a rapid decline in the country TFR, the spatiotemporal evolution of regional fertility in Italy—illustrated through an exploratory time series statistical approach—outlines the marked divide between (wealthier) Northern regions and (economically disadvantaged) Southern regions. Non-linear fertility trends and increasing spatial heterogeneity in more recent times indicate the role of individual behaviors leveraging a generalized decline in marriage and childbearing propensity. Assuming differential responses of regional fertility to changing socioeconomic contexts, these trends are more evident in Southern Italy than in Northern Italy. Reasons at the base of such fertility patterns were extensively discussed focusing—among others—on the distinctive contribution of internal and international migrations to regional fertility rates. Based on these findings, Southern Italy, an economically disadvantaged, peripheral region in Mediterranean Europe, is taken as a paradigmatic case of demographic shrinkage—whose causes and consequences can be generalized to wider contexts in (and outside) Europe. Full article
Show Figures

Figure 1

Open AccessReview
The State of the Art in Methodologies of Course Recommender Systems—A Review of Recent Research
Data 2021, 6(2), 18; https://0-doi-org.brum.beds.ac.uk/10.3390/data6020018 - 11 Feb 2021
Viewed by 622
Abstract
In recent years, education institutions have offered a wide range of course selections with overlaps. This presents significant challenges to students in selecting successful courses that match their current knowledge and personal goals. Although many studies have been conducted on Recommender Systems (RS), [...] Read more.
In recent years, education institutions have offered a wide range of course selections with overlaps. This presents significant challenges to students in selecting successful courses that match their current knowledge and personal goals. Although many studies have been conducted on Recommender Systems (RS), a review of methodologies used in course RS is still insufficiently explored. To fill this literature gap, this paper presents the state of the art of methodologies used in course RS along with the summary of the types of data sources used to evaluate these techniques. This review aims to recognize emerging trends in course RS techniques in recent research literature to deliver insights for researchers for further investigation. We provide a systematic review process followed by research findings on the current methodologies implemented in different course RS in selected research journals such as: collaborative, content-based, knowledge-based, Data Mining (DM), hybrid, statistical and Conversational RS (CRS). This study analyzed publications between 2016 and June 2020, in three repositories; IEEE Xplore, ACM, and Google Scholar. These papers were explored and classified based on the methodology used in recommending courses. This review has revealed that there is a growing popularity in hybrid course RS and followed by DM techniques in recent publications. However, few CRS-based course RS were present in the selected publications. Finally, we discussed future avenues based on the research outcome, which might lead to next-generation course RS. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

Open AccessData Descriptor
Agricultural Crop Change in the Willamette Valley, Oregon, from 2004 to 2017
Data 2021, 6(2), 17; https://0-doi-org.brum.beds.ac.uk/10.3390/data6020017 - 07 Feb 2021
Viewed by 616
Abstract
The Willamette Valley, bounded to the west by the Coast Range and to the east by the Cascade Mountains, is the largest river valley completely confined to Oregon. The fertile valley soils combined with a temperate, marine climate create ideal agronomic conditions for [...] Read more.
The Willamette Valley, bounded to the west by the Coast Range and to the east by the Cascade Mountains, is the largest river valley completely confined to Oregon. The fertile valley soils combined with a temperate, marine climate create ideal agronomic conditions for seed production. Historically, seed cropping systems in the Willamette Valley have focused on the production of grass and forage seeds. In addition to growing over two-thirds of the nation’s cool-season grass seed, cropping systems in the Willamette Valley include a diverse rotation of over 250 commodities for forage, seed, food, and cover cropping applications. Tracking the sequence of crop rotations that are grown in the Willamette Valley is paramount to answering a broad spectrum of agronomic, environmental, and economical questions. Landsat imagery covering approximately 25,303 km2 were used to identify agricultural crops in production from 2004 to 2017. The agricultural crops were distinguished by classifying images primarily acquired by three platforms: Landsat 5 (2003–2013), Landsat 7 (2003–2017), and Landsat 8 (2013–2017). Before conducting maximum likelihood remote sensing classification, the images acquired by the Landsat 7 were pre-processed to reduce the impact of the scan line corrector failure. The corrected images were subsequently used to classify 35 different land-use classes and 137 unique two-year-long sequences of 57 classes of non-urban and non-forested land-use categories from 2004 through 2014. Our final data product uses new and previously published results to classify the western Oregon landscape into 61 different land use classes, including four majority-rule-over-time super-classes and 57 regular classes of annually disturbed agricultural crops (19 classes), perennial crops (20 classes), forests (13 classes), and urban developments (5 classes). These publicly available data can be used to inform and support environmental and agricultural land-use studies. Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Graphical abstract

Open AccessArticle
Investigating the Adoption of Big Data Management in Healthcare in Jordan
Data 2021, 6(2), 16; https://0-doi-org.brum.beds.ac.uk/10.3390/data6020016 - 06 Feb 2021
Viewed by 660
Abstract
Software developers and data scientists use and deal with big data to easily discover useful knowledge and find better solutions to improve healthcare services and patient safety. Big data analytics (BDA) is getting attention due to its role in decision-making across the healthcare [...] Read more.
Software developers and data scientists use and deal with big data to easily discover useful knowledge and find better solutions to improve healthcare services and patient safety. Big data analytics (BDA) is getting attention due to its role in decision-making across the healthcare field. Therefore, this article examines the adoption mechanism of big data analytics and management in healthcare organizations in Jordan. Additionally, it discusses health big data’s characteristics and the challenges, and limitations for health big data analytics and management in Jordan. This article proposes a conceptual framework that allows utilizing health big data. The proposed conceptual framework suggests a way to merge the existing health information system with the National Health Information Exchange (HIE), which might play a role in extracting insights from our massive datasets, increases the data availability and reduces waste in resources. When applying the framework, the collected data are processed to develop knowledge and support decision-making, which helps improve the health care quality for both the community and individuals by improving diagnosis, treatment, and other services. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

Open AccessFeature PaperArticle
Repository Approaches to Improving the Quality of Shared Data and Code
Data 2021, 6(2), 15; https://0-doi-org.brum.beds.ac.uk/10.3390/data6020015 - 03 Feb 2021
Viewed by 707
Abstract
Sharing data and code for reuse has become increasingly important in scientific work over the past decade. However, in practice, shared data and code may be unusable, or published results obtained from them may be irreproducible. Data repository features and services contribute significantly [...] Read more.
Sharing data and code for reuse has become increasingly important in scientific work over the past decade. However, in practice, shared data and code may be unusable, or published results obtained from them may be irreproducible. Data repository features and services contribute significantly to the quality, longevity, and reusability of datasets. This paper presents a combination of original and secondary data analysis studies focusing on computational reproducibility, data curation, and gamified design elements that can be employed to indicate and improve the quality of shared data and code. The findings of these studies are sorted into three approaches that can be valuable to data repositories, archives, and other research dissemination platforms. Full article
(This article belongs to the Special Issue Data Quality and Data Access for Research)
Show Figures

Figure 1

Open AccessData Descriptor
Retinal Fundus Multi-Disease Image Dataset (RFMiD): A Dataset for Multi-Disease Detection Research
Data 2021, 6(2), 14; https://0-doi-org.brum.beds.ac.uk/10.3390/data6020014 - 03 Feb 2021
Viewed by 923
Abstract
The world faces difficulties in terms of eye care, including treatment, quality of prevention, vision rehabilitation services, and scarcity of trained eye care experts. Early detection and diagnosis of ocular pathologies would enable forestall of visual impairment. One challenge that limits the adoption [...] Read more.
The world faces difficulties in terms of eye care, including treatment, quality of prevention, vision rehabilitation services, and scarcity of trained eye care experts. Early detection and diagnosis of ocular pathologies would enable forestall of visual impairment. One challenge that limits the adoption of computer-aided diagnosis tool by ophthalmologists is the number of sight-threatening rare pathologies, such as central retinal artery occlusion or anterior ischemic optic neuropathy, and others are usually ignored. In the past two decades, many publicly available datasets of color fundus images have been collected with a primary focus on diabetic retinopathy, glaucoma, age-related macular degeneration and few other frequent pathologies. To enable development of methods for automatic ocular disease classification of frequent diseases along with the rare pathologies, we have created a new Retinal Fundus Multi-disease Image Dataset (RFMiD). It consists of 3200 fundus images captured using three different fundus cameras with 46 conditions annotated through adjudicated consensus of two senior retinal experts. To the best of our knowledge, our dataset, RFMiD, is the only publicly available dataset that constitutes such a wide variety of diseases that appear in routine clinical settings. This dataset will enable the development of generalizable models for retinal screening. Full article
Show Figures

Figure 1

Open AccessEditorial
Acknowledgment to Reviewers of Data in 2020
Data 2021, 6(2), 13; https://0-doi-org.brum.beds.ac.uk/10.3390/data6020013 - 01 Feb 2021
Viewed by 515
Abstract
Peer review is the driving force of journal development, and reviewers are gatekeepers who ensure that Data maintains its standards for the high quality of its published papers [...] Full article
Open AccessReview
A Systematic Survey of ML Datasets for Prime CV Research Areas—Media and Metadata
Data 2021, 6(2), 12; https://0-doi-org.brum.beds.ac.uk/10.3390/data6020012 - 22 Jan 2021
Viewed by 638
Abstract
The ever-growing capabilities of computers have enabled pursuing Computer Vision through Machine Learning (i.e., MLCV). ML tools require large amounts of information to learn from (ML datasets). These are costly to produce but have received reduced attention regarding standardization. This prevents the cooperative [...] Read more.
The ever-growing capabilities of computers have enabled pursuing Computer Vision through Machine Learning (i.e., MLCV). ML tools require large amounts of information to learn from (ML datasets). These are costly to produce but have received reduced attention regarding standardization. This prevents the cooperative production and exploitation of these resources, impedes countless synergies, and hinders ML research. No global view exists of the MLCV dataset tissue. Acquiring it is fundamental to enable standardization. We provide an extensive survey of the evolution and current state of MLCV datasets (1994 to 2019) for a set of specific CV areas as well as a quantitative and qualitative analysis of the results. Data were gathered from online scientific databases (e.g., Google Scholar, CiteSeerX). We reveal the heterogeneous plethora that comprises the MLCV dataset tissue; their continuous growth in volume and complexity; the specificities of the evolution of their media and metadata components regarding a range of aspects; and that MLCV progress requires the construction of a global standardized (structuring, manipulating, and sharing) MLCV “library”. Accordingly, we formulate a novel interpretation of this dataset collective as a global tissue of synthetic cognitive visual memories and define the immediately necessary steps to advance its standardization and integration. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

Open AccessArticle
On Linear and Circular Approach to GPS Data Processing: Analyses of the Horizontal Positioning Deviations Based on the Adriatic Region IGS Observables
Data 2021, 6(2), 9; https://0-doi-org.brum.beds.ac.uk/10.3390/data6020009 - 21 Jan 2021
Viewed by 565
Abstract
Global and regional positional accuracy assessment is of the highest importance for any satellite navigation system, including the Global Positioning System (GPS). Although positioning error can be expressed as a vector quantity with direction and magnitude, most of the research focuses on error [...] Read more.
Global and regional positional accuracy assessment is of the highest importance for any satellite navigation system, including the Global Positioning System (GPS). Although positioning error can be expressed as a vector quantity with direction and magnitude, most of the research focuses on error magnitude only. The positional accuracy can be evaluated in terms of navigational quadrants as further refinement of error distribution, as it was shown here. This research was conducted in the wider area of the Northern Adriatic Region, employing the International Global Navigation Satellite Systems (GNSS) Service (IGS) data and products. Similarities of positional accuracy and deviations distributions for Single Point Positioning (SPP) were addressed in terms of magnitudes. Data were analyzed during the 11-day period. Linear and circular statistical methods were used to quantify regional positional accuracy and error behavior. This was conducted in terms of both scalar and vector values, with assessment of the underlying probability distributions. Navigational quadrantal positioning error subset analysis was carried out. Similarity in the positional accuracy and positioning deviations behavior, with uneven positional distribution between quadrants, indicated the directionality of the total positioning error. The underlying distributions for latitude and longitude deviations followed approximately normal distributions, while the radius was approximated by the Rayleigh distribution. The Weibull and gamma distributions were considered, as well. Possible causes of the analyzed positioning deviations were not investigated, but the ultimate positioning products were obtained as in standard, single-frequency positioning scenarios. Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Graphical abstract

Open AccessArticle
The Effect of Preprocessing Techniques, Applied to Numeric Features, on Classification Algorithms’ Performance
Data 2021, 6(2), 11; https://0-doi-org.brum.beds.ac.uk/10.3390/data6020011 - 21 Jan 2021
Viewed by 670
Abstract
It is recognized that the performance of any prediction model is a function of several factors. One of the most significant factors is the adopted preprocessing techniques. In other words, preprocessing is an essential process to generate an effective and efficient classification model. [...] Read more.
It is recognized that the performance of any prediction model is a function of several factors. One of the most significant factors is the adopted preprocessing techniques. In other words, preprocessing is an essential process to generate an effective and efficient classification model. This paper investigates the impact of the most widely used preprocessing techniques, with respect to numerical features, on the performance of classification algorithms. The effect of combining various normalization techniques and handling missing values strategies is assessed on eighteen benchmark datasets using two well-known classification algorithms and adopting different performance evaluation metrics and statistical significance tests. According to the reported experimental results, the impact of the adopted preprocessing techniques varies from one classification algorithm to another. In addition, a statistically significant difference between the considered data preprocessing techniques is demonstrated. Full article
Show Figures

Figure 1

Open AccessReview
Balancing Plurality and Educational Essence: Higher Education Between Data-Competent Professionals and Data Self-Empowered Citizens
Data 2021, 6(2), 10; https://0-doi-org.brum.beds.ac.uk/10.3390/data6020010 - 21 Jan 2021
Viewed by 498
Abstract
Data are increasingly important in central facets of modern life: academics, professions, and society at large. Educating aspiring minds to meet highest standards in these facets is the mandate of institutions of higher education. This, naturally, includes the preparation for excelling in today’s [...] Read more.
Data are increasingly important in central facets of modern life: academics, professions, and society at large. Educating aspiring minds to meet highest standards in these facets is the mandate of institutions of higher education. This, naturally, includes the preparation for excelling in today’s data-driven world. In recent years, an intensive academic discussion has resulted in the distinction between two different modes of data related education: data science and data literacy education. As a large number of study programs and offers is emerging around the world, data literacy in higher education is a particular focus of this paper. These programs, despite sharing the same name, differ substantially in their educational content, i.e., a high plurality can be observed. This paper explores this plurality, comments on the role it might play and suggests ways it can be dealt with by maintaining a high degree of adaptiveness and plurality while simultaneously establishing a consistent educational “essence”. It identifies a skill set, data self-empowerment, as a potential part of this essence. Data science and literacy education are still experiencing changeability in their emergence as fields of study, while additionally being stirred up by rapid developments, bringing about a need for flexibility and dialectic. Full article
(This article belongs to the Section Featured Reviews of Data Science Research)
Open AccessArticle
Characteristics of Recent Aftershocks Sequences (2014, 2015, 2018) Derived from New Seismological and Geodetic Data on the Ionian Islands, Greece
Data 2021, 6(2), 8; https://0-doi-org.brum.beds.ac.uk/10.3390/data6020008 - 20 Jan 2021
Viewed by 512
Abstract
In 2014–2018, four strong earthquakes occurred in the Ionian Sea, Greece. After these events, a rich aftershock sequence followed. More analytically, according to the manual solutions of the National Observatory of Athens, the first event occurred on 26 January 2014 in Cephalonia Island [...] Read more.
In 2014–2018, four strong earthquakes occurred in the Ionian Sea, Greece. After these events, a rich aftershock sequence followed. More analytically, according to the manual solutions of the National Observatory of Athens, the first event occurred on 26 January 2014 in Cephalonia Island with magnitude ML = 5.8, followed by another in the same region on 3 February 2014 with magnitude ML = 5.7. The third event occurred on 17 November 2015, ML = 6.0 in Lefkas Island and the last on 25 October 2018, ML = 6.6 in Zakynthos Island. The first three of these earthquakes caused moderate structural damages, mainly in houses and produced particular unrest to the local population. This work determines a seismic moment tensor for both large and intermediate magnitude earthquakes (M > 4.0). Geodetic data from permanent GPS stations were analyzed to investigate the displacement due to the earthquakes. Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Figure 1

Open AccessData Descriptor
Data for Sustainable Platform Economy: Connections between Platform Models and Sustainable Development Goals
Data 2021, 6(2), 7; https://0-doi-org.brum.beds.ac.uk/10.3390/data6020007 - 20 Jan 2021
Viewed by 1100
Abstract
In recent years, the platform economy has been recognised by researchers and governments around the world for its potential to contribute to the sustainable development of society. Yet, platform economy cases such as Uber, Airbnb, and Deliveroo have created a huge controversy over [...] Read more.
In recent years, the platform economy has been recognised by researchers and governments around the world for its potential to contribute to the sustainable development of society. Yet, platform economy cases such as Uber, Airbnb, and Deliveroo have created a huge controversy over their socioeconomic impact, while other alternative models have been associated with a new form of cooperativism. In parallel, the United Nations are advocating global sustainable development by promoting Sustainable Development Goals (SDGs), considering elements such as decent work, inclusive and sustainable economic growth, and fostering innovation. In any case, the SDGs have been also criticised for the lack of digital perspective. This dataset draws from two 2020 European projects’ (DECODE and PLUS) data collections and presents the possibility to compare different platform economy models and their connections with the SDGs. Full article
(This article belongs to the Special Issue A European Approach to the Establishment of Data Spaces)
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop