Next Issue
Volume 6, January
Previous Issue
Volume 5, September

Data, Volume 5, Issue 4 (December 2020) – 33 articles

Cover Story (view full-size image): MRI performed several minutes after the injection of a contrast agent (delayed enhancement MRI or DE-MRI) is a method of choice to evaluate the extent of myocardial infarction (MI), and by extension, to assess viable tissues after an injury. The Emidec dataset is composed of a series of exams with DE-MRI images in short-axis orientation covering the left ventricle from normal cases or patients with MI, including the contouring of the myocardium and diseased areas (if present). Moreover, classical available clinical parameters when the patient is managed by an emergency department are provided for each case. Therefore, the Emidec dataset combines DE-MRI with clinical characteristics of the patient, allowing the development of methodologies for exam classification as well as for exam quantification. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Readerexternal link to open them.
Order results
Result details
Select all
Export citation of selected articles as:
Data Descriptor
Dataset on the Effects of Different Pre-Harvest Factors on the Metabolomics Profile of Lettuce (Lactuca sativa L.) Leaves
Data 2020, 5(4), 119; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040119 - 15 Dec 2020
Cited by 1 | Viewed by 1013
Abstract
The study of the relationship between cultivated plants and environmental factors can provide information ranging from a deeper understanding of the plant biological system to the development of more effective management strategies for improving yield, quality, and sustainability of the produce. In this [...] Read more.
The study of the relationship between cultivated plants and environmental factors can provide information ranging from a deeper understanding of the plant biological system to the development of more effective management strategies for improving yield, quality, and sustainability of the produce. In this article, we present a comprehensive metabolomics dataset of two phytochemically divergent lettuce (Lactuca sativa L.) butterhead varieties under different growing conditions. Plants were cultivated in hydroponics in a growth chamber with ambient control. The pre-harvest factors that were independently investigated were light intensity (two levels), the ionic strength of the nutrient solutions (three levels), and the molar ratio of three macroelements (K, Mg, and Ca) in the nutrient solution (three levels). We used an untargeted, mass-spectrometry-based approach to characterize the metabolomics profiles of leaves harvested 19 days after transplant. The data revealed the ample impact on both primary and secondary metabolism and its range of variation. Moreover, our dataset is useful for uncovering the complex effects of the genotype, the environmental factor(s), and their interaction, which may deserve further investigation. Full article
Data Descriptor
A State-Level Socioeconomic Data Collection of the United States for COVID-19 Research
Data 2020, 5(4), 118; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040118 - 11 Dec 2020
Cited by 2 | Viewed by 1480
Abstract
The outbreak of COVID-19 from late 2019 not only threatens the health and lives of humankind but impacts public policies, economic activities, and human behavior patterns significantly. To understand the impact and better prepare for future outbreaks, socioeconomic factors play significant roles in [...] Read more.
The outbreak of COVID-19 from late 2019 not only threatens the health and lives of humankind but impacts public policies, economic activities, and human behavior patterns significantly. To understand the impact and better prepare for future outbreaks, socioeconomic factors play significant roles in (1) determinant analysis with health care, environmental exposure and health behavior; (2) human mobility analyses driven by policies; (3) economic pressure and recovery analyses for decision making; and (4) short to long term social impact analysis for equity, justice and diversity. To support these analyses for rapid impact responses, state level socioeconomic factors for the United States of America (USA) are collected and integrated into topic-based indicators, including (1) the daily quantitative policy stringency index; (2) dynamic economic indices with multiple time frequency of GDP, international trade, personal income, employment, the housing market, and others; (3) the socioeconomic determinant baseline of the demographic, housing financial situation and medical resources. This paper introduces the measurements and metadata of relevant socioeconomic data collection, along with the sharing platform, data warehouse framework and quality control strategies. Different from existing COVID-19 related data products, this collection recognized the geospatial and dynamic factor as essential dimensions of epidemiologic research and scaled down the spatial resolution of socioeconomic data collection from country level to state level of the USA with a standard data format and high quality. Full article
(This article belongs to the Special Issue Data-Driven Modelling of Infectious Diseases)
Show Figures

Figure 1

Data Descriptor
First 1-M Resolution Land Cover Map Labeling the Overlap in the 3rd Dimension: The 2018 Map for Wallonia
Data 2020, 5(4), 117; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040117 - 11 Dec 2020
Cited by 1 | Viewed by 1067
Abstract
Land cover maps contribute to a large diversity of geospatial applications, including but not limited to land management, hydrology, land use planning, climate modeling and biodiversity monitoring. In densely populated and highly fragmented landscapes as observed in the Walloon region (Belgium), very high [...] Read more.
Land cover maps contribute to a large diversity of geospatial applications, including but not limited to land management, hydrology, land use planning, climate modeling and biodiversity monitoring. In densely populated and highly fragmented landscapes as observed in the Walloon region (Belgium), very high spatial resolution is required to depict all the infrastructures, buildings and most of the structural elements of the semi-natural landscapes (like hedges and small water bodies). Because of the resolution, the vertical dimension needs explicit handling to avoid discontinuities incompatible with many applications. For example, how to map a river flowing under a bridge? The particularity of our data is to provide a two-digit land cover code to label all the overlapping items. The identification of all the overlaps resulted from the combination of remote sensing image analysis and decision rules involving ancillary data. The final product is therefore semantically precise and accurate in terms of land cover description thanks to the addition of 24 classes on top of the 11 pure land cover classes. The quality of the map has been assessed using a state-of-the-art validation scheme. Its overall accuracy is as high as 91.5%, with an average producer’s accuracy of 86% and an average user’s accuracy of 91%. Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Figure 1

Data Descriptor
Seized Ecstasy Pills: Infrared Spectra and Image Datasets
Data 2020, 5(4), 116; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040116 - 09 Dec 2020
Viewed by 769
Abstract
According to the World Drug Report 2020, cocaine and ecstasy are the most consumed stimulant drugs, with 19 and 27 million estimated users in 2018. In this context, large efforts are being made to design fast and cost-effective analytical methods to track and [...] Read more.
According to the World Drug Report 2020, cocaine and ecstasy are the most consumed stimulant drugs, with 19 and 27 million estimated users in 2018. In this context, large efforts are being made to design fast and cost-effective analytical methods to track and monitor the distribution networks of these synthetic drugs. Here, we share two datasets of ecstasy pills seized in the northeast of Switzerland between 2010 and 2011. The first contains 621 forensic-grade images of pills, while the second one consists of 486 mid-infrared (mIR) spectra. While both sets are not covering the same seizure, both provide high-quality data with orthogonal information to evaluate clustering and dimension reduction methods. Full article
(This article belongs to the Section Chemoinformatics)
Show Figures

Figure 1

Data Descriptor
BLE-GSpeed: A New BLE-Based Dataset to Estimate User Gait Speed
Data 2020, 5(4), 115; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040115 - 07 Dec 2020
Cited by 1 | Viewed by 822
Abstract
To estimate the user gait speed can be crucial in many topics, such as health care systems, since the presence of difficulties in walking is a core indicator of health and function in aging and disease. Methods for non-invasive and continuous assessment of [...] Read more.
To estimate the user gait speed can be crucial in many topics, such as health care systems, since the presence of difficulties in walking is a core indicator of health and function in aging and disease. Methods for non-invasive and continuous assessment of the gait speed may be key to enable early detection of cognitive diseases such as dementia or Alzheimer’s disease. Wearable technologies can provide innovative solutions for healthcare problems. Bluetooth Low Energy (BLE) technology is excellent for wearables because it is very energy efficient, secure, and inexpensive. In this paper, the BLE-GSpeed database is presented. The dataset is composed of several BLE RSSI measurements obtained while users were walking at a constant speed along a corridor. Moreover, a set of experiments using a baseline algorithm to estimate the gait speed are also presented to provide baseline results to the research community. Full article
(This article belongs to the Special Issue Data from Smartphones and Wearables)
Show Figures

Figure 1

Data Descriptor
A Compendium of Chemical Class and Use Type Open Access Databases
Data 2020, 5(4), 114; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040114 - 04 Dec 2020
Viewed by 692
Abstract
With an ever-increasing production and registration of chemical substances, obtaining reliable and up to date information on their use types (UT) and chemical class (CC) is of crucial importance. We evaluated the current status of open access chemical substance databases (DBs) regarding UT [...] Read more.
With an ever-increasing production and registration of chemical substances, obtaining reliable and up to date information on their use types (UT) and chemical class (CC) is of crucial importance. We evaluated the current status of open access chemical substance databases (DBs) regarding UT and CC information using the “Meta-analysis of the Global Impact of Chemicals” (MAGIC) graph as a benchmark. A decision tree-based selection process was used to choose the most suitable out of 96 databases. To compare the DB content for 100 weighted, randomly selected chemical substances, an extensive quantitative and qualitative analysis was performed. It was found that four DBs yielded more qualitative and quantitative UT and CC results than the current MAGIC graph: The European Bioinformatics Institute DB, ChemSpider, the English Wikipedia page, and the National Center for Biotechnology Information (NCBI). The NCBI, along with its subsidiary DBs PubChem and Medical Subject Headings (MeSH), showed the best performance according to the defined criteria. To analyse large datasets, harmonisation of the available information might be beneficial, as the available DBs mostly aggregate information without harmonising them. Full article
(This article belongs to the Section Chemoinformatics)
Show Figures

Figure 1

Article
Mid-Cycle Observations of CR Boo and Estimation of the System’s Parameters
Data 2020, 5(4), 113; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040113 - 02 Dec 2020
Viewed by 812
Abstract
We present observations (with NAO Rozhen and AS Vidojevica telescopes) of the AM Canum Venaticorum (AM CVn) binary star CR Bootis (CR Boo) in the UBV bands. The data were obtained in two nights in July 2019, when the V band brightness was [...] Read more.
We present observations (with NAO Rozhen and AS Vidojevica telescopes) of the AM Canum Venaticorum (AM CVn) binary star CR Bootis (CR Boo) in the UBV bands. The data were obtained in two nights in July 2019, when the V band brightness was in the range of 16.1–17.0 mag. In both nights, a variability for a period of 25 ± 1 min and amplitude of about 0.2 magnitudes was visible. These brightness variations are most likely indications of “humps”. During our observational time, they appear for a period similar to the CR Boo orbital period. A possible reason of their origin is the phase rotation of the bright spot, placed in the contact point of the infalling matter and the outer disc edge. We estimated some of the parameters of the binary system, on the base of the observational data. Full article
(This article belongs to the Special Issue Astronomy in the Big Data Era: Perspectives)
Show Figures

Figure 1

Data Descriptor
First Draft Genome Assembly of the Malaysian Stingless Bee, Heterotrigona itama (Apidae, Meliponinae)
Data 2020, 5(4), 112; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040112 - 30 Nov 2020
Viewed by 964
Abstract
The Malaysian stingless bee industry is hugely dependent on wild colonies. Nevertheless, the availability of new queens to establish new colonies is insufficient to meet the growing demand for hives in the industry. Heterotrigona itama is primarily utilized for honey production in the [...] Read more.
The Malaysian stingless bee industry is hugely dependent on wild colonies. Nevertheless, the availability of new queens to establish new colonies is insufficient to meet the growing demand for hives in the industry. Heterotrigona itama is primarily utilized for honey production in the region and the major source of stingless bee colonies comes from the wild. To propagate new colonies domestically, a fundamental understanding of the biology of queen development, especially from the genomics aspect, is necessary. The whole genome was sequenced using a paired-end 150 strategy on the Illumina HiSeq X platform. The shotgun sequencing generated approximately 89 million raw pair-end reads with a total output of 13.37 Gb and a GC content of 37.31%. The genome size of the species was estimated to be approximately 272 Mb. Phylogenetic analysis showed H. itama are much more closely related to the bumble bee (Bombus spp.) than they are to the modern honey bee (Apis spp.). The genome data provided here are expected to contribute to a better understanding of the genetic aspect of queen differentiation as well as of important molecular pathways which are crucial for stingless bee biology, management and conservation. Full article
(This article belongs to the Special Issue Benchmarking Datasets in Bioinformatics)
Show Figures

Graphical abstract

Communication
In Silico Estimation of the Abundance and Phylogenetic Significance of the Composite Oct4-Sox2 Binding Motifs within a Wide Range of Species
Data 2020, 5(4), 111; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040111 - 29 Nov 2020
Viewed by 653
Abstract
High-throughput sequencing technologies have greatly accelerated the progress of genomics, transcriptomics, and metagenomics. Currently, a large amount of genomic data from various organisms is being generated, the volume of which is increasing every year. Therefore, the development of methods that allow the rapid [...] Read more.
High-throughput sequencing technologies have greatly accelerated the progress of genomics, transcriptomics, and metagenomics. Currently, a large amount of genomic data from various organisms is being generated, the volume of which is increasing every year. Therefore, the development of methods that allow the rapid search and analysis of DNA sequences is urgent. Here, we present a novel motif-based high-throughput sequence scoring method that generates genome information. We found and identified Utf1-like, Fgf4-like, and Hoxb1-like motifs, which are cis-regulatory elements for the pluripotency transcription factors Sox2 and Oct4 within the genomes of different eukaryotic organisms. The genome-wide analysis of these motifs was performed to understand the impact of their diversification on mammalian genome evolution. Utf1-like, Fgf4-like, and Hoxb1-like motif diversity was evaluated across genomes from multiple species. Full article
Show Figures

Figure 1

Data Descriptor
Data Employed in the Construction of a Composite Protein Database for Proteogenomic Analyses of Cephalopods Salivary Apparatus
Data 2020, 5(4), 110; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040110 - 27 Nov 2020
Viewed by 792
Abstract
Here we provide all datasets and details applied in the construction of a composite protein database required for the proteogenomic analyses of the article “Putative Antimicrobial Peptides of the Posterior Salivary Glands from the Cephalopod Octopus vulgaris Revealed by Exploring a Composite Protein [...] Read more.
Here we provide all datasets and details applied in the construction of a composite protein database required for the proteogenomic analyses of the article “Putative Antimicrobial Peptides of the Posterior Salivary Glands from the Cephalopod Octopus vulgaris Revealed by Exploring a Composite Protein Database”. All data, subdivided into six datasets, are deposited at the Mendeley Data repository as follows. Dataset_1 provides our composite database “All_Databases_5950827_sequences.fasta” derived from six smaller databases composed of (i) protein sequences retrieved from public databases related to cephalopods’ salivary glands, (ii) proteins identified with Proteome Discoverer software using our original data obtained by shotgun proteomic analyses of posterior salivary glands (PSGs) from three Octopus vulgaris specimens (provided as Dataset_2) and (iii) a non-redundant antimicrobial peptide (AMP) database. Dataset_3 includes the transcripts obtained by de novo assembly of 16 transcriptomes from cephalopods’ PSGs using CLC Genomics Workbench. Dataset_4 provides the proteins predicted by the TransDecoder tool from the de novo assembly of 16 transcriptomes of cephalopods’ PSGs. Further details about database construction, as well as the scripts and command lines used to construct them, are deposited within Dataset_5 and Dataset_6. The data provided in this article will assist in unravelling the role of cephalopods’ PSGs in feeding strategies, toxins and AMP production. Full article
Show Figures

Figure 1

Data Descriptor
Long-Term, Gridded Standardized Precipitation Index for Hawai‘i
Data 2020, 5(4), 109; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040109 - 26 Nov 2020
Viewed by 1039
Abstract
Spatially explicit, wall-to-wall rainfall data provide foundational climatic information but alone are inadequate for characterizing meteorological, hydrological, agricultural, or ecological drought. The Standardized Precipitation Index (SPI) is one of the most widely used indicators of drought and defines localized conditions of both drought [...] Read more.
Spatially explicit, wall-to-wall rainfall data provide foundational climatic information but alone are inadequate for characterizing meteorological, hydrological, agricultural, or ecological drought. The Standardized Precipitation Index (SPI) is one of the most widely used indicators of drought and defines localized conditions of both drought and excess rainfall based on period-specific (e.g., 1-month, 6-month, 12-month) accumulated precipitation relative to multi-year averages. A 93-year (1920–2012), high-resolution (250 m) gridded dataset of monthly rainfall available for the State of Hawai‘i was used to derive gridded, monthly SPI values for 1-, 3-, 6-, 9-, 12-, 24-, 36-, 48-, and 60-month intervals. Gridded SPI data were validated against independent, station-based calculations of SPI provided by the National Weather Service. The gridded SPI product was also compared with the U.S. Drought Monitor during the overlapping period. This SPI product provides several advantages over currently available drought indices for Hawai‘i in that it has statewide coverage over a long historical period at high spatial resolution to capture fine-scale climatic gradients and monitor changes in local drought severity. Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Figure 1

Data Descriptor
Dataset of User Reactions When Filling Out Web Questionnaires
Data 2020, 5(4), 108; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040108 - 25 Nov 2020
Cited by 2 | Viewed by 715
Abstract
This paper presents the dataset and the results of the analysis of user reactions when filling out questionnaires. Based on the analysis of 1980 results of users’ responses to simple questionnaire questions, patterns in user reactions were revealed. Data analysis shows that a [...] Read more.
This paper presents the dataset and the results of the analysis of user reactions when filling out questionnaires. Based on the analysis of 1980 results of users’ responses to simple questionnaire questions, patterns in user reactions were revealed. Data analysis shows that a user is characterized by reactions when answering a variety of questions, reflecting the individual skills of the interface, reading speed, speed of choosing an answer, which can be used to supplement personal verification in information systems. The built-in reaction time does not significantly load the data volumes for logging and transferring and does not contain confidential information. The data would be of interest for further research by specialists in the field of psychology, information security, and information systems design. Full article
Show Figures

Figure 1

Data Descriptor
Municipalities in the Czech Republic—Compilation of “a Universal” Dataset
Data 2020, 5(4), 107; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040107 - 24 Nov 2020
Viewed by 598
Abstract
There have been many changes in the spatial composition and formal delimitation of administrative boundaries of Czech municipalities over the past 30 years. Many municipalities have changed their official status; they separated into ones that were more independent or were merged with existing [...] Read more.
There have been many changes in the spatial composition and formal delimitation of administrative boundaries of Czech municipalities over the past 30 years. Many municipalities have changed their official status; they separated into ones that were more independent or were merged with existing ones, or formally redrew their boundaries due to advances in mapping technology. Such changes have made it almost impossible to analyze and visualize the temporal development of selected socioeconomic indicators, in order to deliver spatially coherent and time-comparable results. In this data description, we present an evolution of a unique (geo) dataset comprising of the administrative borders of the Czech municipalities. The uniqueness lies in time and topologically justified spatial data resulting in a common division of the administrative units at the LAU2 level, valid from 1995 to 2019. Besides the topologically correct spatial representations of municipalities in Czechia, we also provide correspondence tables for each year in the mentioned period, which allows joining tabular statistics to spatial data. The dataset is available as a base layer for further temporal and spatial analyses and visualization of various socioeconomic statistical data. Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Figure 1

Data Descriptor
On the Stark Broadening of Be II Spectral Lines
Data 2020, 5(4), 106; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040106 - 23 Nov 2020
Viewed by 798
Abstract
Calculated Stark broadening parameters of singly ionized beryllium spectral lines have been reported. Three spectral series have been studied within semiclassical perturbation theory. The plasma conditions cover temperatures from 2500 to 50,000 K and perturber densities 1011 cm−3 and 1013 [...] Read more.
Calculated Stark broadening parameters of singly ionized beryllium spectral lines have been reported. Three spectral series have been studied within semiclassical perturbation theory. The plasma conditions cover temperatures from 2500 to 50,000 K and perturber densities 1011 cm−3 and 1013 cm−3. The influence of the temperature and the role of the perturbers (electrons, protons and He+ ions) on the Stark width and shift have been discussed. Results could be useful for plasma diagnostics in astrophysics, laboratory, and industrial plasmas. Full article
(This article belongs to the Special Issue Astronomy in the Big Data Era: Perspectives)
Show Figures

Figure 1

Data Descriptor
An Anti-Nucleocapsid Antigen Sars-Cov-2 Total Antibody Assay Finds Comparable Results in Edta-Anticoagulated Whole Blood Obtained from Capillary and Venous Blood Sampling
Data 2020, 5(4), 105; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040105 - 12 Nov 2020
Viewed by 724
Abstract
Although SARS-CoV-2 antibody assays have been found to provide valid results in EDTA-anticoagulated whole blood, so far, they have not demonstrated that antibody levels in whole blood originating from capillary blood samples are comparable to antibody levels measured in blood from a venous [...] Read more.
Although SARS-CoV-2 antibody assays have been found to provide valid results in EDTA-anticoagulated whole blood, so far, they have not demonstrated that antibody levels in whole blood originating from capillary blood samples are comparable to antibody levels measured in blood from a venous origin. Here, blood is drawn simultaneously by capillary and venous blood sampling. Antibody titers are determined by an assay employing electrochemiluminescence (ECLIA) and SARS-CoV-2 total immunoglobulins are detected with specificity directed against the nucleocapsid antigen. Six individuals with confirmed COVID-19 and six individuals without COVID-19 are analyzed. Antibody titers in capillary venous whole blood did not show significant differences, and when corrected for hematocrit, they did not differ from the results obtained from serum. In conclusion, capillary sampled EDTA-anticoagulated whole blood seems to be an attractive alternative matrix for the evaluation of SARS-CoV-2 antibodies when employing ECLIA for detecting total antibodies directed against nucleocapsid antibodies. Full article
Show Figures

Figure 1

Article
Distinct Two-Stream Convolutional Networks for Human Action Recognition in Videos Using Segment-Based Temporal Modeling
Data 2020, 5(4), 104; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040104 - 11 Nov 2020
Cited by 1 | Viewed by 821
Abstract
The Two-stream convolution neural network (CNN) has proven a great success in action recognition in videos. The main idea is to train the two CNNs in order to learn spatial and temporal features separately, and two scores are combined to obtain final scores. [...] Read more.
The Two-stream convolution neural network (CNN) has proven a great success in action recognition in videos. The main idea is to train the two CNNs in order to learn spatial and temporal features separately, and two scores are combined to obtain final scores. In the literature, we observed that most of the methods use similar CNNs for two streams. In this paper, we design a two-stream CNN architecture with different CNNs for the two streams to learn spatial and temporal features. Temporal Segment Networks (TSN) is applied in order to retrieve long-range temporal features, and to differentiate the similar type of sub-action in videos. Data augmentation techniques are employed to prevent over-fitting. Advanced cross-modal pre-training is discussed and introduced to the proposed architecture in order to enhance the accuracy of action recognition. The proposed two-stream model is evaluated on two challenging action recognition datasets: HMDB-51 and UCF-101. The findings of the proposed architecture shows the significant performance increase and it outperforms the existing methods. Full article
(This article belongs to the Special Issue Machine Learning in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Data Descriptor
Comparison of 3D Point Clouds Obtained by Terrestrial Laser Scanning and Personal Laser Scanning on Forest Inventory Sample Plots
Data 2020, 5(4), 103; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040103 - 31 Oct 2020
Cited by 1 | Viewed by 977
Abstract
In forest inventory, trees are usually measured using handheld instruments; among the most relevant are calipers, inclinometers, ultrasonic devices, and laser range finders. Traditional forest inventory has been redesigned since modern laser scanner technology became available. Laser scanners generate massive data in the [...] Read more.
In forest inventory, trees are usually measured using handheld instruments; among the most relevant are calipers, inclinometers, ultrasonic devices, and laser range finders. Traditional forest inventory has been redesigned since modern laser scanner technology became available. Laser scanners generate massive data in the form of 3D point clouds. We have developed a novel methodology to provide estimates of the tree positions, stem diameters, and tree heights from these 3D point clouds. This dataset was made publicly accessible to test new software routines for the automatic measurement of forest trees using laser scanner data. Benchmark studies with performance tests of different algorithms are welcome. The dataset contains co-registered raw 3D point-cloud data collected on 20 forest inventory sample plots in Austria. The data were collected by two different laser scanning systems: (1) A mobile personal laser scanner (PLS) (ZEB Horizon, GeoSLAM Ltd., Nottingham, UK) and (2) a static terrestrial laser scanner (TLS) (Focus3D X330, Faro Technologies Inc., Lake Mary, FL, USA). The data also contain digital terrain models (DTMs), field measurements as reference data (ground-truth), and the output of recent software routines for the automatic tree detection and the automatic stem diameter measurement. Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Graphical abstract

Data Descriptor
Data for Heuristic Optimization of Electric Vehicles’ Charging Configuration Based on Loading Parameters
Data 2020, 5(4), 102; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040102 - 29 Oct 2020
Cited by 3 | Viewed by 893
Abstract
This dataset includes multiple files related to optimization of electric vehicles to minimize overloading in low voltage grids by varying the locations available to charge the EVs. The data include lognormally sampled hourly sorted scenarios across 11 charging locations for a stochastics-based Monte [...] Read more.
This dataset includes multiple files related to optimization of electric vehicles to minimize overloading in low voltage grids by varying the locations available to charge the EVs. The data include lognormally sampled hourly sorted scenarios across 11 charging locations for a stochastics-based Monte Carlo simulation. This simulation runs through 2 million scenarios based on actual probabilities to incorporate most possible situations. It also includes samples from normally distributed household electricity use scenarios based on agent-based modeling. The article includes the test grid parameters for simulation, which were used to create a benchmark grid in DigSilent Powerfactory software, as well as intermediate outputs defining worst case scenarios when electric vehicles were charged and results from three different optimization approaches involving a reduction in voltage drops, cable overloading and total line losses. The outputs from the benchmark grid were used to train a machine learning algorithm, the weights and codes for which are also attached. This trained network acted as the grid for subsequent iterative optimization procedures. Outputs are presented as a comparison between pre-optimization and post-optimization scenarios. The above dataset and procedure were repeated while varying the number of EVs between 0 and 100 in increments of 20, data for which are also attached. The data article supports a related submission titled “Minimization of Overloading Caused by Electric Vehicle (EV) Charging in Low Voltage Networks”. Full article
Show Figures

Figure 1

Data Descriptor
Dataset for Assessing the Economic Performance of a Residential PV Plant: The Analysis of a New Policy Proposal
Data 2020, 5(4), 101; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040101 - 28 Oct 2020
Cited by 1 | Viewed by 791
Abstract
This data article aims at providing a data description about the manuscript entitled “The post COVID-19 green recovery in practice: assessing the profitability of a policy proposal on residential photovoltaic plants”. The definition of a business plan is a complex decision because the [...] Read more.
This data article aims at providing a data description about the manuscript entitled “The post COVID-19 green recovery in practice: assessing the profitability of a policy proposal on residential photovoltaic plants”. The definition of a business plan is a complex decision because the choice of the input data significantly influences the economic assessment of a project. An Excel file is used to construct an economic model based on the Discounted Cash Flow (DCF) methodology using Net Present Value (NPV) as an indicator. The choice of input data is defined by literature analysis, and policy proposals are identified by the Revival Decree adopted by Italian Government to contrast human and economic shock effected by COVID-19. The aggregation of these data enabled us to obtain both baseline and alternative scenarios to define if the realization of a residential photovoltaic (PV) plant is economically feasible. Similar data can be obtained for other countries according to the policy actions adopted, and this work can be easily replicated in different geographical contexts and considering varying categories of stakeholders (e.g., consumers, which are called upon to implement a green transition). Full article
Show Figures

Figure 1

Article
Essential Variables for Environmental Monitoring: What Are the Possible Contributions of Earth Observation Data Cubes?
Data 2020, 5(4), 100; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040100 - 21 Oct 2020
Cited by 3 | Viewed by 1278
Abstract
Environmental sustainability is nowadays a major global issue that requires efficient and effective responses from governments. Essential variables (EV) have emerged in different scientific communities as a means to characterize and follow environmental changes through a set of measurements required to support policy [...] Read more.
Environmental sustainability is nowadays a major global issue that requires efficient and effective responses from governments. Essential variables (EV) have emerged in different scientific communities as a means to characterize and follow environmental changes through a set of measurements required to support policy evidence. To help track these changes, our planet has been under continuous observation from satellites since 1972. Currently, petabytes of satellite Earth observation (EO) data are freely available. However, the full information potential of EO data has not been yet realized because many big data challenges and complexity barriers hinder their effective use. Consequently, facilitating the production of EVs using the wealth of satellite EO data can be beneficial for environmental monitoring systems. In response to this issue, a comprehensive list of EVs that can take advantage of consistent time-series satellite data has been derived. In addition, a set of use-cases, using an Earth Observation Data Cube (EODC) to process large volumes of satellite data, have been implemented to demonstrate the practical applicability of EODC to produce EVs. The proposed approach has been successfully tested showing that EODC can facilitate the production of EVs at different scales and benefiting from the spatial and temporal dimension of satellite EO data for enhanced environmental monitoring. Full article
(This article belongs to the Section Featured Reviews of Data Science Research)
Show Figures

Figure 1

Review
Specialization of Business Process Model and Notation Applications in Medicine—A Review
Data 2020, 5(4), 99; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040099 - 19 Oct 2020
Viewed by 648
Abstract
Process analysis and process modeling are a current topic that extends to many areas. This trend of using optimization and modeling techniques in various specific areas has led to the question of how widespread these approaches are overall in medical specializations. We compiled [...] Read more.
Process analysis and process modeling are a current topic that extends to many areas. This trend of using optimization and modeling techniques in various specific areas has led to the question of how widespread these approaches are overall in medical specializations. We compiled a list of 272 medical disciplines that we used as a search string with the Business Process Model and Notation (BPMN) for a Web of Science database search. Thus, we found a total of 485 documents that we subjected to the exclusion criteria. We analyzed the remaining 108 articles using bibliometric and content analyses to find answers to three research questions. This systematic review was carried out using the procedure proposed by Kitchenham and following the Preferred Items of the Systematic Review and Meta-Analysis Report (PRISMA). Due to the broad scope of the medical field, it was no surprise that for almost 85% of the sought-after medical specializations, we could not identify any publications in the given database when applying the BPMN. We analyzed the impact of upgrades to the BPMN on publishing. The keyword analysis showed a diametrical difference between the authors’ keywords and the so-called “Keywords Plus”, and we categorized the publications according to the purpose of applying the BPMN. However, the growing interest in combining BPMN with other approaches brings new challenges in practice. Full article
(This article belongs to the Section Featured Reviews of Data Science Research)
Show Figures

Figure 1

Article
The Role of Administrative and Secondary Data in Estimating the Costs and Effects of School and Workplace Closures due to the COVID-19 Pandemic
Data 2020, 5(4), 98; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040098 - 18 Oct 2020
Viewed by 925
Abstract
As a part of mitigation strategies during a COVID-19 pandemic, the WHO currently recommends social distancing measures through school closures (SC) and work closures (WC) to control the infection spread and reduce the illness attack rate. Focusing on the use of administrative and [...] Read more.
As a part of mitigation strategies during a COVID-19 pandemic, the WHO currently recommends social distancing measures through school closures (SC) and work closures (WC) to control the infection spread and reduce the illness attack rate. Focusing on the use of administrative and secondary data, this study aimed to estimate the costs and effects of alternative strategies for mitigating the COVID-19 pandemic in Jakarta, Indonesia, by comparing the baseline (no intervention) with SC + WC for 2, 4, and 8 weeks as respective scenarios. A modified Susceptible-Exposed-Infected-Recovered (SEIR) compartmental model accounting for the spread of infection during the latent period was applied by taking into account a 1-year time horizon. To estimate the total pandemic cost of all scenarios, we took into account the cost of healthcare, SC, and productivity loss due to WC and illness. Next to costs, averted deaths were considered as the effect measure. In comparison with the baseline, the result showed that total savings in scenarios of SC + WC for 2, 4, and 8 weeks would be approximately $24 billion, $25 billion, and $34 billion, respectively. In addition, increasing the duration of SC and WC would increase the number of averted deaths. Scenarios of SC + WC for 2, 4, and 8 weeks would result in approximately 159,075, 173,963, and 250,842 averted deaths, respectively. A sensitivity analysis showed that the wage per day, infectious period, basic reproduction number, incubation period, and case fatality rate were found to be the most influential parameters affecting the savings and number of averted deaths. It can be concluded that all the mitigation scenarios were considered to be cost-saving, and increasing the duration of SC and WC would increase both the savings and the number of averted deaths. Full article
(This article belongs to the Special Issue Challenges in Business Intelligence)
Show Figures

Figure 1

Data Descriptor
An Eddy Covariance Mesonet For Measuring Greenhouse Gas Fluxes in Coastal South Carolina
Data 2020, 5(4), 97; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040097 - 15 Oct 2020
Cited by 2 | Viewed by 888
Abstract
Coastal ecosystems are vulnerable to climate change and have been identified as sources of uncertainty in the global carbon budget. Here we introduce a recently established mesonet of eddy covariance towers in South Carolina and describe the sensor arrays and data workflow used [...] Read more.
Coastal ecosystems are vulnerable to climate change and have been identified as sources of uncertainty in the global carbon budget. Here we introduce a recently established mesonet of eddy covariance towers in South Carolina and describe the sensor arrays and data workflow used to produce three site-years of flux observations in coastal ecosystems. The tower sites represent tidal salt marsh (US-HB1), mature longleaf pine forest (US-HB2), and longleaf pine restoration (replanted clearcut; US-HB3). Coastal ecosystems remain less represented in climate studies despite their potential to sequester large amounts of carbon. Our goal in publishing this open access dataset is to contribute observations in understudied coastal ecosystems to facilitate synthesis and modeling analyses that advance carbon cycle science. Full article
Show Figures

Figure 1

Technical Note
ASDToolkit: A Novel MATLAB Processing Toolbox for ASD Field Spectroscopy Data
Data 2020, 5(4), 96; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040096 - 08 Oct 2020
Cited by 3 | Viewed by 1084
Abstract
Over the past 30 years, the use of field spectroscopy has risen in importance in remote sensing studies for the characterization of the surface reflectance of materials in situ within a broad range of applications. Potential uses range from measurements of individual targets [...] Read more.
Over the past 30 years, the use of field spectroscopy has risen in importance in remote sensing studies for the characterization of the surface reflectance of materials in situ within a broad range of applications. Potential uses range from measurements of individual targets of interest (e.g., vegetation, soils, validation targets) to characterizing the contributions of different materials within larger spatially mixed areas as would be representative of the spatial resolution captured by a sensor pixel (UAV to satellite scale). As such, it is essential that a complete and rigorous assessment of both the data acquisition procedures and the suitability of the derived data product be carried out. The measured energy from solar-reflective range spectroradiometers is influenced by the viewing and illumination geometries and the illumination conditions, which vary due to changes in solar position and atmospheric conditions. By applying corrections, the estimated absolute reflectance (Rabs) of targets can be calculated. This property is independent of illumination intensity or conditions, and is the metric commonly suggested to be used to compare spectra even when data are collected by different sensors or acquired under different conditions. By standardizing the process of estimated Rabs, as is provided in the described toolkit, consistency and repeatability in processing are ensured and the otherwise labor-intensive and error-prone processing steps are streamlined. The resultant end data product (Rabs) represents our current best effort to generate consistent and comparable ground spectra that have been corrected for viewing and illumination geometries as well as other factors such as the individual characteristics of the reference panel used during acquisition. Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Figure 1

Data Descriptor
Digital Psychological Platform for Mass Web-Surveys
Data 2020, 5(4), 95; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040095 - 05 Oct 2020
Cited by 5 | Viewed by 937
Abstract
Web-surveys are one of the most popular forms of primary data collection used for various researches. However, mass surveys involve some challenges. It is required to consider different platforms and browsers, as well as different data transfer rates using connections in different regions [...] Read more.
Web-surveys are one of the most popular forms of primary data collection used for various researches. However, mass surveys involve some challenges. It is required to consider different platforms and browsers, as well as different data transfer rates using connections in different regions of the country. Ensuring guaranteed data delivery in these conditions should determine the right choice of technologies for implementing web-surveys. The paper describes the solution to transfer a questionnaire to the client side in the form of an archive. This technological solution ensures independence from the data transfer rate and the stability of the communication connection with significant survey filling time. The conducted survey benefited the service of education psychologists under the federal Ministry of Education. School psychologists consciously took part in the survey, realizing the importance of their opinion for organizing and improving their professional activities. The desire to answer open-ended questions in detail created a part of the answers in the dataset, where there were several sentences about different aspects of professional activity. An important challenge of the problem is the Russian language, for which there are not as many tools as for the languages more widespread in the world. The survey involved 20,443 school psychologists from all regions of the Russian Federation, both from urban and rural areas. The answers did not contain spam, runaround answers, and so on as evidenced by the average response time. For the surveys, an authoring development tool DigitalPsyTools.ru was used. Full article
Show Figures

Figure 1

Article
Visual Analytics Approach to Comprehensive Meteorological Time-Series Analysis
Data 2020, 5(4), 94; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040094 - 30 Sep 2020
Cited by 1 | Viewed by 1002
Abstract
In some of the domain-specific sectors, such as the climate domain, the provision of publicly available present-day high-resolution meteorological time series is often quite limited or completely lacking. This repeatedly leads to excessive deployment of synthetically generated (historical) meteorological time series (TMY) to [...] Read more.
In some of the domain-specific sectors, such as the climate domain, the provision of publicly available present-day high-resolution meteorological time series is often quite limited or completely lacking. This repeatedly leads to excessive deployment of synthetically generated (historical) meteorological time series (TMY) to support thermal performance assessments on both building and urban scale. These datasets are generally a misrepresentation of current weather variability, which may lead to erroneous inferences drawn from modelling results. In this regard, we outline the application potential of a visual analytics approach in the context of data quality assessment and validation of TMYs. For this purpose, we deployed a standalone visual analytics tool Visplore, enriched with interlinked dashboards, customizable visualizations, and intuitive workflows, to support continuous interaction and early visual feedback. Driven by such integrated visual representations and visual interactions to enhance the analytical reasoning process, we were able to detect critical multifaceted discrepancies, on different levels of granularity, between TMY and present-day meteorological time series and synthetize them into cohesive patterns and insights. These mainly entailed diverging temporal trends and event time lags, under- and overestimation of warming and cooling regimes, respectively, and seasonal discrepancies, in particular meteorological parameters, to name a few. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

Data Descriptor
Classification of Actual Sensor Network Deployments in Research Studies from 2013 to 2017
Data 2020, 5(4), 93; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040093 - 30 Sep 2020
Viewed by 754
Abstract
Technologies, such as Wireless Sensor Networks (WSN) and Internet of Things (IoT), have captured the imagination of researchers, businesses, and general public, due to breakthroughs in embedded system development, sensing technologies, and ubiquitous connectivity in recent years. That resulted in the emergence of [...] Read more.
Technologies, such as Wireless Sensor Networks (WSN) and Internet of Things (IoT), have captured the imagination of researchers, businesses, and general public, due to breakthroughs in embedded system development, sensing technologies, and ubiquitous connectivity in recent years. That resulted in the emergence of an enormous, difficult-to-navigate body of work related to WSN and IoT. In an ongoing research effort to highlight trends and developments in these technologies and to see whether they are actually deployed rather than subjects of theoretical research with presumed potential use cases, we gathered and codified a dataset of scientific publications from a five-year period from 2013 to 2017 involving actual sensor network deployments, which will serve as a basis for future in-depth analysis of the field. In the first iteration, 15,010 potentially relevant articles were identified in SCOPUS and Web of Science databases; after two iterations, 3059 actual sensor network deployments were extracted from those articles and classified in a consistent way according to different categories, such as type of nodes, field of application, communication types, etc. We publish the resulting dataset with the intent that its further analysis may identify prospective research fields and future trends in WSN and IoT. Full article
Show Figures

Figure 1

Data Descriptor
Dataset of Search Results Organized as Learning Paths Recommended by Experts to Support Search as Learning
Data 2020, 5(4), 92; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040092 - 27 Sep 2020
Viewed by 971
Abstract
In this article, we introduce a dataset of curated learning paths (LPs) to support search as learning. LPs were obtained through an online survey delivered to experts in different domains. Data were then analyzed and described in terms of a set of variables. [...] Read more.
In this article, we introduce a dataset of curated learning paths (LPs) to support search as learning. LPs were obtained through an online survey delivered to experts in different domains. Data were then analyzed and described in terms of a set of variables. The resulting dataset comprised 83 LPs, each containing three web pages, for an overall collection consisting of 249 documents. The dataset is intended to provide information scientists, education researchers, and industry professionals, who provide information services in educational contexts, a valuable resource to (i) investigate patterns in the order of LPs, (ii) improve ranking models and/or re-ranking methods, (iii) explain the structure of the recommended LPs, and (iv) investigate alternative approaches to display search results based on the features of LPs. Full article
(This article belongs to the Special Issue Big Data and E-learning)
Show Figures

Figure 1

Data Descriptor
A Public Dataset of 24-h Multi-Levels Psycho-Physiological Responses in Young Healthy Adults
Data 2020, 5(4), 91; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040091 - 25 Sep 2020
Cited by 4 | Viewed by 1761
Abstract
Wearable devices now make it possible to record large quantities of physiological data, which can be used to obtain a clearer view of a person’s health status and behavior. However, to the best of our knowledge, there are no open datasets in the [...] Read more.
Wearable devices now make it possible to record large quantities of physiological data, which can be used to obtain a clearer view of a person’s health status and behavior. However, to the best of our knowledge, there are no open datasets in the literature that provide psycho-physiological data. The Multilevel Monitoring of Activity and Sleep in Healthy people (MMASH) dataset presented in this paper provides 24 h of continuous psycho-physiological data, that is, inter-beat intervals data, heart rate data, wrist accelerometry data, sleep quality index, physical activity (i.e., number of steps per second), psychological characteristics (e.g., anxiety status, stressful events, and emotion declaration), and sleep hormone levels for 22 participants. The MMASH dataset will enable the investigation of possible relationships between the physical and psychological characteristics of people in daily life. Data were validated through different analyses that showed their compatibility with the literature. Full article
(This article belongs to the Special Issue Data from Smartphones and Wearables)
Show Figures

Figure 1

Essay
Towards a Contextual Approach to Data Quality
Data 2020, 5(4), 90; https://0-doi-org.brum.beds.ac.uk/10.3390/data5040090 - 25 Sep 2020
Cited by 2 | Viewed by 910
Abstract
In this commentary, I propose a framework for thinking about data quality in the context of scientific research. I start by analyzing conceptualizations of quality as a property of information, evidence and data and reviewing research in the philosophy of information, the philosophy [...] Read more.
In this commentary, I propose a framework for thinking about data quality in the context of scientific research. I start by analyzing conceptualizations of quality as a property of information, evidence and data and reviewing research in the philosophy of information, the philosophy of science and the philosophy of biomedicine. I identify a push for purpose dependency as one of the main results of this review. On this basis, I present a contextual approach to data quality in scientific research, whereby the quality of a dataset is dependent on the context of use of the dataset as much as the dataset itself. I exemplify the approach by discussing current critiques and debates of scientific quality, thus showcasing how data quality can be approached contextually. Full article
(This article belongs to the Special Issue Data Quality and Data Access for Research)
Previous Issue
Next Issue
Back to TopTop