Data | June 2022 - Browse Articles

12 pages, 11062 KiB

Open AccessData Descriptor

Dataset for Detecting the Electrical Behavior of Photovoltaic Panels from RGB Images

by Juan-Pablo Villegas-Ceballos, Mateo Rico-Garcia and Carlos Andres Ramos-Paja

Data 2022, 7(6), 82; https://0-doi-org.brum.beds.ac.uk/10.3390/data7060082 - 17 Jun 2022

Viewed by 3505

The dynamic reconfiguration and maximum power point tracking in large-scale photovoltaic (PV) systems require a large number of voltage and current sensors. In particular, the reconfiguration process requires a pair of voltage/current sensors for each panel, which introduces costs, increases size and reduces [...] Read more.

The dynamic reconfiguration and maximum power point tracking in large-scale photovoltaic (PV) systems require a large number of voltage and current sensors. In particular, the reconfiguration process requires a pair of voltage/current sensors for each panel, which introduces costs, increases size and reduces the reliability of the installation. A suitable solution for reducing the number of sensors is to adopt image-based solutions to estimate the electrical characteristics of the PV panels, but the lack of reliable data with large diversity of irradiance and shading conditions is a major problem in this topic. Therefore, this paper presents a dataset correlating RGB images and electrical data of PV panels with different irradiance and shading conditions; moreover, the dataset also provides complementary weather data and additional image characteristics to support the training of estimation models. In particular, the dataset was designed to support the design of image-based estimators of electrical data, which could be used to replace large arrays of sensors. The dataset was captured during 70 days distributed between 2020 and 2021, generating 5211 images and registers. The paper also describes the measurement platform used to collect the data, which will help to replicate the experiments in different geographical locations. Full article

► Show Figures

Figure 1

15 pages, 2196 KiB

Open AccessData Descriptor

Indoor Temperature and Relative Humidity Dataset of Controlled and Uncontrolled Environments

by Juan Botero-Valencia, Luis Castano-Londono and David Marquez-Viloria

Data 2022, 7(6), 81; https://0-doi-org.brum.beds.ac.uk/10.3390/data7060081 - 16 Jun 2022

Cited by 2 | Viewed by 3190

Abstract

The large volume of data generated with the increasing development of Internet of Things applications has encouraged the development of a large number of works related to data management, wireless communication technologies, the deployment of sensor networks with limited resources, and energy consumption. [...] Read more.

The large volume of data generated with the increasing development of Internet of Things applications has encouraged the development of a large number of works related to data management, wireless communication technologies, the deployment of sensor networks with limited resources, and energy consumption. Different types of new or well-known algorithms have been used for the processing and analysis of data acquired through sensor networks, algorithms for compression, filtering, calibration, analysis, or variables being common. In some cases, databases available on the network, public government databases, data generated from sensor networks deployed by the authors themselves, or values generated by simulation are used. In the case that the work approach is more related to the algorithm than to the characteristics of the sensor networks, these data source options may have some limitations such as the availability of databases, the time required for data acquisition, the need for the deployment of a real sensors network, and the reliability or characteristics of acquired data. The dataset in this article contains 4,164,267 values of timestamp, indoor temperature, and relative humidity acquired in the months of October and November 2019, with twelve temperature and humidity sensors Xiaomi Mijia at the laboratory of Control Systems and Robotics, and the De La Salle Museum of Natural Sciences, both of the Instituto Tecnológico Metropolitano, Medellín—Colombia. The devices were calibrated in a Metrology Laboratory accredited by the National Accreditation Body of Colombia (Organismo Nacional de Acreditación de Colombia—ONAC). The dataset is available in Mendeley Data repository. Full article

► Show Figures

Figure 1

15 pages, 2676 KiB

Open AccessArticle

Multi-Resolution Discrete Cosine Transform Fusion Technique Face Recognition Model

by Bader M. AlFawwaz, Atallah AL-Shatnawi, Faisal Al-Saqqar and Mohammad Nusir

Data 2022, 7(6), 80; https://0-doi-org.brum.beds.ac.uk/10.3390/data7060080 - 15 Jun 2022

Cited by 1 | Viewed by 1781

Abstract

This work presents a Multi-Resolution Discrete Cosine Transform (MDCT) fusion technique Fusion Feature-Level Face Recognition Model (FFLFRM) comprising face detection, feature extraction, feature fusion, and face classification. It detects core facial characteristics as well as local and global features utilizing Local Binary Pattern [...] Read more.

This work presents a Multi-Resolution Discrete Cosine Transform (MDCT) fusion technique Fusion Feature-Level Face Recognition Model (FFLFRM) comprising face detection, feature extraction, feature fusion, and face classification. It detects core facial characteristics as well as local and global features utilizing Local Binary Pattern (LBP) and Principal Component Analysis (PCA) extraction. MDCT fusion technique was applied, followed by Artificial Neural Network (ANN) classification. Model testing used 10,000 faces derived from the Olivetti Research Laboratory (ORL) library. Model performance was evaluated in comparison with three state-of-the-art models depending on Frequency Partition (FP), Laplacian Pyramid (LP) and Covariance Intersection (CI) fusion techniques, in terms of image features (low-resolution issues and occlusion) and facial characteristics (pose, and expression per se and in relation to illumination). The MDCT-based model yielded promising recognition results, with a 97.70% accuracy demonstrating effectiveness and robustness for challenges. Furthermore, this work proved that the MDCT method used by the proposed FFLFRM is simpler, faster, and more accurate than the Discrete Fourier Transform (DFT), Fast Fourier Transform (FFT) and Discrete Wavelet Transform (DWT). As well as that it is an effective method for facial real-life applications. Full article

(This article belongs to the Special Issue Knowledge Extraction from Data Using Machine Learning)

► Show Figures

Figure 1

14 pages, 13286 KiB

Open AccessEditor’s ChoiceData Descriptor

UNIPD-BPE: Synchronized RGB-D and Inertial Data for Multimodal Body Pose Estimation and Tracking

by Mattia Guidolin, Emanuele Menegatti and Monica Reggiani

Data 2022, 7(6), 79; https://0-doi-org.brum.beds.ac.uk/10.3390/data7060079 - 09 Jun 2022

Cited by 4 | Viewed by 2733

Abstract

The ability to estimate human motion without requiring any external on-body sensor or marker is of paramount importance in a variety of fields, ranging from human–robot interaction, Industry 4.0, surveillance, and telerehabilitation. The recent development of portable, low-cost RGB-D cameras pushed forward the [...] Read more.

The ability to estimate human motion without requiring any external on-body sensor or marker is of paramount importance in a variety of fields, ranging from human–robot interaction, Industry 4.0, surveillance, and telerehabilitation. The recent development of portable, low-cost RGB-D cameras pushed forward the accuracy of markerless motion capture systems. However, despite the widespread use of such sensors, a dataset including complex scenes with multiple interacting people, recorded with a calibrated network of RGB-D cameras and an external system for assessing the pose estimation accuracy, is still missing. This paper presents the University of Padova Body Pose Estimation dataset (UNIPD-BPE), an extensive dataset for multi-sensor body pose estimation containing both single-person and multi-person sequences with up to 4 interacting people. A network with 5 Microsoft Azure Kinect RGB-D cameras is exploited to record synchronized high-definition RGB and depth data of the scene from multiple viewpoints, as well as to estimate the subjects’ poses using the Azure Kinect Body Tracking SDK. Simultaneously, full-body Xsens MVN Awinda inertial suits allow obtaining accurate poses and anatomical joint angles, while also providing raw data from the 17 IMUs required by each suit. This dataset aims to push forward the development and validation of multi-camera markerless body pose estimation and tracking algorithms, as well as multimodal approaches focused on merging visual and inertial data. Full article

(This article belongs to the Special Issue Computer Vision Datasets for Positioning, Tracking and Wayfinding)

► Show Figures

Figure 1

11 pages, 3210 KiB

Open AccessData Descriptor

Deep Learning Dataset for Estimating Burned Areas: Case Study, Indonesia

by Yudhi Prabowo, Anjar Dimara Sakti, Kuncoro Adi Pradono, Qonita Amriyah, Fadillah Halim Rasyidy, Irwan Bengkulah, Kurnia Ulfa, Danang Surya Candra, Muhammad Thufaili Imdad and Shadiq Ali

Data 2022, 7(6), 78; https://0-doi-org.brum.beds.ac.uk/10.3390/data7060078 - 09 Jun 2022

Cited by 12 | Viewed by 3042

Abstract

Wildland fire is one of the most causes of deforestation, and it has an important impact on atmospheric emissions, notably CO₂. It occurs almost every year in Indonesia, especially during the dry season. Therefore, it is necessary to identify the burned [...] Read more.

Wildland fire is one of the most causes of deforestation, and it has an important impact on atmospheric emissions, notably CO₂. It occurs almost every year in Indonesia, especially during the dry season. Therefore, it is necessary to identify the burned areas from remote sensing images to establish the zoning map of areas prone to wildland fires. Many methods have been developed for mapping burned areas from low-resolution to medium-resolution satellite images. One of the popular approaches for mapping tasks is a deep learning approach using U-Net architecture. However, it needs a large amount of representative training data to develop the model. In this paper, we present a new dataset of burned areas in Indonesia for training or evaluating the U-Net model. We delineate burned areas manually by visual interpretation on Landsat-8 satellite images. The dataset is collected from some regions in Indonesia, and it consists of 227 images with a size of 512 × 512 pixels. It contains one or more burned scars or only the background and its labeled masks. The dataset can be used to train and evaluate the deep learning model for image detection, segmentation, and classification tasks related to burned area mapping. Full article

(This article belongs to the Section Spatial Data Science and Digital Earth)

► Show Figures

Figure 1

10 pages, 2955 KiB

Open AccessData Descriptor

Statistical Dataset and Data Acquisition System for Monitoring the Voltage and Frequency of the Electrical Network in an Environment Based on Python and Grafana

by Javier Fernández-Morales, Juan-José González-de-la Rosa, José-María Sierra-Fernández, Manuel-Jesús Espinosa-Gavira, Olivia Florencias-Oliveros, Agustín Agüera-Pérez, José-Carlos Palomares-Salas and Paula Remigio-Carmona

Data 2022, 7(6), 77; https://0-doi-org.brum.beds.ac.uk/10.3390/data7060077 - 06 Jun 2022

Cited by 4 | Viewed by 2408

Abstract

This article presents a unique dataset, from a public building, of voltage data, acquired using a hybrid measurement solution that combines Python^TM for acquisition and Grafana^TM for results representation. This study aims to benefit communities, by demonstrating how to achieve more [...] Read more.

This article presents a unique dataset, from a public building, of voltage data, acquired using a hybrid measurement solution that combines Python^TM for acquisition and Grafana^TM for results representation. This study aims to benefit communities, by demonstrating how to achieve more efficient energy management. The study outlines how to obtain a more realistic vision of the quality of the supply, that is oriented to the monitoring of the state of the network; this should allow for better understanding, which should in turn enable the optimization of the operation and maintenance of power systems. Our work focused on frequency and higher order statistical estimators which, combined with exploratory data analysis techniques, improved the characterization of the shape of the stress signal. These techniques and data, together with the acquisition and monitoring system, present a unique combination of low-cost measurement solutions, which have the underlying benefit of contributing to industrial benchmarking. Our study proposes an effective and versatile system, which can do acquisition, statistical analysis, database management and results representation in less than a second. The system offers a wide variety of graphs to present the results of the analysis, so that the user can observe them and identify, with relative ease, any anomalies in the supply which could damage the sensitive equipment of the correspondent installation. It is a system, therefore, that not only provides information about the power quality, but also significantly contributes to the safety and maintenance of the installation. This system can be practically realized, subject to the availability of internet access. Full article

► Show Figures

Figure 1

14 pages, 1998 KiB

Open AccessArticle

The Complete Mitochondrial Genome of a Neglected Breed, the Peruvian Creole Cattle (Bos taurus), and Its Phylogenetic Analysis

by Carlos I. Arbizu, Rubén D. Ferro-Mauricio, Julio C. Chávez-Galarza, Héctor V. Vásquez, Jorge L. Maicelo, Carlos Poemape, Jhony Gonzales, Carlos Quilcate and Flor-Anita Corredor

Data 2022, 7(6), 76; https://0-doi-org.brum.beds.ac.uk/10.3390/data7060076 - 06 Jun 2022

Cited by 5 | Viewed by 4101

Abstract

Cattle spread throughout the American continent during the colonization years, originating creole breeds that adapted to a wide range of climate conditions. The population of creole cattle in Peru is decreasing mainly due to the introduction of more productive breeds in recent years. [...] Read more.

Cattle spread throughout the American continent during the colonization years, originating creole breeds that adapted to a wide range of climate conditions. The population of creole cattle in Peru is decreasing mainly due to the introduction of more productive breeds in recent years. During the last 15 years, there has been significant progress in cattle genomics. However, little is known about the genetics of the Peruvian creole cattle (PCC) despite its importance to (i) improving productivity in the Andean region, (ii) agricultural labor, and (iii) cultural traditions. In addition, the origin and phylogenetic relationship of the PCC are still unclear. In order to promote the conservation of the PCC, we sequenced the mitochondrial genome of a creole bull, which also possessed exceptional fighting skills and was employed for agricultural tasks, from the highlands of Arequipa for the first time. The total mitochondrial genome sequence is 16,339 bp in length with the base composition of 31.43% A, 28.64% T, 26.81% C, and 13.12% G. It contains 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes, and a control region. Among the 37 genes, 28 were positioned on the H-strand and 9 were positioned on the L-strand. The most frequently used codons were CUA (leucine), AUA (isoleucine), AUU (isoleucine), AUC (isoleucine), and ACA (threonine). Maximum likelihood reconstruction using complete mitochondrial genome sequences showed that the PCC is related to native African breeds. The annotated mitochondrial genome of PCC will serve as an important genetic data set for further breeding work and conservation strategies. Full article

(This article belongs to the Section Computational Biology, Bioinformatics, and Biomedical Data Science)

► Show Figures

Figure 1

12 pages, 2347 KiB

Open AccessData Descriptor

EndoNuke: Nuclei Detection Dataset for Estrogen and Progesterone Stained IHC Endometrium Scans

by Anton Naumov, Egor Ushakov, Andrey Ivanov, Konstantin Midiber, Tatyana Khovanskaya, Alexandra Konyukova, Polina Vishnyakova, Sergei Nora, Liudmila Mikhaleva, Timur Fatkhudinov and Evgeny Karpulevich

Data 2022, 7(6), 75; https://0-doi-org.brum.beds.ac.uk/10.3390/data7060075 - 01 Jun 2022

Cited by 3 | Viewed by 2582

Abstract

We present EndoNuke, an open dataset consisting of tiles from endometrium immunohistochemistry slides with the nuclei annotated as keypoints. Several experts with various experience have annotated the dataset. Apart from gathering the data and creating the annotation, we have performed an agreement study [...] Read more.

We present EndoNuke, an open dataset consisting of tiles from endometrium immunohistochemistry slides with the nuclei annotated as keypoints. Several experts with various experience have annotated the dataset. Apart from gathering the data and creating the annotation, we have performed an agreement study and analyzed the distribution of nuclei staining intensity. Full article

(This article belongs to the Section Computational Biology, Bioinformatics, and Biomedical Data Science)

► Show Figures

Figure 1

18 pages, 4351 KiB

Open AccessArticle

Handling Dataset with Geophysical and Geological Variables on the Bolivian Andes by the GMT Scripts

by Polina Lemenkova

Data 2022, 7(6), 74; https://0-doi-org.brum.beds.ac.uk/10.3390/data7060074 - 01 Jun 2022

Cited by 12 | Viewed by 3855

Abstract

In this paper, an integrated mapping of the georeferenced data is presented using the QGIS and GMT scripting tool set. The study area encompasses the Bolivian Andes, South America, notable for complex geophysical and geological parameters and high seismicity. A data integration was [...] Read more.

In this paper, an integrated mapping of the georeferenced data is presented using the QGIS and GMT scripting tool set. The study area encompasses the Bolivian Andes, South America, notable for complex geophysical and geological parameters and high seismicity. A data integration was performed for a detailed analysis of the geophysical and geological setting. The data included the raster and vector datasets captured from the open sources: the IRIS seismic data (2015 to 2021), geophysical data from satellite-derived gravity grids based on CryoSat, topographic GEBCO data, geoid undulation data from EGM-2008, and geological georeferences’ vector data from the USGS. The techniques of data processing included quantitative and qualitative evaluation of the seismicity and geophysical setting in Bolivia. The result includes a series of thematic maps on the Bolivian Andes. Based on the data analysis, the western region was identified as the most seismically endangered area in Bolivia with a high risk of earthquake hazards in Cordillera Occidental, followed by Altiplano and Cordillera Real. The earthquake magnitude here ranges from 1.8 to 7.6. The data analysis shows a tight correlation between the gravity, geophysics, and topography in the Bolivian Andes. The cartographic scripts used for processing data in GMT are available in the author’s public GitHub repository in open-access with the provided link. The utility of scripting cartographic techniques for geophysical and topographic data processing combined with GIS spatial evaluation of the geological data supported automated mapping, which has applicability for risk assessment and geological hazard mapping of the Bolivian Andes, South America. Full article

(This article belongs to the Special Issue 2nd Edition of Data in Astrophysics & Geophysics: Research and Applications)

► Show Figures

Figure 1

13 pages, 3197 KiB

Open AccessData Descriptor

Quality Control Impacts on Total Precipitation Gauge Records for Montane Valley and Ridge Sites in SW Alberta, Canada

by Celeste Barnes and Chris Hopkinson

Data 2022, 7(6), 73; https://0-doi-org.brum.beds.ac.uk/10.3390/data7060073 - 30 May 2022

Viewed by 1978

Abstract

This paper presents adjustment routines for Geonor totalizing precipitation gauge data collected from the headwaters of the Oldman River, within the southwestern Alberta Canadian Rockies. The gauges are situated at mountain valley and alpine ridge locations with varying degrees of canopy cover. These [...] Read more.

This paper presents adjustment routines for Geonor totalizing precipitation gauge data collected from the headwaters of the Oldman River, within the southwestern Alberta Canadian Rockies. The gauges are situated at mountain valley and alpine ridge locations with varying degrees of canopy cover. These data are prone to sensor noise and environment-induced measurement errors requiring an ordered set of quality control (QC) corrections using nearby weather station data. Sensor noise at valley sites with single-vibrating wire gauges accounted for the removal of 5% to 8% (49–76 mm) of annual precipitation. This was compensated for by an increase of 6% to 8% (50–76 mm) from under-catch. A three-wire ridge gauge did not experience significant sensor noise; however, the under-catch of snow resulted in 42% to 52% (784–1342 mm) increased precipitation. When all QC corrections were applied, the annual cumulative precipitation at the ridge demonstrated increases of 39% to 49% (731–1269 mm), while the valley gauge adjustments were −4% to 1% (−39 mm to 13 mm). Public sector totalizing precipitation gauge records often undergo minimal QC. Care must be exercised to check the corrections applied to such records when used to estimate watershed water balance or precipitation orographic enhancement. Systematic errors at open high-elevation sites may exceed nearby valley or forest sites. Full article

(This article belongs to the Special Issue 2nd Edition of Data in Astrophysics & Geophysics: Research and Applications)

► Show Figures

Figure 1

13 pages, 1238 KiB

Open AccessData Descriptor

Longitudinal RNA Sequencing of Skin and DRG Neurons in Mice with Paclitaxel-Induced Peripheral Neuropathy

by Anthony M. Cirrincione, Cassandra A. Reimonn, Benjamin J. Harrison and Sandra Rieger

Data 2022, 7(6), 72; https://0-doi-org.brum.beds.ac.uk/10.3390/data7060072 - 30 May 2022

Cited by 1 | Viewed by 2186

Abstract

Paclitaxel-induced peripheral neuropathy is a condition of nerve degeneration induced by chemotherapy, which afflicts up to 70% of treated patients. Therapeutic interventions are unavailable due to an incomplete understanding of the underlying mechanisms. We previously discovered that major physiological changes in the skin [...] Read more.

Paclitaxel-induced peripheral neuropathy is a condition of nerve degeneration induced by chemotherapy, which afflicts up to 70% of treated patients. Therapeutic interventions are unavailable due to an incomplete understanding of the underlying mechanisms. We previously discovered that major physiological changes in the skin underlie paclitaxel-induced peripheral neuropathy in zebrafish and rodents. The precise molecular mechanisms are only incompletely understood. For instance, paclitaxel induces the upregulation of MMP-13, which, when inhibited, prevents axon degeneration. To better understand other gene regulatory changes induced by paclitaxel, we induced peripheral neuropathy in mice following intraperitoneal injection either with vehicle or paclitaxel every other day four times total. Skin and dorsal root ganglion neurons were collected based on distinct behavioural responses categorised as “pain onset” (d4), “maximal pain” (d7), “beginning of pain resolution” (d11), and “recovery phase” (d23) for comparative longitudinal RNA sequencing. The generated datasets validate previous discoveries and reveal additional gene expression changes that warrant further validation with the goal to aid in the development of drugs that prevent or reverse paclitaxel-induced peripheral neuropathy. Full article

(This article belongs to the Section Computational Biology, Bioinformatics, and Biomedical Data Science)

► Show Figures

Figure 1

15 pages, 4497 KiB

Open AccessData Descriptor

A Socioeconomic Dataset of the Risk Associated with the 1% and 0.2% Return Period Stillwater Flood Elevation under Sea-Level Rise for the Northern Gulf of Mexico

by Diana Carolina Del Angel, David Yoskowitz, Matthew Vernon Bilskie and Scott C. Hagen

Data 2022, 7(6), 71; https://0-doi-org.brum.beds.ac.uk/10.3390/data7060071 - 26 May 2022

Cited by 2 | Viewed by 2199

Abstract

Storm surge flooding can cause significant damage to coastal communities. In addition, coastal communities face an increased risk of coastal hazards due to sea-level rise (SLR). This research developed a dataset to communicate the socioeconomic consequences of flooding within the 1% and 0.2% [...] Read more.

Storm surge flooding can cause significant damage to coastal communities. In addition, coastal communities face an increased risk of coastal hazards due to sea-level rise (SLR). This research developed a dataset to communicate the socioeconomic consequences of flooding within the 1% and 0.2% Annual Exceedance Probability Floodplain (AEP) under four SLR scenarios for the Northern Gulf of Mexico region. Assessment methods primarily used HAZUS-MH software, a GIS-based modeling tool developed by the Federal Emergency Management Agency in the United States, to estimate natural disasters’ physical, economic, and social impacts. This dataset consists of 29 shapefiles containing seven different measures of storm surge inundation impacts under SLR (including building damage, displaced people and shelter needs, road exposure, essential facilities, wastewater treatment plants, bridges, and vehicle damage). The data is publicly available under the Gulf of Mexico Research Initiative Information and Data Cooperative (GRIIDC). Full article

(This article belongs to the Section Spatial Data Science and Digital Earth)

► Show Figures

Figure 1

21 pages, 25944 KiB

Open AccessArticle

iKeyCriteria: A Qualitative and Quantitative Analysis Method to Infer Key Criteria since a Systematic Literature Review for the Computing Domain

by Mayra Carrión-Toro, Jose Aguilar, Marco Santórum, María Pérez, Boris Astudillo, Cindy-Pamela Lopez, Marcelo Nieto and Patricia Acosta-Vargas

Data 2022, 7(6), 70; https://0-doi-org.brum.beds.ac.uk/10.3390/data7060070 - 26 May 2022

Viewed by 2231

Abstract

A systematic literature review is a synthesis of the available evidence, in which a review of quantitative and qualitative aspects of primary studies is carried out, to summarize the existing information regarding a particular topic. The researchers extract key criteria from papers collected [...] Read more.

A systematic literature review is a synthesis of the available evidence, in which a review of quantitative and qualitative aspects of primary studies is carried out, to summarize the existing information regarding a particular topic. The researchers extract key criteria from papers collected about their study area, answering research questions and conducting document analysis. Nonetheless, in some cases, these criteria are improperly justified, unknowing their true level of importance in the study subject. Hence, an additional study is necessary to explain the criteria relevance in the papers studied using qualitative and quantitative premises. The correct identification of these key criteria is a critical factor in prioritizing and achieving appropriate results in any scientific research work. In our paper, a new method to determine key criteria from a literature review is proposed, composed of three components: input-process-output. First, the inputs are a set of criteria to evaluate and a set of documents to analyze. Next, the process component examines the document set to indicate whether the criteria to be analyzed are found. The process component produces a Boolean matrix, which is the input of the mathematical logic process that will get the key criteria considered necessary and sufficient as the output component. The iKeyCriteria method has been applied in different computing domains, particularly for serious games design and virtual organizations, giving positive results in each context. Finally, we developed an online tool that provides global support to the execution of our method. Full article

(This article belongs to the Section Information Systems and Data Management)

► Show Figures

Figure 1

10 pages, 821 KiB

Open AccessArticle

Using Twitter to Detect Hate Crimes and Their Motivations: The HateMotiv Corpus

by Noha Alnazzawi

Data 2022, 7(6), 69; https://0-doi-org.brum.beds.ac.uk/10.3390/data7060069 - 24 May 2022

Cited by 5 | Viewed by 3748

Abstract

With the rapidly increasing use of social media platforms, much of our lives is spent online. Despite the great advantages of using social media, unfortunately, the spread of hate, cyberbullying, harassment, and trolling can be very common online. Many extremists use social media [...] Read more.

With the rapidly increasing use of social media platforms, much of our lives is spent online. Despite the great advantages of using social media, unfortunately, the spread of hate, cyberbullying, harassment, and trolling can be very common online. Many extremists use social media platforms to communicate their messages of hatred and spread violence, which may result in serious psychological consequences and even contribute to real-world violence. Thus, the aim of this research was to build the HateMotiv corpus, a freely available dataset that is annotated for types of hate crimes and the motivation behind committing them. The dataset was developed using Twitter as an example of social media platforms and could provide the research community with a very unique, novel, and reliable dataset. The dataset is unique as a consequence of its topic-specific nature and its detailed annotation. The corpus was annotated by two annotators who are experts in annotation based on unified guidelines, so they were able to produce an annotation of a high standard with F-scores for the agreement rate as high as 0.66 and 0.71 for type and motivation labels of hate crimes, respectively. Full article

(This article belongs to the Special Issue Knowledge Extraction from Data Using Machine Learning)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Data, Volume 7, Issue 6 (June 2022) – 14 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI