Machine Learning Classification of Roasted Arabic Coffee: Integrating Color, Chemical Compositions, and Antioxidants

Alamri, Eman S.; Altarawneh, Ghada A.; Bayomy, Hala M.; Hassanat, Ahmad B.

doi:10.3390/su151511561

Open AccessArticle

Machine Learning Classification of Roasted Arabic Coffee: Integrating Color, Chemical Compositions, and Antioxidants

¹

Food Science and Nutrition Department, University of Tabuk, Tabuk 47512, Saudi Arabia

²

Accounting Department, Mutah University, Karak 61711, Jordan

³

Food Science and Technology Department, Damanhour University, Damanhour 22516, Egypt

⁴

Computer Science Department, Mutah University, Karak 61711, Jordan

^*

Authors to whom correspondence should be addressed.

Sustainability 2023, 15(15), 11561; https://0-doi-org.brum.beds.ac.uk/10.3390/su151511561

Submission received: 3 June 2023 / Revised: 20 July 2023 / Accepted: 23 July 2023 / Published: 26 July 2023

(This article belongs to the Special Issue Latest Applications of Computer Vision and Machine Learning Techniques for Smart Sustainability)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This study investigates the classification of Arabic coffee into three major variations (light, medium, and dark) using simulated data gathered from the actual measurements of color information, antioxidant laboratory testing, and chemical composition tests. The goal is to overcome the restrictions of limited real-world data availability and the high costs involved with laboratory testing. The Monte Carlo approach is used to generate new samples for each type of Arabic coffee using the mean values and standard deviations of publicly available data. Using these simulated data, multiple machine-learning algorithms are used to classify Arabic coffee, while also investigating the importance of features in identifying the key chemical components. The findings emphasize the importance of color information in accurately recognizing Arabic coffee types. However, depending purely on antioxidant information results in poor classification accuracy due to increased data complexity and classifier variability. The chemical composition information, on the other hand, has exceptional discriminatory power, allowing faultless classification on its own. Notably, particular characteristics like crude protein and crude fiber show high relationships and play an important role in coffee type classification. Based on these findings, it is suggested that a mobile application be developed that uses image recognition to examine coffee color while also providing chemical composition information. End users, especially consumers, would be able to make informed judgments regarding their coffee preferences.

Keywords:

smart sustainability; smart food; Arabic coffee; classification; color information; antioxidant information; chemical composition; mobile application

1. Introduction

Arabic coffee in the Arab world is considered an important part of Arab culture and tradition, particularly in Saudi Arabia and the surrounding Arab countries, where it is said to be the most popular hot beverage [1]. Arabica coffee beans are the primary ingredient [2]. An estimated 1.4 billion cups of coffee are consumed daily around the world. The consumption of coffee has significantly grown in Saudi Arabia, where 18,000 tons of coffee are imported each year at a cost of SAR 54 million [3].According to recent estimates, the typical Saudi adult consumes between 60 and 300 mL of Arabic coffee in one serving, amounting to 1.6 kg of coffee per person yearly [2].

Coffee drinking has certain beneficial effects on human health as a result of its biochemical features [4,5], especially non-communicable diseases [6]. There is proof that drinking coffee can improve several functions, such as memory, mood, and cognitive performance [7,8,9].

On the other hand, coffee can raise LDL-C and the body’s total cholesterol, which can increase the risk of cardiovascular disease [10,11]. Additionally, drinking Arabic coffee is linked to a significant osteoporosis increase among Saudi females over the age of 40 [12]. Regardless of whether drinking coffee is harmful or beneficial, it is apparent that there are many people who do so worldwide, especially in Saudi Arabia when it comes to Arabic coffee.

Arabic coffee comes in several types with varying levels of roasting. The roasting temperature and roasting time affect the degree of roasting, and these two factors determine whether the coffee is a light, medium, or dark roast in terms of color [13]. The biological activity and chemical composition of the coffee can be altered significantly during the roasting process. According to Wang et al. [14], certain components, such as natural phenolic compounds, can be lost, while other elements, such as antioxidants, including Maillard reaction products, can be generated. As a result, antioxidants can be either preserved or enhanced.

Hence, it is important to be able to classify coffee into one of its three common roasting degrees (light, medium, or dark) for human consumption since each has a different effect on health. For instance, Alamri et al. [3] recommended medium Arabic coffee as having a high concentration of chemicals with antioxidants and other biologically advantageous effects.

The interesting research of Alamri et al. [3] examined the chemical composition of typically roasted Arabic coffee at three distinct roasting levels: light, medium, and dark. To analyze various facets of coffee, the researchers used a variety of methodologies. They measured the amount of caffeine using a UV-visible spectrophotometer, the amount of acrylamide using a gas chromatograph, and the amount of free radical scavenging power using the 1,1-diphenyl-2-picryl-hydrazy (DPPH) technique. They also used gas chromatography-mass spectrometry (GC-MS) to separate and distinguish the volatile components contained in the coffee samples, and they estimated the browning index. A temperature of 180 ± 10 °C was held for around

6.0 \pm 1.0

min for the light roast,

8.0 \pm 1.0

min for the medium roast, and

10.0 \pm 1.0

min for the dark roast. Their laboratory experiments were conducted on nine Arabic coffee samples, three for each type. According to their findings, roasting Arabic coffee alters its chemical compositions, such as moisture content, ether extract, crude protein, crude fiber, ash content, nitrogen-free extract (NFE), caffeine content, acrylamide levels, and DPPH activity; Table 1 and Table 2 demonstrate that each roasting degree has unique chemical component values. in addition, they provided distinctive color information for each coffee type using the L* a* b* color system and browning index as shown in Table 3.

There are several studies in the literature designed to classify Arabic coffee and coffee, in general, using various sources of information and technologies, including coffee bean images [15,16,17,18], infrared spectroscopy [19,20,21,22,23], and electronic nose [24,25,26]. To the best of our knowledge, none of them, however, address the chemical classification of Arabic coffee.

The aim of this paper is to investigate the classification of Arabic coffee using its chemical compounds. For this purpose, we utilize the data obtained by [3] (shown in Table 1, Table 2 and Table 3) to generate new samples for each type of Arabic coffee using the Monte Carlo method because we only have one mean value for each feature, ±, its standard deviation. Then, using the simulated data, we employ a variety of machine-learning approaches to classify Arabic coffee; feature importance is also explored in this study to determine which chemical components have the most influence on the classification process. It is worth noting that obtaining significant samples of real data for each chemical test involves laboratory testing, which comes at a very high cost that we cannot afford at present.

The contributions of this study can be summarized as follows:

According to the literature, the majority of coffee sample classification methods presented are based on shape, size, color, infrared spectroscopy, and/or aroma. To the best of our knowledge, none of them address the chemical component classification of Arabic coffee samples.
Our study made some notable findings after conducting a large number of experiments:
- Color information alone was critical in correctly identifying Arabic coffee in three categories. The wide range of CIE color values and their high association with coffee classes all contributed to the flawless classification results.
- It was found that antioxidant information alone was insufficient for accurate coffee classification. Using only antioxidant information resulted in a considerable decrease in classification performance.
- Chemical composition data exhibited significant discrimination power in identifying Arabic coffee types. The use of chemical composition features alone resulted in flawless categorization results, emphasizing their importance and separability among coffee categories. Specific characteristics, such as crude protein and crude fiber, showed high relationships, emphasizing their importance in the classification process.
- As the data complexity increased, the classification performance of the antioxidant information and the choice of the classifier became increasingly important. The overlapping data and complex decision tree rules highlighted the challenges of dealing with complex data and underscored the need for selecting proper classifiers.

These significant findings contribute to a better knowledge of Arabic coffee classification and emphasize the importance of color information, chemical composition, and the issues connected with antioxidant data for achieving correct classification results.

2. Related Work

Several studies have been conducted in the literature to classify Arabic coffee and coffee, in general, utilizing various sources methods, and technologies such as computer vision, infrared spectroscopy, and electronic nose.

2.1. Computer Vision

Because basic computer vision systems are less expensive, they consist of a digital camera used to acquire images, a standard lighting system, and software for image processing and analysis. This affordability has allowed many computer vision systems to emerge in the last decade, not only for coffee classification but also for various food issues.

For example, de Oliveira and coworkers [15] developed a basic computer vision system that produces color measurements of 120 samples of green Arabic coffee beans and classifies them in four groups using artificial neural networks (ANNs) and the Bayes classifier. Their system yielded 100% classification accuracy, although the conclusion is not significant, owing to the small amount of data utilized.

Arboleda et al. [16] proposed a computer vision method to classify the species of coffee bean samples automatically. From a collection of 195 training images and 60 testing images, various morphological characteristics of the beans, including area, perimeter, equivalent diameter, and roundness percentage, were retrieved. The coffee beans were automatically classified using two classification algorithms: k-nearest neighbor (KNN) and ANN. ANN achieved the best results, with a classification score of 96.66%.

Arboleda [17] employed image processing techniques and data mining algorithms to classify green coffee beans from three species: Liberica, Robusta, and Excelsa. Four features were retrieved from 255 photos of coffee beans; 85 samples were taken for each species. For the classification of green coffee beans from various species, a range of data mining algorithms were used. Twenty-two classifiers from five classifier families, including decision trees, discriminant analysis, support vector machines, k-nearest neighbor, and ensemble classifiers, were used in total. Among these classifiers, the coarse tree algorithm achieved the highest classification accuracy of 94.1%. Furthermore, when compared to the other classifiers studied, the coarse tree algorithm had the shortest training time.

The morphological attributes of coffee beans, such as area, perimeter, equivalent diameter, and roundness percentage, were retrieved from images using hand-crafted features available in the MATLAB image processing toolkit in both previous papers. To preprocess the coffee image samples and extract these features, computer routine methods were devised. However, recent advances in computer vision have shown that deep learning-based features outperform hand-crafted features in a variety of tasks, including image analysis, where they have demonstrated superior capabilities in capturing complex patterns and representations in images. Deep learning algorithms have been used successfully in a variety of computer vision applications, including object detection and classification [27,28].

Subjectivity and standards problems plague the manual classification process, resulting in potential inconsistencies. To overcome these challenges, Pizzaia et al. [18] proposed a computer vision approach using a multilayer perceptron (MLP) neural network. The MLP used in the study was made up of three layers: an input layer, a hidden layer with 100 sigmoid-type neurons, and a binary output layer. The input layer of the MLP consisted of five inputs obtained from the shape, size, and color parameters of the coffee beans. The inputs included the area and roundness of each coffee bean, as well as the red, green, and blue (RGB) color channel averages. The MLP was trained using the Levenberg–Marquardt algorithm, a commonly used optimization technique for neural network training. The network was trained to classify coffee beans as “good” or “defective”, with a binary output of “1” indicating a good grain and “0” indicating a defective grain. In terms of performance, the MLP technique attained a classification accuracy of 94.10%.

Other recent studies that addressed coffee classification using computer vision approaches include but are not limited to [29,30,31,32].

2.2. Infrared Spectroscopy

Near and mid-infrared spectral studies have evolved as a valid and promising analytical method for objectively assessing coffee quality features over the last two decades [19,20,21,22,23]. These studies demonstrate that near and mid-infrared techniques have enormous potential for rapidly obtaining information about the chemical composition and related aspects of coffee. These studies provide non-destructive and efficient means of assessing coffee quality features by providing valuable information when analyzing light absorption and reflection in the near and mid-infrared regions of the electromagnetic spectrum. Thus, specific information about color, chemical composition, and other features of coffee can be obtained using these techniques, and then machine learning can be used for the classification of coffee samples [19,20,21].

In order to classify Arabica and Robusta coffee species, Calvinia et al. [20] compared two sparse classification approaches, sparse variants of principal component analysis (sPCA) with KNN, and sparse versions of partial least squares with discriminant analysis (sPLS-DA). Green coffee samples were analyzed using near-infrared hyperspectral imaging. The average spectra from each hyperspectral image were used to build training and testing sets. The reported results show that the sparse methods produced similar results to the conventional methods. Sparse techniques, on the other hand, produced more interpretable and parsimonious models. Notably, both sparse classification algorithms converged on using the same spectral regions for feature selection. This convergence shows that those locations are chemically relevant in distinguishing between Arabica and Robusta coffee species. Feature selection is one of the most common methods in data preprocessing. Obtaining the required feature or feature subsets in the literature to meet classification aims has become a critical component of the machine learning process [33].

The goal of Link et al.’s [21] study was to create a neural network using radial basis function (RBF) to classify the geographic and genotypic origin of Arabica coffee. The spectra were collected using Fourier transform infrared (FTIR) technology and subsequently processed using RBFs. The results demonstrated that the modified RBF successfully classified Arabica coffee samples. Geographically, the classification accuracy was 100%; however, in terms of genotypic classification, the classification accuracy was 94.44%.

Soft independent modeling of class analogies (SIMCA) model was built by Mutz et al. [22] employing a portable near-infrared (NIR) spectrometer and a dataset of 182 coffee samples. The goal was to distinguish between distinct coffee qualities. Specialty coffees from two species (C. arabica and C. canephora), Arabica coffees from specific geographical indication regions (GI), and commodity coffee blends were among the samples. The proposed SIMCA model had good classification accuracy for individual Arabica coffees from GI areas, ranging from 76% to 90%. This suggests that the model was able to distinguish various Arabica coffees depending on their geographical origins. Furthermore, for specialized Arabica coffees and Conilon coffees, the classification accuracies were 98% and 95% respectively.

Okubo and Kurata [23] used classification analysis and NIR to determine the production area of green coffee beans. SIMCA was used for the classification of green coffee bean samples, which were collected from seven different places. The study achieved an overall correct classification rate of over 73% for different types of green coffee beans using NIR.

Other recent studies that addressed coffee classification using infrared spectroscopy approaches include but are not limited to [34,35,36,37].

2.3. Electronic Nose

An electronic nose is a type of electronic sensing equipment that detects scents or flavors by mimicking human senses with sensor arrays and pattern recognition systems. This technology has lately been employed for coffee classification [24,25,26].

Makimori and Bona [24] utilized an electronic nose (E-nose) outfitted with seven metal-oxide-semiconductor (MOS) sensors to examine 53 samples of six distinct commercial instant coffees produced by the same industry. For sample classification, they used chemometric methods, such as common dimension analysis (ComDim) and linear discriminant analysis (LDA). ComDim, an unsupervised multiblock analysis method, was used to minimize the dimensions of the E-nose sensor data. The first derivative of the transitory signal was used to construct each sensor’s block. In the E-nose data, four common dimensions (CDs) were identified, accounting for 99.86% of the total variation. Salience tables revealed links between sensors S1, S3, S5, S6, and S8, inside CD1, but sensors S7 and S9 had a higher influence on CD2. The scores from the first four CDs were used as input for building LDA classifiers. All generated models obtained 100% sensitivity and specificity using leave-one-out cross-validation. This suggests that the models accurately classified the coffee samples studied.

Bona et al. [25] developed an ANN to classify instant coffee using an E-nose fragrance profile. A hybrid algorithm with several components was developed. To begin, the dataset was subjected to a bootstrap resample process. The network parameters were then fine tuned using a factorial design and sequential simplex optimization. Backpropagation was used to train MLP for coffee classification. Finally, knowledge was extracted from the trained ANN using a causal index approach. The proposed approach performed well by correctly identifying 100% of the coffee samples tested.

Tang et al. [26] constructed a classification system comprised of an environmental control system, an E-nose, and a data signal readout system. The system’s goal was to distinguish between different degrees of mold on the coffee beans by assessing the scent of the beans. A standard operating process was designed to collect gas samples from coffee beans in a controlled environment. The E-nose was utilized to capture the changes in the signals after the target gas was introduced. Dimensionality reduction techniques, such as PCA and LDA, were applied to the E-nose data in order to reduce data dimensionality and eliminate noise. The proposed system achieved a classification accuracy of 91.77%.

Other studies that addressed coffee classification issues, regardless of the approach, include but are not limited to [38,39,40,41,42,43,44,45].

Based on the previous literature review, it can be noticed that most classification methods for coffee samples are based on shape, size, color, infrared spectroscopy, and/or aroma. To the best of our knowledge, none of these studies particularly address the classification of Arabic coffee samples based on their chemical components. This study aims to bridge that gap by focusing on the chemical composition of Arabic coffee and its classification potential. Hoping to improve understanding of this key feature of coffee classification, we investigate the specific role of chemical components in classifying Arabic coffee varietals.

3. Materials, Data, and Methods

3.1. Materials

Nine kilograms of fermented and dried beans of a local Arabic coffee cultivar known as Kholani were acquired by [3] from supermarkets in Tabuk City, Saudi Arabia. Drum roasters, which are typically used for coffee roasting, were utilized to roast the coffee beans. The roasting temperature was established at

180 \pm 10

°C, which was held for around

6.0 \pm 1.0

min to obtain the light roast,

8.0 \pm 1.0

min to obtain the medium roast, and

10.0 \pm 1.0

min to obtain the dark roast. After roasting, the coffee beans were ground (using the model GVX212 from Krupps, located in Essen, Germany), and then stored in an airtight jar in the refrigerator until further analysis [3]. Analytical-grade chemicals and indicators were used, which were obtained from Sigma-Aldrich, USA. The results of these laboratory analyses were reported by [3] and are shown in Table 1, Table 2 and Table 3.

For a deeper understanding of the coffee’s chemical components analysis, it is important to refer readers to the recent work of Alamri et al. [3], which provides extensive details on the chemical analysis of roasted Arabic coffee, including caffeine determination using a UV-visible spectrophotometer, acrylamide measurement, free radical scavenging capacity assessment using the DPPH technique, browning index calculation, color measurements, and volatile compound estimation using gas chromatography-mass spectrometry.

3.2. Data

Typically, having a sufficiently large dataset increases the performance and generalization of machine learning models. It enables them to capture a broader range of patterns, reduce overfitting, and better handle real-world conditions [46,47,48,49]. Therefore, in order to achieve the primary goal of this study, which is to classify Arabic coffee based on its chemical components, we need a sufficiently large dataset of coffee samples, which is a luxury we do not have because each chemical feature requires laboratory testing, which comes at a very high cost that we cannot afford at the moment.

As a result, we opt for simulation data, which are generated from real data, to train our machine learning models. We utilize the data obtained by [3] (shown in Table 1, Table 2 and Table 3) to generate 1000 new samples for each type of Arabic coffee (light, medium, and dark coffee) using the Monte Carlo method because we only have one mean value for each feature, ±, its standard deviation.

Using simulation data for machine learning is not new because it serves numerous goals, such as data augmentation, imbalanced data, privacy and security, rare events, novel scenarios, and a lack of labeled data [50,51,52]. The latter is our case because actual laboratory testing for a big number of coffee samples is expensive to obtain.

Our simulated data are based on real-world chemical tests on three distinct coffee classes. It is critical to differentiate this simulation approach from oversampling approaches used in machine learning to handle class imbalance. Oversampling by creating new instances can produce unsatisfactory results if the newly formed samples are wrongly thought to be part of the minority class based only on their closeness to existing minority examples. When dealing with imbalanced datasets, it is important to avoid conflating the usage of simulated data with oversampling class-imbalanced data [53,54].

The central limit theorem implies that the mean sampling distribution will approach a normal distribution, regardless of the population’s original distribution [55]. It is easy to verify that the assumption of the central limit theorem holds for our specific dataset simply by looking at Figure 1. However, without obtaining real tests for a large number of coffee samples, it is difficult to ensure that the simulated data accurately reflect the underlying distribution of the coffee samples and their chemical features in the real population.

Table 4 shows the statistical characteristics of the simulated data, and Figure 1 depicts the distribution of each simulated feature.

Table 4 summarizes the statistical properties of the simulated data utilized in the study. The table covers information for each class (light coffee, medium coffee, and dark coffee) as well as their specific features. MoistureContent, EtherExtract, CrudeProtein, CrudeFiber, AshContent, NFE, CaffeineContent, Acrylamide, DPPH, Browning Index, Lcolor, aColor, and bColor are among the properties. The table also supplies the minimum, maximum, and mean values for each feature across all classes.

The simulated data, as shown in Table 4, have means that nearly match the actual means presented in Table 1, Table 2 and Table 3. This observation shows that the Monte Carlo method is effective in generating samples centered on the true means, with the goal of approximating the features of the real population.

3.3. Feature Importance

To pick a subset of relevant features from a number of available features, a feature selection technique is used. The major goal of this process is to improve the machine learning model’s evaluation metrics by omitting irrelevant redundant data. Two feature selection techniques are used in our experiment to analyze and determine the most influential features in classifying coffee. These methods include the Correlation coefficient (shown in Figure 2) and feature importance (shown in Figure 3). The feature importance scores are calculated using the mean decrease Gini index with a random forest classifier.

Figure 2 depicts the Pearson correlation analysis results between every feature and the class variable, which represents the three coffee varieties. The correlation values provide essential information about the relationship between the features and coffee type classification. It is critical to consider the strength of the correlation when analyzing the correlation values. A large correlation indicates a substantial relationship between the feature and the coffee types, whilst a moderate correlation shows a less significant relationship. The correlation coefficient has a magnitude ranging from

- 1

to 1, with values closer to

- 1

or 1 indicating higher correlations and values closer to 0 indicating a weaker correlation.

In Figure 2, we can see that Crude Protein has a strong negative association (

- 0.91

) with the coffee types. This strong relationship means that as the crude protein level approaches

- 1

, the likelihood of the coffee being of a given variety increases. Crude Fiber, on the other hand, has a positive correlation of 0.80, indicating that higher crude fiber readings are connected with a greater likelihood of the coffee belonging to a specific type. Similarly, color-related measures, such as a*, Browning Index, and L*, have moderate-to-substantial relationships with coffee types (0.85, 0.84, and −0.81, respectively). These correlations imply that these color-related features play an important role in the categorization process, as they capture the specific color characteristics of each coffee type.

In conclusion, the correlation values in Figure 2 reveal the intensity and direction of the correlation between the features and the coffee types. Strong correlations imply features that are extremely influential in determining the coffee type, whereas lesser correlations exhibit links that are less significant. We may acquire an improved understanding of the importance of the features in the classification process by taking these correlation values into account.

Similarly, Figure 3 shows that Crude Protein, a*, Crude Fiber, Browning Index, and L* are the most important features for classifying coffee types based on the simulated data. Given that each coffee type has distinct color properties when compared to the other two, a high link between color information and the three coffee types is reasonable. Furthermore, the high correlation of Crude Protein and Crude Fiber percentages with coffee classes can be related to their discriminative strength as shown by the separability of their values in the distributions presented in the top-right corner of Figure 1.

3.4. Methods: Machine Learning Classifiers

Several widely used classifiers were studied to determine the best option for the proposed Arabic coffee classification system. The analysis focused on the Weka3 implementations of random forests (RFs), support vector machines (SVMs), k-nearest neighbors (KNNs), naive Bayes (NB), decision tree algorithm C4.5, fuzzy lattice reasoning (FLR) [57], and Ada Boost (AB).

These classifiers were chosen based on their prominence in the field of machine learning as well as their likely applicability for our classification of the task at hand. For our investigation, we used the Weka3 [56] implementation of these classifiers, which provided efficient and well-established implementations.

In this study, we used the default parameters for each classifier, for example, KNN (K = 1, distance function = Euclidean), SVM (method = libsvm library, kernel = radial basis function), RF (number of trees = 100, number of features =

i n t (\log_{2} (# f e a t u r e s + 1))

), AB (number of iterations = 10), and FLR (

ρ = 0.5

).

It is worth noting that these default hyperparameters may not be suitable for every dataset or classification task. To discover the ideal configuration for a classification task, it is sometimes recommended to tweak the hyperparameters depending on the specific properties of the data source [58,59]. However, such an investigation is beyond the scope of this study.

4. Results

The chemical tests conducted in the laboratory for the three types of coffee by [3] can be classified into three categories of information as follows:

Arabic coffee color information: This category includes measurements such as the browning index, which is determined using a UV-Vis spectrophotometer at a wavelength of 420 nm; in addition, it includes color information based on the International Commission of Illumination (CIE) color system, which involves the L*, a*, and b* channels.
Arabic coffee antioxidant information: This category comprises the caffeine percentage, acrylamide concentration (measured in mg per 100 g), and DPPH.
Arabic coffee chemical compositions: This category includes the percentage of moisture content, percentage of ether extract, percentage of crude protein, percentage of crude fiber, percentage of ash content, and percentage of NFE.

A series of experiments focusing on the aforementioned types of information were carried out to evaluate the proposed Arabic coffee classification system based on its chemical properties. It is worth noting that all of the tests used machine learning methods, and a 10-fold cross-validation approach was applied with each method’s default parameters. This approach ensured that the evaluation was consistent and unbiased across all experiments.

Table 5 shows the classification results of Arabic coffee based on its color information: L*, a*, and b*. We did not add the browning index to this information category because it can be assessed from the L*, a*, and b* channels according to [60].

According to the data presented in Table 5, it is evident that the color information alone was sufficient to achieve the flawless classification of Arabic coffee into its three types: light, medium, and dark. This observation can be attributed to the fact that the CIE colors of these three classes are easily differentiated. Furthermore, Figure 2 depicts the high correlation between color information and coffee classes, underlining the association between color and coffee classification. Additionally, Figure 3 demonstrates the significant value of color information in discriminating across coffee types.

Table 6 presents the classification results of Arabic coffee based on its antioxidant information: caffeine percentage, acrylamide concentration, and DPPH.

The results presented in Table 6 show that when both the color and chemical composition information were eliminated, leaving only the antioxidant information, the classification results significantly decreased. The performance of the classifiers varied, with the best performer being NB, followed by RF. This means that depending exclusively on antioxidant information is insufficient for accurate Arabic coffee classification. The correlation analysis illustrated in Figure 2 and Figure 3 supports this observation. The correlation values and importance rankings show that variables like caffeine percentage and acrylamide concentration are less important in the classification process and have weaker correlations with the dependent variable (coffee class).

A closer look at Figure 1 reveals that some of the curves depicting these features for the three types of Arabic coffee have significant overlap. This overlap implies that the values of these features for various types of coffee are not unique enough to create obvious borders or distinctions between the classes. As a result, depending entirely on antioxidant features for classification would almost certainly result in misclassifications or poorer classification accuracy. The observed difficulties in effectively identifying Arabic coffee based on antioxidant information can be attributable mostly to the near proximity of these variables’ actual values. As shown in Table 1, the caffeine percentages of light, medium, and dark coffee are reported as 1.13 ± 0.02, 1.17 ± 0.07, and 1.08 ± 0.06, respectively.

Table 7 presents the classification results of Arabic coffee based on its chemical compositions information: moisture content %, ether extract %, crude protein %, crude fiber %, ash content %, and NFE %.

According to the data in Table 7, all classifiers’ classification results are perfect, with the exception of J48, which earned a classification accuracy of 0.999. However, this classifier achieved perfect training accuracy by obtaining the following rules:

If CrudeProtein ≤ 11.283023, then class = 3 (1001 out of 1001 examples).
If CrudeProtein > 11.283023 and CrudeFiber ≤ 25.642686, then class = 1 (1001 out of 1001 examples).
If CrudeProtein > 11.283023 and CrudeFiber > 25.642686, then class = 2 (1001 out of 1001 examples).

This level of agreement and excellent accuracy across classifier families shows that the training data utilized in the studies were distinct and well separated. Based on these outcomes, it is clear that chemical composition data alone were sufficient to achieve the faultless classification of Arabic coffee into three types: light, medium, and dark. This great precision is most likely due to the distinctive characteristics of the chemical composition information for these classes.

Figure 2 also shows a substantial link between chemical composition information and coffee classes. The percentages of crude protein and crude fiber, in particular, have strong correlation values of

- 0.91

and 0.80, respectively. This demonstrates the connection and discriminatory power of these variables in classifying coffee types.

Furthermore, Figure 3 shows that when compared to other features, the percentage of crude protein ranks first, and the percentage of crude fiber is ranked third. This highlights the importance and discriminative capacity of this type of information in the classification of coffee types, particularly, crude protein and crude fiber, which are extremely separable as shown in Figure 4 and can achieve perfect classification alone, even without the need for the other variables.

Based on the preceding results, it is evident that the simulated data used for classification are clean and well suited for various types of classifiers, owing to the features’ strong discriminatory power. This is especially true for the categories of color information and chemical composition information.

The standard deviation of the normal distribution for each feature was doubled (2STD) to bring more unpredictability and variability into the simulated data, resulting in a dataset with higher dispersion. By providing more variability in the feature values, this modification attempts to increase diversity and make the classification task more challenging. The classification results of the new 2STD simulated data are shown in Table 8, Table 9, Table 10 and Table 11.

According to the results obtained from the newly generated data with doubled standard deviation, the classification’s performance of Arabic coffee based on its 2STD simulated chemical composition information is reduced by 1% to 3%. Similarly, depending on the classifier, the classification results based on its 2STD simulated color information are reduced by 2% to 5%. This decrease in performance might be due to the growing complexity of the data, which causes classifiers to differ in their performance. It is worth mentioning that clean data often yield superior classification accuracy regardless of the classifier used.

In the case of the 2STD simulated antioxidant information, the classification results show a drop of 16% to 20%, depending on the classifier utilized. The KNN classifier, in particular, performs poorly in this scenario due to its reliance on similarity across samples for classification, which becomes less accurate when dealing with complicated data.

This drop in performance can be traced primarily to the nature of the original data shown in Table 1. The antioxidant characteristics in the table demonstrate no significant difference between the three coffee classes (light, medium, and dark). As a result, increasing the standard deviation to generate the 2STD simulated data reduces the distinctiveness and separability of the antioxidant characteristics, resulting in an overall decrease in classification accuracy.

When analyzing the antioxidant classification results, it is critical to recognize these constraints as well as the impact of data characteristics. In such circumstances, additional characteristics or alternative methodologies may be necessary to improve classification’s performance.

The observed decline in classification’s performance, as well as the variation in performance among classifiers, emphasizes the need for classifier selection when dealing with complicated data. Figure 4 depicts the link between crude fiber% and crude protein for all 1STD simulated data points. This graph depicts the relationship between both of these features and their distribution among different coffee types. We can see the patterns and separability of the data points in Figure 4, revealing the discriminative power of crude fiber% and crude protein when classifying the coffee types.

On the other hand, Figure 5 depicts the same features (crude fiber% and crude protein) but with 2STD simulated data, which creates greater complexity and overlapping among the data points. In Figure 5, the overlapping data and growing complexity demonstrate the difficulties that come with dealing with complicated data. This underscores the significance of proper classifier selection when dealing with such complex data, as different classifiers’ performance may vary as data complexity increases.

This intricacy is illustrated further in Figure 6 by the decision tree of J48, where the rules become more intricate. However, it is worth mentioning that the increase in data complexity has no significant effect on classification accuracy for color information and chemical composition.

5. Conclusions

In this paper, we evaluated the applicability of classifying Arabic coffee into three major varieties, light, medium, and dark, using simulated data based on Iman’s actual data of color information, antioxidant laboratory tests, and chemical composition tests. We created two types of simulated data using the Monte Carlo approach: the first used the standard deviation of each of the actual measures, and the second used a double standard deviation of each of the actual measures. We classified both of the simulated data using a number of commonly used classifiers.

Based on our large set of experiments, several key findings emerged:

The color information alone proved to be quite successful in accurately classifying Arabic coffee into three types: light, medium, and dark. The different CIE color values, as well as the excellent correlation between color information and coffee classes, were critical in achieving faultless classification outcomes.
When the antioxidant information was considered alone, it became clear that this type of information is insufficient for correct coffee classification. The results showed that relying simply on antioxidant information reduces classification performance significantly and different classifiers perform differently. The KNN classifier, in particular, struggled with the additional data complexity (2STD simulated data).
In identifying Arabic coffee types, the chemical composition information revealed significant discriminatory strength. This information alone produced flawless classification results, demonstrating its significance and separability among the coffee groups. The substantial correlation between certain chemical composition features, such as crude protein and crude fiber, underscores their importance in the classification process even more.
Classification performance regarding the antioxidant information and the choice of classifier grew more relevant as the data complexity rose. The overlapping curves and increasingly elaborate decision tree rules demonstrated the difficulties encountered when dealing with complex data and the importance of proper classifier selection.

Indeed, based on the data stated, color information alone proved the ability to reliably identify the type of coffee, showing its practical application potential. Using this knowledge, it is possible to create a smartphone application that analyzes the color of a coffee sample using image recognition technology. Such an application could allow end users, such as consumers, to utilize their smartphone’s camera to snap an image of their coffee. The application would next transform the image’s RGB pixel values to the CIE color space, extracting the color information required for classification. After recognizing the coffee type, the application might offer the consumer with useful information on the chemical components of their coffee. Details such as caffeine content, antioxidant levels, moisture content, and other pertinent chemical compositions could be included. Our future mobile application would provide users with a handy and user-friendly tool for understanding the qualities of their coffee and making informed decisions depending on their tastes by delivering such information. Furthermore, it has the potential to increase transparency in the coffee sector, allowing customers to make more educated purchasing decisions.

There are various limitations that could be addressed in future coffee classification studies:

The paper largely concentrated on color, antioxidant, and chemical composition information. Exploring additional features that could improve classification accuracy, such as scent profiles, geographical origin, or processing processes, would be advantageous. Incorporating a larger variety of characteristics may result in a more comprehensive understanding of the factors determining coffee classification.
According to the findings, the distinctness and separability of certain variables, particularly antioxidant information, were limited, resulting in lower classification accuracy. Introducing more diverse and representative datasets with a broader range of actual (not simulated) antioxidant feature values could help alleviate this restriction and provide a more genuine portrayal of coffee variations.
The study focused on several off-the-shelf classifiers, although different algorithms or ensemble approaches could produce better results [61,62,63,64]. Exploring and comparing different classifier alternatives could provide useful insights into the most successful ways for coffee classification.
This paper focused on Arabic coffee varieties and classification. It would be beneficial to replicate the study using different coffee types from various places throughout the world to improve the generalizability of the findings. This would assist in testing the categorization algorithms’ efficiency across different coffee varieties and broaden the research’s practicality.
In our investigation, we used simulated data instead of real-world samples. We understand that this approach may introduce variations and shortcomings, particularly in terms of accurately capturing the underlying distribution of the coffee samples and their chemical properties. While simulated data can be a viable option when laboratory testing for a large number of coffee samples is prohibitively expensive, we recognize the need to address any biases associated with this technique. We will make an attempt in the future to examine these constraints in greater depth and to investigate techniques to alleviate any deviations or uncertainties generated by using simulated data.

Future studies should look into factors other than color, antioxidants, and chemical composition. Incorporating odor profiles, origin information, and processing methods may increase classification accuracy and provide a more complete understanding of coffee differences.

To solve shortcomings, future studies might use more diverse and representative datasets, experiment with other classifiers, and go beyond Arabic coffee types to improve generalizability. They may concentrate on these factors, as well as the development of the mobile application for practical usage.

Overall, this research adds to the field of coffee classification, provides practical insights, and identifies areas for further research and advancement.

Author Contributions

Conceptualization, E.S.A. and A.B.H.; methodology, A.B.H., E.S.A. and G.A.A.; software, A.B.H. and G.A.A.; validation, E.S.A., H.M.B.; formal analysis, E.S.A., H.M.B.; investigation, H.M.B. and A.B.H.; resources, E.S.A. and H.M.B.; data curation, A.B.H., G.A.A.; writing—original draft preparation, A.B.H., E.S.A., G.A.A. and H.M.B.; writing—review and editing, A.B.H., E.S.A., G.A.A. and H.M.B.; visualization, G.A.A. and A.B.H.; supervision, A.B.H.; project administration, E.S.A. and A.B.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

For the purposes of this study, all experiments were performed using simulated data based on the actual data reported in [3]. The simulated data can be obtained from the corresponding author.

Acknowledgments

We truly appreciate the reviewers’ voluntary efforts and thank them for their input.

Conflicts of Interest

The authors declare no conflict of interest.

References

Al-Mssallem, M.Q.; Brown, J.E. Arabic coffee increases the glycemic index but not insulinemic index of dates. Saudi Med. J. 2013, 34, 923–928. [Google Scholar] [PubMed]
Butt, M.S.; Sultan, M.T. Coffee and its consumption: Benefits and risks. Crit. Rev. Food Sci. Nutr. 2011, 51, 363–373. [Google Scholar] [CrossRef] [PubMed]
Alamri, E.; Rozan, M.; Bayomy, H. A study of chemical Composition, Antioxidants, and volatile compounds in roasted Arabic coffee. Saudi J. Biol. Sci. 2022, 29, 3133–3139. [Google Scholar] [CrossRef]
Ciaramelli, C.; Palmioli, A.; Airoldi, C. Coffee variety, origin and extraction procedure: Implications for coffee beneficial effects on human health. Food Chem. 2019, 278, 47–55. [Google Scholar] [CrossRef] [PubMed]
Messina, G.; Zannella, C.; Monda, V.; Dato, A.; Liccardo, D.; De Blasio, S.; Valenzano, A.; Moscatelli, F.; Messina, A.; Cibelli, G.; et al. The beneficial effects of coffee in human nutrition. Biol. Med. 2015, 7, 1. [Google Scholar]
Poole, R.; Kennedy, O.J.; Roderick, P.; Fallowfield, J.A.; Hayes, P.C.; Parkes, J. Coffee consumption and health: Umbrella review of meta-analyses of multiple health outcomes. Bmj 2017, 359, j5024. [Google Scholar] [CrossRef] [Green Version]
Borota, D.; Murray, E.; Keceli, G.; Chang, A.; Watabe, J.M.; Ly, M.; Toscano, J.P.; Yassa, M.A. Post-study caffeine administration enhances memory consolidation in humans. Nat. Neurosci. 2014, 17, 201–203. [Google Scholar] [CrossRef] [Green Version]
Olson, C.A.; Thornton, J.A.; Adam, G.E.; Lieberman, H.R. Effects of 2 adenosine antagonists, quercetin and caffeine, on vigilance and mood. J. Clin. Psychopharmacol. 2010, 30, 573–578. [Google Scholar] [CrossRef]
Nehlig, A. Is caffeine a cognitive enhancer? J. Alzheimer’s Dis. 2010, 20, S85–S94. [Google Scholar] [CrossRef]
Cai, L.; Ma, D.; Zhang, Y.; Liu, Z.; Wang, P. The effect of coffee consumption on serum lipids: A meta-analysis of randomized controlled trials. Eur. J. Clin. Nutr. 2012, 66, 872–877. [Google Scholar] [CrossRef]
Zhou, A.; Hyppönen, E. Habitual coffee intake and plasma lipid profile: Evidence from UK Biobank. Clin. Nutr. 2021, 40, 4404–4413. [Google Scholar] [CrossRef]
AlQuaiz, A.M.; Kazi, A.; Tayel, S.; Shaikh, S.A.; Al-Sharif, A.; Othman, S.; Habib, F.; Fouda, M.; Sulaimani, R. Prevalence and factors associated with low bone mineral density in Saudi women: A community based survey. BMC Musculoskelet. Disord. 2014, 15, 5. [Google Scholar] [CrossRef]
Somporn, C.; Kamtuo, A.; Theerakulpisut, P.; Siriamornpun, S. Effects of roasting degree on radical scavenging activity, phenolics and volatile compounds of Arabica coffee beans (Coffea arabica L. cv. Catimor). Int. J. Food Sci. Technol. 2011, 46, 2287–2296. [Google Scholar] [CrossRef]
Wang, H.Y.; Qian, H.; Yao, W.R. Melanoidins produced by the Maillard reaction: Structure and biological activity. Food Chem. 2011, 128, 573–584. [Google Scholar] [CrossRef]
de Oliveira, E.M.; Leme, D.S.; Barbosa, B.H.G.; Rodarte, M.P.; Pereira, R.G.F.A. A computer vision system for coffee beans classification based on computational intelligence techniques. J. Food Eng. 2016, 171, 22–27. [Google Scholar] [CrossRef]
Arboleda, E.R.; Fajardo, A.C.; Medina, R.P. Classification of coffee bean species using image processing, artificial neural network and K nearest neighbors. In Proceedings of the 2018 IEEE International Conference on Innovative Research and Development (ICIRD), Bangkok, Thailand, 11–12 May 2018; pp. 1–5. [Google Scholar]
Arboleda, E.R. Comparing performances of data mining algorithms for classification of green coffee beans. Int. J. Eng. Adv. Technol 2019, 8, 1563–1567. [Google Scholar]
Pizzaia, J.P.L.; Salcides, I.R.; de Almeida, G.M.; Contarato, R.; de Almeida, R. Arabica coffee samples classification using a Multilayer Perceptron neural network. In Proceedings of the 2018 13th IEEE International Conference on Industry Applications (INDUSCON), Sao Paulo, Brazil, 12–14 November 2018; pp. 80–84. [Google Scholar]
Barbin, D.F.; Felicio, A.L.d.S.M.; Sun, D.W.; Nixdorf, S.L.; Hirooka, E.Y. Application of infrared spectral techniques on quality and compositional attributes of coffee: An overview. Food Res. Int. 2014, 61, 23–32. [Google Scholar] [CrossRef] [Green Version]
Calvini, R.; Ulrici, A.; Amigo, J.M. Practical comparison of sparse methods for classification of Arabica and Robusta coffee species using near infrared hyperspectral imaging. Chemom. Intell. Lab. Syst. 2015, 146, 503–511. [Google Scholar] [CrossRef] [Green Version]
Link, J.V.; Lemes, A.L.G.; Marquetti, I.; dos Santos Scholz, M.B.; Bona, E. Geographical and genotypic classification of arabica coffee using Fourier transform infrared spectroscopy and radial-basis function networks. Chemom. Intell. Lab. Syst. 2014, 135, 150–156. [Google Scholar] [CrossRef]
Mutz, Y.S.; do Rosario, D.; Galvan, D.; Schwan, R.F.; Bernardes, P.C.; Conte-Junior, C.A. Feasibility of NIR spectroscopy coupled with chemometrics for classification of Brazilian specialty coffee. Food Control 2023, 149, 109696. [Google Scholar] [CrossRef]
Okubo, N.; Kurata, Y. Nondestructive classification analysis of green coffee beans by using near-infrared spectroscopy. Foods 2019, 8, 82. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Makimori, G.Y.F.; Bona, E. Commercial instant coffee classification using an electronic nose in tandem with the ComDim-LDA approach. Food Anal. Methods 2019, 12, 1067–1076. [Google Scholar] [CrossRef]
Bona, E.; da Silva, R.S.d.S.F.; Borsato, D.; Bassoli, D.G. Optimized neural network for instant coffee classification through an electronic nose. Int. J. Food Eng. 2011, 7, 1–19. [Google Scholar] [CrossRef]
Tang, C.L.; Chou, T.I.; Yang, S.R.; Lin, Y.J.; Ye, Z.K.; Chiu, S.W.; Lee, S.W.; Tang, K.T. Development of a Nondestructive Moldy Coffee Beans Detection System Based on Electronic Nose. IEEE Sens. Lett. 2023, 7, 1–4. [Google Scholar] [CrossRef]
Salmanpour, M.R.; Rezaeijo, S.M.; Hosseinzadeh, M.; Rahmim, A. Deep versus Handcrafted Tensor Radiomics Features: Prediction of Survival in Head and Neck Cancer Using Machine Learning and Fusion Techniques. Diagnostics 2023, 13, 1696. [Google Scholar] [CrossRef]
Rezaeijo, S.M.; Nesheli, S.J.; Serj, M.F.; Birgani, M.J.T. Segmentation of the prostate, its zones, anterior fibromuscular stroma, and urethra on the MRIs and multimodality image fusion using U-Net model. Quant. Imaging Med. Surg. 2022, 12, 4786–4804. [Google Scholar] [CrossRef]
Alrasyid, M.A.; Rohmatulloh, B.; Damayanti, R.; Al-Riza, D.F.; Hermanto, M.B.; Sandra, S.; Hendrawan, Y. ResNet-50 to classify the types of Indonesian local coffee beans. In AIP Conference Proceedings; AIP Publishing: New York, NY, USA, 2023; Volume 2596. [Google Scholar]
Kesiman, M.W.A.; Sulaiman, I. Semi-automatic Ground Truth Image Construction for Coffee Bean Defects Classification Based on SNI 01-2907-2008. In Proceedings of the 3rd International Conference on Smart and Innovative Agriculture (ICoSIA 2022); Atlantis Press: Paris, France, 2023; pp. 453–463. [Google Scholar]
Valles-Coral, M.A.; Ivan Bernales-del Aguila, C.; Benavides-Cuvas, E.; Cabanillas-Pardo, L. Effectiveness of a cherry coffee sorter prototype with image recognition using machine learning. Braz. J. Agric. Sci./Revista Brasileira Ciências Agrárias 2023, 18, 1–7. [Google Scholar] [CrossRef]
Maghfirah, A.; Nasution, I. Application of colour, shape, and texture parameters for classifying the defect of Gayo Arabica green coffee bean using computer vision. Iop Conf. Ser. Earth Environ. Sci. 2022, 951, 012097. [Google Scholar] [CrossRef]
Taghizadeh, E.; Heydarheydari, S.; Saberi, A.; JafarpoorNesheli, S.; Rezaeijo, S.M. Breast cancer prediction with transcriptome profiling using feature selection and machine learning methods. BMC Bioinform. 2022, 23, 410. [Google Scholar] [CrossRef]
Ruttanadech, N.; Phetpan, K.; Srisang, N.; Srisang, S.; Chungcharoen, T.; Limmun, W.; Youryon, P.; Kongtragoul, P. Rapid and accurate classification of Aspergillus ochraceous contamination in Robusta green coffee bean through near-infrared spectral analysis using machine learning. Food Control 2023, 145, 109446. [Google Scholar] [CrossRef]
Dharmawan, A.; Masithoh, R.E.; Amanah, H.Z. Development of PCA-MLP Model Based on Visible and Shortwave Near Infrared Spectroscopy for Authenticating Arabica Coffee Origins. Foods 2023, 12, 2112. [Google Scholar] [CrossRef] [PubMed]
Souza, J.C.; Pasquini, C.; Hespanhol, M.C. Feasibility of compact near-infrared spectrophotometers and multivariate data analysis to assess roasted ground coffee traits. Food Control 2022, 138, 109041. [Google Scholar] [CrossRef]
Phuangsaijai, N.; Theanjumpol, P.; Kittiwachana, S. Performance Optimization of a Developed Near-Infrared Spectrometer Using Calibration Transfer with a Variety of Transfer Samples for Geographical Origin Identification of Coffee Beans. Molecules 2022, 27, 8208. [Google Scholar] [CrossRef]
Luo, S.; Yan, C.; Chen, D. Preliminary study on coffee type identification and coffee mixture analysis by light emitting diode induced fluorescence spectroscopy. Food Control 2022, 138, 109044. [Google Scholar] [CrossRef]
Hu, Q.; Sellers, C.; Kwon, J.S.I.; Wu, H.J. Integration of surface-enhanced Raman spectroscopy (SERS) and machine learning tools for coffee beverage classification. Digit. Chem. Eng. 2022, 3, 100020. [Google Scholar] [CrossRef] [PubMed]
Belchior, V.; Botelho, B.G.; Franca, A.S. Comparison of spectroscopy-based methods and chemometrics to confirm classification of specialty coffees. Foods 2022, 11, 1655. [Google Scholar] [CrossRef] [PubMed]
Tamayo-Monsalve, M.A.; Mercado-Ruiz, E.; Villa-Pulgarin, J.P.; Bravo-Ortíz, M.A.; Arteaga-Arteaga, H.B.; Mora-Rubio, A.; Alzate-Grisales, J.A.; Arias-Garzon, D.; Romero-Cano, V.; Orozco-Arias, S.; et al. Coffee maturity classification using convolutional neural networks and transfer learning. IEEE Access 2022, 10, 42971–42982. [Google Scholar] [CrossRef]
Manuel, M.N.B.; da Silva, A.C.; Lopes, G.S.; Ribeiro, L.P.D. One-class classification of special agroforestry Brazilian coffee using NIR spectrometry and chemometric tools. Food Chem. 2022, 366, 130480. [Google Scholar] [CrossRef]
Gope, H.L.; Fukai, H. Peaberry and normal coffee bean classification using CNN, SVM, and KNN: Their implementation in and the limitations of Raspberry Pi 3. AIMS Agric. Food 2022, 7, 149–167. [Google Scholar] [CrossRef]
Adiwijaya, N.O.; Romadhon, H.I.; Putra, J.A.; Kuswanto, D.P. The quality of coffee bean classification system based on color by using k-nearest neighbor method. J. Phys. Conf. Ser. 2022, 2157, 012034. [Google Scholar] [CrossRef]
Pahlawan, M.F.R.; Masithoh, R.E. Vis-NIR Spectroscopy and PLS-Da model for classification of Arabica and robusta roasted coffee bean. Adv. Sci. Technol. 2022, 115, 45–52. [Google Scholar]
Figueroa, R.L.; Zeng-Treitler, Q.; Kandula, S.; Ngo, L.H. Predicting sample size required for classification performance. BMC Med. Inform. Decis. Mak. 2012, 12, 8. [Google Scholar]
Hassanat, A.B.; Prasath, V.S.; Al-kasassbeh, M.; Tarawneh, A.S.; Al-shamailh, A.J. Magnetic energy-based feature extraction for low-quality fingerprint images. Signal Image Video Process. 2018, 12, 1471–1478. [Google Scholar] [CrossRef]
Hassanat, A.B. On identifying terrorists using their victory signs. Data Sci. J. 2018, 17, 1–13. [Google Scholar] [CrossRef] [Green Version]
Hassanat, A.B.; Btoush, E.; Abbadi, M.A.; Al-Mahadeen, B.M.; Al-Awadi, M.; Mseidein, K.I.; Almseden, A.M.; Tarawneh, A.S.; Alhasanat, M.B.; Prasath, V.S.; et al. Victory sign biometrie for terrorists identification: Preliminary results. In Proceedings of the 2017 8th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 4–6 April 2017; pp. 182–187. [Google Scholar]
Bohn, B.; Garcke, J.; Iza-Teran, R.; Paprotny, A.; Peherstorfer, B.; Schepsmeier, U.; Thole, C.A. Analysis of car crash simulation data with nonlinear machine learning methods. Procedia Comput. Sci. 2013, 18, 621–630. [Google Scholar]
Chen, H.; Kätelhön, E.; Compton, R.G. Machine Learning in Fundamental Electrochemistry: Recent Advances and Future Opportunities. Curr. Opin. Electrochem. 2023, 38, 101214. [Google Scholar]
Koutsoupakis, J.; Seventekidis, P.; Giagopoulos, D. Machine learning based condition monitoring for gear transmission systems using data generated by optimal multibody dynamics models. Mech. Syst. Signal Process. 2023, 190, 110130. [Google Scholar] [CrossRef]
Tarawneh, A.S.; Hassanat, A.B.; Altarawneh, G.A.; Almuhaimeed, A. Stop Oversampling for Class Imbalance Learning: A Review. IEEE Access 2022, 10, 47643–47660. [Google Scholar] [CrossRef]
Elreedy, D.; Atiya, A.F.; Kamalov, F. A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning. In Machine Learning; Springer: Berlin/Heidelberg, Germany, 2023; pp. 1–21. [Google Scholar]
Hagtvedt, R.; Jones, G.T.; Jones, K. Pedagogical simulation of sampling distributions and the central limit theorem. Teach. Stat. 2007, 29, 94–97. [Google Scholar]
Witten, I.H.; Frank, E. Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed.; Morgan Kaufmann: Burlington, MA, USA, 2005. [Google Scholar]
Kaburlasos, V.G.; Athanasiadis, I.N.; Mitkas, P.A. Fuzzy lattice reasoning (FLR) classifier and its application for ambient ozone estimation. Int. J. Approx. Reason. 2007, 45, 152–188. [Google Scholar] [CrossRef] [Green Version]
Gupta, S.C.; Goel, N. Predictive Modeling and Analytics for Diabetes using Hyperparameter tuned Machine Learning Techniques. Procedia Comput. Sci. 2023, 218, 1257–1269. [Google Scholar]
Ahamad, G.N.; Shafiullah; Fatima, H.; Imdadullah; Zakariya, S.M.; Abbas, M.; Alqahtani, M.S.; Usman, M. Influence of Optimal Hyperparameters on the Performance of Machine Learning Algorithms for Predicting Heart Disease. Processes 2023, 11, 734. [Google Scholar] [CrossRef]
Sikora, M.; Złotek, U.; Kordowska-Wiater, M.; Świeca, M. Spicy Herb Extracts as a Potential Improver of the Antioxidant Properties and Inhibitor of Enzymatic Browning and Endogenous Microbiota Growth in Stored Mung Bean Sprouts. Antioxidants 2021, 10, 425. [Google Scholar] [PubMed]
Kozubal, J.V.; Hassanat, A.; Tarawneh, A.S.; Wróblewski, R.J.; Anysz, H.; Valença, J.; Júlio, E. Automatic strength assessment of the virtually modelled concrete interfaces based on shadow-light images. Constr. Build. Mater. 2022, 359, 129296. [Google Scholar] [CrossRef]
Hassanat, A.B.; Tarawneh, A.S.; Abed, S.S.; Altarawneh, G.A.; Alrashidi, M.; Alghamdi, M. Rdpvr: Random data partitioning with voting rule for machine learning from class-imbalanced datasets. Electronics 2022, 11, 228. [Google Scholar] [CrossRef]
Hassanat, A.B.; Ali, H.N.; Tarawneh, A.S.; Alrashidi, M.; Alghamdi, M.; Altarawneh, G.A.; Abbadi, M.A. Magnetic Force Classifier: A Novel Method for Big Data Classification. IEEE Access 2022, 10, 12592–12606. [Google Scholar] [CrossRef]
Hassanat, A.; Alkafaween, E.; Tarawneh, A.S.; Elmougy, S. Applications Review of Hassanat Distance Metric. In Proceedings of the 2022 International Conference on Emerging Trends in Computing and Engineering Applications (ETCEA), Karak, Jordan, 23–24 November 2022; pp. 1–6. [Google Scholar]

Figure 1. The distribution of each simulated feature. The x-axis of the graph displays the values associated with each feature, while the y-axis represents the frequency of occurrence for each class. The visualization of the data is generated using Weka 3 [56].

Figure 2. The results of the Pearson correlation analysis between all features and the class variable, which represents the three coffee types.

Figure 3. The importance scores of each feature were determined using a random forest algorithm to classify the three different coffee types.

Figure 4. The crude fiber% as a function of crude protein for all simulated data points. The visualization of the data is generated using Weka 3 [56].

Figure 5. The crude fiber% as a function of crude protein for all 2STD simulated data points. The visualization of the data is generated using Weka 3 [56].

Figure 6. J48 decision tree for classifying coffee types based on 2STD simulated chemical composition information. The visualization of the data is generated using Weka 3 [56].

Table 1. Caffeine content, acrylamide, and free radical scavenging capacity (DPPH) in light, medium, and dark coffee as reported by [3].

Degree	Light Coffee	Medium Coffee	Dark Coffee
Caffeine content %	1.13 ± 0.02	1.17 ± 0.07	1.08 ± 0.06
Acrylamide (mg/100 gm)	0.41 ± 0.086	0.31 ± 0.063	0.36 ± 0.048
DPPH (mg TE/g)	88.72 ± 2.91	84.61 ± 1.76	78.76 ± 2.49

Table 2. Proximate chemical composition (% in dry matter) content in light, medium, and dark coffee as reported by [3].

Degree	Light Coffee	Medium Coffee	Dark Coffee
Moisture content %	4.80 ± 0.24	4.30 ± 0.17	3.89 ± 0.28
Ether extract %	10.39 ± 0.30	10.47 ± 0.19	10.65 ± 0.22
Crude protein %	13.05 ± 0.14	12.36 ± 0.24	11.10 ± 0.06
Crude fiber %	24.24 ± 0.47	28.31 ± 0.31	28.40 ± 0.42
Ash content %	3.95 ± 0.26	3.89 ± 0.08	4.10 ± 0.17
Nitrogen-free extract (NFE) %	48.37	44.97	45.76

Table 3. Color information in light, medium, and dark coffee as reported by [3].

Degree	Light Coffee	Medium Coffee	Dark Coffee
Browning index	0.4540 ± 0.13	0.8600 ± 0.13	1.8400 ± 0.24
L*	58.62 ± 2.73	48.83 ± 1.73	41.04 ± 3.06
a*	9.75 ± 0.44	13.04 ± 0.07	13.93 ± 0.62
b*	31.74 ± 1.21	32.21 ± 0.75	29.80 ± 1.52

Table 4. The statistical characteristics of the simulated data, 1000 samples for each class, in addition to the 3 actual samples.

Class	Light Coffee			Medium Coffee			Dark Coffee
Features	Min	Max	Mean	Min	Max	Mean	Min	Max	Mean
MoistureContent	3.98	5.51	4.80	3.67	4.83	4.30	2.89	4.93	3.90
EtherExtract	9.36	11.29	10.39	9.90	11.04	10.48	10.04	11.26	10.65
CrudeProtein	12.58	13.52	13.05	11.64	13.13	12.36	10.91	11.28	11.10
CrudeFiber	22.70	25.64	24.23	27.30	29.55	28.31	27.02	29.63	28.39
AshContent	2.88	4.88	3.95	3.63	4.16	3.89	3.51	4.59	4.10
NFE	45.29	51.44	48.36	41.65	48.17	44.98	42.48	49.27	45.74
CaffeineContent	1.07	1.19	1.13	0.95	1.42	1.17	0.92	1.27	1.08
Acrylamide	0.12	0.66	0.40	0.04	0.52	0.31	0.20	0.50	0.36
DPPH	78.22	98.72	88.80	79.54	90.07	84.68	70.40	86.05	78.64
Browning Index	0.02	0.90	0.44	0.35	1.23	0.86	1.04	2.51	1.84
Lcolor	48.67	66.36	58.44	43.52	53.87	48.81	30.15	52.24	41.10
aColor	8.31	11.25	9.78	12.79	13.24	13.04	12.08	16.01	13.92
bColor	27.56	35.87	31.75	30.00	34.46	32.22	25.18	35.43	29.76

Table 5. Classification results of Arabic coffee based on its color information: L*, a* and b*.

Classifier	Precision	Recall	F-Measure
RF	1.000	1.000	1.000
NB	0.998	0.998	0.998
SVM	0.993	0.993	0.993
KNN	0.994	0.994	0.994
J48	0.993	0.993	0.993
FLR	0.889	0.835	0.825
AB	0.984	0.984	0.984

Table 6. Classification results of Arabic coffee based on its antioxidant information: caffeine percentage, acrylamide concentration, and DPPH.

Classifier	Precision	Recall	F-Measure
RF	0.900	0.899	0.900
NB	0.912	0.912	0.912
SVM	0.866	0.856	0.858
KNN	0.868	0.868	0.868
J48	0.893	0.892	0.892
FLR	0.634	0.636	0.632
AB	0.821	0.818	0.819

Table 7. Classification results of Arabic coffee based on its chemical compositions information: moisture content %, ether extract %, crude protein %, crude fiber %, ash content %, and nfe %.

Classifier	Precision	Recall	F-Measure
RF	1.000	1.000	1.000
NB	1.000	1.000	1.000
SVM	1.000	1.000	1.000
KNN	1.000	1.000	1.000
J48	0.999	0.999	0.999
FLR	1.000	1.000	1.000
AB	1.000	1.000	1.000

Table 8. Classification results of Arabic coffee based on its color information: L*, a*, and b*, obtained from 2STD simulated data.

Classifier	Precision	Recall	F-Measure
RF	0.968	0.968	0.968
NB	0.972	0.972	0.972
SVM	0.953	0.952	0.952
KNN	0.957	0.956	0.956
J48	0.964	0.964	0.964
FLR	0.803	0.755	0.752
AB	0.829	0.724	0.667

Table 9. The confusion matrices of four classifiers of 2STD simulated color data.

	Light	Medium	Dark	Light	Medium	Dark
Light	1001	0	0	992	0	9
Medium	0	980	21	0	968	33
Dark	12	53	936	9	94	898
	KNN			RF
Light	991	0	10	993	1	7
Medium	0	959	42	0	969	32
Dark	8	71	922	10	47	944

Table 10. Classification results of Arabic coffee based on its 2STD simulated antioxidant information: caffeine percentage, acrylamide concentration, and DPPH.

Classifier	Precision	Recall	F-Measure
RF	0.701	0.700	0.700
NB	0.740	0.740	0.740
SVM	0.678	0.660	0.662
KNN	0.640	0.641	0.640
J48	0.749	0.703	0.725
FLR	0.427	0.426	0.422
AB	0.548	0.537	0.445

Table 11. Classification results of Arabic coffee based on its 2STD simulated chemical compositions information: moisture content %, ether extract %, crude protein %, crude fiber %, ash content %, and NFE) %.

Classifier	Precision	Recall	F-Measure
RF	0.991	0.991	0.991
NB	0.993	0.993	0.993
SVM	0.967	0.998	0.982
KNN	0.983	0.983	0.983
J48	0.985	0.985	0.985
FLR	0.897	0.873	0.870
AB	0.985	0.985	0.985

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alamri, E.S.; Altarawneh, G.A.; Bayomy, H.M.; Hassanat, A.B. Machine Learning Classification of Roasted Arabic Coffee: Integrating Color, Chemical Compositions, and Antioxidants. Sustainability 2023, 15, 11561. https://0-doi-org.brum.beds.ac.uk/10.3390/su151511561

AMA Style

Alamri ES, Altarawneh GA, Bayomy HM, Hassanat AB. Machine Learning Classification of Roasted Arabic Coffee: Integrating Color, Chemical Compositions, and Antioxidants. Sustainability. 2023; 15(15):11561. https://0-doi-org.brum.beds.ac.uk/10.3390/su151511561

Chicago/Turabian Style

Alamri, Eman S., Ghada A. Altarawneh, Hala M. Bayomy, and Ahmad B. Hassanat. 2023. "Machine Learning Classification of Roasted Arabic Coffee: Integrating Color, Chemical Compositions, and Antioxidants" Sustainability 15, no. 15: 11561. https://0-doi-org.brum.beds.ac.uk/10.3390/su151511561

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Classification of Roasted Arabic Coffee: Integrating Color, Chemical Compositions, and Antioxidants

Abstract

1. Introduction

2. Related Work

2.1. Computer Vision

2.2. Infrared Spectroscopy

2.3. Electronic Nose

3. Materials, Data, and Methods

3.1. Materials

3.2. Data

3.3. Feature Importance

3.4. Methods: Machine Learning Classifiers

4. Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI