1. Introduction
Arabic coffee in the Arab world is considered an important part of Arab culture and tradition, particularly in Saudi Arabia and the surrounding Arab countries, where it is said to be the most popular hot beverage [
1]. Arabica coffee beans are the primary ingredient [
2]. An estimated 1.4 billion cups of coffee are consumed daily around the world. The consumption of coffee has significantly grown in Saudi Arabia, where 18,000 tons of coffee are imported each year at a cost of SAR 54 million [
3].According to recent estimates, the typical Saudi adult consumes between 60 and 300 mL of Arabic coffee in one serving, amounting to 1.6 kg of coffee per person yearly [
2].
Coffee drinking has certain beneficial effects on human health as a result of its biochemical features [
4,
5], especially non-communicable diseases [
6]. There is proof that drinking coffee can improve several functions, such as memory, mood, and cognitive performance [
7,
8,
9].
On the other hand, coffee can raise LDL-C and the body’s total cholesterol, which can increase the risk of cardiovascular disease [
10,
11]. Additionally, drinking Arabic coffee is linked to a significant osteoporosis increase among Saudi females over the age of 40 [
12]. Regardless of whether drinking coffee is harmful or beneficial, it is apparent that there are many people who do so worldwide, especially in Saudi Arabia when it comes to Arabic coffee.
Arabic coffee comes in several types with varying levels of roasting. The roasting temperature and roasting time affect the degree of roasting, and these two factors determine whether the coffee is a light, medium, or dark roast in terms of color [
13]. The biological activity and chemical composition of the coffee can be altered significantly during the roasting process. According to Wang et al. [
14], certain components, such as natural phenolic compounds, can be lost, while other elements, such as antioxidants, including Maillard reaction products, can be generated. As a result, antioxidants can be either preserved or enhanced.
Hence, it is important to be able to classify coffee into one of its three common roasting degrees (light, medium, or dark) for human consumption since each has a different effect on health. For instance, Alamri et al. [
3] recommended medium Arabic coffee as having a high concentration of chemicals with antioxidants and other biologically advantageous effects.
The interesting research of Alamri et al. [
3] examined the chemical composition of typically roasted Arabic coffee at three distinct roasting levels: light, medium, and dark. To analyze various facets of coffee, the researchers used a variety of methodologies. They measured the amount of caffeine using a UV-visible spectrophotometer, the amount of acrylamide using a gas chromatograph, and the amount of free radical scavenging power using the 1,1-diphenyl-2-picryl-hydrazy (DPPH) technique. They also used gas chromatography-mass spectrometry (GC-MS) to separate and distinguish the volatile components contained in the coffee samples, and they estimated the browning index. A temperature of 180 ± 10 °C was held for around
min for the light roast,
min for the medium roast, and
min for the dark roast. Their laboratory experiments were conducted on nine Arabic coffee samples, three for each type. According to their findings, roasting Arabic coffee alters its chemical compositions, such as moisture content, ether extract, crude protein, crude fiber, ash content, nitrogen-free extract (NFE), caffeine content, acrylamide levels, and DPPH activity;
Table 1 and
Table 2 demonstrate that each roasting degree has unique chemical component values. in addition, they provided distinctive color information for each coffee type using the L* a* b* color system and browning index as shown in
Table 3.
There are several studies in the literature designed to classify Arabic coffee and coffee, in general, using various sources of information and technologies, including coffee bean images [
15,
16,
17,
18], infrared spectroscopy [
19,
20,
21,
22,
23], and electronic nose [
24,
25,
26]. To the best of our knowledge, none of them, however, address the chemical classification of Arabic coffee.
The aim of this paper is to investigate the classification of Arabic coffee using its chemical compounds. For this purpose, we utilize the data obtained by [
3] (shown in
Table 1,
Table 2 and
Table 3) to generate new samples for each type of Arabic coffee using the Monte Carlo method because we only have one mean value for each feature, ±, its standard deviation. Then, using the simulated data, we employ a variety of machine-learning approaches to classify Arabic coffee; feature importance is also explored in this study to determine which chemical components have the most influence on the classification process. It is worth noting that obtaining significant samples of real data for each chemical test involves laboratory testing, which comes at a very high cost that we cannot afford at present.
The contributions of this study can be summarized as follows:
According to the literature, the majority of coffee sample classification methods presented are based on shape, size, color, infrared spectroscopy, and/or aroma. To the best of our knowledge, none of them address the chemical component classification of Arabic coffee samples.
Our study made some notable findings after conducting a large number of experiments:
Color information alone was critical in correctly identifying Arabic coffee in three categories. The wide range of CIE color values and their high association with coffee classes all contributed to the flawless classification results.
It was found that antioxidant information alone was insufficient for accurate coffee classification. Using only antioxidant information resulted in a considerable decrease in classification performance.
Chemical composition data exhibited significant discrimination power in identifying Arabic coffee types. The use of chemical composition features alone resulted in flawless categorization results, emphasizing their importance and separability among coffee categories. Specific characteristics, such as crude protein and crude fiber, showed high relationships, emphasizing their importance in the classification process.
As the data complexity increased, the classification performance of the antioxidant information and the choice of the classifier became increasingly important. The overlapping data and complex decision tree rules highlighted the challenges of dealing with complex data and underscored the need for selecting proper classifiers.
These significant findings contribute to a better knowledge of Arabic coffee classification and emphasize the importance of color information, chemical composition, and the issues connected with antioxidant data for achieving correct classification results.
4. Results
The chemical tests conducted in the laboratory for the three types of coffee by [
3] can be classified into three categories of information as follows:
Arabic coffee color information: This category includes measurements such as the browning index, which is determined using a UV-Vis spectrophotometer at a wavelength of 420 nm; in addition, it includes color information based on the International Commission of Illumination (CIE) color system, which involves the L*, a*, and b* channels.
Arabic coffee antioxidant information: This category comprises the caffeine percentage, acrylamide concentration (measured in mg per 100 g), and DPPH.
Arabic coffee chemical compositions: This category includes the percentage of moisture content, percentage of ether extract, percentage of crude protein, percentage of crude fiber, percentage of ash content, and percentage of NFE.
A series of experiments focusing on the aforementioned types of information were carried out to evaluate the proposed Arabic coffee classification system based on its chemical properties. It is worth noting that all of the tests used machine learning methods, and a 10-fold cross-validation approach was applied with each method’s default parameters. This approach ensured that the evaluation was consistent and unbiased across all experiments.
Table 5 shows the classification results of Arabic coffee based on its color information: L*, a*, and b*. We did not add the browning index to this information category because it can be assessed from the L*, a*, and b* channels according to [
60].
According to the data presented in
Table 5, it is evident that the color information alone was sufficient to achieve the flawless classification of Arabic coffee into its three types: light, medium, and dark. This observation can be attributed to the fact that the CIE colors of these three classes are easily differentiated. Furthermore,
Figure 2 depicts the high correlation between color information and coffee classes, underlining the association between color and coffee classification. Additionally,
Figure 3 demonstrates the significant value of color information in discriminating across coffee types.
Table 6 presents the classification results of Arabic coffee based on its antioxidant information: caffeine percentage, acrylamide concentration, and DPPH.
The results presented in
Table 6 show that when both the color and chemical composition information were eliminated, leaving only the antioxidant information, the classification results significantly decreased. The performance of the classifiers varied, with the best performer being NB, followed by RF. This means that depending exclusively on antioxidant information is insufficient for accurate Arabic coffee classification. The correlation analysis illustrated in
Figure 2 and
Figure 3 supports this observation. The correlation values and importance rankings show that variables like caffeine percentage and acrylamide concentration are less important in the classification process and have weaker correlations with the dependent variable (coffee class).
A closer look at
Figure 1 reveals that some of the curves depicting these features for the three types of Arabic coffee have significant overlap. This overlap implies that the values of these features for various types of coffee are not unique enough to create obvious borders or distinctions between the classes. As a result, depending entirely on antioxidant features for classification would almost certainly result in misclassifications or poorer classification accuracy. The observed difficulties in effectively identifying Arabic coffee based on antioxidant information can be attributable mostly to the near proximity of these variables’ actual values. As shown in
Table 1, the caffeine percentages of light, medium, and dark coffee are reported as 1.13 ± 0.02, 1.17 ± 0.07, and 1.08 ± 0.06, respectively.
Table 7 presents the classification results of Arabic coffee based on its chemical compositions information: moisture content %, ether extract %, crude protein %, crude fiber %, ash content %, and NFE %.
According to the data in
Table 7, all classifiers’ classification results are perfect, with the exception of J48, which earned a classification accuracy of 0.999. However, this classifier achieved perfect training accuracy by obtaining the following rules:
If CrudeProtein ≤ 11.283023, then class = 3 (1001 out of 1001 examples).
If CrudeProtein > 11.283023 and CrudeFiber ≤ 25.642686, then class = 1 (1001 out of 1001 examples).
If CrudeProtein > 11.283023 and CrudeFiber > 25.642686, then class = 2 (1001 out of 1001 examples).
This level of agreement and excellent accuracy across classifier families shows that the training data utilized in the studies were distinct and well separated. Based on these outcomes, it is clear that chemical composition data alone were sufficient to achieve the faultless classification of Arabic coffee into three types: light, medium, and dark. This great precision is most likely due to the distinctive characteristics of the chemical composition information for these classes.
Figure 2 also shows a substantial link between chemical composition information and coffee classes. The percentages of crude protein and crude fiber, in particular, have strong correlation values of
and 0.80, respectively. This demonstrates the connection and discriminatory power of these variables in classifying coffee types.
Furthermore,
Figure 3 shows that when compared to other features, the percentage of crude protein ranks first, and the percentage of crude fiber is ranked third. This highlights the importance and discriminative capacity of this type of information in the classification of coffee types, particularly, crude protein and crude fiber, which are extremely separable as shown in
Figure 4 and can achieve perfect classification alone, even without the need for the other variables.
Based on the preceding results, it is evident that the simulated data used for classification are clean and well suited for various types of classifiers, owing to the features’ strong discriminatory power. This is especially true for the categories of color information and chemical composition information.
The standard deviation of the normal distribution for each feature was doubled (2STD) to bring more unpredictability and variability into the simulated data, resulting in a dataset with higher dispersion. By providing more variability in the feature values, this modification attempts to increase diversity and make the classification task more challenging. The classification results of the new 2STD simulated data are shown in
Table 8,
Table 9,
Table 10 and
Table 11.
According to the results obtained from the newly generated data with doubled standard deviation, the classification’s performance of Arabic coffee based on its 2STD simulated chemical composition information is reduced by 1% to 3%. Similarly, depending on the classifier, the classification results based on its 2STD simulated color information are reduced by 2% to 5%. This decrease in performance might be due to the growing complexity of the data, which causes classifiers to differ in their performance. It is worth mentioning that clean data often yield superior classification accuracy regardless of the classifier used.
In the case of the 2STD simulated antioxidant information, the classification results show a drop of 16% to 20%, depending on the classifier utilized. The KNN classifier, in particular, performs poorly in this scenario due to its reliance on similarity across samples for classification, which becomes less accurate when dealing with complicated data.
This drop in performance can be traced primarily to the nature of the original data shown in
Table 1. The antioxidant characteristics in the table demonstrate no significant difference between the three coffee classes (light, medium, and dark). As a result, increasing the standard deviation to generate the 2STD simulated data reduces the distinctiveness and separability of the antioxidant characteristics, resulting in an overall decrease in classification accuracy.
When analyzing the antioxidant classification results, it is critical to recognize these constraints as well as the impact of data characteristics. In such circumstances, additional characteristics or alternative methodologies may be necessary to improve classification’s performance.
The observed decline in classification’s performance, as well as the variation in performance among classifiers, emphasizes the need for classifier selection when dealing with complicated data.
Figure 4 depicts the link between crude fiber% and crude protein for all 1STD simulated data points. This graph depicts the relationship between both of these features and their distribution among different coffee types. We can see the patterns and separability of the data points in
Figure 4, revealing the discriminative power of crude fiber% and crude protein when classifying the coffee types.
On the other hand,
Figure 5 depicts the same features (crude fiber% and crude protein) but with 2STD simulated data, which creates greater complexity and overlapping among the data points. In
Figure 5, the overlapping data and growing complexity demonstrate the difficulties that come with dealing with complicated data. This underscores the significance of proper classifier selection when dealing with such complex data, as different classifiers’ performance may vary as data complexity increases.
This intricacy is illustrated further in
Figure 6 by the decision tree of J48, where the rules become more intricate. However, it is worth mentioning that the increase in data complexity has no significant effect on classification accuracy for color information and chemical composition.
5. Conclusions
In this paper, we evaluated the applicability of classifying Arabic coffee into three major varieties, light, medium, and dark, using simulated data based on Iman’s actual data of color information, antioxidant laboratory tests, and chemical composition tests. We created two types of simulated data using the Monte Carlo approach: the first used the standard deviation of each of the actual measures, and the second used a double standard deviation of each of the actual measures. We classified both of the simulated data using a number of commonly used classifiers.
Based on our large set of experiments, several key findings emerged:
The color information alone proved to be quite successful in accurately classifying Arabic coffee into three types: light, medium, and dark. The different CIE color values, as well as the excellent correlation between color information and coffee classes, were critical in achieving faultless classification outcomes.
When the antioxidant information was considered alone, it became clear that this type of information is insufficient for correct coffee classification. The results showed that relying simply on antioxidant information reduces classification performance significantly and different classifiers perform differently. The KNN classifier, in particular, struggled with the additional data complexity (2STD simulated data).
In identifying Arabic coffee types, the chemical composition information revealed significant discriminatory strength. This information alone produced flawless classification results, demonstrating its significance and separability among the coffee groups. The substantial correlation between certain chemical composition features, such as crude protein and crude fiber, underscores their importance in the classification process even more.
Classification performance regarding the antioxidant information and the choice of classifier grew more relevant as the data complexity rose. The overlapping curves and increasingly elaborate decision tree rules demonstrated the difficulties encountered when dealing with complex data and the importance of proper classifier selection.
Indeed, based on the data stated, color information alone proved the ability to reliably identify the type of coffee, showing its practical application potential. Using this knowledge, it is possible to create a smartphone application that analyzes the color of a coffee sample using image recognition technology. Such an application could allow end users, such as consumers, to utilize their smartphone’s camera to snap an image of their coffee. The application would next transform the image’s RGB pixel values to the CIE color space, extracting the color information required for classification. After recognizing the coffee type, the application might offer the consumer with useful information on the chemical components of their coffee. Details such as caffeine content, antioxidant levels, moisture content, and other pertinent chemical compositions could be included. Our future mobile application would provide users with a handy and user-friendly tool for understanding the qualities of their coffee and making informed decisions depending on their tastes by delivering such information. Furthermore, it has the potential to increase transparency in the coffee sector, allowing customers to make more educated purchasing decisions.
There are various limitations that could be addressed in future coffee classification studies:
The paper largely concentrated on color, antioxidant, and chemical composition information. Exploring additional features that could improve classification accuracy, such as scent profiles, geographical origin, or processing processes, would be advantageous. Incorporating a larger variety of characteristics may result in a more comprehensive understanding of the factors determining coffee classification.
According to the findings, the distinctness and separability of certain variables, particularly antioxidant information, were limited, resulting in lower classification accuracy. Introducing more diverse and representative datasets with a broader range of actual (not simulated) antioxidant feature values could help alleviate this restriction and provide a more genuine portrayal of coffee variations.
The study focused on several off-the-shelf classifiers, although different algorithms or ensemble approaches could produce better results [
61,
62,
63,
64]. Exploring and comparing different classifier alternatives could provide useful insights into the most successful ways for coffee classification.
This paper focused on Arabic coffee varieties and classification. It would be beneficial to replicate the study using different coffee types from various places throughout the world to improve the generalizability of the findings. This would assist in testing the categorization algorithms’ efficiency across different coffee varieties and broaden the research’s practicality.
In our investigation, we used simulated data instead of real-world samples. We understand that this approach may introduce variations and shortcomings, particularly in terms of accurately capturing the underlying distribution of the coffee samples and their chemical properties. While simulated data can be a viable option when laboratory testing for a large number of coffee samples is prohibitively expensive, we recognize the need to address any biases associated with this technique. We will make an attempt in the future to examine these constraints in greater depth and to investigate techniques to alleviate any deviations or uncertainties generated by using simulated data.
Future studies should look into factors other than color, antioxidants, and chemical composition. Incorporating odor profiles, origin information, and processing methods may increase classification accuracy and provide a more complete understanding of coffee differences.
To solve shortcomings, future studies might use more diverse and representative datasets, experiment with other classifiers, and go beyond Arabic coffee types to improve generalizability. They may concentrate on these factors, as well as the development of the mobile application for practical usage.
Overall, this research adds to the field of coffee classification, provides practical insights, and identifies areas for further research and advancement.