Next Article in Journal
Highly Sensitive and Selective Graphene Nanoribbon Based Enzymatic Glucose Screen-Printed Electrochemical Sensor
Next Article in Special Issue
State of the Art on Developments of (Bio)Sensors and Analytical Methods for Rifamycin Antibiotics Determination
Previous Article in Journal
PRAGAN: Progressive Recurrent Attention GAN with Pretrained ViT Discriminator for Single-Image Deraining
Previous Article in Special Issue
A Self-Driven Microfluidic Chip for Ricin and Abrin Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multivariate Curve Resolution Alternating Least Squares Analysis of In Vivo Skin Raman Spectra

1
Department of Laser and Biotechnical Systems, Samara University, Samara 443086, Russia
2
Department of Oncology, Samara State Medical University, Samara 443099, Russia
3
Department of Visual Localization Tumors, Samara Regional Clinical Oncology Dispensary, Samara 443031, Russia
*
Author to whom correspondence should be addressed.
Submission received: 27 September 2022 / Revised: 2 December 2022 / Accepted: 2 December 2022 / Published: 7 December 2022
(This article belongs to the Special Issue Advances in Biosensor Technologies for Clinical Applications)

Abstract

:
In recent years, Raman spectroscopy has been used to study biological tissues. However, the analysis of experimental Raman spectra is still challenging, since the Raman spectra of most biological tissue components overlap significantly and it is difficult to separate individual components. New methods of analysis are needed that would allow for the decomposition of Raman spectra into components and the evaluation of their contribution. The aim of our work is to study the possibilities of the multivariate curve resolution alternating least squares (MCR-ALS) method for the analysis of skin tissues in vivo. We investigated the Raman spectra of human skin recorded using a portable conventional Raman spectroscopy setup. The MCR-ALS analysis was performed for the Raman spectra of normal skin, keratosis, basal cell carcinoma, malignant melanoma, and pigmented nevus. We obtained spectral profiles corresponding to the contribution of the optical system and skin components: melanin, proteins, lipids, water, etc. The obtained results show that the multivariate curve resolution alternating least squares analysis can provide new information on the biochemical profiles of skin tissues. Such information may be used in medical diagnostics to analyze Raman spectra with a low signal-to-noise ratio, as well as in various fields of science and industry for preprocessing Raman spectra to remove parasitic components.

1. Introduction

In recent years, methods such as Raman spectroscopy have been increasingly used to investigate biological tissues and fluids [1,2,3]. The Raman spectra of various biological tissues are specific: during the development of a neoplasm, some biochemical changes occur, which leads to changes in the measured Raman spectra. Previously we showed that Raman spectroscopy can be used in the diagnosis of skin cancer [4], kidney-failure detection [5], and for blood analysis [6]. These studies demonstrate that Raman spectroscopy can be used to assume diagnostic conclusions, i.e., to determine a disease or tumor type.
The use of Raman spectroscopy in medical diagnostics is facilitated by the development of technical means for recording Raman scattering, such as flexible fiber-optic Raman probes that can be effectively used in clinical conditions and portable spectrometers [7,8]. However, despite some advances in technical equipment, the analysis of experimental Raman spectra still remains challenging. The reason for this is that Raman spectra contain an extremely large amount of information on all the chemical substances that can be found in skin. Many components of skin tissue that are accepted as biomarkers have similar peak positions in Raman spectra and overlap the spectrum of other substances such as the extracellular matrix, etc. Moreover, in addition to differences caused by the disease development, there are also some within-group differences associated with comorbidities and skin phenotypes. It is also important to understand that the quality of the Raman spectra recorded using portable Raman probes is usually slightly lower than the spectra recorded using a Raman microscope [4,7]. This can be explained by the fact that Raman microscopy is usually applied to investigate thin histological sections of tissues, while in the case of thick samples or in vivo studies, in addition to the useful Raman signal, we also reveal the contribution from the surrounding components, which cannot be removed by focusing. The spectra may also contain a signal associated with the optical system.
In this research, we use a portable Raman setup [7]. The methodology for the evaluation of the effectiveness of a portable setup and SNR estimation are described in [7].
Among the methods of Raman spectra analysis, principal component analysis (PCA) can be noted. It is an orthogonal linear transformation, which is commonly used for dimensionality reduction by projecting data onto the new principal components. The analysis of Raman spectra by the PCA was successfully applied in [9]. However, despite the positive experience, a significant limitation of this method in terms of its application to real data is non-physicality, i.e., the impossibility of the physical interpretation of the components, since they “absorb” the variance from several initial variables.
Another approach applied to the analysis of Raman spectra is neural networks. They are used as a classifier of diseases, as for example was undertaken in [10,11,12]. The main disadvantage is that the use of neural networks requires a lot of training and test datasets. For multivariate data discrimination, the method of discriminant analysis with the projection on latent structures (PLS-DA) is also widely used in chemometrics, biology, and medicine. It finds a linear regression model by projecting the predicted variables and the observable variables to a new space. In other words, the PLS-DA is able to construct a regression model based on the significant spectral differences [4,5,6,7,13]. However, these black-box approaches do not essentially analyze the spectra, focusing exclusively on the classification problem, and the classification criteria in many cases remain unknown.
To provide some clear, physically-based rules for discriminating Raman spectra, it is preferable to use white-box approaches when we know and understand how the algorithm of analysis works and in which physical or medical criteria are used to classify spectra. This approach is shown in [14], where a two-step phase-type method based on the analysis of spectral intensity alteration in the tumor tissue in comparison with the healthy tissue near the lesion is used for tumor detection and identification.
However, the white-box approach has limitations, due to the lack of information on the objects under study—in this case, skin tissue and tumors. We do not have complete data on the composition of the skin and the changes in composition that occur with the development of the disease. Using only well-known and understandable criteria to analyze Raman spectra, we risk losing classification accuracy as a result.
In this paper, we propose using a gray-box approach, when a model has a partial theoretical structure and some unknown parts derived from the data. To analyze the low signal-to-noise ratio (SNR) Raman spectra of skin, we propose using the multivariate curve resolution alternating least squares (MCR-ALS) method.
The MCR-ALS method is widely used to reconstruct concentration profiles in chemical analysis [15,16]. The main advantage of MCR analysis in a fully unsupervised mode with the unknown basis is its ability to physically interpret the components into which the original Raman spectra are divided. As a result of this analysis, we obtain the spectra of the substances constituting the sample and their contribution to the Raman spectrum. Their contribution, in fact, is directly related to the real concentrations of the substances.
On the other hand, the MCR-ALS analysis cannot be explicitly called a white box. Therefore, the research problem is to choose the optimal number of the components into which the spectra will be unmixed. It may depend on both the study objective and the characteristics of the Raman spectra. The MCR-ALS analysis combines the white-box and the black-box approaches and thus can be called a ‘gray-box’ approach.
Recently, the MCR-ALS method has found a wide application in biology and medicine to analyze spectral data, when it is necessary to determine the concentrations of components of a mixture from the spectra [16,17]. For example, Xu and Rice [18] used MCR-ALS spectral unmixing in fluorescence imaging. Chen et al. [19] used the Raman spectroscopic detection of keratin with the MCR-ALS analysis for automatic oral cancer diagnosis. Iwasaki et al. [20] investigated the possibilities of discriminating breast-cancer cells from normal mammary epithelial cells by Raman microspectroscopy and MCR-ALS analysis. In [21], samples of biological tissues were studied in situ using a Raman scanning microscope. Using the MCR-ALS method, the authors were able to obtain the spectra of Raman active skin components involved in biochemical changes during the development of various neoplasms. In [22,23], the researchers used MCR-ALS for the quantitative analysis of pharmaceutical and agricultural samples, as well as for vibrational spectroscopic image analysis of biological material.
This work is devoted to the study of the possibilities of the MCR-ALS method for analyzing skin tissues in vivo. In a clinical setting, high-quality recording of Raman spectra can be difficult due to a large patient flow and the limited time available for patient examination. An SNR is determined by multiple factors, including the laser power, the detector cooling temperature, and the signal-accumulation time. It was revealed that there is a linear relationship between SNR and temperature (see [24]). Although it is possible to increase the SNR due to the strong cooling of the detector, the portability will be lost. Raman microscopy is usually used to investigate thin sections of tissues, but if we investigate tissues in vivo, it is expensive and, in most cases, inefficient to use Raman microscopy. Therefore, it can be assumed that the skin Raman spectra subjected to analysis by the MCR-ALS method will have a rather low SNR.
For example, in [4,5,6,7], a portable spectroscopic device able to record Raman spectra, with a signal-to-noise ratio equal to three, is used to record the Raman spectra of the skin. We take into account the fact that the Raman spectra of some of skin-tissue components also have similar peaks and bands, for example, collagen, elastin, and keratin [21,25,26,27]. Nevertheless, we believe that it is possible to distinguish groups of components, for example, proteins, the moisturizing factor, and pigments (melanin), since some of the Raman peaks of the selected groups are located in different spectral bands. Our results show the MCR-ALS analysis of experimental in vivo Raman spectra of normal skin and tumors as well as the spectra which correspond to the contribution of the optical system and skin components. In addition, we investigated the possibilities of using the spectral profiles obtained by MCR-ALS analysis to classify skin tissues.

2. Materials and Methods

2.1. Experimental Data

The Raman spectra were recorded using a portable spectroscopic setup, which includes a thermally stabilized LML-785.0RB-04 diode laser module as an excitation source (785 ± 0.1 nm central wavelength) and a QE 65 Pro spectrometer (OceanOptics, Inc., Dunedin, FL, USA) with a CCD detector operating at −15 °C [4]. Spectra registration was carried out with a spectral resolution of 0.2 nm in the range from 837 to 920 nm, which corresponds to 792–1874 cm−1.
All in vivo spectra measurements were performed with a signal-accumulation time of 60 s. The laser power density on the skin was about 0.3 W/cm2, which does not exceed the ANSI maximum permissible exposure limit of 1.63 W/cm2 and does not cause any damage to the skin or discomfort to patients.
In this case, the SNR of the recorded Raman spectra is 3 in accordance with the methodology for SNR estimation described in [7].
In Section 3.1, we used the recorded raw spectra, and in Section 3.2 and Section 3.3, the spectra were subjected to preprocessing in the following way: the spectra were cut in the range from 860 to 920 nm, which corresponds to 1114–1874 cm−1. The spectra were then preprocessed with baseline removal with asymmetric least squares (lambda = 6, p = 0.1, number of iterations = 10) and smoothing by the Savitzky–Golay method (derivative order = 0, window width = 15, polynomial order = 1) using the baseline package of the RStudio software [28]. An example of the experimental Raman spectra of normal skin before and after preprocessing is shown in Figure 1. It is clearly seen that before preprocessing, in addition to the useful Raman signal, the spectra also contain a fluorescence signal.
It should be noted that preprocessing methods can probably influence the results of the MCR-ALS. A particularly high influence is observed for cases with a fluorescence background and without fluorescence (see Section 3.1 and Section 3.2). In our case, preprocessing increases the number of reconstructed components (see Figure 2). The issue of the influence of preprocessing methods on the results of MCR-ALS analysis of Raman spectra of the skin is subject to separate research.
The in vivo study of skin tumors was performed at Samara Regional Clinical Oncology Dispensary. The study involved 540 patients. The Raman spectra of normal skin and skin neoplasm were recorded for each patient. In total, we used 1000 spectra: 540 of normal skin (NS), 113 of keratosis (K), 122 of basal cell carcinoma (BCC), 67 of malignant melanoma (MM), and 158 of pigmented nevus (PN). All in vivo studies were conducted on patients over 18 years of age. Informed consents were acquired from all the patients before the in vivo study. The studies were approved by the Ethics Committee of Samara State Medical University (Samara, Russia).
In addition to the Raman spectra, we have obtained information on the actual diagnoses for each patient. These diagnoses were made by the doctors of Samara Regional Clinical Oncology Dispensary using histological examination of tumor samples.

2.2. Multivariate Curve Resolution Alternating Least Squares Analysis

In this study, we use the multivariate curve resolution alternating least squares (MCR-ALS) method to analyze the Raman spectra of skin tissue and distinguish the disease markers. MCR is a group of methods which can be used to solve the curve resolution problem in spectroscopy, and ALS is an algorithm that is used for the MCR basic bilinear model.
The MCR problem can be defined as follows. Suppose we have a mixture of A components. Every individual component is usually called a ‘pure component’. Every pure component has a spectrum, which can be represented as a column vector si with size J, which is a number of values in each spectrum (corresponding to the number of wavelengths or wavenumbers).
If we mix the components into one mixture and take a spectrum of this mixture, the spectrum will be just a linear combination of the spectra of the pure components [23]. This can be written as follows:
d = c1sT1 + c2sT2 + ⋯ + cAsTA,
where d is a vector of spectral values representing the spectrum of the mixture (J); c1, c2, …, cA are concentrations of the pure components in the mixture; and s1, s2, …, sA are the spectra of the pure components.
The equation can be written in a more compact form:
D = CST,
where C is a matrix of concentrations of the pure components in the mixture, and S is a matrix of the spectra of the pure components.
The task of the MCR methods is to get C and S by knowing D. This is not a trivial task, as the expression above does not have a unique solution. For example, one of the solutions is what the PCA gives; however, the scores do not correspond to the real concentration values nor do the loadings represent the spectra of the pure components. In fact, it is impossible to obtain C and S precisely; what we obtain is a sort of estimate, and we can rewrite the equation as:
D = C∙ST + E,
where E is the error matrix [23].
The main advantage of MCR analysis is the ability to physically interpret the components into which the original Raman spectra are divided. As a result of the analysis, we obtain the spectra of the substances constituting the sample and their contribution to the Raman spectrum. Its contribution, in fact, is directly related to the real concentrations of the substances. Among the limitations of the MCR method is the difficulty of unmixing substances with similar Raman spectra, when the substance does not contain unique Raman peaks in the spectrum [23]. In addition, the research problem is to choose the optimal number of components into which the spectra will be unmixed. It may depend on both the study objective and the characteristics of the Raman spectra.
It should be specially noted that MCR analysis could be sensitive to noise in the analyzed Raman spectra. In practical application, the efficiency of evaluating the component concentration in a test sample can decrease due to the fact that the spectra contain a noise signal caused by the optical system or the fluorescence signal.
Previously, we studied the possibility of applying the MCR-ALS method to the analysis of Raman spectra of proteinogenic amino acid mixtures [29] and the effect of noise in Raman spectra on the analysis result [30]. Our studies demonstrate the possibility of unmixing several spectrally similar components of one complex mixture under noisy conditions. Using the MCR-ALS method makes it possible to successfully estimate the amino acid concentrations in a mixture, even when the Raman spectra have a low SNR (from 3 to 5).
For unmixing the spectra, we used the MCR-ALS GUI v4c by András Gorzsás [31]. To analyze experimental skin Raman spectra, we utilized the MCR-ALS method, and in this case we do not know the exact number of substances contained in the skin sample. Feng et al. [21] report that eight components captured the skin constituents as measured in in vivo human skin cancers. However, this also depends on the characteristics of the Raman spectra. Therefore, it should be investigated how many components it is necessary to obtain in the original Raman spectra for an optimal result.
This can be undertaken based on a priori knowledge or by analyzing the dataset using singular value decomposition (SVD) [31]. Figure 2 shows the numerical results of the singular value decomposition for set of raw Raman spectra and set of preprocessed Raman spectra. The number of components is usually selected, after which a large drop in eigenvalues is observed or the eigenvalues approximate 0. In both cases, Figure 2 displays a gradual decrease with no abrupt drop in eigenvalues. Thus, at first glance, the number of components for set of raw Raman spectra can reasonably be assumed to be 2 (3 is already close to 0 on the SVD plot). The number of components for a set of preprocessed spectra can reasonably be assumed to be 5–6 (7 is already close to 0 on the SVD plot). That is, in our case, preprocessing increases the number of reconstructed components (see Figure 2). However, it is further shown that the choice of the number of components according to the SVD results does not allow the MCR analysis to reconstruct diagnostically useful spectral profiles (see Section 3.1). Therefore, we chose the number of components based on a priori knowledge and other considerations.
The estimation of the initial values was performed using a SIMPLISMA-based algorithm with a default value of 10% allowed noise [32].
As constraints, we used a default spectrum equal-length closure constraint (this is the most common closure constraint for general spectroscopic data and means an equal spectral-area normalization for the MCR-ALS algorithm to avoid possible scale indeterminacies) and a non-negativity constraint in both the spectrum and the concentration profiles, which is also hard-coded in the GUI with no user control. It is implemented as a mild constraint, using a fast non-negative least squares algorithm [31].
In the case of using the spectrum basis in Section 3.3 (which is described below), we also used equality constraints for spectra profiles. Equality constraints allow the user to fix values in the spectral profiles, such as the spectrum of a known compound present in the dataset [31].
The maximum number of iterations that will be performed, if convergence or divergence does not happen before, was set to 150. The convergence criterion (the percentage of change in the standard deviation of residuals between two consecutive iterations) was set to 0.1 (default setting). In other words, the convergence is achieved when the change in the standard deviation of the residuals is lower or equal to 0.1% between two consecutive iterations.
After selecting the number of components, setting the constraints and the end conditions for the modelling, the MCR-ALS algorithm can be started. After initial estimation is given for C, it is optimized iteratively using the ALS algorithm until the convergence condition is met and the maximum number of iterations is reached [31].
In Section 3.1, we use 250 in vivo Raman spectra: 100 of normal skin, 90 of BCC, and 60 of MM. The Raman spectra were not preprocessed, which means that the signal associated with fluorescence was retained. The spectra were decomposed by the MCR-ALS method without using a basis. The number of the decomposed spectra varied, and for visual analysis of the obtained spectra, the number of the decomposed spectra was chosen to be nine as an optimal solution.
In Section 3.2, the preprocessed Raman spectra were studied: 540 spectra of normal skin, 113 of K, 122 of BCC, 67 of MM, and 158 of PN. The spectra of normal skin were divided into a different number of components from 4 to 30, and the spectra of the diseases into 5 components.
In Section 3.3, we also investigate 540 spectra of normal skin, 113 of K, 122 of BCC, 67 of MM, and 158 of PN. However, there is one important difference between Section 3.2 and Section 3.3. In Section 3.2, only matrix D is known, while matrices C and ST remain undefined. In Section 3.3 we also define matrix ST.
In other words, in Section 3.2. we only know the matrix of the original Raman spectra, that is, the matrix of mixtures, and the matrices of concentrations of the components and the spectra of the components are not known. In Section 3.3, in addition to the matrix of mixtures, we also know the matrix of the spectra of the components; only the matrix of the concentrations of these components remains unknown. In this case, the matrix of the spectra of the components is called a ‘basis’ or ‘basic spectra’.
The fact is that in the case of MCR-ALS analysis, without using a basis we cannot unambiguously determine into which specific components the Raman spectra are divided (for example, to which specific substance each selected component corresponds). Using a basis, we can solve this problem. In this case, the set of components into which the Raman spectra are divided remains fixed and uniquely determined. As a result, we can unambiguously determine what the concentrations of the components in the tissue sample are. The disadvantage of this approach is the fact that we often do not know what components the tissue sample consists of, so it is not possible to set the basis in advance.
We solve this problem in the following way. First, we subject the Raman spectra to MCR-ALS analysis without using a basis (see Section 3.2). In Section 3.3, as a basis, we use several spectra obtained in Section 3.2, that, in our opinion, most accurately represent the contribution of some skin components. In addition to the fixed bases, we set the possibility of using four additional bases, which are identified in advance. In other words, the Raman spectra were unmixed into eight components, four of which were strictly defined.
Spectra equality is used to fix spectral profiles. For all the components that are to be constrained, the row contains numbers (i.e., spectral intensities at each wavenumber); for unconstrained components, the row contains NaN values (see [31]).

2.3. Analysis of Results

As a result of the MCR-ALS method, two matrices were obtained: a matrix of the component concentrations and a matrix of their Raman spectra. It is important to understand that the values of the concentrations and intensities of the spectra were obtained in arbitrary units and vary for each individual analysis. The reason for this is the ambiguity of the bilinear transformation, as well as the fact that the raw Raman spectra are also recorded in arbitrary units. Thus, the Raman spectra were normalized with rescaling (min-max normalization, when the lowest intensity is set to 0, and the highest is set to 1) and analyzed for the presence of Raman bands and peaks as well as their ratio.
In Section 3.3, the results were presented using a box-plot diagrams with a spread of the concentrations and the receiver operating characteristic (ROC) curves AUC with 95% confidence interval (CI). The box-plot diagrams and ROC curves were plotted using the pROC package of the RStudio software [28].
A box plot shows the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable. The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be outliers.
An ROC curve illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The true-positive rate is also known as sensitivity. The false-positive rate is also known as probability and can be calculated as (1 − specificity).
The area under the ROC curve (AUC) provides an aggregate value of how well the model correctly classifies the cases with all possible cutoff values. AUC values range between 0.5 and 1, where an area of 0.5 means that the model predicts which outcomes will be 1 or 0 no better than flipping a coin, and an area of 1 means that the model predicts perfectly.
Following this, the profiles of the obtained component concentrations were subjected to logistic regression. Logistic regression is a form of multiple regression whose general purpose is to analyze the relationship between the multiple independent variables (called predictors) and the dependent variable [33]. In our case, the predictors are the concentration profiles of the components representing their contribution to the Raman spectrum. We applied binary logistic regression for two cases:
  • malignant neoplasms versus benign neoplasms;
  • malignant melanoma (MM) versus pigmented neoplasms (K + PN).
This type of statistical model is often used for classification in medicine and biology. Binary logistic regression estimates the probability of an event occurring, such as melanoma or no melanoma, based on a given dataset of independent variables. In other words, the goal of binary logistic regression is to simply classify observations as one of the two possible outcomes that the model describes.
That is, for these two cases, we obtained the probabilities of the presence of a particular disease in each patient. Besides this, at the first step of logistic regression, the significance of all predictors was calculated, which represents their influence on the regression result, that is, on the predicted value. These probabilities were called predicted values. Since the outcome is a probability, the dependent variable is bounded between 0 and 1. Logistic regression was performed using the SPSS Statistics software [33].
In addition, the profiles of the component concentrations were subjected to partial least square discriminant analysis (PLS-DA) using the pROC package of the RStudio software [27].
Partial least squares discriminant analysis (PLS-DA) is one of the most widely used classification techniques in many fields, and generates a large amount of data, such as in spectroscopy. PLS-DA is a supervised clustering or classification method, used to optimize separation between different groups of samples, which is completed by linking two data matrices: X (i.e., raw data) and Y (i.e., classes or diagnoses). This method aims to maximize the covariance between the independent variables X and the corresponding dependent variable Y of highly multidimensional data by finding a linear subspace of the explanatory variables. The new subspace permits the prediction of the Y variable based on a reduced number of factors (PLS components).
To check the classification models stability, a 10-fold cross-validation was performed. The predicted values were presented using box-plot diagrams with a spread of the concentrations and the ROC curves. The box-plot diagrams and the ROC curves were plotted using the pROC package of RStudio software [28].

3. Results and Discussion

3.1. Unmixing the In Vivo Raman Spectra without Processing

As described in Section 2.2, we conducted the MCR-ALS analysis of 250 raw in vivo Raman spectra: 100 of normal skin, 90 of BCC, and 60 of MM. The spectra were decomposed without using a basis. The number of the decomposed spectra, i.e., the spectra of pure components, varied, and using the visual analysis of the obtained spectra, the number of pure components was chosen to be nine as an optimal solution. The spectra of these components are shown in Figure 3.
As can be seen in Figure 3, most of the unmixed components correspond to the fluorescence signal (see Components 2, 3, 4, 5, 6). This is fully consistent with the known information on the significant contribution of fluorescence to the total signal and indicates the need to remove the fluorescence signal from the spectrum if we need to study the Raman signal. Moreover, in most of the spectra obtained (see Components 3, 4, 5, 6, 7, 9), a sinusoidal component is visible. We believe that this component is related to the contribution of the optical system. The reason may be optical filters in the setup for recording Raman spectra.
As an example of such a sinusoidal component, Figure 4 shows the spectrum of the 785 nm RazorEdge® ultrasteep long-pass edge filter by Semrock [34]. A component representing the contribution of optics (but without a sinusoidal pattern) was obtained in [35], where the authors applied a non-negative matrix factorization (NNMF) algorithm to the in vivo Raman spectra of the upper epidermis. It can be assumed that the source for the sinusoidal pattern is an etaloning effect from the back-thinned CCD of the spectrometer. The reason for this sinusoidal effect and ways to reduce it will be the focus of our further research.
Section 3.2 gives evidence that the preprocessing we use is not able to filter out this component completely.
In addition, in Figure 3, Raman peaks are seen in the spectra of Components 7 and 8. The peaks at 1316 and 1660 cm−1 are included in the bands of amide III and amide I and correspond to proteins [25]. If we talk about the components of the skin, then these peaks can correspond to collagen, elastin, and keratin [21]. The peak at 1450 cm−1 is associated with bending CH2/CH3, scissoring CH2/CH3, and bending CH2. This peak may correspond to the lipids that protect the skin from environmental factors. Those may be triolein or ceramide [21]. Proteins and lipids are the main constituents of the upper layers of skin tissue, so it is not surprising that they contribute so much to the skin spectrum and can be distinguished by MCR-ALS even in the presence of fluorescence.

3.2. Unmixing Preprocessed In Vivo Raman Spectra

As described in Section 2, we preprocessed the in vivo Raman spectra of normal skin and diseases. Further, as discussed in Section 2.2, we conducted the MCR-ALS analysis of 1000 preprocessed in vivo Raman spectra: 540 of normal skin, 113 of K, 122 of BCC, 67 of MM, and 158 of PN. The spectra of normal skin were divided into a different number of components from 4 to 30, and the disease spectra into 5 components. All the component spectra obtained are presented in the Supplementary Section (see Supplementary Figures S1–S3).
Spectra analysis revealed that with a large number of components into which the Raman spectra are unmixed, MCR-ALS reconstructs single peaks, and with a decrease in the components, these peaks are combined into more complex spectra. Below we describe only some of the spectra, which, in our opinion, are most similar to the Raman spectra of the components that make up the skin tissue (Figure 5A–D).
Component 1 (Figure 5A) corresponds to the contribution of the optical system. A similar component representing the contribution of optic filter is shown in Figure 4. The contribution of microscope optics in Raman spectra was highlighted in [35], where the authors applied an NNMF algorithm to the in vivo Raman spectra of upper epidermis. It seems quite logical that in [35] this component was one of the last obtained, and in our study, the first. The reason for this is that we used a spectroscopic setup with a low signal-to-noise ratio instead of a Raman microscope.
The ability to identify the parasitic signal associated with the contribution of the optical system can potentially be used in preprocessing the recorded Raman signals. In the spectrum of Component 2 (Figure 5B), the peaks at 1174 and 1230 cm−1 correspond to the natural moisturizing factor of the skin [35]. The peaks at 1390 and 1520 cm−1 are the linear stretching of the C-C bonds within the rings and the in-plane stretching of the aromatic rings of melanin [21,35,36]. The area from 1650 to 1800 cm−1 is due to C=C and C=O stretching and corresponds to skin lipids. In general, the most intense regions in these spectra are the melanin regions, so the second component can be considered predominantly melanin. A similar shape of the spectrum was obtained in [35], where melanin exhibits broad Raman bands in the 1200–1700 cm−1 range with two local maxima at ≈1370 and 1580 cm−1.
In the spectrum of Component 3 (Figure 5C), one can see four peaks. The peak at 1150 cm−1 corresponds to the contribution of the natural moisturizing factor, and that at 1280 cm−1 to amide III and urocanic acid [35,37]. The most pronounced peak at 1450 cm−1 corresponds to the C–H bending proteins and may be contributed by keratin, collagen, and elastin [21]. In addition, the peak at 1450 cm−1 may also correspond to the contribution of urocanic acid [35]. The peak at 1660 cm−1 is the amide I of proteins and may be contributed by collagen [25,38]. In other words, Component 3 represents the contribution of various proteins of the skin tissue and, possibly, of the natural moisturizing factor, including urocanic acid. Moreover, the contribution of collagen is so high that the peaks at 1450 and 1660 cm−1 are visible even when the raw Raman spectra of skin with fluorescence are decomposed (see Section 3.1). Comparison with the results obtained by Feng et al. shows the similarity of this spectrum with the spectra of collagen, elastin, and keratin in the range from 1400 to 1800 cm−1. Thus, in [21] we can also see the peaks corresponding to amide III (1269 cm−1), to C–H bending proteins (1450 cm−1), and C–O amide I vibration (1665 cm−1). Nevertheless, Feng et al. were able to distinguish the spectra of collagen, elastin, and keratin separately, while we obtained only the spectrum of their mixture.
In the spectrum of Component 4 (Figure 5D), the most intense peak is at 1650 cm−1 and corresponds to water. Comparison with the results by Feng et al. [21] shows that the spectra match almost completely.
The other components reconstructed as a result of the MCR-ALS analysis (see the Supplementary Section) apparently correspond to the contribution not of a specific skin component or related substances, but of a group of components. This may be attributed to superposition of peaks and bands, which prevents the MCR-ALS method from working correctly.
It is important to understand that the reason for the superposition of peaks and bands is not only the nature of these substances. The quality of the equipment used also plays a great role. The overlapping of spectral peaks and bands occurs in the presence of intense noise, i.e., in the Raman spectra with a low signal-to-noise ratio.
Feng et al. used Raman microscopy to record the spectra and identified eight different components [21]. Yakimov et al. also used Raman microscopy but applied an NNMF algorithm instead of MCR-ALS and identified ten different components [35]. In our case, using a portable conventional Raman spectroscopy setup, we managed to explicitly reconstruct four components, whereas the rest of the components represent the contribution of several completely different groups of substances. Nevertheless, we can use the four spectra of components obtained as a basis and evaluate their concentration in the skin Raman spectra.
In Section 3.3, we also investigate 540 spectra of normal skin, 113 of K, 122 of BCC, 67 of MM, and 158 of PN. However, there is one important difference between Section 3.2 and Section 3.3. In Section 3.2, only matrix D is known, while matrices C and ST remain undefined, whereas in Section 3.3 we also define matrix ST.
In other words, in Section 3.2. we only know the matrix of the original Raman spectra, that is, the matrix of mixtures, while the matrices of concentrations of the components and the spectra of the components remain unknown. In Section 3.3, in addition to the matrix of mixtures, we also know the matrix of the spectra of the components; only the matrix of the concentrations of these components remains unknown. In this case, the matrix of the spectra of the components is called a basis or basic spectra.

3.3. The Contribution of Individual Components to the Raman Spectra of Various Diseases

As described in Section 2.2, we also investigated 1000 preprocessed in vivo Raman spectra: 540 of normal skin, 113 of K, 122 of BCC, 67 of MM, and 158 of PN. The only difference is that we used the MCR-ALS method with a basis that contained four fixed spectra (see Figure 5A–D). MCR-ALS is performed with 99.4893% explained variation and 6.711% lack of fit (these indicators are detailed in [31]). As a result, we obtained four additional Raman spectra. These spectra are presented in Figure 4E–H. The concentrations of the components were rendered using box-plot diagrams. They are presented in the Supplementary Section (see Supplementary Figure S4).
As one can see in Supplementary Figure S4 in the Supplementary Section, Components 1, 3, and 4 (proteins, water, and optical contribution; see Figure 5A,C,D) are expressed almost identically in the spectra of all diseases. The most striking differences are seen in the example of Component 2 (Figure 5B). As we mentioned earlier, Component 2 presumably represents the contribution of melanin. The concentrations of Component 2 are high in K, MM, and PN, and low in normal skin and BCC. It is well-known that, on average, the content of melanin in K, MM, and PN is higher than in normal skin [21].
Among the box-plots of additional components (Figure 5E–H), the box-plots of Components 5 and 7 (Figure 5E,G) should be distinguished. As can be seen in Supplementary Figure S4, Component 5 is less pronounced in K and BCC and is more pronounced in PN and MM. Referring to the spectrum of this component (Figure 5E), a band from 1150 to 1320 cm−1 of amide III can be distinguished. This spectrum may correspond to the contribution of lipids and natural moisturizing factor [24,35,38]. Besides this, in the spectra of Components 5 and 6 (Figure 5E,F) we see an intense peak at 1240 cm−1 that corresponds to the contribution of proteins. If we compare Components 3, 5, and 6 with the spectra of collagen, elastin, and keratin, it can be noted that Components 5 and 6 somehow complement the spectrum of Component 3 in the range from 1150 to 1320 cm−1. The contribution of proteins might have been distributed among these components. Component 7 (Figure 5G) is most pronounced only in BCC, and its contribution is extremely insignificant in the Raman spectra of other diseases. We can note peaks at 1144 cm−1 (natural moisturizing factor); 1275 and 1750 cm−1 (lipids); 1355, 1386, and 1559 cm−1 (melanin); and 1694 cm−1 (collagen) [21,35]. In the spectrum of Component 8 (Figure 5H) one can see intense peaks at 1450 and 1650 cm−1, which suggests that this is also part of the total contribution of proteins and water. The spectrum of Component 6 (Figure 5F) shows an intense band at 1700–1850 cm−1, which may correspond to the contribution of lipids [21].
Thus, Components 2–4 represent the contribution of certain substances or groups of substances (e.g., proteins), whereas Components 5–8 represent the contribution of several completely different groups of substances with a set of overlapping peaks, which leads to their ratio redistribution in different diseases.
The limitation of this study lies in the fact that we cannot compare the concentration values of skin components (i.e., the values of the contribution of the reconstructed spectra to the total Raman spectrum) with their true values. We can only indirectly confirm general patterns, noting, for example, an increased average level of melanin in pigmented neoplasms.
Among all box-plot diagrams in Supplementary Figure S4 (see the Supplementary Section), we selected several most successful cases, where we can note the possibility of classifying the Raman spectra by diseases. Figure 6 and Figure 7 show the box-plot diagrams and ROC curves corresponding to those cases.
More advanced discrimination models demonstrate higher ROC AUC values; therefore, it seems plausible to compare the ROC AUCs of the constructed classification models. In the case of BCC (n = 122) vs. normal skin and all diseases (n = 878) neoplasm classification, the highest ROC AUC is 0.772 (0.720–0.823, 95% CI) when discriminated by Component 3 and a slightly lower ROC AUC is 0.685 (0.634–0.736, 95% CI) when discriminated by Component 7. The best separation of BCC (n = 122) vs. MM (n = 67) is possible by Components 5 and 7 demonstrating 0.700 (0.626–0.773, 95% CI) and 0.690 (0.623–0.757, 95% CI) ROC AUCs, respectively. In the case of malignant (MM, BCC; n = 189) vs. benign (K, PN; n = 271) neoplasm classification, the highest ROC AUC is 0.653 (0.602–0.705, 95% CI) when discriminated by Component 3 and a lower ROC AUC is 0.614 (0.570–0.658, 95% CI) when discriminated by Component 7. One can notice that we have better discrimination models using components with a lower number (e.g., Components 3 or 5), that is, the components that are reconstructed first by the MCR-ALS method.
Interestingly, the discrimination models of MM (n = 67) vs. PN (n = 158) demonstrate 0.656 (0.574–0.738, 95% CI) ROC AUC. This is a fairly high result, given the fact that these types of neoplasms have similar external manifestations and a comparable amount of melanin [29,35].
We suppose that using more accurate equipment to record Raman spectra (e.g., a Raman microscope) might lead to better results. However, the use of microscopes makes the in vivo research unnecessarily complicated. Partial application of Raman microscopy could be a compromise solution. For instance, we can record Raman spectra of relatively homogeneous samples in terms of component composition on a microscope (as was performed in [21]) and use these spectra as a basis. Moreover, the Raman spectra to be analyzed should be further recorded with a Raman probe. This approach can improve the quality of spectrum analysis. Returning to the use of microscopy, MCR-ALS can be applied to the analysis of the Raman spectra recorded with a microscope, for example, for the analysis of biological fluids or biopsy material.
Furthermore, taking into account the obtained concentrations (see Supplementary Figure S4), as described in Section 2.3, we applied the binary logistic regression for two cases:
  • malignant neoplasms versus benign neoplasms;
  • malignant melanoma (MM) versus pigmented neoplasms (K + PN).
Using the results of the binary logistic regression we obtained box-plot diagrams and ROC-curves of the predicted values (see Figure 8).
In the case of malignant (MM, BCC; n = 189) vs. benign (K, PN; n = 271) neoplasm classification, the ROC AUC is 0.698 (0.650–0.746, 95% CI), and in the case of MM (n = 67) vs. pigmented neoplasms (n = 271) classification, the ROC AUC is 0.702 (0.629–0.776, 95% CI).
In [4], the data were processed by means of regression analysis using PLS-DA, and the AUCs of similar classification models were 0.75 (0.71–0.79) and 0.61 (0.53–0.69), respectively. Thus, we can conclude that PLS-DA (applied to Raman spectra, as is described in [4]) and MCR-ALS analysis followed by binary logistic regression produce nearly the same results.
The significance values of all predictors (concentrations of the components) are presented in Table 1.
As can be seen from Table 1, in the case of the classification of malignant vs. benign tumors, significant components are 1, 3, 4, 7, and 8. Earlier, we assumed that these components correspond to optical system contribution, proteins, water, lipids, nature moisturizing factor, and melanin. Feng et al. [21] also report about the changes in the concentration of collagen during disease development; for instance, in the case of BCC, the relative concentration of collagen decreases, while in the case of PN, it increases. It should be noted that Component 2 is expected to become less significant due to the presence of melanin in both groups.
In the case of the classification of MM vs. pigmented neoplasms (K + PN), Component 3, which reflects the contribution of proteins, becomes less important. Indeed, Feng et al. [21] reports that with the development of K and MM, the share of collagen changes in approximately the same way. Components 6, 7, and 8 are also become less significant, because they also represent the contribution of proteins.
Among the box-plots of additional components (Figure 5E–H), the box-plots of Components 5 and 7 (Figure 5E,G) should be distinguished. As can be seen in Supplementary Figure S4, Component 5 is less pronounced in K and BCC and is more pronounced in PN and MM. Referring to the spectrum of this component (Figure 5E), the band from 1150 to 1320 cm−1 of amide III can be distinguished. This spectrum may correspond to the contribution of lipids and natural moisturizing factor [25,35,38]. Besides this, in the spectra of components 5 and 6 (Figure 5E,F) we see an intense peak at 1240 cm−1 that corresponds to the contribution of proteins. If we compare Components 3, 5, and 6 with the spectra of collagen, elastin, and keratin, it can be noted that Components 5 and 6 somehow complement the spectrum of Component 3 in the range from 1150 to 1320 cm−1. It is possible that the contribution of proteins was distributed among these components. Component 7 (Figure 5G) is most pronounced only in BCC, and its contribution is extremely insignificant in the Raman spectra of other diseases. We can note peaks at 1144 cm−1 (natural moisturizing factor); 1275 and 1750 cm−1 (lipids); 1355, 1386, and 1559 cm−1 (melanin); and 1694 cm−1 (collagen) [21,35]. In the spectrum of Component 8 (Figure 5H) one can see intense peaks at 1450 and 1650 cm−1, which suggests that this is also part of the total contribution of proteins and water. The spectrum of Component 6 (Figure 5F) shows an intense band at 1700–1850 cm−1, which may correspond to the contribution of lipids [21].
It should be noted that the contribution of Components 1 (optical system contribution) and 7 (proteins, lipids, nature moisturizing factor, melanin) is small for both cases of classification. This is quite logical, because the contribution of the optical system is approximately the same for all recorded Raman spectra, and no significant changes in water during the development of diseases have been reported [21].
As described in Section 2.3, we applied the PLS-DA for two cases:
  • malignant neoplasms versus benign neoplasms;
  • malignant melanoma (MM) versus pigmented neoplasms (K + PN).
The box-plot diagrams and ROC-curves of the predicted values are presented in Figure 9.
In the case of malignant (MM, BCC; n = 189) vs. benign (K, PN; n = 271) neoplasm classification, the ROC AUC is 0.632 (0.581–0.683, 95% CI), whereas in the case of MM (n = 67) vs. pigmented neoplasms (n = 271) classification, the ROC AUC is 0.658 (0.583–0.733, 95% CI). Thus, we can conclude that PLS-DA applied to the profiles of the component concentrations gives approximately the same result as the logistic regression (cf. Figure 8 and Figure 9).

4. Conclusions

The results obtained in this study demonstrate the possibility of unmixing several spectrally similar components using MCR-ALS analysis even under noisy conditions of the recorded Raman spectra. This gives grounds to talk about a possible effective use of the MCR-ALS method in portable devices for diagnosing neoplasms, when the speed and ease of use of the device are more important than its high spectral resolution.
Moreover, in addition to our initial problem of analyzing the Raman spectra in order to classify them, we were also able to use the MCR-ALS method to identify the parasitic signal associated with the contribution of the optical system. In other words, the method can be used as part of the Raman setup software to correct the setup bias. Eliminating the influence of optical equipment can be extremely useful in multicenter and transnational research, when completely different equipment with different technical characteristics is used to record Raman spectra.
Unmixing such spectrally similar components as different proteins or lipids is still a challenge. In this study, we were unable to unmix them into separate components. The use of more localized measurement equipment for recording Raman spectra, for example, a Raman microscope, could make it possible to achieve better results.
Besides this, one advantage of using Raman microscope is that in this case we can yield 2D microscopic imaging (see [21]) and build a direct relation between the skin constituents and the unmixed components by comparing them with the histology.
In addition, the method of finding initial estimates (SIMPLISMA) suffers from an essential drawback, which consists of the need for “pure” variables [32], which cannot be expected because of the strong signal overlapping. Future studies should try other methods of finding initial estimates, for example, independent component analysis (ICA).
Another limitation of this study is the fact that we do not have an approved and confirmed method for determining the composition of the skin in vivo. We can only evaluate the MCR-ALS method using the classification models.
Nevertheless, despite some limitations of the method, we have obtained a satisfactory classification result for some diseases. We plan to devote our future research to supplementing our methodology with other methods for analyzing Raman spectra, such as using neural networks.

Supplementary Materials

The following supporting information can be downloaded at: https://0-www-mdpi-com.brum.beds.ac.uk/article/10.3390/s22249588/s1, Supplementary Figure S1. The result of MCR analysis of normal skin Raman spectra. Supplementary Figure S2. The result of MCR analysis of normal skin Raman spectra. Supplementary Figure S3. The result of MCR analysis of disease Raman spectra (BCC is basal cell carcinoma, MM is malignant melanoma, PN is pigmented nevus). Supplementary Figure S4. Box-plot diagrams of skin component concentrations obtained by MCR analysis using a basis (B is benign neoplasms, M is malignant neoplasms, N is normal skin, BCC is basal cell carcinoma, MM is malignant melanoma, PN is pigmented nevus).

Author Contributions

Conceptualization, I.B.; methodology, I.M., I.B. and V.Z.; validation, I.M. and I.B.; formal analysis, I.M. and Y.K.; investigation, I.M.; resources, S.K. and O.K.; data curation, L.B. and A.M.; writing—original draft preparation, I.M. and I.B.; writing—review and editing, I.B.; visualization, I.M.; supervision, V.Z.; project administration, V.Z.; funding acquisition, I.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Russian Science Foundation, grant number 21-75-10097.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethics Committee of Samara State Medical University.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Popp, J.; Krafft, C.; Mayerhöfer, T. Modern Raman spectroscopy for biomedical applications: A variety of Raman spectroscopical techniques on the threshold of biomedical applications. Opt. Photonik 2011, 6, 24–28. [Google Scholar] [CrossRef]
  2. Lui, H.; Zhao, J.; McLean, D.; Zeng, H. Real-time Raman Spectroscopy for In Vivo Skin Cancer Diagnosis. Cancer Res. 2012, 72, 2491–2500. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Dunaev, A. Optical Diagnostics in Human Diseases. Diagnostics 2021, 11, 873. [Google Scholar] [CrossRef] [PubMed]
  4. Bratchenko, I.A.; Bratchenko, L.A.; Moryatov, A.A.; Khristoforova, Y.A.; Artemyev, D.N.; Myakinin, O.O.; Orlov, A.E.; Kozlov, S.V.; Zakharov, V.P. In vivo diagnosis of skin cancer with a portable Raman spectroscopic device. Exp. Dermatol. 2021, 30, 652–663. [Google Scholar] [CrossRef]
  5. Bratchenko, L.A.; Bratchenko, I.A.; Khristoforova, Y.A.; Artemyev, D.N.; Konovalova, D.Y.; Lebedev, P.A.; Zakharov, V.P. Raman spectroscopy of human skin for kidney failure detection. J. Biophotonics 2021, 14, e202000360. [Google Scholar] [CrossRef]
  6. Bratchenko, L.A.; Bratchenko, I.A.; Lykina, A.A.; Komarova, M.V.; Artemyev, D.N.; Myakinin, O.O.; Moryatov, A.A.; Davydkin, I.L.; Kozlov, S.V.; Zakharov, V.P. Comparative study of multivariative analysis methods of blood Raman spectra classification. J. Raman Spectrosc. 2020, 51, 279–292. [Google Scholar] [CrossRef]
  7. Khristoforova, Y.A.; Bratchenko, I.A.; Myakinin, O.O.; Artemyev, D.N.; Moryatov, A.A.; Orlov, A.E.; Kozlov, S.V.; Zakharov, V.P. Portable spectroscopic system for in vivo skin neoplasms diagnostics by Raman and autofluorescence analysis. J. Biophotonics 2019, 12, e201800400. [Google Scholar] [CrossRef]
  8. Kandurova, K.; Dremin, V.; Zherebtsov, E.; Potapova, E.; Alyanov, A.; Mamoshin, A.; Ivanov, Y.; Borsukov, A.; Dunaev, A. Fiber-optic system for intraoperative study of abdominal organs during minimally invasive surgical interventions. Appl. Sci. 2019, 9, 217. [Google Scholar] [CrossRef] [Green Version]
  9. Zakharov, V.P.; Bratchenko, L.A.; Khristoforova, Y.A.; Myakinin, O.O.; Artemyev, D.N.; Moryatov, A.A.; Kozlov, S.V.; Borisova, E.G.; Genova, T.I.; Troyanova, P.P.; et al. Multiparametric spectral diagnosis of skin cancer. Tissue Opt. Photonics 2020, 11363, 105–110. [Google Scholar]
  10. Sigurdsson, S.; Philipsen, P.A.; Hansen, L.K.; Larsen, J.; Gniadecka, M.; Wulf, H.C. Detection of skin cancer by classification of Raman spectra. IEEE Trans. Biomed. Eng. 2004, 51, 1784–1793. [Google Scholar] [CrossRef] [Green Version]
  11. Vinokurov, V.; Khristoforova, Y.; Myakinin, O.; Bratchenko, I.; Moryatov, A.; Machikhin, A.; Zakharov, V. Neural network classifier for hyperspectral images of skin pathologies. J. Phys. Conf. Ser. 2021, 2127, 012026. [Google Scholar]
  12. Lihacova, I.; Bondarenko, A.; Chizhov, Y.; Uteshev, D.; Bliznuks, D.; Kiss, N.; Lihachev, A. Multi-Class CNN for Classification of Multispectral and Autofluorescence Skin Lesion Clinical Images. J. Clin. Med. 2022, 11, 2833. [Google Scholar] [CrossRef] [PubMed]
  13. Khristoforova, Y.A.; Bratchenko, I.A.; Bratchenko, L.A.; Myakinin, O.O.; Artemyev, D.N.; Moryatov, A.A.; Kozlov, S.V.; Borisova, E.G.; Genova, T.I.; Troyanova, P.P.; et al. Optical biopsy of skin cancer based on Raman and fluorescence spectroscopy. In Proceedings of the Fourth International Conference on Terahertz and Microwave Radiation: Generation, Detection, and Applications, Tomsk, Russia, 24–26 August 2020; Volume 11582, pp. 325–330. [Google Scholar]
  14. Zakharov, V.P.; Bratchenko, I.A.; Artemyev, D.N.; Myakinin, O.O.; Kornilin, D.V.; Kozlov, S.V.; Moryatov, A.A. Comparative analysis of combined spectral and optical tomography methods for detection of skin and lung cancers. J. Biomed. Opt. 2015, 20, 025003. [Google Scholar] [CrossRef] [PubMed]
  15. Garrido, M.; Rius, F.X.; Larrechi, M.S. Multivariate curve resolution–alternating least squares (MCR-ALS) applied to spectroscopic data from monitoring chemical reactions processes. Anal. Bioanal. Chem. 2008, 390, 2059–2066. [Google Scholar] [CrossRef]
  16. De Juan, A.; Tauler, R. Multivariate Curve Resolution: 50 years addressing the mixture analysis problem—A review. Anal. Chim. Acta 2021, 1145, 59–78. [Google Scholar] [CrossRef]
  17. Noothalapati, H.; Iwasaki, K.; Yamamoto, T. Biological and medical applications of multivariate curve resolution assisted Raman spectroscopy. Anal. Sci. 2017, 33, 15–22. [Google Scholar] [CrossRef] [Green Version]
  18. Xu, H.; Rice, B.W. In-vivo fluorescence imaging with a multivariate curve resolution spectral unmixing technique. J. Biomed. Opt. 2009, 14, 064011. [Google Scholar] [CrossRef]
  19. Chen, P.H.; Shimada, R.; Yabumoto, S.; Okajima, H.; Ando, M.; Chang, C.T.; Lee, L.T.; Wong, Y.K.; Chiou, A.; Hamaguchi, H.O. Automatic and objective oral cancer diagnosis by Raman spectroscopic detection of keratin with multivariate curve resolution analysis. Sci. Rep. 2016, 6, 20097. [Google Scholar] [CrossRef] [Green Version]
  20. Iwasaki, K.; Araki, A.; Krishna, C.M.; Maruyama, R.; Yamamoto, T.; Noothalapati, H. Identification of Molecular Basis for Objective Discrimination of Breast Cancer Cells (MCF-7) from Normal Human Mammary Epithelial Cells by Raman Microspectroscopy and Multivariate Curve Resolution Analysis. Int. J. Mol. Sci. 2021, 22, 800. [Google Scholar] [CrossRef]
  21. Feng, X.; Moy, A.J.; Nguyen, H.T.; Zhang, J.; Fox, M.C.; Sebastian, K.R.; Reichenberg, J.S.; Markey, M.K.; Tunnell, J.W. Raman active components of skin cancer. Biomed. Opt. Express 2017, 8, 2835–2850. [Google Scholar] [CrossRef] [Green Version]
  22. Azzouz, T.; Tauler, R. Application of multivariate curve resolution alternating least squares (MCR-ALS) to the quantitative analysis of pharmaceutical and agricultural samples. Talanta 2008, 74, 1201–1210. [Google Scholar] [CrossRef] [PubMed]
  23. Felten, J.; Hall, H.; Jaumot, J.; Tauler, R.; De Juan, A.; Gorzsás, A. Vibrational spectroscopic image analysis of biological material using multivariate curve resolution–alternating least squares (MCR-ALS). Nat. Protoc. 2015, 10, 217–240. [Google Scholar] [CrossRef] [PubMed]
  24. Desroches, J.; Jermyn, M.; Mok, K.; Lemieux-Leduc, C.; Mercier, J.; St-Arnaud, K.; Urmey, K.; Guiot, M.C.; Marple, E.; Petrecca, K.; et al. Characterization of a Raman spectroscopy probe system for intraoperative brain tissue classification. Biomed. Opt. Express 2015, 6, 2380–2397. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  25. Kuhar, N.; Sil, S.; Verma, T.; Umapathy, S. Challenges in application of Raman spectroscopy to biology and materials. RSC Adv. 2018, 8, 25888–25908. [Google Scholar] [CrossRef] [Green Version]
  26. Khalid, M.; Bora, T.; Ghaithi, A.A.; Thukral, S.; Dutta, J. Raman spectroscopy detects changes in bone mineral quality and collagen cross-linkage in staphylococcus infected human bone. Sci. Rep. 2018, 8, 9417. [Google Scholar] [CrossRef]
  27. Nguyen, H.T.; Zhang, Y.; Moy, A.J.; Feng, X.; Sebastian, K.R.; Reichenberg, J.S.; Fox, M.C.; Markey, M.K.; Tunnell, J.W. Characterization of Ex Vivo Nonmelanoma Skin Tissue Using Raman Spectroscopy. Photonics 2021, 8, 282. [Google Scholar] [CrossRef]
  28. The Comprehensive R Archive Network. Available online: https://cran.r-project.org (accessed on 5 June 2022).
  29. Matveeva, I.A.; Myakinin, O.O.; Khristoforova, Y.A.; Bratchenko, I.A.; Tupikova, E.N.; Zakharov, V.P. Possibilities for decomposing Raman spectra of amino acids mixture by Multivariate Curve Resolution (MCR) analysis. In Proceedings of the Fourth International Conference on Terahertz and Microwave Radiation: Generation, Detection, and Applications, Tomsk, Russia, 24–26 August 2020; Volume 11582, pp. 299–306. [Google Scholar]
  30. Matveeva, I.A.; Bratchenko, L.A.; Myakinin, O.O.; Tupikova, E.N.; Zakharov, V.P. The Effect of Noise in Raman Spectra on the Reconstruction of the Concentration of Amino Acids in the Mixture by Multivariate Curve Resolution (MCR) Analysis. J. Biomed. Photonics Eng. 2021, 7, 020309. [Google Scholar] [CrossRef]
  31. Gorzsás, A. MCR-ALS GUI V4c, Open-Source MATLAB Script from the Vibrational Spectroscopy Core Facility; Umeå University: Umeå, Sweden, 2017. [Google Scholar]
  32. Ruckebusch, C. Resolving Spectral Mixtures: With Applications from Ultrafast Time-Resolved Spectroscopy to Super-Resolution Imaging; Elsevier: Amsterdam, The Netherlands, 2016. [Google Scholar]
  33. Logistic Regression. SPSS Statistics. Available online: https://www.ibm.com/docs/en/spss-statistics/26.0.0?topic=regression-logistic (accessed on 25 September 2021).
  34. 785 nm EdgeBasic™ Best-Value Long-Pass Edge Filter. Available online: https://www.semrock.com/FilterDetails.aspx?id=BLP01-785R-25 (accessed on 25 September 2021).
  35. Yakimov, B.P.; Venets, A.V.; Schleusener, J.; Fadeev, V.V.; Lademann, J.; Shirshin, E.A.; Darvin, M.E. Blind source separation of molecular components of the human skin in vivo: Non-negative matrix factorization of Raman microspectroscopy data. Analyst 2021, 146, 3185–3196. [Google Scholar] [CrossRef]
  36. Huang, Z.; Lui, H.; Chen, M.X.; Alajlan, A.; McLean, D.I.; Zeng, H. Raman spectroscopy of in vivo cutaneous melanin. J. Biomed. Opt. 2004, 9, 1198–1205. [Google Scholar] [CrossRef]
  37. Duraipandian, S.; Mo, J.; Zheng, W.; Huang, Z. Near-infrared Raman spectroscopy for assessing biochemical changes of cervical tissue associated with precarcinogenic transformation. Analyst 2014, 139, 5379–5386. [Google Scholar] [CrossRef]
  38. Bankapur, A.; Zachariah, E.; Chidangil, S.; Valiathan, M.; Mathur, D. Raman tweezers spectroscopy of live, single red and white blood cells. PLoS ONE 2010, 5, e10427. [Google Scholar] [CrossRef] [PubMed]
Figure 1. An example of in vivo Raman spectra before (a) and after (b) preprocessing: MM is malignant melanoma, BCC is basal cell carcinoma, SCC is squamous cell carcinoma. Adapted with permission from Ref. [13]. 2020, Y. Khristoforova.
Figure 1. An example of in vivo Raman spectra before (a) and after (b) preprocessing: MM is malignant melanoma, BCC is basal cell carcinoma, SCC is squamous cell carcinoma. Adapted with permission from Ref. [13]. 2020, Y. Khristoforova.
Sensors 22 09588 g001
Figure 2. Determining the number of components by singular value decomposition: (a) set of raw Raman spectra, (b) set of preprocessed Raman spectra.
Figure 2. Determining the number of components by singular value decomposition: (a) set of raw Raman spectra, (b) set of preprocessed Raman spectra.
Sensors 22 09588 g002
Figure 3. The spectra of skin components (presented in the order they were extracted as a result of MCR-ALS analysis).
Figure 3. The spectra of skin components (presented in the order they were extracted as a result of MCR-ALS analysis).
Sensors 22 09588 g003
Figure 4. 785 nm EdgeBasic™ best-value long-pass edge filter spectrum. Reprinted with permission from Ref. [34]. 2022, Semrock Optical Filters. © IDEX Health & Science, Semrock Optical Filters www.semrock.com (accessed on 11 August 2022).
Figure 4. 785 nm EdgeBasic™ best-value long-pass edge filter spectrum. Reprinted with permission from Ref. [34]. 2022, Semrock Optical Filters. © IDEX Health & Science, Semrock Optical Filters www.semrock.com (accessed on 11 August 2022).
Sensors 22 09588 g004
Figure 5. Raman spectra obtained by MCR-ALS analysis of the in vivo Raman spectra of skin (solid lines denote the spectra obtained in our study, dashed lines, the spectra obtained in [21]): (A) Component 1 (optical system contribution), (B) Component 2 (melanin), (C) Component 3 (proteins), (D) Component 4 (water), (E) Component 5 (proteins, lipids, nature moisturizing factor), (F) Component 6 (proteins, lipids), (G) Component 7 (proteins, lipids, nature moisturizing factor, melanin), (H) Component 8 (proteins, water).
Figure 5. Raman spectra obtained by MCR-ALS analysis of the in vivo Raman spectra of skin (solid lines denote the spectra obtained in our study, dashed lines, the spectra obtained in [21]): (A) Component 1 (optical system contribution), (B) Component 2 (melanin), (C) Component 3 (proteins), (D) Component 4 (water), (E) Component 5 (proteins, lipids, nature moisturizing factor), (F) Component 6 (proteins, lipids), (G) Component 7 (proteins, lipids, nature moisturizing factor, melanin), (H) Component 8 (proteins, water).
Sensors 22 09588 g005
Figure 6. Box-plot diagrams and ROC curves corresponding to several cases of discrimination: (A) basal cell carcinoma (BCC) versus normal skin and all diseases (All) according to the MCR Component 2 (melanin), (B) basal cell carcinoma (BCC) versus normal skin and all diseases (All) according to the MCR Component 7 (proteins, lipids, nature moisturizing factor, melanin), (C) basal cell carcinoma (BCC) versus malignant melanoma (MM) according to the MCR Component 5 (proteins, lipids, nature moisturizing factor), (D) basal cell carcinoma (BCC) versus malignant melanoma (MM) according to the MCR Component 7 (proteins, lipids, nature moisturizing factor, melanin).
Figure 6. Box-plot diagrams and ROC curves corresponding to several cases of discrimination: (A) basal cell carcinoma (BCC) versus normal skin and all diseases (All) according to the MCR Component 2 (melanin), (B) basal cell carcinoma (BCC) versus normal skin and all diseases (All) according to the MCR Component 7 (proteins, lipids, nature moisturizing factor, melanin), (C) basal cell carcinoma (BCC) versus malignant melanoma (MM) according to the MCR Component 5 (proteins, lipids, nature moisturizing factor), (D) basal cell carcinoma (BCC) versus malignant melanoma (MM) according to the MCR Component 7 (proteins, lipids, nature moisturizing factor, melanin).
Sensors 22 09588 g006
Figure 7. Box-plot diagrams and ROC curves corresponding to several cases of discrimination: (A) malignant neoplasms (basal cell carcinoma + malignant melanoma) versus benign neoplasms (keratosis + pigmented nevus) according to the MCR Component 3 (proteins), (B) malignant neoplasms (basal cell carcinoma + malignant melanoma) versus benign neoplasms (keratosis + pigmented nevus) according to the MCR Component 7 (proteins, lipids, nature moisturizing factor, melanin), (C) malignant melanoma (MM) versus pigmented nevus (PN) according to the MCR Component 3 (proteins).
Figure 7. Box-plot diagrams and ROC curves corresponding to several cases of discrimination: (A) malignant neoplasms (basal cell carcinoma + malignant melanoma) versus benign neoplasms (keratosis + pigmented nevus) according to the MCR Component 3 (proteins), (B) malignant neoplasms (basal cell carcinoma + malignant melanoma) versus benign neoplasms (keratosis + pigmented nevus) according to the MCR Component 7 (proteins, lipids, nature moisturizing factor, melanin), (C) malignant melanoma (MM) versus pigmented nevus (PN) according to the MCR Component 3 (proteins).
Sensors 22 09588 g007
Figure 8. Box-plot diagrams and ROC-curves of predicted values obtained as a result of logistic regression: (A) malignant neoplasms (basal cell carcinoma + malignant melanoma) versus benign neoplasms (keratosis + pigmented nevus), (B) malignant melanoma (MM) versus pigmented neoplasms (keratosis + pigmented nevus).
Figure 8. Box-plot diagrams and ROC-curves of predicted values obtained as a result of logistic regression: (A) malignant neoplasms (basal cell carcinoma + malignant melanoma) versus benign neoplasms (keratosis + pigmented nevus), (B) malignant melanoma (MM) versus pigmented neoplasms (keratosis + pigmented nevus).
Sensors 22 09588 g008
Figure 9. Box-plot diagrams and ROC-curves of the predicted values obtained as a result of PLS-DA: (A) malignant neoplasms (basal cell carcinoma + malignant melanoma) versus benign neoplasms (keratosis + pigmented nevus), (B) malignant melanoma (MM) versus pigmented neoplasms (keratosis + pigmented nevus).
Figure 9. Box-plot diagrams and ROC-curves of the predicted values obtained as a result of PLS-DA: (A) malignant neoplasms (basal cell carcinoma + malignant melanoma) versus benign neoplasms (keratosis + pigmented nevus), (B) malignant melanoma (MM) versus pigmented neoplasms (keratosis + pigmented nevus).
Sensors 22 09588 g009
Table 1. The significance values (p-values) for component concentration calculated at the first step of logistic regression. The p-values for the coefficients indicate whether these relationships are statistically significant. If the p-value for a variable is less than the significance level of 0.05, this variable is statistically significant. Bold type in the table indicates cases of binary logistic regression.
Table 1. The significance values (p-values) for component concentration calculated at the first step of logistic regression. The p-values for the coefficients indicate whether these relationships are statistically significant. If the p-value for a variable is less than the significance level of 0.05, this variable is statistically significant. Bold type in the table indicates cases of binary logistic regression.
Malignant vs. BenignMM vs. K + PN
Component 1 (optical system contribution)0.0040.101
Component 2 (melanin)0.4490.243
Component 3 (proteins)0.0340.805
Component 4 (water)0.0050.002
Component 5 (proteins, lipids, nature moisturizing factor)0.7530.710
Component 6 (proteins, lipids)0.1830.499
Component 7 (proteins, lipids, nature moisturizing factor, melanin)0.0430.246
Component 8 (proteins, water)0.0170.568
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Matveeva, I.; Bratchenko, I.; Khristoforova, Y.; Bratchenko, L.; Moryatov, A.; Kozlov, S.; Kaganov, O.; Zakharov, V. Multivariate Curve Resolution Alternating Least Squares Analysis of In Vivo Skin Raman Spectra. Sensors 2022, 22, 9588. https://0-doi-org.brum.beds.ac.uk/10.3390/s22249588

AMA Style

Matveeva I, Bratchenko I, Khristoforova Y, Bratchenko L, Moryatov A, Kozlov S, Kaganov O, Zakharov V. Multivariate Curve Resolution Alternating Least Squares Analysis of In Vivo Skin Raman Spectra. Sensors. 2022; 22(24):9588. https://0-doi-org.brum.beds.ac.uk/10.3390/s22249588

Chicago/Turabian Style

Matveeva, Irina, Ivan Bratchenko, Yulia Khristoforova, Lyudmila Bratchenko, Alexander Moryatov, Sergey Kozlov, Oleg Kaganov, and Valery Zakharov. 2022. "Multivariate Curve Resolution Alternating Least Squares Analysis of In Vivo Skin Raman Spectra" Sensors 22, no. 24: 9588. https://0-doi-org.brum.beds.ac.uk/10.3390/s22249588

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop