1. Introduction
Spectroscopy is one of the primary exploratory tools that have been used to study the micro-world, investigating the physical structure of substances at the atomic and molecular scale, and characterizing the properties of novel materials, as the physical structure and properties of substances at the micro-world scale cannot be directly measured or observed by any instruments. Through spectroscopic technology, it is possible to determine the key properties of matter such as compositions and electronic structures at the nano-level. There are some different spectroscopic techniques that can be used to reveal different properties and characteristics of materials. So far, several spectroscopic methods, like emission, absorption, Raman, and backscattering, etc., have been well developed. In this article, we mainly discuss the data analysis problem of the Rutherford backscattering spectroscopy (RBS). RBS is a physical process in which a beam of high energy incident ions is projected to a thin solid sample to be analyzed. The backscattering spectrum of the incident particles is recorded, which essentially is a noisy curve of the backscattering particle yield against the energy of scattered ions or the device channels. By a quantitative analysis of the RBS spectra, the elementary compositions and their depth profiles of substances are extracted. Conventionally, the problem of RBS spectral data analysis has been viewed as a numerical fitting problem, assisting with complementary knowledge in advanced physics. The solution by a numerical fitting procedure, generally starts with an initial guess on the sample structural information, and then calculates the theoretical spectral curve from the backscattering physics. The sum of mean-squared error between the theoretical spectra and measured spectroscopic data is minimized in an iteration process by a least-square algorithm. In the iteration process, the measured spectra and theoretical spectra are recursively compared, and the structural parameters are adjusted in the running computational program till a convincing match or a pre-set error value is achieved. The conventional numerical method has two major issues—(i) The convergence of minimizing error process is not guaranteed; and (ii) It highly relies on the analyst’s empirical skills and advanced physics knowledge.
The emergence of new algorithms in machine learning in decades, has inspired researchers’ passions and interests to attempt novel solutions based on optimization algorithms or intelligent techniques, such as simulated annealing, support vector machines, and artificial neural networks. Barradas and co-authors [
1] proposed to apply the combinatorial optimization simulated annealing (SA) algorithm [
2,
3,
4] to the analysis of RBS spectra. Their analysis method could be designed in a fully automatic manner, without human intervention on the parameter adjustments in the analysis process. The proposed method was tested on a few complicated physical samples, such as iron-cobalt silicide and SiOF spectra, and the sample structural parameters were correctly determined in terms of a quantitative way. In addition to the SA algorithm, the neural computing-based method was also considered to solve specific problems in RBS spectral data analysis. Barradas et al. [
5,
6,
7] developed a multilayer perceptron (MLP) model with the spectrum as input and the sample structural parameters as output for the quantitative analysis of RBS data. The developed MLP model was strictly trained with thousands of theoretically generated spectra of the samples that have the known nominal structures. Then the well-trained MLP acquired a learning capability to interpret the spectrum of a given sample, for which the physical structure was unknown. Their numerical results showed the developed MLP model could quantitatively predict the sample structure with reasonable accuracies, provided a sufficient amount of training data. These investigations appear to be a great advancement toward developing a relatively easy data-driven analysis approach, without much expert knowledge involved.
However, it should mention that in Barradas’ method of using the neural network model, each RBS spectrum with up to several hundred data points was used as a single input, without a necessary dimensionality reduction. Such a method inevitably requires a huge amount of training data and a very long training time, which influences the convergence. This is a gap to be filled in our study by using a data dimensionality reduction technique.
We should emphasize that neural network models are not limited to solving data analysis problems in the domain of natural science. More commonly, neural networks and other methods of machine learning have been extensively applied to a wide range of disciplines, such as system identification and control [
8], pattern recognition and classification [
9], medical diagnosis [
10,
11,
12], finance, and many others. The successes of using machine learning methods have been proven in these works, with significant improvement of classification and prediction accuracies. As a variation of single-hidden layer feed-forward neural network (SLFN), extreme learning machine (ELM) is a special SLFN network where input weights and biases of the hidden layer are randomly generated, and the output weights are analytically determined from input data [
13]. Due to its fast learning capability and excellent generation performance, a large number of applications of ELM have been carried out in the past fifteen years [
14,
15,
16,
17,
18]. There are several works [
19,
20,
21] that apply ELMs to make classifications on food or wine associated with spectroscopic data or relevant feature selections. Zheng et al. [
19,
20] presented a study based on the combination of spectroscopy and ELM algorithm for food classification. Four benchmark spectroscopic datasets [
22] involving food samples including coffee, olive oil, meat, and fruit, with corresponding measured near-mid infrared spectroscopy were used in their investigations. They also compared the experimental results from the ELM algorithm with those from other methods like back-propagation artificial neural networks (BP-ANN),
k-nearest neighbor (KNN), and support vector machines (SVM). Their study shows the classification accuracies of ELM can be achievable up to 100%, 97.78%, 97.35%, and 95.05% for coffee, meat, olive oil, and fruit, respectively, which improved about 6% compared to other methods in addition to faster classification speed.
More recently, ELMs have appeared in a deep structure for investigating specific pattern recognition or object classification problems [
23,
24,
25]. Different from the three-layer architecture of a single ELM, the multi-layered deep ELM can be constructed by stacking a series of standard ELM models with or without constraints. Khan et al. [
23] designed and implemented a neural classifier based on deep ELMs for fabric weave pattern and yarn colour recognition and classification. They have reported that the deep ELM classifier significantly improved the classification error rates and achieved better recognition accuracy up to 97.5% [
23] for complex weave patterns, whereas the recognition accuracies from other methods are between 80% and 84% for the same problem.
In this study, a universal method that incorporates neural networks and the dimensionality reduction technique has been proposed to explore a new approach for spectral data analysis. Our objectives include firstly transforming the complicated numerical computation problem into a multivariate regression problem associated with a learning process through datasets, secondly reducing data dimensionality in the input space for easier training and ensuring convergency, and thirdly utilizing extreme learning machines to establish a mapping mechanism using the reduced data components as input to produce an accurate prediction of the structural parameter information. This work contributes to a newly proposed method transforming conventional numerical analysis based problem into a statistical learning problem through data. The proposed method is a general-use approach for any spectra-oriented applications. It has been demonstrated by a set of RBS data with producing accurate predictions of the physical sample structures. This method significantly reduces the reliance on an initial guess and user intervention in the process of analyzing spectral data, which greatly alleviates the burden of analysts. It may be applicable for spectral analysis applications involving high volumes of data, even for automating the analyzing process in collecting real-time experimental spectra if a suitable interface between the spectrometer and application software is implemented properly.
The organization of this article is as follows: In
Section 2, the problem of spectral data analysis is defined, and the proposed method is described.
Section 3 discusses computer experiments and results. The conclusions of our studies and future work are presented in
Section 4.
3. Experiment
The proposed method and theoretical framework for solving the inverse problem in the spectral data analysis can be verified by numerical experiments. We have constructed a set of training and test data by utilizing a rigorous RBS spectrum simulation software package—SIMNRA 7.02 [
39]. SIMNRA is a de facto standard simulation software for generating RBS spectra. It not only treats the basic scattering phenomena but also takes into account the subtle features in spectra that are arisen from complex interactions such as energy straggling effect, plural and multiple scattering, resonance, and surface roughness [
39,
40]. As an illustrative example, we consider a typical physical sample consisting of a film Sn
xS
0.87−xO
0.13 on the silicon substrate, where the structural parameters in this case include the thickness of the sample and concentration of Sn and S. The ion species in the simulated RBS spectra are alpha particles with the incident energy 2 MeV at a backscattering angle 165°. This is a typical experimental setting in RBS analysis. For the training purpose, the thickness and concentrations are randomly generated within the range between 400 × 10
15 atoms/cm
2 and 4000 × 10
15 atoms/cm
2, and between 0.32 and 0.55, respectively. The full data set used for the computer experiments is composed of a series of spectral curves (input vector), and the corresponding structural parameters (output vector). To make the synthetic spectra close to the realistic ones, Gaussian noises were added to the simulated smooth spectra.
Figure 3a,b illustrate two typical simulation spectral curves and their corresponding structural parameters. Within the specific ranges of the sample thickness and concentrations, a total of 482 spectra are generated by running SIMNRA7.02. 75% of the full data is for training, and the remaining data is used for testing. It must mention that the pre-processing of spectroscopic data must be carried out since they vary in the different order of magnitudes. The pre-processing is made by applying a simple transformation z
i = log
10(z
i + 10), which normalizes the spectral yield between 1 and 5. The spectral data are further compressed by the PAC method as described in
Section 2.3, which produces a significant dimensional reduction. In this application, a spectral curve with 400 points, undergoing the dimensionality reduction by PCA, retains seven principal components (PCs) which sufficiently accounts for 99% variances in the dataset. Thus these principal components are selected as the new input variables for the constructed ELM network.
The numerical experiments are conducted under the MATLAB platform. The number of hidden layer node
L is an adjustable parameter in the experiments. Some trials with different values of
L have been tested.
Figure 4a,b shows that the specified metrics—the mean squared errors (MSE) vary with
L. It has been noted that MSEs, starting a large initial value, decrease quickly with the increase of the hidden layer node number and it tends to be stable to a non-zero minimal value when
L takes values greater than 40, indicating a good convergence. The optimal number of L also could be available via an optimization method such as particle swarm optimization (PSO) where the MSE can be defined as the cost function.
To further check the accuracy and the performance of the trained ELM network, it would be meaningful to examine the linear regression analysis of the output predictions on the test dataset. As shown in
Figure 5 and
Figure 6, the predictions of output results for selected cases by ELM accurately match the expected target values.
Figure 5 illustrates that the ELM machine produces the accurate prediction values of the thickness variable, whereas
Figure 6 shows the predictions of concentration variables with high accuracies.
After training and testing, the constructed ELM network can be used to analyze the spectral data with the unknown structural parameters of the samples by a generalization procedure. We select a few spectral curves to be analyzed and compare the predictions by the proposed ELM method with the results from calculations of a three-layer MLP network. The predicted outputs of both methods are summarized in
Table 1 and
Table 2. The ELM machine produces the correct analysis results for all cases with small errors. The maximum errors in the four sets of spectra for the thickness and concentrations are respectively 0.94%, 0.79%, and 0.97%. Compared to the ELM results with the exact nominal values of the structural parameter, it shows an excellent agreement. The maximum errors from the MLP method for the same test cases are 1.53%, 2.06%, and 2.54%, respectively. This comparison shows that the errors of MLP results are larger than ELM errors. For most numerical-based techniques, the analysis errors are typically around 5% [
28]. It can be seen that ELM should be a better option for the applications in spectral data analysis. Unlike a deep neural network (for example, deep belief networks, and deep Boltzmann machines), the original ELM was established by a shallow architecture; however, it is still a very practical and valid method for resolving the problems of regression, classification, and recognition because of its architecture simplicity, high accuracy, and easy training [
13]. The webpage of the reference [
41] lists some benchmark examples and results that compare the performances of using ELM and deep neural networks (DNN) for the applications with datasets—MNIST OCR, 3D Shape Classification [
42], and Traffic sign recognition [
43]. It can be clearly seen that the training accuracies by ELM were better or equal to those by DNN, whereas the training time was shortened dramatically (less than in a few magnitude order).