1. Introduction
The chemical compound space (CCS) is the theoretical space consisting of every possible compound known (and unknown) to us [
1,
2]. Even some of our largest databases consisting of approximately
known substances are a mere drop in the ocean compared with an estimated
substances that possibly make up the CCS [
3,
4]. Needless to say, the next big discovery of a compound that can revolutionize energy storage devices of the future is far from trivial.
The status quo for techniques used in the discovery of new and novel materials to enhance battery technologies has progressed from expensive and time-consuming empirical trial and error methods to the more recent first principles approach of using quantum mechanics (QM) [
5,
6,
7,
8,
9], Monte Carlo simulations and molecular dynamics (MD) [
10,
11,
12,
13,
14]. QM calculations evaluate electron–electron interactions bby solving the complex Schrödinger equation, thereby enabling accurate results for a wide variety of properties. However, the computational cost is a bottleneck for molecules larger than a couple hundred atoms. Hence, for multi-component or multi-layer structures such as the solid electrolyte interface layer, QM is not a feasible approach. Additionally, many battery components including ionic and polymer electrolytes, crystal structures and electrode–electrolyte interactions [
11,
15,
16,
17,
18] are better analyzed on larger length and time scales that are inaccessible with QM. MD simulations simplify particle–particle interactions to five main types of interactions, namely nonbonded, bonded, angle, dihedral and improper interactions. These interactions, which can be obtained using a simple algebraic equation, reduce the computational cost significantly and are applicable to systems almost
times larger. To analyze ion migration in perovskite nickelate with 200 atoms, QM techniques, even using deensity functional theory (DFT) approximation to reduce computational costs, require about
core-hours of computational time in a picosecond range simulation. On the other hand, MD simulations with 105 atoms required only
core-hours of computational time [
19]. Thus, MD simulations enable the analysis of a wide variety of properties and behavior of materials at the atomic scale, such as the crystal structure, thermal properties and mechanical properties, which are often too complex to model using QM calculations. In a recent review, Sun et al. [
20] presented the use of MD simulations to optimize lithium metal batteries, investigating the transport structure of Li ions, the electrochemical process at the electronic, atomic or molecular level, the Li+ transport mechanism and the Li deposition behavior in detail.
Though MD simulations are widely used to investigate the properties of materials at the atomic level, these simulations rely on experimentally derived interatomic potential parameters that determine the forces between particles [
21]. This dependence on prior experimental data poses a challenge in using MD to design new and novel materials. To address this issue, Lanjan et al. [
22] recently proposed a novel computational framework that couples QM calculations with MD simulations. This generates a wide range of crystal structures by varying a single system parameter (e.g., bond length) while keeping other parameters relaxed at their minimum energy level. The QM calculations are then used to evaluate the system’s energy as a function of these changes, and the resulting data points are used to fit the interaction equations to estimate the potential parameters for each type of particle–particle interaction. Employing this framework enables the study of crystal structures with the accuracy of QM calculations but at the speed and system sizes permissible by MD techniques. While this framework enhances nano-based computational methods, the QM calculations still need massive amounts of computational power, which can be significantly reduced with the AI-based technique proposed in this work.
The emergence of ML, deep learning (DL) and artificial intelligence (AI) has helped alleviate the bottlenecks posed by QM and MD simulations and has made it possible to expand the scope of our search for novel materials in the CCS. ML and DL algorithms are orders of magnitude faster than ab initio techniques. Unlike the QM-based simulations, which can take days to complete, ML algorithms can produce results within seconds. The use of AI has brought a paradigm shift in research related to improving battery technology as well as molecular property prediction and material discovery in general. For example, Sandhu et al. [
23] used DL to examine the optimal crystal structures of doped cathode materials in lithium manganese oxide (LMO) batteries. Failed or unsuccessful synthesis data were used to predict the reaction success rate for the crystallization of templated vanadium selenites [
24]. Using QM and ML techniques, Lu et al. [
25] developed a method to predict undiscovered hybrid organic-inorganic perovskites (HOIPs) for photovoltaics. Their screening technique was able to shortlist six HOIPs with ideal band gaps and thermal stabilities from 5158 unexplored candidates. To identify material compositions with suitable properties, Meredig et al. [
26] built an ML model trained on thousands of ground state crystal structures and used this model to scan roughly 1.6 million candidate compositions of novel ternary compounds to produce a ranked list of 4500 stable ternary compositions that would possibly represent undiscovered materials.
The broad approach employed when using AI-based property prediction models consists of three overarching components: a reference database consisting of relevant quantum mechanical data which is used to fit the AI model; a mathematical representation that not only uniquely describes the attributes of the reference materials but also enables effective model training; and finally a suitable AI model that can accomplish the learning task itself. In the ensuing sections, we describe these components in further detail.
1.1. Database
The fundamental premise of AI is the ability to draw inferences from patterns in data and enable an accurate prediction in unknown domains. Hence, the data, which make up the training examples for our learning task, becomes a critical aspect for successful prediction. With the introduction of the Materials Genome Initiative in 2011 [
27], the United States signaled the importance of unifying the infrastructure for material innovation and harnessing the power of material data. In lieu of the same goal, there has been an advent of various materials databases, such as the Inorganic Crystal Structure Database (ICSD) [
28], the Open Quantum Materials Database (OQMD) [
29], the Cambridge Structural Databases [
30], the Harvard Clean Energy Project [
31], the Materials Project [
32] and the AFLOWLIB [
33]. Specifically, the size of the training examples, the diversity of the dataset and the degrees of freedom all contribute to how effective the learning task for a specific objective can be [
34]. In predicting properties such as the band gap energy and glass-forming ability for crystalline and amorphous materials, Ward et al. [
35] methodically selected a chemically diverse set of attributes taken from the OQMD. Similarly, for electronic-structure problems, Schütt et al. [
36] noted that the density of states at the Fermi energy is the critical property of concern. In predicting this property, around 7000 crystal structures from the ICSD were used, observing higher predicted variance for certain configurations and the need to extend the training set in these specific areas. The process of material discovery is complex and diverse, and it is not surprising that there is no one-size-fits-all database that can accurately predict the properties of all materials. The physical and chemical characteristics of materials vary widely, requiring different methods and techniques for precise analysis and prediction. Moreover, the current methodologies rely on the availability of well-curated data or the ability to manually generate such data, which is a daunting and often infeasible task, especially for new and unexplored materials. Thus, there is a need to develop generalizable and adaptable approaches that can efficiently handle a diverse range of materials, properties and configurations without the need for extensive data generation or curation.
1.2. Molecular Representation
ML algorithms draw inferences from data to establish a relationship between the atomic structure and the properties of a system. To enable the best possible structure-property approximation, a good representation of the material (also referred to as the ‘fingerprint’ or ‘descriptor’) is crucial. The first Hohenberg–Kohn theorem of DFT proves that the electron density of a system contains all the information needed to describe its ground state properties, and it is a ‘universal descriptor’ that can be used to predict these properties without knowledge of the details of the interactions between the electrons [
37]. Crucially, for ML, a good molecular representation is invariant to rotation and translation of the system as well as permutation of atomic indices [
38]. Therefore, unfortunately, the electronic density is not a universally suitable representation of a system. Additionally, a good descriptor must be unique, continuous, compact and computationally cheap [
38]. Often, there are multiple molecular geometries that possess similar values for a property. Hence, there is no single universal representation for all properties leading to hundreds of molecular descriptors that are suitable only for a small subset of the CCS and a small subset of properties [
39]. A commonly used molecular representation that satisfies the above-mentioned criteria of a good representation is the ‘Coulomb matrix’. It uses the same parameters that constitute the Hamiltonian for any given system, namely the set of Cartesian coordinates R
I and nuclear charges Z
I [
40]. While the Coulomb matrix representation has shown tremendous success for property prediction in finite systems, it is unable to do the same for infinite periodic crystal structures [
36]. Hansen et al. [
41] proposed a new descriptor called ‘bag-of-bonds’ that performed better due to incorporating the many-body interactions of a system. In fact, the use of different descriptors in an ML endeavor for material property prediction is so common that there are open-source software packages that provide implementations for a myriad of different descriptors [
38]. Unfortunately, a lack of clarity on the right descriptor makes the use of AI inaccessible to researchers that possess domain expertise but lack the needed knowledge of AI. Additionally, the lack of generalizability of a chosen descriptor makes the current AI-based techniques inaccurate and narrow in scope. For overcoming these challenges, the novel technique proposed in this work makes material discovery and property prediction easier and more accessible without the time-consuming process of selecting a suitable descriptor. Specifically, our approach leverages a two-stage process combining AI with MD simulations.
1.3. AI Model
In addition to an appropriate database and the precise molecular representation, a critical aspect in the material property prediction process is the choice of the AI algorithm. AI algorithms can be categorized into supervised learning, unsupervised learning and reinforcement learning. Supervised learning uses a standard fitting procedure that attempts to determine a mapping function between the known input features and the corresponding output labels. The goal is to make accurate predictions for new, unseen data. In contrast, unsupervised learning does not have prior knowledge of the desired output, and the goal is to find patterns and structures in this unlabeled data. Reinforcement learning uses an iterative trial-and-error process where the actions are determined based on reinforcement in the form of a reward-penalty system. The goal here is to maximize the cumulative reward over time. Supervised learning is the most widespread category of learning used in materials research. Different models may be better suited for certain types of materials or properties, and the choice of model often depends on the available data and the specific goals of the prediction task. Akbarpour et al. [
42] found that artificial neural networks (ANNs) performed better in predicting the synthesis conditions of nano-porous anodic aluminum oxide at the interpore distance in comparison with both multiple linear regression and experimental studies. On the other hand, for the modeling and synthesis of zeolite synthesis, Manuel Serra et al. [
43] found that support vector regression (SVR) outperformed ANNs and decision trees. Fang et al. [
44] proposed a novel hybrid methodology for forecasting the atmospheric corrosion of metallic materials where the optimal hyperparameters for an SVR model were automatically determined using a generic algorithm. These examples highlight the need for AI expertise when choosing the right algorithm for a given application, which can be a barrier to making AI methods accessible for materials-based research.
In this work, we have presented an ML model to predict the non-bonded potential parameters for conventional elements in the periodic table. We propose a novel approach that uses ML to learn a common empirical non-bonded interatomic potential—the Buckingham potential [
45]—and we successfully demonstrate the ability of this machine-learned potential to predict a wide range of properties when used as an input to classical MD simulations. We also demonstrate a marked improvement in the time taken to determine such properties compared with a traditional first principles approach.
3. Results and Discussion
To evaluate the effectiveness of our method, we selected four molecules with different levels of complexity: (1) H
2O, a simple molecule, (2) (CH
2O)
2CO ethylene carbonate (EC), a relatively complex molecule with a ring section, (3) C
2H
5OH (ethanol), a short-length hydrocarbon, and (4) C
8H
18 (octane), a long-chain molecule. Firstly, we used the partial charges from the literature [
22] for all possible unique similar atom pair combinations for each molecule to predict the corresponding Buckingham potential parameters using our trained ML model. We then computed the Buckingham potential parameters for the dissimilar atom pair combinations using the mixing rules outlined in Equations (2)–(4). The accuracy of the predicted potential parameters is provided in
Table 4. The comparison of the predicted values with the experimental values is shown in
Figure 4. Next, we used these predicted potential parameters as inputs for the MD simulations to predict the density of these molecules.
Density is an important property of molecules as it can provide information about their packing and intermolecular forces. An accurate prediction of density requires an accurate modeling of interatomic forces and interactions, including both bonded and non-bonded interactions. Non-bonded interactions are sensitive to temperature and pressure changes and have a significant impact on the density of a molecule. As such, calculating the density precisely is a good indicator that the proposed ML-based technique can be employed to determine other molecular properties such as the mechanical properties, thermal properties and electrochemical properties, which are influenced by similar interatomic interactions. Furthermore, density is a thermodynamic property that can be measured experimentally and accurately calculated using QM techniques. Hence, comparing the predicted densities of materials with the experimental values is an effective approach to assessing the accuracy and reliability of our ML-based method. This comparison is summarized in
Table 5, where our predicted densities are shown to have an accuracy greater than 93% with respect to the experimental data. Also, the densities obtained with MD simulations (specifications in
Table 2) using our ML-predicted potential parameters are shown as a function of time in
Figure 5. The density results in our MD simulations align closely with the expected values for ethylene carbonate (EC) and octane, with slight deviations well within the permissible range for computational models. The dynamic density fluctuations observed in the H
2O and ethanol simulations are characteristic of the inherent complexities of molecular dynamics. Such variations are anticipated in MD simulations, reflecting the system’s responsiveness to changing conditions and interactions, while the overall trends remained consistent with the experimental expectations, demonstrating the reliability of our computational approach.
4. Conclusions
This work presents a novel ML-based technique that can learn the interatomic potential parameters for various particle–particle interactions with the accuracy of conventional computational techniques like QM. When used as input to MD simulations, these learned potential parameters can predict a diverse range of properties, enabling the rapid screening and comparison of large databases of material properties for battery applications.
In this study, we demonstrate the efficacy and validity of our proposed technique by learning a non-bonded interatomic potential: the Buckingham potential. We used the non-bonded potential parameters predicted in this work in conjunction with the potential parameters obtained from the literature for other types of interactions to predict the densities of four different complex molecules. The obtained values were in close agreement with the experimental values for all four molecules, establishing the accuracy and efficacy of our proposed technique for the nanoscale evaluation of new and novel materials. Our technique can help quickly eliminate materials that are unlikely to meet the desired criteria, narrowing down the list of potential candidates for further evaluation. By identifying the most promising battery compositions and materials for further testing and development, this technique can accelerate the discovery of novel materials and the improvement of existing battery technologies.
In conclusion, the proposed ML-based technique provides a promising path toward discovering and developing novel materials with enhanced properties for applications such as next-generation batteries with superior electrochemical performance. Our technique can accelerate the search for new materials with desirable properties, allowing for the rapid screening and comparison of large databases of material properties for such applications.