3.1. Approaches
Machine learning methods used in decision-making algorithms are conventionally divided into two categories [
3]:
In content-based filtering, the system recommends items that are similar to the ones a user has liked or interacted with in the past. The system uses features or attributes of the items to determine their similarity, such as text, images, or audio. This method is based on the assumption that, if a user likes one item, they will also like items that are similar to it. Content-based filtering is useful when there is a small number of users and a large number of items, making it difficult to gather collaborative data. It is also useful when the items being recommended have distinct and measurable features, such as books, movies, or songs. Content-based filtering can lead to better accuracy and diversity in recommendations, as it is not influenced by the preferences of other users.
In collaborative filtering, the system recommends items based on the preferences of other users who have tastes and behaviors that are similar to those of the active user. The system uses a collaborative matrix to calculate the similarity between users and items, taking into account the ratings or interactions of all users. This method is based on the assumption that, if a user likes an item, other users with similar preferences will also like it. Collaborative filtering is useful when there is a large number of users and a small number of items, making it easier to gather collaborative data. It is also useful when the items being recommended have subjective features, such as movies, books, or music, where the preferences of other users can provide valuable information. Collaborative filtering can lead to better accuracy and personalization in recommendations, as it takes into account the preferences of other users with tastes similar to those of the active user.
In summary, content-based filtering is more useful when there are distinct and measurable features of the items being recommended, while collaborative filtering is more useful when there are a large number of users and subjective features of the items being recommended. Both methods have their strengths and weaknesses, and hybrid approaches that combine both methods can often achieve the best results.
Recommender systems can combine both methods to improve their quality. Content-based methods are based on the similarity of element attributes, and collaborative methods calculate the similarity of users’ content data based on the matrix of their interactions.
Within a mathematical model of information security, a content-based method can use a set of
N attacks and a set of
M defense strategies that can be recommended to an information security administrator (
Table 4).
In the corresponding cells, there are numerical values denoting the degree of applicability of the defense strategy for a particular attack on a five-point scale, where 5 means Strategy_M is optimal for countering Attack_N and where 1 means Strategy_M is not suitable for protecting against Attack_N.
A learning model for content methods based on retrospective data predicts a particular strategy for each type of attack.
Collaborative methods work with an interaction or rating matrix [
4]. The task of machine learning is to determine a function which predicts the importance of an information protection strategy for each attack or for each family of attacks. Such a matrix is usually very large and sparse, with most values missing [
5].
In the approach presented here, the base model could not account for zero-day exploits [
5], which are understood to be methods used by attackers to attack systems with previously undetected vulnerabilities.
Considering zero-day exploits, the recommender system algorithm could be implemented as follows:
Each type of attack/anomaly is represented as a vector of a certain dimensionality.
When an attack is detected at the input of a model that is not in the knowledge base, the attack vector is calculated according to step 1.
Using the cosine proximity method [
6], the cyberattack from the knowledge base that is closest to the newly detected one is selected.
A protection strategy is implemented according to the information obtained in step 3.
The simplest algorithm calculates the cosine or correlation similarity of rows (users) or columns (elements) and recommends elements that have been selected as KNN [
4].
Zero-day exploits pose several challenges to information security:
Unknown vulnerabilities: Zero-day exploits target previously unknown vulnerabilities in software, operating systems, or applications. These vulnerabilities are not yet known to the software vendors or the security community, making it difficult to defend against them.
No patches or fixes: Since the vulnerabilities are unknown, there are no patches or fixes available to address them. This means that organizations and individuals must find alternative solutions to mitigate the risk posed by these exploits.
Limited visibility: Zero-day exploits can be difficult to detect and analyze, as they often use novel techniques and codes that have not been seen before. This limited visibility makes it challenging to determine the extent of the attack and the damage that has been done.
High risk: zero-day exploits can be highly risky, as they can be used to gain unauthorized access to systems and data, to steal sensitive information, or to disrupt critical infrastructure.
Difficult to mitigate: Zero-day exploits can be difficult to mitigate, as they often rely on novel techniques and codes that are not yet understood. This can make it challenging to develop effective defenses against them.
Targeted attacks: Zero-day exploits are often used in targeted attacks, where the attacker specifically targets a particular organization or individual. These attacks can be highly sophisticated and difficult to detect.
Since zero-day exploits can combine combinations of already known attacks and the recommender system solves the problem of learning without a teacher, with the proposed approach, it would be able to identify the common behavior patterns of a new vulnerability and to show attacks and, consequently, defense strategies that are as similar as possible to the attacks already carried out rather than ignoring them like existing methods to classify a limited set of attacks.
Methods based on matrix factorization [
7] reduce the dimensionality of the interaction matrix
W of size
n × v and approximate it by two or more small matrices with
k latent components, where
n is the set of attacks and where
v is the set of defense strategies (
Figure 1).
The matrix factorization method can be described by the following equation [
8]:
where
is the learning error function of the model;
is the initial matrix of the interaction between protection strategies and types of attacks;
and are matrices of the rating of the applicability of current information protection strategies for a particular information security event;
are elements of matrices and ;
is the correlation coefficient between the elements of matrices and .
The features of the matrix factorization method [
16] can be formulated as follows:
where ζ is the parameter of the protection strategy preference function.
Matrix factorization is a technique commonly used in recommender systems to reduce the dimensionality of large user–item interaction datasets and to identify latent factors that can be used to make personalized recommendations. In the context of information security, matrix factorization can be applied to various problems, such as anomaly detection, intrusion detection, malware detection, risk assessment, and personalized security.
The most popular learning algorithm is the stochastic gradient descent, which minimizes losses through the gradient updating of columns and rows of matrices
P and
Q, whose error function is described by the following equation [
18]:
where
λ is a constant determining the step of parameter change (learning rate). As an alternative to the stochastic gradient descent algorithm, we could use the method of alternating least squares, which interactively optimizes the matrices
P and
Q [
19].
Figure 2 shows a matrix of relationships between different events, where “clumps” are marked with “similar” information security events [
20].
The association matrix rules [
21] can be used in a recommendation decision support system. Elements that are often grouped together are connected by an edge in the graph. Let us denote the following:
I as the set of objects;
D as the base of transactions;
Smin as the minimum level of decision support;
Amin as the minimum confidence threshold.
Rules extracted from the interaction matrix must have at least a minimum level of decision support and a minimum confidence threshold [
22] (
Figure 3). Support is related to the frequency of occurrence [
23]. High confidence means that rules are violated infrequently.
The scalability of the rule search can be improved with the APRIORI algorithm [
19] (
Figure 3), which examines the state space of possible sets of frequent items and removes branches of the search space that are not frequent [
24].
The main metric for recommendation quality is the normalized discounted cumulative gain (nDCG) metric [
25]. The advantage of this metric is its finiteness: it takes values in the range [0;1]. The closer its value is to 1, the better the ranking of protection strategies for a particular event is [
26]:
where
takes into account the order of items in the list by multiplying the relevance of an item by a weight equal to the inverse logarithm of the item number [
27], IDCG@K is the ideal value of DCG metric, and
k is the order number in the ranked list.
The normalized discounted cumulative gain (DCG) metric is a metric that is widely used to evaluate the quality of recommendations in information retrieval and recommendation systems. It measures the usefulness or gain of a recommendation based on the user’s feedback, such as clicks or purchases. The DCG metric is calculated as the sum of the gains of all recommendations, where the gain of each recommendation is discounted by a decaying function of the time elapsed since the recommendation was made. The discounting factor allows the metric to prioritize more recent recommendations, as they are considered to be more relevant and useful to the user.
The DCG metric is normalized to ensure that the scores are on the same scale regardless of the number of recommendations made. Normalization is typically done by dividing the DCG score by the maximum possible score, which is the sum of the gains of all recommendations. To assess the quality of recommendations using the DCG metric, we could compare the score of each recommendation to a threshold value. If the score is above the threshold, the recommendation is considered to be of high quality and is likely to be relevant to the user. If the score is below the threshold, the recommendation is considered to be of low quality and may not be relevant to the user.
Thus, a decision support system for information security tasks can be built using two methods [
28] based on the following:
Both methods assume that the knowledge base contains retrospective data about the administrator’s actions when an information security event occurs [
29]. However, the collaborative filtering method is resistant to zero-day vulnerabilities.