Research

Jump to: Review

15 pages, 2977 KiB

Open AccessArticle

Learning State-Specific Action Masks for Reinforcement Learning

by Ziyi Wang, Xinran Li, Luoyang Sun, Haifeng Zhang, Hualin Liu and Jun Wang

Algorithms 2024, 17(2), 60; https://0-doi-org.brum.beds.ac.uk/10.3390/a17020060 - 30 Jan 2024

Viewed by 1277

Efficient yet sufficient exploration remains a critical challenge in reinforcement learning (RL), especially for Markov Decision Processes (MDPs) with vast action spaces. Previous approaches have commonly involved projecting the original action space into a latent space or employing environmental action masks to reduce [...] Read more.

Efficient yet sufficient exploration remains a critical challenge in reinforcement learning (RL), especially for Markov Decision Processes (MDPs) with vast action spaces. Previous approaches have commonly involved projecting the original action space into a latent space or employing environmental action masks to reduce the action possibilities. Nevertheless, these methods often lack interpretability or rely on expert knowledge. In this study, we introduce a novel method for automatically reducing the action space in environments with discrete action spaces while preserving interpretability. The proposed approach learns state-specific masks with a dual purpose: (1) eliminating actions with minimal influence on the MDP and (2) aggregating actions with identical behavioral consequences within the MDP. Specifically, we introduce a novel concept called Bisimulation Metrics on Actions by States (BMAS) to quantify the behavioral consequences of actions within the MDP and design a dedicated mask model to ensure their binary nature. Crucially, we present a practical learning procedure for training the mask model, leveraging transition data collected by any RL policy. Our method is designed to be plug-and-play and adaptable to all RL policies, and to validate its effectiveness, an integration into two prominent RL algorithms, DQN and PPO, is performed. Experimental results obtained from Maze, Atari, and

μ

RTS2 reveal a substantial acceleration in the RL learning process and noteworthy performance improvements facilitated by the introduced approach. Full article

(This article belongs to the Special Issue Algorithms for Games AI)

► Show Figures

Figure 1

22 pages, 3492 KiB

Open AccessArticle

Optimizing Reinforcement Learning Using a Generative Action-Translator Transformer

by Jiaming Li, Ning Xie and Tingting Zhao

Algorithms 2024, 17(1), 37; https://0-doi-org.brum.beds.ac.uk/10.3390/a17010037 - 16 Jan 2024

Viewed by 1731

Abstract

In recent years, with the rapid advancements in Natural Language Processing (NLP) technologies, large models have become widespread. Traditional reinforcement learning algorithms have also started experimenting with language models to optimize training. However, they still fundamentally rely on the Markov Decision Process (MDP) [...] Read more.

In recent years, with the rapid advancements in Natural Language Processing (NLP) technologies, large models have become widespread. Traditional reinforcement learning algorithms have also started experimenting with language models to optimize training. However, they still fundamentally rely on the Markov Decision Process (MDP) for reinforcement learning, and do not fully exploit the advantages of language models for dealing with long sequences of problems. The Decision Transformer (DT) introduced in 2021 is the initial effort to completely transform the reinforcement learning problem into a challenge within the NLP domain. It attempts to use text generation techniques to create reinforcement learning trajectories, addressing the issue of finding optimal trajectories. However, the article places the training trajectory data of reinforcement learning directly into a basic language model for training. Its aim is to predict the entire trajectory, encompassing state and reward information. This approach deviates from the reinforcement learning training objective of finding the optimal action. Furthermore, it generates redundant information in the output, impacting the final training effectiveness of the agent. This paper proposes a more reasonable network model structure, the Action-Translator Transformer (ATT), to predict only the next action of the agent. This makes the language model more interpretable for the reinforcement learning problem. We test our model in simulated gaming scenarios and compare it with current mainstream methods in the offline reinforcement learning field. Based on the presented experimental results, our model demonstrates superior performance. We hope that introducing this model will inspire new ideas and solutions for combining language models and reinforcement learning, providing fresh perspectives for offline reinforcement learning research. Full article

(This article belongs to the Special Issue Algorithms for Games AI)

► Show Figures

Figure 1

18 pages, 967 KiB

Open AccessArticle

Reducing Q-Value Estimation Bias via Mutual Estimation and Softmax Operation in MADRL

by Zheng Li, Xinkai Chen, Jiaqing Fu, Ning Xie and Tingting Zhao

Algorithms 2024, 17(1), 36; https://0-doi-org.brum.beds.ac.uk/10.3390/a17010036 - 16 Jan 2024

Viewed by 1187

Abstract

With the development of electronic game technology, the content of electronic games presents a larger number of units, richer unit attributes, more complex game mechanisms, and more diverse team strategies. Multi-agent deep reinforcement learning shines brightly in this type of team electronic game, [...] Read more.

With the development of electronic game technology, the content of electronic games presents a larger number of units, richer unit attributes, more complex game mechanisms, and more diverse team strategies. Multi-agent deep reinforcement learning shines brightly in this type of team electronic game, achieving results that surpass professional human players. Reinforcement learning algorithms based on Q-value estimation often suffer from Q-value overestimation, which may seriously affect the performance of AI in multi-agent scenarios. We propose a multi-agent mutual evaluation method and a multi-agent softmax method to reduce the estimation bias of Q values in multi-agent scenarios, and have tested them in both the particle multi-agent environment and the multi-agent tank environment we constructed. The multi-agent tank environment we have built has achieved a good balance between experimental verification efficiency and multi-agent game task simulation. It can be easily extended for different multi-agent cooperation or competition tasks. We hope that it can be promoted in the research of multi-agent deep reinforcement learning. Full article

(This article belongs to the Special Issue Algorithms for Games AI)

► Show Figures

Figure 1

20 pages, 1201 KiB

Open AccessArticle

Hierarchical Reinforcement Learning for Crude Oil Supply Chain Scheduling

by Nan Ma, Ziyi Wang, Zeyu Ba, Xinran Li, Ning Yang, Xinyi Yang and Haifeng Zhang

Algorithms 2023, 16(7), 354; https://0-doi-org.brum.beds.ac.uk/10.3390/a16070354 - 24 Jul 2023

Cited by 1 | Viewed by 1294

Abstract

Crude oil resource scheduling is one of the critical issues upstream in the crude oil industry chain. It aims to reduce transportation and inventory costs and avoid alerts of inventory limit violations by formulating reasonable crude oil transportation and inventory strategies. Two main [...] Read more.

Crude oil resource scheduling is one of the critical issues upstream in the crude oil industry chain. It aims to reduce transportation and inventory costs and avoid alerts of inventory limit violations by formulating reasonable crude oil transportation and inventory strategies. Two main difficulties coexist in this problem: the large problem scale and uncertain supply and demand. Traditional operations research (OR) methods, which rely on forecasting supply and demand, face significant challenges when applied to the complicated and uncertain short-term operational process of the crude oil supply chain. To address these challenges, this paper presents a novel hierarchical optimization framework and proposes a well-designed hierarchical reinforcement learning (HRL) algorithm. Specifically, reinforcement learning (RL), as an upper-level agent, is used to select the operational operators combined by various sub-goals and solving orders, while the lower-level agent finds a viable solution and provides penalty feedback to the upper-level agent based on the chosen operator. Additionally, we deploy a simulator based on real-world data and execute comprehensive experiments. Regarding the alert number, maximum alert penalty, and overall transportation cost, our HRL method outperforms existing OR and two RL algorithms in the majority of time steps. Full article

(This article belongs to the Special Issue Algorithms for Games AI)

► Show Figures

Figure 1

27 pages, 893 KiB

Open AccessArticle

Official International Mahjong: A New Playground for AI Research

by Yunlong Lu, Wenxin Li and Wenlong Li

Algorithms 2023, 16(5), 235; https://0-doi-org.brum.beds.ac.uk/10.3390/a16050235 - 28 Apr 2023

Cited by 3 | Viewed by 2207

Abstract

Games have long been benchmarks and testbeds for AI research. In recent years, with the development of new algorithms and the boost in computational power, many popular games played by humans have been solved by AI systems. Mahjong is one of the most [...] Read more.

Games have long been benchmarks and testbeds for AI research. In recent years, with the development of new algorithms and the boost in computational power, many popular games played by humans have been solved by AI systems. Mahjong is one of the most popular games played in China and has been spread worldwide, which presents challenges for AI research due to its multi-agent nature, rich hidden information, and complex scoring rules, but it has been somehow overlooked in the community of game AI research. In 2020 and 2022, we held two AI competitions of Official International Mahjong, the standard variant of Mahjong rules, in conjunction with a top-tier AI conference called IJCAI. We are the first to adopt the duplicate format in evaluating Mahjong AI agents to mitigate the high variance in this game. By comparing the algorithms and performance of AI agents in the competitions, we conclude that supervised learning and reinforcement learning are the current state-of-the-art methods in this game and perform much better than heuristic methods based on human knowledge. We also held a human-versus-AI competition and found that the top AI agent still could not beat professional human players. We claim that this game can be a new benchmark for AI research due to its complexity and popularity among people. Full article

(This article belongs to the Special Issue Algorithms for Games AI)

► Show Figures

Figure 1

22 pages, 1927 KiB

Open AccessArticle

Measuring the Non-Transitivity in Chess

by Ricky Sanjaya, Jun Wang and Yaodong Yang

Algorithms 2022, 15(5), 152; https://0-doi-org.brum.beds.ac.uk/10.3390/a15050152 - 28 Apr 2022

Cited by 7 | Viewed by 2817

Abstract

In this paper, we quantify the non-transitivity in chess using human game data. Specifically, we perform non-transitivity quantification in two ways—Nash clustering and counting the number of rock–paper–scissor cycles—on over one billion matches from the Lichess and FICS databases. Our findings indicate that [...] Read more.

In this paper, we quantify the non-transitivity in chess using human game data. Specifically, we perform non-transitivity quantification in two ways—Nash clustering and counting the number of rock–paper–scissor cycles—on over one billion matches from the Lichess and FICS databases. Our findings indicate that the strategy space of real-world chess strategies has a spinning top geometry and that there exists a strong connection between the degree of non-transitivity and the progression of a chess player’s rating. Particularly, high degrees of non-transitivity tend to prevent human players from making progress in their Elo ratings. We also investigate the implications of non-transitivity for population-based training methods. By considering fixed-memory fictitious play as a proxy, we conclude that maintaining large and diverse populations of strategies is imperative to training effective AI agents for solving chess. Full article

(This article belongs to the Special Issue Algorithms for Games AI)

► Show Figures

Figure 1

23 pages, 591 KiB

Open AccessArticle

Research and Challenges of Reinforcement Learning in Cyber Defense Decision-Making for Intranet Security

by Wenhao Wang, Dingyuanhao Sun, Feng Jiang, Xingguo Chen and Cheng Zhu

Algorithms 2022, 15(4), 134; https://0-doi-org.brum.beds.ac.uk/10.3390/a15040134 - 18 Apr 2022

Cited by 5 | Viewed by 4525

Abstract

In recent years, cyber attacks have shown diversified, purposeful, and organized characteristics, which pose significant challenges to cyber defense decision-making on internal networks. Due to the continuous confrontation between attackers and defenders, only using data-based statistical or supervised learning methods cannot cope with [...] Read more.

In recent years, cyber attacks have shown diversified, purposeful, and organized characteristics, which pose significant challenges to cyber defense decision-making on internal networks. Due to the continuous confrontation between attackers and defenders, only using data-based statistical or supervised learning methods cannot cope with increasingly severe security threats. It is urgent to rethink network defense from the perspective of decision-making, and prepare for every possible situation. Reinforcement learning has made great breakthroughs in addressing complicated decision-making problems. We propose a framework that defines four modules based on the life cycle of threats: pentest, design, response, recovery. Our aims are to clarify the problem boundary of network defense decision-making problems, to study the problem characteristics in different contexts, to compare the strengths and weaknesses of existing research, and to identify promising challenges for future work. Our work provides a systematic view for understanding and solving decision-making problems in the application of reinforcement learning to cyber defense. Full article

(This article belongs to the Special Issue Algorithms for Games AI)

► Show Figures

Figure 1

Review

Jump to: Research

27 pages, 363 KiB

Open AccessReview

Techniques and Paradigms in Modern Game AI Systems

by Yunlong Lu and Wenxin Li

Algorithms 2022, 15(8), 282; https://0-doi-org.brum.beds.ac.uk/10.3390/a15080282 - 12 Aug 2022

Cited by 5 | Viewed by 5651

Abstract

Games have long been benchmarks and test-beds for AI algorithms. With the development of AI techniques and the boost of computational power, modern game AI systems have achieved superhuman performance in many games played by humans. These games have various features and present [...] Read more.

Games have long been benchmarks and test-beds for AI algorithms. With the development of AI techniques and the boost of computational power, modern game AI systems have achieved superhuman performance in many games played by humans. These games have various features and present different challenges to AI research, so the algorithms used in each of these AI systems vary. This survey aims to give a systematic review of the techniques and paradigms used in modern game AI systems. By decomposing each of the recent milestones into basic components and comparing them based on the features of games, we summarize the common paradigms to build game AI systems and their scope and limitations. We claim that deep reinforcement learning is the most general methodology to become a mainstream method for games with higher complexity. We hope this survey can both provide a review of game AI algorithms and bring inspiration to the game AI community for future directions. Full article

(This article belongs to the Special Issue Algorithms for Games AI)

► Show Figures

Figure 1

43 pages, 565 KiB

Open AccessReview

A Review: Machine Learning for Combinatorial Optimization Problems in Energy Areas

by Xinyi Yang, Ziyi Wang, Hengxi Zhang, Nan Ma, Ning Yang, Hualin Liu, Haifeng Zhang and Lei Yang

Algorithms 2022, 15(6), 205; https://0-doi-org.brum.beds.ac.uk/10.3390/a15060205 - 13 Jun 2022

Cited by 11 | Viewed by 5735

Abstract

Combinatorial optimization problems (COPs) are a class of NP-hard problems with great practical significance. Traditional approaches for COPs suffer from high computational time and reliance on expert knowledge, and machine learning (ML) methods, as powerful tools have been used to overcome these problems. [...] Read more.

Combinatorial optimization problems (COPs) are a class of NP-hard problems with great practical significance. Traditional approaches for COPs suffer from high computational time and reliance on expert knowledge, and machine learning (ML) methods, as powerful tools have been used to overcome these problems. In this review, the COPs in energy areas with a series of modern ML approaches, i.e., the interdisciplinary areas of COPs, ML and energy areas, are mainly investigated. Recent works on solving COPs using ML are sorted out firstly by methods which include supervised learning (SL), deep learning (DL), reinforcement learning (RL) and recently proposed game theoretic methods, and then problems where the timeline of the improvements for some fundamental COPs is the layout. Practical applications of ML methods in the energy areas, including the petroleum supply chain, steel-making, electric power system and wind power, are summarized for the first time, and challenges in this field are analyzed. Full article

(This article belongs to the Special Issue Algorithms for Games AI)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Algorithms for Games AI

Share This Special Issue

Special Issue Editors

Special Issue Information

Published Papers (9 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI