The distribution process in business-to-business materials trading is among the most complex and in transparent ones within logistics. The highly volatile environment requires continuous adaptations by the responsible decision-makers, who face a substantial number of potential improvement actions with conflicting goals, such as simultaneously maintaining a high service level and low costs. Simulation-optimisation approaches have been proposed in this context, for example based on evolutionary algorithms. But, on real-world system dimensions, they face impractically long computation times. This paper addresses this challenge in two principal streams. On the one hand, reinforcement learning is investigated to reduce the response time of the system in a concrete decision situation. On the other hand, domain-specific information and defining equivalent solutions are exploited to support a metaheuristic algorithm. For these approaches, we have developed suitable implementations and evaluated them with subsets of real-world data. The results demonstrate that reinforcement learning exploits the idle time between decision situations to learn which decisions might be most promising, thus adding computation time but significantly reducing the response time. Using domain-specific information reduces the number of required simulation runs and guides the search for promising actions. In our experimentation, defining equivalent solutions decreased the number of required simulation runs up to 15%.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited