Production Cell Operation Optimization by Reinforcement Learning
DOI:
https://doi.org/10.33927//hjic-2026-21Keywords:
reinforcement learning, manufacturing operation optimization, action masking, production cell controlAbstract
Machine learning, particularly reinforcement learning, plays an increasing role in optimizing complex industrial processes. One such challenge arises in production systems, where products must be processed, often involving nontrivial scheduling and routing problems. The paper presents a reinforcement learning (RL)-based method to optimize a specific production cell, where two material-moving units and several machining units must cooperate to manufacture items that require both processing and occasional cleaning. The proposed methodology models the environment as a Markov Decision Process and employs RL algorithms to maximize throughput. Several popular RL algorithms were compared, and it was found that Maskable Proximal Policy Optimization (Maskable PPO) delivers the best performance, as agent-specific, valid and differentiated behavior is ensured for both material handling and machining units through action masking. Among the various masking strategies tested, a distinct masking approach proved to be the most effective.

