다종 장사정포 공격에 대한 강화학습 기반의 동적 무기할당
North Korea continues to upgrade and display its long-range rocket launchers to emphasize its military strength. Recently Republic of Korea kicked off the development of anti-artillery interception system similar to Israel’s “Iron Dome”, designed to protect against North Korea’s arsenal of long-range rockets. The system may not work smoothly without the function assigning interceptors to incoming various-caliber artillery rockets. We view the assignment task as a dynamic weapon target assignment (DWTA) problem. DWTA is a multistage decision process in which decision in a stage affects decision processes and its results in the subsequent stages. We represent the DWTA problem as a Markov decision process (MDP). Distance from Seoul to North Korea’s multiple rocket launchers positioned near the border, limits the processing time of the model solver within only a few second. It is impossible to compute the exact optimal solution within the allowed time interval due to the curse of dimensionality inherently in MDP model of practical DWTA problem. We apply two reinforcement-based algorithms to get the approximate solution of the MDP model within the time limit. To check the quality of the approximate solution, we adopt Shoot-Shoot-Look(SSL) policy as a baseline. Simulation results showed that both algorithms provide better solution than the solution from the baseline strategy.