PURPOSES : In this study, the existence of an optimal pattern among transition methods applied during changes in traffic signal timing was investigated. We aimed to develop this pattern into an artificial intelligence reinforcement-learning model to assess its effectiveness METHODS : By developing various traffic signal transition scenarios and considering 19 different traffic signal transition situations that can be applied to these scenarios, a simulation analysis was performed to identify patterns through statistical analysis. Subsequently, a reinforcement-learning model was developed to select an optimal transition time model suitable for various traffic conditions. This model was then tested by simulating a virtual experimental center environment and conducting performance comparison evaluations on a daily basis. RESULTS : The results indicated that when the change in the traffic signal cycle length was less than 50% in the negative direction, the subtraction method was efficient. In cases where the transition was less than 15% in the positive direction, the proposed center method for traffic signal transition was found to be advantageous. By applying the proposed optimal transition model selection, we observed that the transition time decreased by approximately 70%. CONCLUSIONS : The findings of this study provide guidance for the next level of traffic signal transitions. The importance of traffic signal transition will increase in future AI-based traffic signal control methods, requiring ongoing research in this field.
In the manufacturing industry, dispatching systems play a crucial role in enhancing production efficiency and optimizing production volume. However, in dynamic production environments, conventional static dispatching methods struggle to adapt to various environmental conditions and constraints, leading to problems such as reduced production volume, delays, and resource wastage. Therefore, there is a need for dynamic dispatching methods that can quickly adapt to changes in the environment. In this study, we aim to develop an agent-based model that considers dynamic situations through interaction between agents. Additionally, we intend to utilize the Q-learning algorithm, which possesses the characteristics of temporal difference (TD) learning, to automatically update and adapt to dynamic situations. This means that Q-learning can effectively consider dynamic environments by sensitively responding to changes in the state space and selecting optimal dispatching rules accordingly. The state space includes information such as inventory and work-in-process levels, order fulfilment status, and machine status, which are used to select the optimal dispatching rules. Furthermore, we aim to minimize total tardiness and the number of setup changes using reinforcement learning. Finally, we will develop a dynamic dispatching system using Q-learning and compare its performance with conventional static dispatching methods.
다중 에이전트 강화학습의 발전과 함께 게임 분야에서 강화학습을 레벨 디자인에 적용하려는 연구가 계속되 고 있다. 플랫폼의 형태가 레벨 디자인의 중요한 요소임에도 불구하고 지금까지의 연구들은 플레이어의 스킬 수준이나, 스킬 구성 등 플레이어의 매트릭에 초첨을 맞춰 강화학습을 활용하였다. 따라서 본 논문에서는 레 벨 디자인에 플랫폼의 형태가 사용될 수 있도록 시각 센서의 가시성과 구조물의 복잡성을 고려하여 플랫폼 이 플레이 경험에 미치는 영향을 연구한다. 이를 위해Unity ML-Agents Toolkit과MA-POCA 알고리즘, Self-play 방식을 기반으로2vs2 대전 슈팅 게임 환경을 개발하였으며 다양한 플랫폼의 형태를 구성하였다. 분석을 통해 플랫폼의 형태에 따른 가시성과 복잡성의 차이가 승률 밸런스에는 크게 영향을 미치지 않으나 전체 에피소 드 수, 무승부 비율, Elo의 증가폭에 유의미한 영향을 미치는 것을 확인했다.
기존의 스타크래프트II 내장AI는 미리 정의한 행동 패턴을 따르기 때문에 사용자가 전략을 쉽게 파악할 수 있어 사용자의 흥미를 오랫동안 유지시키기 힘들다. 이를 해결하기 위해, 많은 강화학습 기반의 스타크래프 트II AI 연구가 진행되었다. 그러나 기존의 강화학습AI는 승률에만 중점을 두고 에이전트를 학습시킴으로써 소수의 유닛을 사용하거나 정형화 된 전략만을 사용하여 여전히 사용자들이 게임의 재미를 느끼기에 한계가 존재한다. 본 논문에서는 게임의 재미를 향상시키기 위하여, 강화학습을 활용하여 실제 플레이어와 유사한 AI을 제안한다. 에이전트에게 스타크래프트II의 상성표를 학습시키고, 정찰한 정보로 보상을 부여해 유동적 으로 전략을 변경하도록 한다. 실험 결과, 사용자가 느끼는 재미와 난이도, 유사도 부분에서 고정된 전략을 사용하는 에이전트보다 본 논문에서 제안하는 에이전트가 더 높은 평가를 받았다..
This paper proposes an algorithm for the Unrelated Parallel Machine Scheduling Problem(UPMSP) without setup times, aiming to minimize total tardiness. As an NP-hard problem, the UPMSP is hard to get an optimal solution. Consequently, practical scenarios are solved by relying on operator's experiences or simple heuristic approaches. The proposed algorithm has adapted two methods: a policy network method, based on Transformer to compute the correlation between individual jobs and machines, and another method to train the network with a reinforcement learning algorithm based on the REINFORCE with Baseline algorithm. The proposed algorithm was evaluated on randomly generated problems and the results were compared with those obtained using CPLEX, as well as three scheduling algorithms. This paper confirms that the proposed algorithm outperforms the comparison algorithms, as evidenced by the test results.
Nuclear Material Accountancy (NMA) system quantitatively evaluates whether nuclear material is diverted or not. Material balance is evaluated based on nuclear material measurements based on this system and these processes are based on statistical techniques. Therefore, it is possible to evaluate the performance based on modeling and simulation technique from the development stage. In the performance evaluation, several diversion scenarios are established, nuclear material diversion is attempted in a virtual simulation environment according to these scenarios, and the detection probability is evaluated. Therefore, one of the important things is to derive vulnerable diversion scenario in advance. However, in actual facilities, it is not easy to manually derive weak scenario because there are numerous factors that affect detection performance. In this study, reinforcement learning has been applied to automatically derive vulnerable diversion scenarios from virtual NMA system. Reinforcement learning trains agents to take optimal actions in a virtual environment, and based on this, it is possible to develop an agent that attempt to divert nuclear materials according to optimal weak scenario in the NMA system. A somewhat simple NMA system model has been considered to confirm the applicability of reinforcement learning in this study. The simple model performs 10 consecutive material balance evaluations per year and has the characteristic of increasing MUF uncertainty according to balance period. The expected vulnerable diversion scenario is a case where the amount of diverted nuclear material increases in proportion to the size of the MUF uncertainty, and total amount of diverted nuclear material was assumed to be 8 kg, which corresponds to one significant quantity of plutonium. Virtual NMA system model (environment) and a divertor (agent) attempting to divert nuclear material were modeled to apply reinforcement learning. The agent is designed to receive a negative reward if an action attempting to divert is detected by the NMA system. Reinforcement learning automatically trains the agent to receive the maximum reward, and through this, the weakest diversion scenario can be derived. As a result of the study, it was confirmed that the agent was trained to attempt to divert nuclear material in a direction with a low detection probability in this system model. Through these results, it is found that it was possible to sufficiently derive weak scenarios based on reinforcement learning. This technique considered in this study can suggest methods to derive and supplement weak diversion scenarios in NMA system in advance. However, in order to apply this technology smoothly, there are still issues to be solved, and further research will be needed in the future.
Reinforcement learning (RL) is widely applied to various engineering fields. Especially, RL has shown successful performance for control problems, such as vehicles, robotics, and active structural control system. However, little research on application of RL to optimal structural design has conducted to date. In this study, the possibility of application of RL to structural design of reinforced concrete (RC) beam was investigated. The example of RC beam structural design problem introduced in previous study was used for comparative study. Deep q-network (DQN) is a famous RL algorithm presenting good performance in the discrete action space and thus it was used in this study. The action of DQN agent is required to represent design variables of RC beam. However, the number of design variables of RC beam is too many to represent by the action of conventional DQN. To solve this problem, multi-agent DQN was used in this study. For more effective reinforcement learning process, DDQN (Double Q-Learning) that is an advanced version of a conventional DQN was employed. The multi-agent of DDQN was trained for optimal structural design of RC beam to satisfy American Concrete Institute (318) without any hand-labeled dataset. Five agents of DDQN provides actions for beam with, beam depth, main rebar size, number of main rebar, and shear stirrup size, respectively. Five agents of DDQN were trained for 10,000 episodes and the performance of the multi-agent of DDQN was evaluated with 100 test design cases. This study shows that the multi-agent DDQN algorithm can provide successfully structural design results of RC beam.
North Korea continues to upgrade and display its long-range rocket launchers to emphasize its military strength. Recently Republic of Korea kicked off the development of anti-artillery interception system similar to Israel’s “Iron Dome”, designed to protect against North Korea’s arsenal of long-range rockets. The system may not work smoothly without the function assigning interceptors to incoming various-caliber artillery rockets. We view the assignment task as a dynamic weapon target assignment (DWTA) problem. DWTA is a multistage decision process in which decision in a stage affects decision processes and its results in the subsequent stages. We represent the DWTA problem as a Markov decision process (MDP). Distance from Seoul to North Korea’s multiple rocket launchers positioned near the border, limits the processing time of the model solver within only a few second. It is impossible to compute the exact optimal solution within the allowed time interval due to the curse of dimensionality inherently in MDP model of practical DWTA problem. We apply two reinforcement-based algorithms to get the approximate solution of the MDP model within the time limit. To check the quality of the approximate solution, we adopt Shoot-Shoot-Look(SSL) policy as a baseline. Simulation results showed that both algorithms provide better solution than the solution from the baseline strategy.
Recently, machine learning is widely used to solve optimization problems in various engineering fields. In this study, machine learning is applied to development of a control algorithm for a smart control device for reduction of seismic responses. For this purpose, Deep Q-network (DQN) out of reinforcement learning algorithms was employed to develop control algorithm. A single degree of freedom (SDOF) structure with a smart tuned mass damper (TMD) was used as an example structure. A smart TMD system was composed of MR (magnetorheological) damper instead of passive damper. Reward design of reinforcement learning mainly affects the control performance of the smart TMD. Various hyperparameters were investigated to optimize the control performance of DQN-based control algorithm. Usually, decrease of the time step for numerical simulation is desirable to increase the accuracy of simulation results. However, the numerical simulation results presented that decrease of the time step for reward calculation might decrease the control performance of DQN-based control algorithm. Therefore, a proper time step for reward calculation should be selected in a DQN training process.
현재 교량과 같은 토목구조물의 설계프로세스는 1차 설계 후 구조 검토를 수행하여 기준에 부적합할 경우 재설계하는 과정을 반복 하여 최종적인 성과품을 만드는 것이 일반적이다. 이러한 반복 과정은 설계에 소요되는 기간을 연장시키는 원인이 되며, 보다 수준 높 은 설계를 위해 투입되어야 할 고급 엔지니어링 인력을 기계적인 단순 반복 작업에 소모하고 있다. 이러한 문제는 설계 과정 자동화를 통하여 해결할 수 있으나, 설계 과정에서 사용되는 해석프로그램은 이러한 자동화에 가장 큰 장애요인이 되어 왔다. 본 연구에서는 기 존 설계 과정 중 반복작업을 대체하고자 강화학습 알고리즘과 외부 해석프로그램을 함께 제어할 수 있는 인터페이스를 포함한 교량 설계 프로세스에 대한 AI기반 자동화 시스템을 구축하였다. 이 연구를 통하여 구축된 시스템의 프로토타입은 2경간 RC라멘교를 대 상으로 제작하였다. 개발된 인터페이스 체계는 향후 최신 AI 및 타 형식의 교량설계 간 연계를 위한 기초기술로써 활용될 수 있을 것 으로 판단된다..
A mid-story isolation system was proposed for seismic response reduction of high-rise buildings and presented good control performance. Control performance of a mid-story isolation system was enhanced by introducing semi-active control devices into isolation systems. Seismic response reduction capacity of a semi-active mid-story isolation system mainly depends on effect of control algorithm. AI(Artificial Intelligence)-based control algorithm was developed for control of a semi-active mid-story isolation system in this study. For this research, an practical structure of Shiodome Sumitomo building in Japan which has a mid-story isolation system was used as an example structure. An MR (magnetorheological) damper was used to make a semi-active mid-story isolation system in example model. In numerical simulation, seismic response prediction model was generated by one of supervised learning model, i.e. an RNN (Recurrent Neural Network). Deep Q-network (DQN) out of reinforcement learning algorithms was employed to develop control algorithm The numerical simulation results presented that the DQN algorithm can effectively control a semi-active mid-story isolation system resulting in successful reduction of seismic responses.