검색결과

검색조건
좁혀보기
검색필터
결과 내 재검색

간행물

    분야

      발행연도

      -

        검색결과 29

        1.
        2023.12 KCI 등재 구독 인증기관·개인회원 무료
        다중 에이전트 강화학습의 발전과 함께 게임 분야에서 강화학습을 레벨 디자인에 적용하려는 연구가 계속되 고 있다. 플랫폼의 형태가 레벨 디자인의 중요한 요소임에도 불구하고 지금까지의 연구들은 플레이어의 스킬 수준이나, 스킬 구성 등 플레이어의 매트릭에 초첨을 맞춰 강화학습을 활용하였다. 따라서 본 논문에서는 레 벨 디자인에 플랫폼의 형태가 사용될 수 있도록 시각 센서의 가시성과 구조물의 복잡성을 고려하여 플랫폼 이 플레이 경험에 미치는 영향을 연구한다. 이를 위해Unity ML-Agents Toolkit과MA-POCA 알고리즘, Self-play 방식을 기반으로2vs2 대전 슈팅 게임 환경을 개발하였으며 다양한 플랫폼의 형태를 구성하였다. 분석을 통해 플랫폼의 형태에 따른 가시성과 복잡성의 차이가 승률 밸런스에는 크게 영향을 미치지 않으나 전체 에피소 드 수, 무승부 비율, Elo의 증가폭에 유의미한 영향을 미치는 것을 확인했다.
        2.
        2023.12 KCI 등재 구독 인증기관 무료, 개인회원 유료
        기존의 스타크래프트II 내장AI는 미리 정의한 행동 패턴을 따르기 때문에 사용자가 전략을 쉽게 파악할 수 있어 사용자의 흥미를 오랫동안 유지시키기 힘들다. 이를 해결하기 위해, 많은 강화학습 기반의 스타크래프 트II AI 연구가 진행되었다. 그러나 기존의 강화학습AI는 승률에만 중점을 두고 에이전트를 학습시킴으로써 소수의 유닛을 사용하거나 정형화 된 전략만을 사용하여 여전히 사용자들이 게임의 재미를 느끼기에 한계가 존재한다. 본 논문에서는 게임의 재미를 향상시키기 위하여, 강화학습을 활용하여 실제 플레이어와 유사한 AI을 제안한다. 에이전트에게 스타크래프트II의 상성표를 학습시키고, 정찰한 정보로 보상을 부여해 유동적 으로 전략을 변경하도록 한다. 실험 결과, 사용자가 느끼는 재미와 난이도, 유사도 부분에서 고정된 전략을 사용하는 에이전트보다 본 논문에서 제안하는 에이전트가 더 높은 평가를 받았다..
        4,000원
        3.
        2023.12 KCI 등재 구독 인증기관 무료, 개인회원 유료
        This paper proposes an algorithm for the Unrelated Parallel Machine Scheduling Problem(UPMSP) without setup times, aiming to minimize total tardiness. As an NP-hard problem, the UPMSP is hard to get an optimal solution. Consequently, practical scenarios are solved by relying on operator's experiences or simple heuristic approaches. The proposed algorithm has adapted two methods: a policy network method, based on Transformer to compute the correlation between individual jobs and machines, and another method to train the network with a reinforcement learning algorithm based on the REINFORCE with Baseline algorithm. The proposed algorithm was evaluated on randomly generated problems and the results were compared with those obtained using CPLEX, as well as three scheduling algorithms. This paper confirms that the proposed algorithm outperforms the comparison algorithms, as evidenced by the test results.
        4,000원
        7.
        2023.11 구독 인증기관·개인회원 무료
        Nuclear Material Accountancy (NMA) system quantitatively evaluates whether nuclear material is diverted or not. Material balance is evaluated based on nuclear material measurements based on this system and these processes are based on statistical techniques. Therefore, it is possible to evaluate the performance based on modeling and simulation technique from the development stage. In the performance evaluation, several diversion scenarios are established, nuclear material diversion is attempted in a virtual simulation environment according to these scenarios, and the detection probability is evaluated. Therefore, one of the important things is to derive vulnerable diversion scenario in advance. However, in actual facilities, it is not easy to manually derive weak scenario because there are numerous factors that affect detection performance. In this study, reinforcement learning has been applied to automatically derive vulnerable diversion scenarios from virtual NMA system. Reinforcement learning trains agents to take optimal actions in a virtual environment, and based on this, it is possible to develop an agent that attempt to divert nuclear materials according to optimal weak scenario in the NMA system. A somewhat simple NMA system model has been considered to confirm the applicability of reinforcement learning in this study. The simple model performs 10 consecutive material balance evaluations per year and has the characteristic of increasing MUF uncertainty according to balance period. The expected vulnerable diversion scenario is a case where the amount of diverted nuclear material increases in proportion to the size of the MUF uncertainty, and total amount of diverted nuclear material was assumed to be 8 kg, which corresponds to one significant quantity of plutonium. Virtual NMA system model (environment) and a divertor (agent) attempting to divert nuclear material were modeled to apply reinforcement learning. The agent is designed to receive a negative reward if an action attempting to divert is detected by the NMA system. Reinforcement learning automatically trains the agent to receive the maximum reward, and through this, the weakest diversion scenario can be derived. As a result of the study, it was confirmed that the agent was trained to attempt to divert nuclear material in a direction with a low detection probability in this system model. Through these results, it is found that it was possible to sufficiently derive weak scenarios based on reinforcement learning. This technique considered in this study can suggest methods to derive and supplement weak diversion scenarios in NMA system in advance. However, in order to apply this technology smoothly, there are still issues to be solved, and further research will be needed in the future.
        9.
        2023.06 KCI 등재 구독 인증기관 무료, 개인회원 유료
        Reinforcement learning (RL) is widely applied to various engineering fields. Especially, RL has shown successful performance for control problems, such as vehicles, robotics, and active structural control system. However, little research on application of RL to optimal structural design has conducted to date. In this study, the possibility of application of RL to structural design of reinforced concrete (RC) beam was investigated. The example of RC beam structural design problem introduced in previous study was used for comparative study. Deep q-network (DQN) is a famous RL algorithm presenting good performance in the discrete action space and thus it was used in this study. The action of DQN agent is required to represent design variables of RC beam. However, the number of design variables of RC beam is too many to represent by the action of conventional DQN. To solve this problem, multi-agent DQN was used in this study. For more effective reinforcement learning process, DDQN (Double Q-Learning) that is an advanced version of a conventional DQN was employed. The multi-agent of DDQN was trained for optimal structural design of RC beam to satisfy American Concrete Institute (318) without any hand-labeled dataset. Five agents of DDQN provides actions for beam with, beam depth, main rebar size, number of main rebar, and shear stirrup size, respectively. Five agents of DDQN were trained for 10,000 episodes and the performance of the multi-agent of DDQN was evaluated with 100 test design cases. This study shows that the multi-agent DDQN algorithm can provide successfully structural design results of RC beam.
        4,000원
        11.
        2022.12 KCI 등재 구독 인증기관 무료, 개인회원 유료
        North Korea continues to upgrade and display its long-range rocket launchers to emphasize its military strength. Recently Republic of Korea kicked off the development of anti-artillery interception system similar to Israel’s “Iron Dome”, designed to protect against North Korea’s arsenal of long-range rockets. The system may not work smoothly without the function assigning interceptors to incoming various-caliber artillery rockets. We view the assignment task as a dynamic weapon target assignment (DWTA) problem. DWTA is a multistage decision process in which decision in a stage affects decision processes and its results in the subsequent stages. We represent the DWTA problem as a Markov decision process (MDP). Distance from Seoul to North Korea’s multiple rocket launchers positioned near the border, limits the processing time of the model solver within only a few second. It is impossible to compute the exact optimal solution within the allowed time interval due to the curse of dimensionality inherently in MDP model of practical DWTA problem. We apply two reinforcement-based algorithms to get the approximate solution of the MDP model within the time limit. To check the quality of the approximate solution, we adopt Shoot-Shoot-Look(SSL) policy as a baseline. Simulation results showed that both algorithms provide better solution than the solution from the baseline strategy.
        4,200원
        14.
        2022.06 KCI 등재 구독 인증기관 무료, 개인회원 유료
        Recently, machine learning is widely used to solve optimization problems in various engineering fields. In this study, machine learning is applied to development of a control algorithm for a smart control device for reduction of seismic responses. For this purpose, Deep Q-network (DQN) out of reinforcement learning algorithms was employed to develop control algorithm. A single degree of freedom (SDOF) structure with a smart tuned mass damper (TMD) was used as an example structure. A smart TMD system was composed of MR (magnetorheological) damper instead of passive damper. Reward design of reinforcement learning mainly affects the control performance of the smart TMD. Various hyperparameters were investigated to optimize the control performance of DQN-based control algorithm. Usually, decrease of the time step for numerical simulation is desirable to increase the accuracy of simulation results. However, the numerical simulation results presented that decrease of the time step for reward calculation might decrease the control performance of DQN-based control algorithm. Therefore, a proper time step for reward calculation should be selected in a DQN training process.
        4,000원
        15.
        2021.12 KCI 등재 구독 인증기관 무료, 개인회원 유료
        현재 교량과 같은 토목구조물의 설계프로세스는 1차 설계 후 구조 검토를 수행하여 기준에 부적합할 경우 재설계하는 과정을 반복 하여 최종적인 성과품을 만드는 것이 일반적이다. 이러한 반복 과정은 설계에 소요되는 기간을 연장시키는 원인이 되며, 보다 수준 높 은 설계를 위해 투입되어야 할 고급 엔지니어링 인력을 기계적인 단순 반복 작업에 소모하고 있다. 이러한 문제는 설계 과정 자동화를 통하여 해결할 수 있으나, 설계 과정에서 사용되는 해석프로그램은 이러한 자동화에 가장 큰 장애요인이 되어 왔다. 본 연구에서는 기 존 설계 과정 중 반복작업을 대체하고자 강화학습 알고리즘과 외부 해석프로그램을 함께 제어할 수 있는 인터페이스를 포함한 교량 설계 프로세스에 대한 AI기반 자동화 시스템을 구축하였다. 이 연구를 통하여 구축된 시스템의 프로토타입은 2경간 RC라멘교를 대 상으로 제작하였다. 개발된 인터페이스 체계는 향후 최신 AI 및 타 형식의 교량설계 간 연계를 위한 기초기술로써 활용될 수 있을 것 으로 판단된다..
        4,000원
        16.
        2021.12 KCI 등재 구독 인증기관 무료, 개인회원 유료
        A mid-story isolation system was proposed for seismic response reduction of high-rise buildings and presented good control performance. Control performance of a mid-story isolation system was enhanced by introducing semi-active control devices into isolation systems. Seismic response reduction capacity of a semi-active mid-story isolation system mainly depends on effect of control algorithm. AI(Artificial Intelligence)-based control algorithm was developed for control of a semi-active mid-story isolation system in this study. For this research, an practical structure of Shiodome Sumitomo building in Japan which has a mid-story isolation system was used as an example structure. An MR (magnetorheological) damper was used to make a semi-active mid-story isolation system in example model. In numerical simulation, seismic response prediction model was generated by one of supervised learning model, i.e. an RNN (Recurrent Neural Network). Deep Q-network (DQN) out of reinforcement learning algorithms was employed to develop control algorithm The numerical simulation results presented that the DQN algorithm can effectively control a semi-active mid-story isolation system resulting in successful reduction of seismic responses.
        4,000원
        18.
        2021.06 KCI 등재 구독 인증기관 무료, 개인회원 유료
        In this study, we investigated whether a tool such as a game toy can be used as an augmented reality tool, and a system model that can be extended to a game element using wireless communication technology such as Bluetooth and a controllable module. This is an online ship type game using augmented reality technology and wireless communication technology. In addition, the existing game element was extended by applying a smartphone app control module. The existing game method uses the method of playing the game with only limited functions in the same space. This study expands to augmented reality-based games by implementing contents in a way that matches game objects with the grafting of augmented reality technology, and uses various items that emerge as the limit of reality. Therefore, we standardized the size of game objects so that they can be used three-dimension in all spaces on the screen according to the space arrangement such as overlapping prevention, distance, and height, and augmented reality technology was used to allow the game to be played by manipulation of a smartphone. In addition, we propose a system framework-based model that can be applied to various games, and a framework that can implement various augmented reality environments. The augmented reality-based battle game proposed in this study combines a knowledge-based augmented reality system that can be extended to game elements by modularizing the function of a toy through+ a context-aware agent based on context information and an intelligent DB based on domain knowledge.
        4,300원
        19.
        2021.06 KCI 등재 구독 인증기관 무료, 개인회원 유료
        A smart tuned mass damper (TMD) is widely studied for seismic response reduction of various structures. Control algorithm is the most important factor for control performance of a smart TMD. This study used a Deep Deterministic Policy Gradient (DDPG) among reinforcement learning techniques to develop a control algorithm for a smart TMD. A magnetorheological (MR) damper was used to make the smart TMD. A single mass model with the smart TMD was employed to make a reinforcement learning environment. Time history analysis simulations of the example structure subject to artificial seismic load were performed in the reinforcement learning process. Critic of policy network and actor of value network for DDPG agent were constructed. The action of DDPG agent was selected as the command voltage sent to the MR damper. Reward for the DDPG action was calculated by using displacement and velocity responses of the main mass. Groundhook control algorithm was used as a comparative control algorithm. After 10,000 episode training of the DDPG agent model with proper hyper-parameters, the semi-active control algorithm for control of seismic responses of the example structure with the smart TMD was developed. The simulation results presented that the developed DDPG model can provide effective control algorithms for smart TMD for reduction of seismic responses.
        4,000원
        20.
        2020.12 KCI 등재 구독 인증기관 무료, 개인회원 유료
        게임을 포함한 가상환경 및 현실의 문제를 해결하기 위한 현대의 강화학습에서는 근사 함수로써 인공신경망을 사용한다. 하지만 이는 통계 기반이기 때문에 대량의 데이터가 필요해서 시뮬레이터가 없는 경우는 사용 및 적용에 애로가 있다. 이때문에 인공신경망은 아직 일상에서 자주 접할 수가 없는데, 대부분의 환경은 시뮬레이터를 만들기 힘들거나 데이터와 보상은 희소하기 때문이다. 이에 메모리 구조를 활용해서 적은 데이터와 희소한 보상을 가진 환경에서 빠른 학습을 할 수 있는 모델을 만들었다. 실험에서는 기존의 policy gradient와 메모리를 기반으로 open AI CartPole 문제에 도전했다. 이때 이득을 평가하는 함수인 Advantage function을 메모리구조를 변형하여 구현하였다. 이후 실험에 서 모델의 학습 시 편차가 커서 평균적으로는 저조한 성적을 보였다. 하지만 다른 알고리즘과의 학습 속도 비교를 통해 100회 이내의 작은 에피소드 내에서 상위 10개, 5개의 성적이 타 알고리즘들 보다 더 높은 점수를 획득한 것을 확인하였다. 결론적으로 연구를 통해 메모리구조를 사용하는 방법이 적은 데이터에 효과적일수 있다는 가능성을 발견했으며, 향후에는 학습의 편차를 줄이는 기술들에 대한 연구가 필요하다.
        4,000원
        1 2