검색결과 - koreascholar

1.

2023.12 KCI 등재 구독 인증기관·개인회원 무료

A Study on the Influence of Platform Design in Level Design by utilizing Multi-agent Reinforcement Learning

강화학습을 이용한 대전 슈팅 게임의 플랫폼 형태에 따른 레벨 디자인 영향 분석

한국컴퓨터게임학회 논문지 제36권 제4호 pp.236-29 한국컴퓨터게임학회

다중 에이전트 강화학습의 발전과 함께 게임 분야에서 강화학습을 레벨 디자인에 적용하려는 연구가 계속되 고 있다. 플랫폼의 형태가 레벨 디자인의 중요한 요소임에도 불구하고 지금까지의 연구들은 플레이어의 스킬 수준이나, 스킬 구성 등 플레이어의 매트릭에 초첨을 맞춰 강화학습을 활용하였다. 따라서 본 논문에서는 레 벨 디자인에 플랫폼의 형태가 사용될 수 있도록 시각 센서의 가시성과 구조물의 복잡성을 고려하여 플랫폼 이 플레이 경험에 미치는 영향을 연구한다. 이를 위해Unity ML-Agents Toolkit과MA-POCA 알고리즘, Self-play 방식을 기반으로2vs2 대전 슈팅 게임 환경을 개발하였으며 다양한 플랫폼의 형태를 구성하였다. 분석을 통해 플랫폼의 형태에 따른 가시성과 복잡성의 차이가 승률 밸런스에는 크게 영향을 미치지 않으나 전체 에피소 드 수, 무승부 비율, Elo의 증가폭에 유의미한 영향을 미치는 것을 확인했다.

2.

2023.12 KCI 등재 구독 인증기관 무료, 개인회원 유료

A Player-Like StarCraft II AI for Enhanced Fun and Diversity using Reinforcement Learning

향상된 유닛 생산과 게임 플레이 경험을 위한 스타크래프트 II 강화학습 AI

Kyo Seoung KOO, Hanul Sung

한국컴퓨터게임학회 논문지 제36권 제4호 pp.31-36 한국컴퓨터게임학회

기존의 스타크래프트II 내장AI는 미리 정의한 행동 패턴을 따르기 때문에 사용자가 전략을 쉽게 파악할 수 있어 사용자의 흥미를 오랫동안 유지시키기 힘들다. 이를 해결하기 위해, 많은 강화학습 기반의 스타크래프 트II AI 연구가 진행되었다. 그러나 기존의 강화학습AI는 승률에만 중점을 두고 에이전트를 학습시킴으로써 소수의 유닛을 사용하거나 정형화 된 전략만을 사용하여 여전히 사용자들이 게임의 재미를 느끼기에 한계가 존재한다. 본 논문에서는 게임의 재미를 향상시키기 위하여, 강화학습을 활용하여 실제 플레이어와 유사한 AI을 제안한다. 에이전트에게 스타크래프트II의 상성표를 학습시키고, 정찰한 정보로 보상을 부여해 유동적 으로 전략을 변경하도록 한다. 실험 결과, 사용자가 느끼는 재미와 난이도, 유사도 부분에서 고정된 전략을 사용하는 에이전트보다 본 논문에서 제안하는 에이전트가 더 높은 평가를 받았다..

4,000원

3.

2023.12 KCI 등재 구독 인증기관 무료, 개인회원 유료

이종 병렬설비에서 총납기지연 최소화를 위한 강화학습 기반 일정계획 알고리즘

Scheduling Algorithm, Based on Reinforcement Learning for Minimizing Total Tardiness in Unrelated Parallel Machines

이태희, 김재곤, 유우식

대한안전경영과학회지 제25권 제4호 pp.131-140 대한안전경영과학회

This paper proposes an algorithm for the Unrelated Parallel Machine Scheduling Problem(UPMSP) without setup times, aiming to minimize total tardiness. As an NP-hard problem, the UPMSP is hard to get an optimal solution. Consequently, practical scenarios are solved by relying on operator's experiences or simple heuristic approaches. The proposed algorithm has adapted two methods: a policy network method, based on Transformer to compute the correlation between individual jobs and machines, and another method to train the network with a reinforcement learning algorithm based on the REINFORCE with Baseline algorithm. The proposed algorithm was evaluated on randomly generated problems and the results were compared with those obtained using CPLEX, as well as three scheduling algorithms. This paper confirms that the proposed algorithm outperforms the comparison algorithms, as evidenced by the test results.

4,000원

4.

2023.12 구독 인증기관·개인회원 무료

픽업 대기 시간의 감소를 위한 강화 학습 기반의 승차 공유 서비스의 매칭 및 요금 책정 알고리즘

Matching and Pricing Algorithm in a Ride-hailing Service Based on Reinforcement Learning to Reduce Pickup Waiting Time

정다운

한국도로학회지 제25권 제4호 pp.117-118 한국도로학회

5.

2023.12 구독 인증기관·개인회원 무료

강화학습을 활용한 도시부 도로의 Speed Zoning Model 개발 연구

Development of a Speed Zoning Model on Urban Roads Using Reinforcement Learning

강민지

한국도로학회지 제25권 제4호 pp.115-116 한국도로학회

6.

2023.11 구독 인증기관·개인회원 무료

강화학습기반 인명안전피난 알고리즘 개발 가능성

Possibility of developing a reinforcement learning-based human safety evacuation algorithm

황광일, 김별

해양환경안전학회 학술대회 논문집 2023년도 추계학술발표회 p.82 해양환경안전학회

7.

2023.11 구독 인증기관·개인회원 무료

Development of Method for Deriving Weak Diversion Scenario in Nuclear Material Accountancy (NMA) System Based on Reinforcement Learning

Byung Hee Won

한국방사성폐기물학회 학술논문요약집 2023 추계학술논문요약집 p.43 한국방사성폐기물학회

Nuclear Material Accountancy (NMA) system quantitatively evaluates whether nuclear material is diverted or not. Material balance is evaluated based on nuclear material measurements based on this system and these processes are based on statistical techniques. Therefore, it is possible to evaluate the performance based on modeling and simulation technique from the development stage. In the performance evaluation, several diversion scenarios are established, nuclear material diversion is attempted in a virtual simulation environment according to these scenarios, and the detection probability is evaluated. Therefore, one of the important things is to derive vulnerable diversion scenario in advance. However, in actual facilities, it is not easy to manually derive weak scenario because there are numerous factors that affect detection performance. In this study, reinforcement learning has been applied to automatically derive vulnerable diversion scenarios from virtual NMA system. Reinforcement learning trains agents to take optimal actions in a virtual environment, and based on this, it is possible to develop an agent that attempt to divert nuclear materials according to optimal weak scenario in the NMA system. A somewhat simple NMA system model has been considered to confirm the applicability of reinforcement learning in this study. The simple model performs 10 consecutive material balance evaluations per year and has the characteristic of increasing MUF uncertainty according to balance period. The expected vulnerable diversion scenario is a case where the amount of diverted nuclear material increases in proportion to the size of the MUF uncertainty, and total amount of diverted nuclear material was assumed to be 8 kg, which corresponds to one significant quantity of plutonium. Virtual NMA system model (environment) and a divertor (agent) attempting to divert nuclear material were modeled to apply reinforcement learning. The agent is designed to receive a negative reward if an action attempting to divert is detected by the NMA system. Reinforcement learning automatically trains the agent to receive the maximum reward, and through this, the weakest diversion scenario can be derived. As a result of the study, it was confirmed that the agent was trained to attempt to divert nuclear material in a direction with a low detection probability in this system model. Through these results, it is found that it was possible to sufficiently derive weak scenarios based on reinforcement learning. This technique considered in this study can suggest methods to derive and supplement weak diversion scenarios in NMA system in advance. However, in order to apply this technology smoothly, there are still issues to be solved, and further research will be needed in the future.

8.

2023.10 구독 인증기관·개인회원 무료

GPR 영상을 활용한 콘크리트 기반시설 철근 탐지 훈련 데이터의 품질 보증을 위한 딥러닝 기반 프로그램 개발

Development of Deep Learning-based Program for Quality Assurance of Training Data for Reinforcement Bar Detection on Concrete Infrastructures Using GPR Images

엘립세 카를로, 김영민, 신재철, 백유진

한국도로학회 학술대회 발표논문 초록집 2023년도 제23회 한국도로학회 학술대회 발표논문 초록집 pp.121-122 한국도로학회

9.

2023.06 KCI 등재 구독 인증기관 무료, 개인회원 유료

다중 에이전트 강화학습을 이용한 RC보 최적설계 기술개발

Development of Optimal Design Technique of RC Beam using Multi-Agent Reinforcement Learning

강주원, 김현수

한국공간구조학회지 제23권 제2호 pp.29-36 한국공간구조학회

Reinforcement learning (RL) is widely applied to various engineering fields. Especially, RL has shown successful performance for control problems, such as vehicles, robotics, and active structural control system. However, little research on application of RL to optimal structural design has conducted to date. In this study, the possibility of application of RL to structural design of reinforced concrete (RC) beam was investigated. The example of RC beam structural design problem introduced in previous study was used for comparative study. Deep q-network (DQN) is a famous RL algorithm presenting good performance in the discrete action space and thus it was used in this study. The action of DQN agent is required to represent design variables of RC beam. However, the number of design variables of RC beam is too many to represent by the action of conventional DQN. To solve this problem, multi-agent DQN was used in this study. For more effective reinforcement learning process, DDQN (Double Q-Learning) that is an advanced version of a conventional DQN was employed. The multi-agent of DDQN was trained for optimal structural design of RC beam to satisfy American Concrete Institute (318) without any hand-labeled dataset. Five agents of DDQN provides actions for beam with, beam depth, main rebar size, number of main rebar, and shear stirrup size, respectively. Five agents of DDQN were trained for 10,000 episodes and the performance of the multi-agent of DDQN was evaluated with 100 test design cases. This study shows that the multi-agent DDQN algorithm can provide successfully structural design results of RC beam.

4,000원

10.

2023.04 구독 인증기관·개인회원 무료

역강화학습을 활용한 자율운항선박의 지역경로 생성 방안

Local Path Planning of Maritime Autonomous Surface Ships using Inverse Reinforcement Learning

이재용, 남궁호, 김주성

해양환경안전학회 학술대회 논문집 2023년도 공동학술대회 해양환경안전학회 춘계학술발표회 p.32 해양환경안전학회

11.

2022.12 KCI 등재 구독 인증기관 무료, 개인회원 유료

Reinforcement Learning-based Dynamic Weapon Assignment to Multi-Caliber Long-Range Artillery Attacks

다종 장사정포 공격에 대한 강화학습 기반의 동적 무기할당

Hyeonho Kim, Jung Hun Kim, Joohoe Kong, Ji Hoon Kyung

한국산업경영시스템학회지 Vol. 45 No. 4 pp.42-52 한국산업경영시스템학회

North Korea continues to upgrade and display its long-range rocket launchers to emphasize its military strength. Recently Republic of Korea kicked off the development of anti-artillery interception system similar to Israel’s “Iron Dome”, designed to protect against North Korea’s arsenal of long-range rockets. The system may not work smoothly without the function assigning interceptors to incoming various-caliber artillery rockets. We view the assignment task as a dynamic weapon target assignment (DWTA) problem. DWTA is a multistage decision process in which decision in a stage affects decision processes and its results in the subsequent stages. We represent the DWTA problem as a Markov decision process (MDP). Distance from Seoul to North Korea’s multiple rocket launchers positioned near the border, limits the processing time of the model solver within only a few second. It is impossible to compute the exact optimal solution within the allowed time interval due to the curse of dimensionality inherently in MDP model of practical DWTA problem. We apply two reinforcement-based algorithms to get the approximate solution of the MDP model within the time limit. To check the quality of the approximate solution, we adopt Shoot-Shoot-Look(SSL) policy as a baseline. Simulation results showed that both algorithms provide better solution than the solution from the baseline strategy.

4,200원

12.

2022.10 구독 인증기관·개인회원 무료

Application of deep reinforcement learning to major solar flare forecast

Kangwoo Yi, Yong-Jae Moon

천문학회보 제47권 2호 pp.55-56 한국천문학회

13.

2022.06 구독 인증기관·개인회원 무료

강화학습을 이용한 해양수색구조 의사결정지원 시스템 개발

Development of a Dicision Support System for Maritime Search and Rescue using Reinforcement Learning

최보라, 우동한, 임남균

해양환경안전학회 학술대회 논문집 2022년도 춘계학술발표회 p.122 해양환경안전학회

14.

2022.06 KCI 등재 구독 인증기관 무료, 개인회원 유료

스마트 제어알고리즘 개발을 위한 강화학습 리워드 설계

Reward Design of Reinforcement Learning for Development of Smart Control Algorithm

김현수, 윤기용

한국공간구조학회지 제22권 제2호 pp.39-46 한국공간구조학회

Recently, machine learning is widely used to solve optimization problems in various engineering fields. In this study, machine learning is applied to development of a control algorithm for a smart control device for reduction of seismic responses. For this purpose, Deep Q-network (DQN) out of reinforcement learning algorithms was employed to develop control algorithm. A single degree of freedom (SDOF) structure with a smart tuned mass damper (TMD) was used as an example structure. A smart TMD system was composed of MR (magnetorheological) damper instead of passive damper. Reward design of reinforcement learning mainly affects the control performance of the smart TMD. Various hyperparameters were investigated to optimize the control performance of DQN-based control algorithm. Usually, decrease of the time step for numerical simulation is desirable to increase the accuracy of simulation results. However, the numerical simulation results presented that decrease of the time step for reward calculation might decrease the control performance of DQN-based control algorithm. Therefore, a proper time step for reward calculation should be selected in a DQN training process.

4,000원

15.

2021.12 KCI 등재 구독 인증기관 무료, 개인회원 유료

AI기반 교량설계 프로세스 자동화를 위한 강화학습 알고리즘과 외부 해석프로그램 간 인터페이스 구축

Interface Establishment between Reinforcement Learning Algorithm and External Analysis Program for AI-based Automation of Bridge Design Process

김민수, 최상현

한국전산구조공학회 논문집 제34권 6호 pp.403-408 한국전산구조공학회

현재 교량과 같은 토목구조물의 설계프로세스는 1차 설계 후 구조 검토를 수행하여 기준에 부적합할 경우 재설계하는 과정을 반복 하여 최종적인 성과품을 만드는 것이 일반적이다. 이러한 반복 과정은 설계에 소요되는 기간을 연장시키는 원인이 되며, 보다 수준 높 은 설계를 위해 투입되어야 할 고급 엔지니어링 인력을 기계적인 단순 반복 작업에 소모하고 있다. 이러한 문제는 설계 과정 자동화를 통하여 해결할 수 있으나, 설계 과정에서 사용되는 해석프로그램은 이러한 자동화에 가장 큰 장애요인이 되어 왔다. 본 연구에서는 기 존 설계 과정 중 반복작업을 대체하고자 강화학습 알고리즘과 외부 해석프로그램을 함께 제어할 수 있는 인터페이스를 포함한 교량 설계 프로세스에 대한 AI기반 자동화 시스템을 구축하였다. 이 연구를 통하여 구축된 시스템의 프로토타입은 2경간 RC라멘교를 대 상으로 제작하였다. 개발된 인터페이스 체계는 향후 최신 AI 및 타 형식의 교량설계 간 연계를 위한 기초기술로써 활용될 수 있을 것 으로 판단된다..

4,000원

16.

2021.12 KCI 등재 구독 인증기관 무료, 개인회원 유료

지도학습과 강화학습을 이용한 준능동 중간층면진시스템의 최적설계

Optimal Design of Semi-Active Mid-Story Isolation System using Supervised Learning and Reinforcement Learning

강주원, 김현수

한국공간구조학회지 제21권 제4호 pp.73-80 한국공간구조학회

A mid-story isolation system was proposed for seismic response reduction of high-rise buildings and presented good control performance. Control performance of a mid-story isolation system was enhanced by introducing semi-active control devices into isolation systems. Seismic response reduction capacity of a semi-active mid-story isolation system mainly depends on effect of control algorithm. AI(Artificial Intelligence)-based control algorithm was developed for control of a semi-active mid-story isolation system in this study. For this research, an practical structure of Shiodome Sumitomo building in Japan which has a mid-story isolation system was used as an example structure. An MR (magnetorheological) damper was used to make a semi-active mid-story isolation system in example model. In numerical simulation, seismic response prediction model was generated by one of supervised learning model, i.e. an RNN (Recurrent Neural Network). Deep Q-network (DQN) out of reinforcement learning algorithms was employed to develop control algorithm The numerical simulation results presented that the DQN algorithm can effectively control a semi-active mid-story isolation system resulting in successful reduction of seismic responses.

4,000원

17.

2021.11 구독 인증기관·개인회원 무료

A study on the shooting trajectory for purse seine using Reinforcement Learning

Kyu-suk Choi, Se-na Baek, Chun-woo Lee

한국수산해양기술학회 학술발표대회 2021 KOFFST International Conference 2021 p.397 한국수산해양기술학회(구 한국어업기술학회)

18.

2021.06 KCI 등재 구독 인증기관 무료, 개인회원 유료

Experiment in Using Reinforcement Learning in Gaming: The Breakout Game Learning

Taresh Dewan, Aloukik Aditya, Manva Trivedi, Ao Chen, Danning Jiang, Sabah Mohammed

한국컴퓨터게임학회 논문지 제34권 제2호 pp.37-48 한국컴퓨터게임학회

In this study, we investigated whether a tool such as a game toy can be used as an augmented reality tool, and a system model that can be extended to a game element using wireless communication technology such as Bluetooth and a controllable module. This is an online ship type game using augmented reality technology and wireless communication technology. In addition, the existing game element was extended by applying a smartphone app control module. The existing game method uses the method of playing the game with only limited functions in the same space. This study expands to augmented reality-based games by implementing contents in a way that matches game objects with the grafting of augmented reality technology, and uses various items that emerge as the limit of reality. Therefore, we standardized the size of game objects so that they can be used three-dimension in all spaces on the screen according to the space arrangement such as overlapping prevention, distance, and height, and augmented reality technology was used to allow the game to be played by manipulation of a smartphone. In addition, we propose a system framework-based model that can be applied to various games, and a framework that can implement various augmented reality environments. The augmented reality-based battle game proposed in this study combines a knowledge-based augmented reality system that can be extended to game elements by modularizing the function of a toy through+ a context-aware agent based on context information and an intelligent DB based on domain knowledge.

4,300원

19.

2021.06 KCI 등재 구독 인증기관 무료, 개인회원 유료

스마트 TMD 제어를 위한 강화학습 알고리즘 성능 검토

Performance Evaluation of Reinforcement Learning Algorithm for Control of Smart TMD

강주원, 김현수

한국공간구조학회지 제21권 제2호 pp.41-48 한국공간구조학회

A smart tuned mass damper (TMD) is widely studied for seismic response reduction of various structures. Control algorithm is the most important factor for control performance of a smart TMD. This study used a Deep Deterministic Policy Gradient (DDPG) among reinforcement learning techniques to develop a control algorithm for a smart TMD. A magnetorheological (MR) damper was used to make the smart TMD. A single mass model with the smart TMD was employed to make a reinforcement learning environment. Time history analysis simulations of the example structure subject to artificial seismic load were performed in the reinforcement learning process. Critic of policy network and actor of value network for DDPG agent were constructed. The action of DDPG agent was selected as the command voltage sent to the MR damper. Reward for the DDPG action was calculated by using displacement and velocity responses of the main mass. Groundhook control algorithm was used as a comparative control algorithm. After 10,000 episode training of the DDPG agent model with proper hyper-parameters, the semi-active control algorithm for control of seismic responses of the example structure with the smart TMD was developed. The simulation results presented that the developed DDPG model can provide effective control algorithms for smart TMD for reduction of seismic responses.

4,000원

20.

2020.12 KCI 등재 구독 인증기관 무료, 개인회원 유료

Reinforcement learning with one-shot memory

원샷 러닝 메모리를 활용한 강화학습

Insub LEE, Jun Park

한국컴퓨터게임학회 논문지 제33권 제4호 pp.58-67 한국컴퓨터게임학회

게임을 포함한 가상환경 및 현실의 문제를 해결하기 위한 현대의 강화학습에서는 근사 함수로써 인공신경망을 사용한다. 하지만 이는 통계 기반이기 때문에 대량의 데이터가 필요해서 시뮬레이터가 없는 경우는 사용 및 적용에 애로가 있다. 이때문에 인공신경망은 아직 일상에서 자주 접할 수가 없는데, 대부분의 환경은 시뮬레이터를 만들기 힘들거나 데이터와 보상은 희소하기 때문이다. 이에 메모리 구조를 활용해서 적은 데이터와 희소한 보상을 가진 환경에서 빠른 학습을 할 수 있는 모델을 만들었다. 실험에서는 기존의 policy gradient와 메모리를 기반으로 open AI CartPole 문제에 도전했다. 이때 이득을 평가하는 함수인 Advantage function을 메모리구조를 변형하여 구현하였다. 이후 실험에 서 모델의 학습 시 편차가 커서 평균적으로는 저조한 성적을 보였다. 하지만 다른 알고리즘과의 학습 속도 비교를 통해 100회 이내의 작은 에피소드 내에서 상위 10개, 5개의 성적이 타 알고리즘들 보다 더 높은 점수를 획득한 것을 확인하였다. 결론적으로 연구를 통해 메모리구조를 사용하는 방법이 적은 데이터에 효과적일수 있다는 가능성을 발견했으며, 향후에는 학습의 편차를 줄이는 기술들에 대한 연구가 필요하다.

4,000원