현재 교량과 같은 토목구조물의 설계프로세스는 1차 설계 후 구조 검토를 수행하여 기준에 부적합할 경우 재설계하는 과정을 반복 하여 최종적인 성과품을 만드는 것이 일반적이다. 이러한 반복 과정은 설계에 소요되는 기간을 연장시키는 원인이 되며, 보다 수준 높 은 설계를 위해 투입되어야 할 고급 엔지니어링 인력을 기계적인 단순 반복 작업에 소모하고 있다. 이러한 문제는 설계 과정 자동화를 통하여 해결할 수 있으나, 설계 과정에서 사용되는 해석프로그램은 이러한 자동화에 가장 큰 장애요인이 되어 왔다. 본 연구에서는 기 존 설계 과정 중 반복작업을 대체하고자 강화학습 알고리즘과 외부 해석프로그램을 함께 제어할 수 있는 인터페이스를 포함한 교량 설계 프로세스에 대한 AI기반 자동화 시스템을 구축하였다. 이 연구를 통하여 구축된 시스템의 프로토타입은 2경간 RC라멘교를 대 상으로 제작하였다. 개발된 인터페이스 체계는 향후 최신 AI 및 타 형식의 교량설계 간 연계를 위한 기초기술로써 활용될 수 있을 것 으로 판단된다..
A mid-story isolation system was proposed for seismic response reduction of high-rise buildings and presented good control performance. Control performance of a mid-story isolation system was enhanced by introducing semi-active control devices into isolation systems. Seismic response reduction capacity of a semi-active mid-story isolation system mainly depends on effect of control algorithm. AI(Artificial Intelligence)-based control algorithm was developed for control of a semi-active mid-story isolation system in this study. For this research, an practical structure of Shiodome Sumitomo building in Japan which has a mid-story isolation system was used as an example structure. An MR (magnetorheological) damper was used to make a semi-active mid-story isolation system in example model. In numerical simulation, seismic response prediction model was generated by one of supervised learning model, i.e. an RNN (Recurrent Neural Network). Deep Q-network (DQN) out of reinforcement learning algorithms was employed to develop control algorithm The numerical simulation results presented that the DQN algorithm can effectively control a semi-active mid-story isolation system resulting in successful reduction of seismic responses.
In this study, we investigated whether a tool such as a game toy can be used as an augmented reality tool, and a system model that can be extended to a game element using wireless communication technology such as Bluetooth and a controllable module. This is an online ship type game using augmented reality technology and wireless communication technology. In addition, the existing game element was extended by applying a smartphone app control module. The existing game method uses the method of playing the game with only limited functions in the same space. This study expands to augmented reality-based games by implementing contents in a way that matches game objects with the grafting of augmented reality technology, and uses various items that emerge as the limit of reality. Therefore, we standardized the size of game objects so that they can be used three-dimension in all spaces on the screen according to the space arrangement such as overlapping prevention, distance, and height, and augmented reality technology was used to allow the game to be played by manipulation of a smartphone. In addition, we propose a system framework-based model that can be applied to various games, and a framework that can implement various augmented reality environments. The augmented reality-based battle game proposed in this study combines a knowledge-based augmented reality system that can be extended to game elements by modularizing the function of a toy through+ a context-aware agent based on context information and an intelligent DB based on domain knowledge.
A smart tuned mass damper (TMD) is widely studied for seismic response reduction of various structures. Control algorithm is the most important factor for control performance of a smart TMD. This study used a Deep Deterministic Policy Gradient (DDPG) among reinforcement learning techniques to develop a control algorithm for a smart TMD. A magnetorheological (MR) damper was used to make the smart TMD. A single mass model with the smart TMD was employed to make a reinforcement learning environment. Time history analysis simulations of the example structure subject to artificial seismic load were performed in the reinforcement learning process. Critic of policy network and actor of value network for DDPG agent were constructed. The action of DDPG agent was selected as the command voltage sent to the MR damper. Reward for the DDPG action was calculated by using displacement and velocity responses of the main mass. Groundhook control algorithm was used as a comparative control algorithm. After 10,000 episode training of the DDPG agent model with proper hyper-parameters, the semi-active control algorithm for control of seismic responses of the example structure with the smart TMD was developed. The simulation results presented that the developed DDPG model can provide effective control algorithms for smart TMD for reduction of seismic responses.
게임을 포함한 가상환경 및 현실의 문제를 해결하기 위한 현대의 강화학습에서는 근사 함수로써 인공신경망을 사용한다. 하지만 이는 통계 기반이기 때문에 대량의 데이터가 필요해서 시뮬레이터가 없는 경우는 사용 및 적용에 애로가 있다. 이때문에 인공신경망은 아직 일상에서 자주 접할 수가 없는데, 대부분의 환경은 시뮬레이터를 만들기 힘들거나 데이터와 보상은 희소하기 때문이다. 이에 메모리 구조를 활용해서 적은 데이터와 희소한 보상을 가진 환경에서 빠른 학습을 할 수 있는 모델을 만들었다. 실험에서는 기존의 policy gradient와 메모리를 기반으로 open AI CartPole 문제에 도전했다. 이때 이득을 평가하는 함수인 Advantage function을 메모리구조를 변형하여 구현하였다. 이후 실험에 서 모델의 학습 시 편차가 커서 평균적으로는 저조한 성적을 보였다. 하지만 다른 알고리즘과의 학습 속도 비교를 통해 100회 이내의 작은 에피소드 내에서 상위 10개, 5개의 성적이 타 알고리즘들 보다 더 높은 점수를 획득한 것을 확인하였다. 결론적으로 연구를 통해 메모리구조를 사용하는 방법이 적은 데이터에 효과적일수 있다는 가능성을 발견했으며, 향후에는 학습의 편차를 줄이는 기술들에 대한 연구가 필요하다.
최근, 여러 분야에서의 AI가 빠르게 성장하였고 게임에서도 큰 발전이 있었다. 게임 AI에 대한 접근 방법은 여러 가지가 있다. 먼저 지도 학습 기반 접근 방법은 게임 플레이 데이터에서 학습하고, 플레이 행동을 흉내 낸다. 그러나, 지도 학습 기반 접근 방법은 입력 자질을 선형 조합하므로, 복잡한 문제에는 성능 향상에 한계가 있다. 선형 조합에 따른 성능 한계를 개선하기 위해, 딥 뉴럴 네트워크 기반 접근방법은 지역적 특성 및 전역적 특성을 개별적으로 각각 표현하기 위해 둘 이상의 뉴럴 네트워크를 사용한다. 그러나 딥 뉴럴 네트워크 기반 접근방법은 충분한 학습 집합이 필요하다. 학습 집합을 준비해야 하는 부담을 줄이기 위해서, 강화 학습 기반 접근 방식은 Agent가 먼저 Action을 하고 이에 따른 보상을 분석하여 학습한다. 즉, 이 접근방법은 Agent가 최대 보상을 받도록 학습한다. 본 논문에서는 강화 학습을 통해 여러 게임에서 학습하는 AI를 제안한다. 제안하는 AI 모델은 개별 게임에서 Local Agent가 플레이하고, 여러 Local Agent에서 Global Agent를 학습한다. 실험 결과, 한 게임에서 학습한 Agent는 학습했던 게임에서는 우수한 성능을 보여주었지만, 새로운 게임에서는 성능이 떨어졌다. 반면에, 두 게임에서 학습한 제안하는 Agent는 학습한 게임과 새로운 게임 모두에서 잘 적응했다.
본 논문은 보편적으로 복잡한 문제로 정의되던 테트리스 게임을 강화학습을 통해 해결하기 위한 아키텍처를 구현하였다. 테트리스 게임은 무작위로 나타나는 블록의 모양과 회전의 형태를 고려해서 블록을 최적의 위치에 신속하게 쌓아야 하므로, actor의 빠른 판단능력과 반응속도를 요구한다. 또한, 다양한 블록의 형태와 순서로 인해 매우 많은 경우의 수가 나타나기 때문에 수행의 주체가 사람이라면 단순히 기억력과 암기에 의존하는 방법으로는 수행에 한계가 있다. 따라서 본 연구에서 구현한 강화학습 아키텍처의 경우 학습 모델의 구현뿐 아니라 의사결정의 정확성을 높일 수 있는 휴리스틱을 보상에 가중치로 활용하는 방식으로 접목하였고, 그 결과 사람이 직접 게임을 수행하는 것에 비해 보편적으로 높은 점수를 얻을 수 있었다. 아직은 해당 분야를 완전히 정복하였다고 표현할 수는 없지만, 여러 번의 실험에서도 일반적인 사람에 비해서 더욱 좋은 플레이를 할 수 있었다. 하지만 성능에 가장 큰 영향을 미치는 요소가 학습 모델보다 휴리스틱에서 비롯되고 있다는 단점도 식별하였다. 이에 본 논문에서는 이러한 아키텍처의 구조와 사용한 기술들과 알고리즘에 대해 자세히 기술하였으며 접근 방향을 제시한다.
Nuclear Material Accountancy (NMA) system quantitatively evaluates whether nuclear material is diverted or not. Material balance is evaluated based on nuclear material measurements based on this system and these processes are based on statistical techniques. Therefore, it is possible to evaluate the performance based on modeling and simulation technique from the development stage. In the performance evaluation, several diversion scenarios are established, nuclear material diversion is attempted in a virtual simulation environment according to these scenarios, and the detection probability is evaluated. Therefore, one of the important things is to derive vulnerable diversion scenario in advance. However, in actual facilities, it is not easy to manually derive weak scenario because there are numerous factors that affect detection performance. In this study, reinforcement learning has been applied to automatically derive vulnerable diversion scenarios from virtual NMA system. Reinforcement learning trains agents to take optimal actions in a virtual environment, and based on this, it is possible to develop an agent that attempt to divert nuclear materials according to optimal weak scenario in the NMA system. A somewhat simple NMA system model has been considered to confirm the applicability of reinforcement learning in this study. The simple model performs 10 consecutive material balance evaluations per year and has the characteristic of increasing MUF uncertainty according to balance period. The expected vulnerable diversion scenario is a case where the amount of diverted nuclear material increases in proportion to the size of the MUF uncertainty, and total amount of diverted nuclear material was assumed to be 8 kg, which corresponds to one significant quantity of plutonium. Virtual NMA system model (environment) and a divertor (agent) attempting to divert nuclear material were modeled to apply reinforcement learning. The agent is designed to receive a negative reward if an action attempting to divert is detected by the NMA system. Reinforcement learning automatically trains the agent to receive the maximum reward, and through this, the weakest diversion scenario can be derived. As a result of the study, it was confirmed that the agent was trained to attempt to divert nuclear material in a direction with a low detection probability in this system model. Through these results, it is found that it was possible to sufficiently derive weak scenarios based on reinforcement learning. This technique considered in this study can suggest methods to derive and supplement weak diversion scenarios in NMA system in advance. However, in order to apply this technology smoothly, there are still issues to be solved, and further research will be needed in the future.
In this paper, we present a learning platform for robotic grasping in real world, in which actor-critic deep reinforcement learning is employed to directly learn the grasping skill from raw image pixels and rarely observed rewards. This is a challenging task because existing algorithms based on deep reinforcement learning require an extensive number of training data or massive computational cost so that they cannot be affordable in real world settings. To address this problems, the proposed learning platform basically consists of two training phases; a learning phase in simulator and subsequent learning in real world. Here, main processing blocks in the platform are extraction of latent vector based on state representation learning and disentanglement of a raw image, generation of adapted synthetic image using generative adversarial networks, and object detection and arm segmentation for the disentanglement. We demonstrate the effectiveness of this approach in a real environment.
Robots are used in various industrial sites, but traditional methods of operating a robot are limited at some kind of tasks. In order for a robot to accomplish a task, it is needed to find and solve accurate formula between a robot and environment and that is complicated work. Accordingly, reinforcement learning of robots is actively studied to overcome this difficulties. This study describes the process and results of learning and solving which applied reinforcement learning. The mission that the robot is going to learn is bottle flipping. Bottle flipping is an activity that involves throwing a plastic bottle in an attempt to land it upright on its bottom. Complexity of movement of liquid in the bottle when it thrown in the air, makes this task difficult to solve in traditional ways. Reinforcement learning process makes it easier. After 3-DOF robotic arm being instructed how to throwing the bottle, the robot find the better motion that make successful with the task. Two reward functions are designed and compared the result of learning. Finite difference method is used to obtain policy gradient. This paper focuses on the process of designing an efficient reward function to improve bottle flipping motion.
As the development of autonomous vehicles becomes realistic, many automobile manufacturers and components producers aim to develop ‘completely autonomous driving’. ADAS (Advanced Driver Assistance Systems) which has been applied in automobile recently, supports the driver in controlling lane maintenance, speed and direction in a single lane based on limited road environment. Although technologies of obstacles avoidance on the obstacle environment have been developed, they concentrates on simple obstacle avoidances, not considering the control of the actual vehicle in the real situation which makes drivers feel unsafe from the sudden change of the wheel and the speed of the vehicle. In order to develop the ‘completely autonomous driving’ automobile which perceives the surrounding environment by itself and operates, ability of the vehicle should be enhanced in a way human driver does. In this sense, this paper intends to establish a strategy with which autonomous vehicles behave human-friendly based on vehicle dynamics through the reinforcement learning that is based on Q-learning, a type of machine learning. The obstacle avoidance reinforcement learning proceeded in 5 simulations. The reward rule has been set in the experiment so that the car can learn by itself with recurring events, allowing the experiment to have the similar environment to the one when humans drive. Driving Simulator has been used to verify results of the reinforcement learning. The ultimate goal of this study is to enable autonomous vehicles avoid obstacles in a human-friendly way when obstacles appear in their sight, using controlling methods that have previously been learned in various conditions through the reinforcement learning.
보드게임은 많은 수의 말들과 상태공간을 갖고 있다. 그러므로 게임은 학습을 오래 하여야 한다. 그러나 강화학습은 학습초기에 학습속도가 느려지는 단점이 있다. 그러므로 학습 도중에 동일한 최선 값이 있을 때, 영향력분포도를 고려한 문제 영역 지식을 활용한 휴리스틱을 사용해 학습의 속도 향상을 시도하였다. 기존 구현된 말과 개선 구현된 말을 비교하기 위해 보드게임을 제작하였다. 그래서 일방공격형 말과 승부를 하게 하였다. 실험결과 개선 구현된 말의 성능이 학습속도 측면에서 향상됨을 알 수 있었다.
This paper describes a program which learns good strategies for two-poison, deterministic, zero-sum board games of perfect information. The program learns by simply playing the game against either a human or computer opponent. The results of the program's teaming of a lot of games are reported. The program consists of search kernel and a move generator module. Only the move generator is modified to reflect the rules of the game to be played. The kernel uses a temporal difference procedure combined with a backpropagation neural network to team good evaluation functions for the game being played. Central to the performance of the program is the search procedure. This is a the capture tree search used in most successful janggi playing programs. It is based on the idea of using search to correct errors in evaluations of positions. This procedure is described, analyzed, tested, and implemented in the game-teaming program. Both the test results and the performance of the program confirm the results of the analysis which indicate that search improves game playing performance for sufficiently accurate evaluation functions.