With the recent surge in YouTube usage, there has been a proliferation of user-generated videos where individuals evaluate cosmetics. Consequently, many companies are increasingly utilizing evaluation videos for their product marketing and market research. However, a notable drawback is the manual classification of these product review videos incurring significant costs and time. Therefore, this paper proposes a deep learning-based cosmetics search algorithm to automate this task. The algorithm consists of two networks: One for detecting candidates in images using shape features such as circles, rectangles, etc and Another for filtering and categorizing these candidates. The reason for choosing a Two-Stage architecture over One-Stage is that, in videos containing background scenes, it is more robust to first detect cosmetic candidates before classifying them as specific objects. Although Two-Stage structures are generally known to outperform One-Stage structures in terms of model architecture, this study opts for Two-Stage to address issues related to the acquisition of training and validation data that arise when using One-Stage. Acquiring data for the algorithm that detects cosmetic candidates based on shape and the algorithm that classifies candidates into specific objects is cost-effective, ensuring the overall robustness of the algorithm.
본 연구에서는 게임 영상과 같은 생성된 영상으로부터 물체를 인식하는 심층 학습 기반 모델의 성능을 향상 시키는 방법을 제시한다. 특히, 실제 영상으로 훈련된 물체 인식 모델에 대해서 게임 영상으로 추가 훈련을 수행함으로써 물체 인식 성능이 향상됨을 검증한다. 본 연구에서는 심층 학습 기반의 물체 인식 모델들 중에서 가장 널리 사용되는 YoloV2 모델을 이용한다. 이 모델에 대해서 8 종류의 다양한 게임에서 샘플링한 160장의 게임 영상을 적용해서 물체 인식 모델을 다시 훈련하고, IoU와 정확도를 측정해서 본 연구에서 주장하는 게임 영상을 이용한 훈련이 효과적임을 입증한다.
In this paper, we present auto-annotation tool and synthetic dataset using 3D CAD model for deep learning based object detection. To be used as training data for deep learning methods, class, segmentation, bounding-box, contour, and pose annotations of the object are needed. We propose an automated annotation tool and synthetic image generation. Our resulting synthetic dataset reflects occlusion between objects and applicable for both underwater and in-air environments. To verify our synthetic dataset, we use MASK R-CNN as a state-of-the-art method among object detection model using deep learning. For experiment, we make the experimental environment reflecting the actual underwater environment. We show that object detection model trained via our dataset show significantly accurate results and robustness for the underwater environment. Lastly, we verify that our synthetic dataset is suitable for deep learning model for the underwater environments.
A robot usually adopts ANN (artificial neural network)-based object detection and instance segmentation algorithms to recognize objects but creating datasets for these algorithms requires high labeling costs because the dataset should be manually labeled. In order to lower the labeling cost, a new scheme is proposed that can automatically generate a training images and label them for specific objects. This scheme uses an instance segmentation algorithm trained to give the masks of unknown objects, so that they can be obtained in a simple environment. The RGB images of objects can be obtained by using these masks, and it is necessary to label the classes of objects through a human supervision. After obtaining object images, they are synthesized with various background images to create new images. Labeling the synthesized images is performed automatically using the masks and previously input object classes. In addition, human intervention is further reduced by using the robot arm to collect object images. The experiments show that the performance of instance segmentation trained through the proposed method is equivalent to that of the real dataset and that the time required to generate the dataset can be significantly reduced.
Recently, smart factories have attracted much attention as a result of the 4th Industrial Revolution. Existing factory automation technologies are generally designed for simple repetition without using vision sensors. Even small object assemblies are still dependent on manual work. To satisfy the needs for replacing the existing system with new technology such as bin picking and visual servoing, precision and real-time application should be core. Therefore in our work we focused on the core elements by using deep learning algorithm to detect and classify the target object for real-time and analyzing the object features. We chose YOLO CNN which is capable of real-time working and combining the two tasks as mentioned above though there are lots of good deep learning algorithms such as Mask R-CNN and Fast R-CNN. Then through the line and inside features extracted from target object, we can obtain final outline and estimate object posture.
This paper presents a method of improving the pose recognition accuracy of objects by using Kinect sensor. First, by using the SURF algorithm, which is one of the most widely used local features point algorithms, we modify inner parameters of the algorithm for efficient object recognition. The proposed method is adjusting the distance between the box filter, modifying Hessian matrix, and eliminating improper key points. In the second, the object orientation is estimated based on the homography. Finally the novel approach of Auto-scaling method is proposed to improve accuracy of object pose estimation. The proposed algorithm is experimentally tested with objects in the plane and its effectiveness is validated.
This paper proposes an underwater localization algorithm using probabilistic object recognition. It is organized as follows; 1) recognizing artificial objects using imaging sonar, and 2) localizing the recognized objects and the vehicle using EKF(Extended Kalman Filter) based SLAM. For this purpose, we develop artificial landmarks to be recognized even under the unstable sonar images induced by noise. Moreover, a probabilistic recognition framework is proposed. In this way, the distance and bearing of the recognized artificial landmarks are acquired to perform the localization of the underwater vehicle. Using the recognized objects, EKF-based SLAM is carried out and results in a path of the underwater vehicle and the location of landmarks. The proposed localization algorithm is verified by experiments in a basin.
In recent years, the research of 3D mapping technique in urban environments obtained by mobile robots equipped with multiple sensors for recognizing the robot’s surroundings is being studied actively. However, the map generated by simple integration of multiple sensors data only gives spatial information to robots. To get a semantic knowledge to help an autonomous mobile robot from the map, the robot has to convert low-level map representations to higher-level ones containing semantic knowledge of a scene. Given a 3D point cloud of an urban scene, this research proposes a method to recognize the objects effectively using 3D graph model for autonomous mobile robots. The proposed method is decomposed into three steps: sequential range data acquisition, normal vector estimation and incremental graph-based segmentation. This method guarantees the both real-time performance and accuracy of recognizing the objects in real urban environments. Also, it can provide plentiful data for classifying the objects. To evaluate a performance of proposed method, computation time and recognition rate of objects are analyzed. Experimental results show that the proposed method has efficiently in understanding the semantic knowledge of an urban environment.
The camera has limitations of poor visibility in underwater environment due to the limited light source and medium noise of the environment. However, its usefulness in close range has been proved in many studies, especially for navigation. Thus, in this paper, vision-based object detection and tracking techniques using artificial objects for underwater robots have been studied. We employed template matching and mean shift algorithms for the object detection and tracking methods. Also, we propose the weighted correlation coefficient of adaptive threshold -based and color-region-aided approaches to enhance the object detection performance in various illumination conditions. The color information is incorporated into the template matched area and the features of the template are used to robustly calculate correlation coefficients. And the objects are recognized using multi-template matching approach. Finally, the water basin experiments have been conducted to demonstrate the performance of the proposed techniques using an underwater robot platform yShark made by KORDI.
This paper presents a new shape-based algorithm using affine category shape model for object category recognition and model learning. Affine category shape model is a graph of interconnected nodes whose geometric interactions are modeled using pairwise potentials. In its learning phase, it can efficiently handle large pose variations of objects in training images by estimating 2-D homography transformation between the model and the training images. Since the pairwise potentials are defined on only relative geometric relationship between features, the proposed matching algorithm is translation and in-plane rotation invariant and robust to affine transformation. We apply spectral matching algorithm to find feature correspondences, which are then used as initial correspondences for RANSAC algorithm. The 2-D homography transformation and the inlier correspondences which are consistent with this estimate can be efficiently estimated through RANSAC, and new correspondences also can be detected by using the estimated 2-D homography transformation. Experimental results on object category database show that the proposed algorithm is robust to pose variation of objects and provides good recognition performance.
스테레오 정합은 스테레오 시각 분야에서 가장 활발히 연구되는 분야이다. 본 논문에서는 물체의 위치 인식을 위한 유전 알고리즘을 이용한 스테레오 정합을 제안한다. 정합 환경을 최적화 문제로 간주하고 진화 전략을 이용하여 최적해를 탐색한다. 따라서, 유전 연산자는 스테레오 정합에 맞게 설계하였고 개체는 변위집단을 대표한다. 영상의 수평화소라인을 염색체로 간주하였다. 비용함수는 스테레오 정합에서 사용하는 일반적인 제약조건들의 조합이다. 비용함수가 명암도, 유사도, 변위 평활성으로 구성되었기 때문에 정합을 시도할 때 매 세대마다 이 모든 요소들을 한번에 다룬다. 염색체를 정의하기 위해 LoG연산자로 경계선을 추출하였으며 실험을 통하여 제안한 방법을 검증하였다.
This paper describes the recognition method of moving objects in mobile robot with an omnidirectional camera. The moving object is detected using the specific pattern of an optical flow in omnidirectional image. This paper consists of two parts. In the first part, the pattern of an optical flow is investigated in omnidirectional image. The optical flow in omnidirectional image is influenced on the geometry characteristic of an omnidirectional camera. The pattern of an optical flow is theoretically and experimentally investigated. In the second part, the detection of moving objects is presented from the estimated optical flow. The moving object is extracted through the relative evaluation of optical flows which is derived from the pattern of optical flow. In particular, Focus-Of-Expansion (FOE) and Focus-Of-Contraction (FOC) vectors are defined from the estimated optical flow. They are used as reference vectors for the relative evaluation of optical flows. The proposed algorithm is performed in four motions of a mobile robot such as straight forward, left turn, right turn and rotation. Experimental results using real movie show the effectiveness of the proposed method.
Robots need to understand as much as possible about their environmental situation and react appropriately to any event that provokes changes in their behavior. In this paper, we pay attention to topological relations between spatial objects and propose a model of robotic cognition that represents and infers temporal relations. Specifically, the proposed model extracts specified features of the co-occurrence matrix represents from disparity images of the stereo vision system. More importantly, a habituation model is used to infer intrinsic spatial relations between objects. A preliminary experimental investigation is carried out to verify the validity of the proposed method under real test condition.
Based on object recognition technology, we present a new global localization method for robot navigation. For doing this, we model any indoor environment using the following visual cues with a stereo camera; view-based image features for object recognition and those 3D positions for object pose estimation. Also, we use the depth information at the horizontal centerline in image where optical axis passes through, which is similar to the data of the 2D laser range finder. Therefore, we can build a hybrid local node for a topological map that is composed of an indoor environment metric map and an object location map. Based on such modeling, we suggest a coarse-to-fine strategy for estimating the global localization of a mobile robot. The coarse pose is obtained by means of object recognition and SVD based least-squares fitting, and then its refined pose is estimated with a particle filtering algorithm. With real experiments, we show that the proposed method can be an effective vision-based global localization algorithm.
In this paper, we present a practical palce and object recognition method for guiding visitors in building environments. Recognizing palces or objects in real world can be a difficult problem due to motion blur and camera noise. In this work, we present a modeling method based on the bidirectional interactionbetween places and objects for simulataneous reinforcement for the robust recognition. The unification of visual context including scene context, object context, and temporal context is also. The proposed system has been tested to guide visitors in a large scale building environment(10 topological places, 80 3D objects)
In this paper, we introduce visual contexts in terms of types and utilization methods for robust object recognition with intelligent mobile robots. One of the core technologies for intelligent robots is visual object recognition. Robust techniques are strongly required since there are many sources of visual variations such as geometric, photometric, and noise. For such requirements, we define spatial context, hierarchical context, and temporal context. According to object recognition domain, we can select such visual contextx. We also propose a unified framework which can utilize the whole contexts and validates it in real working environment. Finally, we also discuss the furture research directions of object recognition technologies for intelligent robots.