PURPOSES : This study aimed to predict the number of future COVID-19 confirmed cases more accurately using public and transportation big data and suggested priorities for introducing major policies by region. METHODS : Prediction analysis was performed using a long short-term memory (LSTM) model with excellent prediction accuracy for time-series data. Random forest (RF) classification analysis was used to derive regional priorities and major influencing factors. RESULTS : Based on the daily number of COVID-19 confirmed cases from January 26 to December 12, 2020, as well as the daily number of confirmed cases in Gyeonggi Province, which was expected to occur on December 24 and 25, depending on social distancing, the accuracy of the LSTM artificial neural network was approximately 95.8%. In addition, as a result of deriving the major influencing factors of COVID-19 through random forest classification analysis, according to the number of people, social distancing stages, and masks worn, Bucheon, Yongin, and Pyeongtaek were identified as regions expected to be at high risk in the future. CONCLUSIONS : The results of this study can help predict pandemics such as COVID-19.
본 연구는 코로나-19 확진자 수와 이에 영향을 미치는 요인들을 활용하여 지리가중회귀(Geographically Weighted Regression, GWR)모델과 다중스케일 지리가중회귀(Multi-scale Geographically Weighted Regression, MGWR)모델을 비교 분석 하는데 목적이 있다. 가장 먼저 선행 연구조사를 통해 코로나-19 확진자 수에 커다란 영향을 미치는 요인들을 선별하였다. 다음으로 최소제곱법(Ordinary Least Square, OLS), 분산팽창계수(Variation Inflation Factor, VIF)및 Local Moran’s I를 사용하여 코로나 -19 확진자 수와의 선형 및 국지적 공간자기상관관계를 탐색하였다. 특히 종전에 널리 사용되어왔던 GWR모델과 최근 새롭게 등장한 MGWR모델을 비교 분석하여 본 연구에 가장 적합한 지리가중회귀 모델을 결정하였다. MGWR은 변수 특성을 감안한 조정된 밴드대역폭을 사용하여 보다 정밀하게 변수들 간의 공간관계를 설명할 수 있다. 본 연구를 통해 얻은 결과물은 감염병 예방 및 예측에 필요로 하는 감염병역학조사지원시스템에 도움을 줄 수 있을 것으로 기대한다.