Efficient and safe maritime navigation in complex and congested coastal regions requires advanced route optimization methods that surpass the limitations of traditional shortest-path algorithms. This study applies Deep Q-Network (DQN) and Proximal Policy Optimization (PPO) reinforcement learning (RL) algorithms to generate and refine optimal ship routes in East Asian waters, focusing on passages from Shanghai to Busan and Ulsan to Daesan. Operating within a grid-based representation of the marine environment and considering constraints such as restricted areas and Traffic Separation Schemes (TSS), both DQN and PPO learn policies prioritizing safety and operational efficiency. Comparative analyses with actual vessel routes demonstrate that RL-based methods yield shorter and safer paths. Among these methods, PPO outperforms DQN, providing more stable and coherent routes. Post-processing with the Douglas-Peucker (DP) algorithm further simplifies the paths for practical navigational use. The findings underscore the potential of RL in enhancing navigational safety, reducing travel distance, and advancing autonomous ship navigation technologies.