Written examination for driver’s license certification plays a critical role in promoting road safety by assessing the applicants' understanding of traffic laws and safe driving practices. However, concerns have emerged regarding structural biases in multiple-choice question (MCQ) formats, such as disproportionate answer placement and leading linguistic cues, which may allow test-takers to guess the correct answers without substantive legal knowledge. To address these problems, this paper proposes a prompt-driven evaluation framework that integrates structural item analysis with response simulations using a large language model (LLM). First, we conducted a quantitative analysis of 1,000 items to assess formal biases in the answer positions and option lengths. Subsequently, GPT-based simulations were performed under four distinct prompt conditions: (1) safety-oriented reasoning without access to legal knowledge, (2) safety-oriented reasoning with random choices for knowledge-based questions, (3) performance-oriented reasoning using all available knowledge, and (4) a random-guessing baseline model to simulate non-inferential choice behavior. The results revealed notable variations in item difficulty and prompt sensitivity, particularly when safety-related keywords influence answer selection, irrespective of legal accuracy. The proposed framework enables a pretest diagnosis of potential biases in the MCQ design and provides a practical tool for enhancing the fairness and validity of traffic law assessments. By improving the quality control of item banks, this approach contributes to the development of more reliable knowledge-based testing systems that better support public road safety.