This paper investigates the cognitive processes involved in English word recognition among young EFL learners using eye-tracking methodology. A quasi-experimental mixed method design was used to investigate how young L2 learners engage with basic words, with or without pictorial cues. A total of seventeen 6th-grade pupils from two schools participated in the experiment. The participants were presented with a list of 20 words and were asked to read them aloud while their eye movements were tracked to discern their viewing patterns. Immediately after the reading task, stimulated-recall interviews were conducted to triangulate and validate the participants’ viewing behaviors. Results indicate that participants focused significantly more on the text than the accompanying pictures yet demonstrated better performance in recognizing and reading the words presented in a picture-based mode. Some participants reported that the pictures were not viewed because the words were easy to read. In contrast, others struggled to read certain words due to an over-reliance on their background knowledge, which sometimes led to misinterpretation. These results emphasize the importance of integrating visual cues with word recognition instruction in early language learning contexts, highlighting when and how these cues should be utilized effectively.