This study proposes a mobile-based lightweight deep learning model (Lite-MCC) capable of reconstructing three-dimensional (3D) spatial structures from a single RGB image. Conventional 3D reconstruction models require multi-view inputs or point cloud data and depend on large-scale computational resources, which limits their real-time applicability in practical environments. To address this limitation, the proposed Lite-MCC model simplifies the existing Multiview Compressive Coding (MCC) architecture, enabling accurate 3D reconstruction using only a single image. The model adopts a parallel structure consisting of a Vision Transformer (ViT-Tiny) and a Geometry Encoder to extract visual and spatial features simultaneously, while a Transformer Decoder generates the corresponding 3D point cloud. Furthermore, depth map–based input transformation and ONNX-based optimization are employed to achieve efficient real-time inference on edge devices. Experimental results on the CO3D dataset demonstrate that Lite-MCC reduces computational cost by 87% and memory usage by 65%, while maintaining a Chamfer Distance of 0.045, comparable to the original MCC model. These results indicate that the proposed method provides a promising direction for lightweight AI models enabling low-cost, real-time 3D recording and visualization.