The optimal design of steel plate girders has traditionally relied on meta-heuristic techniques, such as Genetic Algorithms (GA), to handle discrete design variables and complex non-linear constraints, including shear buckling and section classification. However, these methods suffer from high computational costs as they require repetitive re-optimization for every new load condition. To address this limitation, this study proposes a highly efficient Sequential Multi-Agent Reinforcement Learning (MARL) framework based on the Agent-Environment Cycle (AEC) architecture. Unlike parallel one-shot approaches, the proposed model effectively learns the dependencies between design variables by determining them sequentially. Furthermore, to maximize cost efficiency during the inference phase, we introduce an Adaptive Inference Chain combined with a deterministic DCR-based Shrink-Refine algorithm. Experimental results on 100 diverse load cases demonstrate that the proposed method achieves an average cost reduction of 8.2% compared to the GA baseline while maintaining 100% feasibility. With an inference time reduced to approximately 76 ms, the model demonstrates significant potential for real-time automated design. Additionally, an in-depth analysis of cases where the Demand-Capacity Ratio (DCR) fell short of the target clarifies the exploration limits within the discrete design space and validates the robustness of the algorithm.