This paper proposes an algorithm for the Unrelated Parallel Machine Scheduling Problem(UPMSP) without setup times, aiming to minimize total tardiness. As an NP-hard problem, the UPMSP is hard to get an optimal solution. Consequently, practical scenarios are solved by relying on operator's experiences or simple heuristic approaches. The proposed algorithm has adapted two methods: a policy network method, based on Transformer to compute the correlation between individual jobs and machines, and another method to train the network with a reinforcement learning algorithm based on the REINFORCE with Baseline algorithm. The proposed algorithm was evaluated on randomly generated problems and the results were compared with those obtained using CPLEX, as well as three scheduling algorithms. This paper confirms that the proposed algorithm outperforms the comparison algorithms, as evidenced by the test results.