DeepSeek challenged this assumption by skipping SFT entirely, opting instead to rely on reinforcement learning (RL ... open projects produced by Meta, for example the Llama model, and ML library ...