OpenAI introduced the gpt-oss-safeguard model series (gpt-oss-safeguard-20B and -120B) as open-weight reasoning engines tailored for safety and trust-and-safety classification tasks.
Developers supply their own policy text at runtime and the model reasons over input accordingly, classifies conversation elements (user messages, completions, full chats) and emits chain-of-thought explanations of how decisions are made.
OpenAI positions them as alternatives to rigid classifiers: they permit iterative policy changes without retraining. Limitations noted include higher compute/latency and that traditional classifiers may still win in ultra-high precision contexts.




