PANews|Oct 29, 2025 12:38
[OpenAI Launches Open-Source Safety Reasoning Model gpt-oss-safeguard, Supporting Policy-Driven Classification]
OpenAI today released the open-source safety reasoning model gpt-oss-safeguard (120b, 20b), allowing developers to provide custom policies for content classification during inference. The model outputs conclusions along with reasoning chains. This model is fine-tuned based on the open-weight gpt-oss, licensed under Apache 2.0, and is available for download on Hugging Face. Internal evaluations show that it outperforms gpt-5-thinking and gpt-oss in multi-policy accuracy, with external dataset performance close to Safety Reasoner. Limitations include: traditional classifiers still perform better in scenarios with a large amount of high-quality annotations, and inference requires significant time and computational resources. ROOST will establish a model community and release a technical report.
Share To
Timeline
HotFlash
APP
X
Telegram
CopyLink