Filter PII from Training Data to Build General Knowledge

OpenAI trains ChatGPT models on publicly available internet content (e.g., forum posts, blogs), partnership data, and opt-in user/contractor/researcher inputs to develop broad world knowledge for reliable responses. Only freely accessible public data is used—no paywalled or private content. Before training, apply OpenAI Privacy Filter at multiple stages to identify and mask personal information like names or addresses in public datasets and opted-in user conversations. This tool outperforms other PII removal methods in evaluations, enabling models to learn general patterns without memorizing individual details. Developers can access it free on Hugging Face to integrate into their workflows.

Opt-Out Controls Prevent Conversation Use in Training

Disable "Improve the model for everyone" in Settings > Data Controls to exclude new conversations from training (they stay in history). Use Temporary Chat mode (top-right button) for sessions not saved to history, without memories, and auto-deleted after 30 days (retained briefly for safety). Turn off Memory feature entirely to prevent saving references to personal details like projects or contacts—review, edit, or delete any saved items. Export data, delete account, or submit privacy requests via privacy.openai.com. Avoid sharing sensitive info, as it's not recommended for review or training.

Block Sensitive Outputs and Handle Errors

ChatGPT rejects requests for private individual data, prioritizing general knowledge over specifics. If erroneous personal info appears in responses, affected users submit removal requests through the privacy portal. Balancing privacy with safety, OpenAI detects violence threats while upholding filters, committing to clearer controls and stronger safeguards as models advance.