GPT-4 is not trained with API customer data
4 mins read

GPT-4 is not trained with API customer data

In a marked departure from its previous practices, OpenAI has announced that it will no longer use customer data sent through its APIs to train its rich language models such as GPT-4.

The change was recently confirmed by Sam Altman, the CEO of OpenAI, in an interview with CNBC.

OpenAI’s new approach to user data

OpenAI’s policy change was implemented on March 1, 2023, when the company quietly updated its Terms of Service to reflect this new commitment to protecting user privacy.

Altman clarified, “Customers clearly don’t want us to train on their data, so we’ve changed our plans: we won’t do that.”

APIs, or application programming interfaces, are technological frameworks that allow customers to connect directly to OpenAI’s software.

Altman explained that OpenAI hasn’t used API data for model training “for a while,” suggesting that this official announcement is formalizing an existing practice.

Impact on Business Customers

OpenAI’s move has far-reaching implications, particularly for its business customers, which include giants like Microsoft, Salesforce, and Snapchat.

These companies are more likely to use OpenAI’s API capabilities to run their operations, so the privacy shift is particularly relevant to them.

However, the new data protection measures only apply to customers using the company’s API services. OpenAI’s updated Terms of Service states: “We may use content from services other than our API.”

Therefore, other forms of data entry, such as text entered into the popular chatbot ChatGPT, can still be used by OpenAI unless the data is shared via the API.

Broader industry impact

OpenAI’s policy shift comes as the industry grapples with the potential impact of large language models like OpenAI’s ChatGPT replacing traditional human-made material.

For example, the Writers Guild of America recently went on strike after negotiations between the guild and the film studios collapsed. The guild had advocated restrictions on using OpenAI’s ChatGPT to generate or rewrite scripts.

OpenAI’s decision not to use customer data for training marks a pivotal moment in the ongoing conversation about privacy and AI. As companies continue to explore and push the boundaries of AI technology, ensuring user privacy and maintaining trust will likely continue to be at the heart of these discussions.

The Evolution of ChatGPT: GPT-3 to GPT-4

It’s important to note that OpenAI’s commitment not to use customer data for training applies to its latest language model, GPT-4, released on March 14, 2023.

GPT-4 introduced several improvements over its predecessor GPT-3, including a significant increase in the word limit (25,000 compared to ChatGPT’s 3,000 word limit), a larger context window size, and improved reasoning and comprehension skills.

Another notable feature of GPT-4 is its multimodality, or ability to understand and derive information from images in addition to text. This latest model generates more human-like text and uses features like emojis for a more personal feel.

However, the exact size and architecture of GPT-4 remain secret, leading to speculation about the details of the model.

Despite these rumours, OpenAI’s CEO has denied specific claims about the size of the model.

In terms of performance, GPT-4 has shown strengths in text generation but also some limitations. For example, it scored in the 54th percentile on the Graduate Record Examination (GRE) Writing and in the 43rd to 59th percentile on the AP Calculus BC exam.

In addition, it performed well on simple Leetcode coding tasks, but its performance decreased as the task difficulty increased.

While the details of GPT-4’s training process are not officially documented, GPT models in general are known to involve large-scale machine learning with a variety of Internet texts.

I’m looking forward to

Due to changes to OpenAI’s data usage policy, the data used to train the language models does not include information shared via the API unless users explicitly consent to contribute it for this purpose.

As this technology improves our lives and plays a more significant role, it’s interesting how companies are responding to concerns and responding to keeping data private and gaining people’s trust.


Featured image created by the author using Midjourney.