Why ChatGPT poses a huge risk to your own data privacy
ChatGPT has taken the world by storm. Within two months of its release, it reached 100 million active usersmaking it the fastest growing consumer application ever launched. Users are attracted to the tool advanced options — and concerned about the potential to cause disruption in different sectors.
A much less discussed implication is the privacy risks that ChatGPT poses to all of us. just yesterday, Google revealed his own conversational AI called Bard, and others are sure to follow. Tech companies working on AI have rightfully entered an arms race.
The problem is that it is fueled by our personal data.
300 billion words. How many are yours?
ChatGPT is supported by a large language model that requires huge amounts of data to function and improve. The more data the model trains, the better it gets at detecting patterns, anticipating what’s to come, and generating plausible text.
OpenAI, the company behind ChatGPT, gave the tool some food 300 billion words systematically taken from the internet: books, articles, websites and posts – including personal information obtained without permission.
If you’ve ever written a blog post or product review or commented on an article online, chances are this information has been used by ChatGPT.
So why is that a problem?
The data collection used to train ChatGPT is problematic for several reasons.
First, none of us were asked if OpenAI could use our data. This is a clear violation of privacy, especially when data is sensitive and could be used to identify us, our family members or our location.
Even when data is publicly available, its use may violate what we call textual integrity. This is a fundamental principle in legal discussions about privacy. It requires that individuals’ information not be disclosed outside the context in which it was originally produced.
Also, OpenAI does not provide procedures for individuals to verify whether the company stores their personal information or to request that it be deleted. This is a guaranteed right under the European General Data Protection Regulation (GDPR) – although it is still debated whether ChatGPT is compliant with the requirements of the GDPR.
This “right to be forgotten” is especially important in cases where the information is inaccurate or misleading occur regularly with ChatGPT.
In addition, the scraped data on which ChatGPT is trained may be proprietary or copyrighted. For example, when I asked, the tool produced the first few paragraphs of Peter Carey’s novel “True History of the Kelly Gang” – a copyrighted text.

ChatGPT does not take into account copyright protection when generating output. Anyone using the results elsewhere could be inadvertently plagiarizing. ChatGPT, author provided
Finally, OpenAI didn’t pay for the data it scraped from the internet. The individuals, website owners and companies that produced it were not compensated. This is especially noteworthy considering OpenAI was recently worth $29 billion, more than double value in 2021.
OpenAI also has net announced ChatGPT Plus, a paid subscription that provides customers with ongoing access to the tool, faster response times, and priority access to new features. This plan will contribute to the expected sales of $1 billion by 2024.
None of this would have been possible without data – our data – collected and used without our consent.
A weak privacy policy
Another privacy risk concerns the data provided to ChatGPT in the form of user prompts. When we ask the tool to answer questions or perform tasks, we may inadvertently transfer it sensitive information and put it in the public domain.
For example, an attorney might ask the tool to review a draft divorce agreement, or a programmer might ask it to review a piece of code. The agreement and code, along with the essays performed, are now part of ChatGPT’s database. This means they can be used to further train the tool and can be included in responses to other people’s prompts.
In addition, OpenAI collects a wide variety of other user information. According to the company privacy policy, it collects users’ IP address, browser type and settings, and data about users’ interactions with the site, including the type of content users interact with, features they use, and actions they take.
It also collects information about users’ browsing activities over time and across different websites. Alarmingly, OpenAI claims that this is possible share users’ personal information with unspecified third parties, without notifying them, to achieve their business objectives.
Time to contain it?
Some experts believe that ChatGPT is a tipping point for AI – a realization of technological development that can revolutionize the way we work, learn, write and even think. Despite the potential benefits, we must remember that OpenAI is a private, for-profit company whose interests and commercial requirements do not necessarily align with larger societal needs.
The privacy risks associated with ChatGPT should be a warning. And as consumers of a growing number of AI technologies, we must be extremely careful about what information we share with such tools.
- This article has been republished from The conversation under a Creative Commons license. Read the original article.
Contents