Data Drift, Bias, & Context: Why AIs Like ChatGPT Will Always Need Human Input

Natalie Hussey
Feb 16
4 min read

Generative artificial intelligence (AI) large language models (LLMs) like ChatGPT can outperform humans on many tasks, but its inherent nature will always require human input.

Using generative AI programs like ChatGPT is equally as remarkable as it is a little freaky. Generative AI has an impressive capability to perform tasks that require years of specialized education and training for humans to master. The natural, logical response to experiencing the capabilities of AI is that it will soon replace humans in the workforce. This concern is not unfounded, as in many cases, AI models have already replaced not just tasks but entire jobs in the workforce.

Let's face it, though, software has a long history of replacing humans. However, it has been long recognized that people will still need to program and maintain software, so it balanced out. With AI performing tasks like writing sophisticated, error-free code, reliable software careers like computer programming are now threatened (exemplified by the mass tech layoffs occurring concurrently with the rise of AI).

However, for now, AI, like all other software, still needs humans because of data drift, bias, and context.

Data Drift

Data drift is a phenomenon in AI where the statistical properties of the data used to train the model change over time. As data changes, the model's performance degrades and eventually becomes irrelevant. The AI will no longer make accurate predictions or generate relevant responses.

A simple explanation for data drift can be understood by examining a natural language processing (NLP) AI like ChatGPT. NLPs like ChatGPT use data sets to build models that make accurate predictions. This is known as the "decision tree." Decision trees are essentially trained to make yes/no decisions based on the data and training model. When the data changes, it's like altering the branches of the tree, and the AI may not make accurate decisions anymore because it's not aligned with the new information.

So, for example, you would feed the AI a set of random sentences that describe the same thing as the core data:

AI is a helpful software
AI software can help people
AI software can be used to generate pictures of doughnuts
AI software empowers people

Next, you would train the AI model to use common grammar and vocabulary terms associated with the topic to accurately describe the content when queried.

Out of the above data, the decision is likely to reject (say no to) "generate pictures of doughnuts" because it is clearly uncommon and does not match the other data sets. Then, the AI will find the commonalities, like AI helping people, most likely using "help" as the most commonly generated word. So the software would generate responses like:

AI software helps people
AI is software helpful to people
People use helpful AI software

These may be redundant, but they are fine responses. However, they showcase two problems. First, what is generated is often fed back into the algorithm as part of the machine learning process. This further dilutes the data, making "helpful" the dominant term and, eventually, the only term used for this query. Very quickly, the AI starts creating less and less specific results, leading to useful synonyms like "empower" being lost. This is only corrected if people continuously add new and novel data. People who use ChatGPT and other NLPs can tell you how much filler and repetitive copy it produces sometimes.

Essentially, the best way to think about it is that you get a larger variety of output when you have more diverse input. Humans provide that diversity. Human writers are far more creative than AI, which is fundamentally just programmed to be correct and concise.

Furthermore, language and culture are constantly in flux, which can further cause data degradation. AI, like ChatGPT, will only be aware of cultural and language changes with human input. Thus, human input will always be required to ensure the language and other AI-generated data are current and relevant.

Biases

Beyond keeping data current, human input is essential to protect against biases. AI can easily develop biases as a result of bias in training data. Biases can lead to both information that is not factual and information that is harmful. Say the data pool has a lot of flat-earther content that can taint the AI's response to queries about the shape of the world. We don't want our children to include a flat piece of paper in their school mobiles because AI misled them.

Data can also contain harmful racial, gender, and disability stereotypes and slurs. Humans are essential to assessing the data and removing any negative outputs that are harmful. Otherwise, the AI will not be safe, ethical, or, frankly, usable.

DEI experts Ashley Kennedy, Christie Lindor, and Nika White offer enlightening insights on the social and organizational consequences of biased AI and why there will always be a need for human input to combat bias. You can watch it here.

Context

Human language is full of ambiguity and nuance. My favorite way to explain how these can be confusing for AI is by highlighting this simple word exercise:

"Yeah"
"No"
"Yeah, no."
"No, yeah."
"Yeah, no, yeah."
"No, yeah, no."

Humans read this and know the differences instantly. We also recognize that the tone in which these are said also impacts their meaning. AI needs an explanation from a human to understand these differences. Further, language is full of jargon, slang, and culturally and locationally-specific phrases, vocabulary, and grammar. For example, in some areas of the United States, when you say "pop," people think "pop music," while in other areas, when you say "pop," people think "soft drinks." AI needs humans to provide context to all of these complexities, clarify ambiguity, and interpret these differences.

The Human Impact Of AI

There is no understating how impactful AI, like ChatGPT, will be in the workplace. However, no matter how sophisticated generative AIs like ChatGPT get, AI will always rely on humans to ensure that AI models are accurate, valuable, and safe tools for users.

On the plus side of losing jobs to AI, we are already seeing a rise in new jobs specific to providing human input to AI. AI jobs like: