Research Finds That Artificial Intelligence Gets Much Worse When It’s Not Trained By Humans
AI is all anyone in tech (and pretty much everywhere else) can talk about right now, but it seems like every time a new story comes out it raises more questions than answers.
Scientists at Rice and Stanford Universities recently fed AI-generated content to AI models and published their findings – which were that it seemed to cause their output quality to distinctly erode.
They attempted to train AI generative models – both large language models and image generators – on AI knowledge, and it sort of broke it’s “brain.”
Or, as the researchers said, it drove the AI “MAD.”
“Seismic advances in generative AI algorithms for imagery, text, and other data types has led to the temptation to use synthetic data to train next-generation models. Repeating this process creates and autophagous (‘self-consuming’) loop whose properties are poorly understood.”
The came to some pretty interesting conclusions.
“Our primary conclusion across all scenarios is that without enough fresh real data in each generation of an autophagous loop, future generative models are doomed to have their quality (precision) or diversity (recall) progressively decrease. We term this condition Model Autophagy Disorder (MAD).”
Basically, without original human artwork to feed into the machine, we can’t expect it to give us anything good in return. Because when it’s fed AI-generated content it will start to return less-varied data, eventually crumbling.
The paper has not been peer-reviewed yet, and the model they tested only made it through five rounds of training before the cracks started to show.
The real world implications are many and fairly serious, too.
It is generally true that the more data you feed a model, the better it get – which means AI builders need tons and tons of training material to create the best AI system around.
With more and more of the art floating around already generated by AI, well, you can see the issue that’s looming large. AI is already inextricably fused with the structure of the internet, so there’s no real way to separate what’s being fed during training sessions.
“Since the training datasets for generative AI models tend to be sourced from the Internet, today’s AI models are unwittingly being trained on increasing amounts of AI-synthesized data. Popular LAION-5B dataset, which is used to train state-of-the-art text-to-image models like Stable Diffusion, contains synthetic images sampled from several earlier generations of generative models.”
They believe the situation will not only continue, but it will accelerate.
Some people point out this could be a little bit hopeful, since it does mean that these systems are pretty much worthless without human input.
Without us, their brains will literally melt, and I don’t know. That feels like…something. Something not terrible.
Sign up to get our BEST stories of the week straight to your inbox.