AIs trained on AI-generated images produce glitches and blurs

Using AI-generated images to train AI quickly creates a loop where the results get worse in either quality or visual diversity.

AI images get blurrier and less realistic if AIs are trained on AI-generated data examples
Rice University

As the internet fills up with AI-created images of human faces and strange cat portraits, there is the growing danger of creating a self-consuming loop that consists of generative AIs mainly training on their own synthetic images. That could lead to huge drops in either the quality or diversity of these images.

The phenomenon will challenge all but the largest tech companies that can afford to filter AI training data sets scraped from the internet. “There’s going to be a slippery slope to using synthetic data, either wittingly or unwittingly,” says Richard Baraniuk at Rice University in Texas.

Baraniuk and his colleagues showed how the self-consuming AI loop can impact generative AIs, including StyleGAN models that create images in a single pass, and diffusion models that use many steps to gradually produce a clear image. They trained the AIs on either AI-generated images or real images – the latter consisting of 70,000 human faces taken from the Flickr online photo service.

First, each AI was trained exclusively and repeatedly on its own AI-generated images taken from previous generations. Within a few generations, wavy visual patterns were appearing on the human faces produced by the StyleGAN image generator, whereas the diffusion image generator results became increasingly blurry.

The declining image quality can be slowed by cherry-picking AI-generated images of higher quality to use in training. But that approach can lead to AI-generated images looking more alike – this loss of diversity was seen as the StyleGAN image generator eventually created a sea of human faces that looked overwhelmingly similar in both facial features and skin tone.

“It will always be very tempting to sacrifice diversity for the aesthetic quality to be a more popular generative model,” says Josue Casco-Rodriguez, also at Rice University and part of the team.

In another scenario, researchers folded a fixed set of real images into the training sets that included AI-generated images. This is similar to the way researchers sometimes use AI-generated data to fill out small training sets. But this strategy only delayed the decline in image quality or diversity.

Finally, they trained each AI on a combination of AI-generated images and an ever-changing set of real images. This led to the most promising results in staving off the drop in quality or diversity – but only as long as the amount of AI-generated data used in training was limited to a certain threshold.

Such results align with other recent research about the decline in AI usefulness as these programmes learn from other AIs, says Ilia Shumailav at the University of Oxford.

But Shumailav remains optimistic about using synthetic data to train AIs. The key would be to make sure any AI-generated data is high quality and free from systematic errors that could more severely impact results down the road.

Smaller organisations without the resources of Google or Microsoft will face the greatest challenges in filtering AI-created images taken from the internet. Efforts to develop watermarks that allow people to easily identify AI-generated images could help, says team member Sina Alemohammad, also at Rice University. But hidden watermarks can actually worsen the quality of AI-generated images if they get overlooked.

“You are damned if you do and damned if you don’t,” says Alemohammad. “But it’s definitely better to watermark the image than not.”

Reference:

arXivDOI: 10.48550/arXiv.2307.01850

Post a Comment

0 Comments