Investigating the Potential of Using CNNs to Identify AI Generated Images

In recent years, the advancement of artificial intelligence (AI) has spurred the creation of increasingly lifelike artificially generated images. These images, often indistinguishable from genuine photographs, have raised concerns about their potential for misuse.

Jarrod Muddyman

2/19/20243 min read

white-and-black quail eggs on brown wicker bowl

In recent years, the advancement of artificial intelligence (AI) has spurred the creation of increasingly lifelike artificially generated images. These images, often indistinguishable from genuine photographs, have raised concerns about their potential for misuse, particularly in the context of deepfakes and other forms of digital manipulation.

Out of curiosity, I decided to explore the feasibility of training a Convolutional Neural Network (CNN) to discern AI-generated images. With a relatively modest effort, employing a simple model and a small training set comprising fewer than 100 images, I achieved moderate success.

Code and Training Setup

I assembled a straightforward Python script to initiate the process. Preprocessing the images into a format compatible with the CNN was the initial step, wherein I experimented with various settings, prioritizing higher resolutions for optimal performance, I'm hopeful that I can replicate the same or greater performance with lower resolutions in the future.

Subsequently, I loaded the two sets of data and partitioned them into training and testing subsets.

The crux of the endeavor lay in crafting the model architecture itself, a facet offering myriad permutations. I also contemplate conducting hyper-parameter optimization experiments although decided against it.

Still I was mindful of the tradeoff between enhanced performance, achieved through augmenting connections and enlarging layers, and the resultant ballooning of model size.

For instance, the model seen below can be close to the size of a Gigabyte, which is not ideal, and something I'm hoping to cut down on in the future.

My ultimate objective is to embed this model into a browser extension to aid users in detecting AI images.

The last part of the process involves setting up the training settings, which I've intentionally kept straightforward: a batch size of 24 over 32 epochs. These parameters, albeit arbitrary at this juncture, facilitate gauging the CNN's performance before committing to more meticulous fine-tuning.

Testing

While the model has yielded varying results in subsequent testing phases, the more promising outcomes from initial trials manifest as an average detection rate of 66-80%. These findings suggest the presence of discernible patterns exploitable by a CNN. With further refinement and a larger sample size, the prospect of designing and training a neural network to accurately identify AI-generated images appears promising.

For example I'll show you below the results of the above model when testing it on data it's never seen before.

Conclusion

Even when confronted with a minuscule dataset predominantly picturing human subjects facing the camera, the CNN surpassed random chance in detecting AI images. This success underscores the genuine potential of CNNs in discerning artificially generated imagery, a capability potentially extendable to the realm of detecting AI-generated videos.

Resources and References

All real images sourced from the following sites, I used the free for use images listed:

AI images sourced from various generators, primarily www.getimg.ai and www.perchance.org/ai-photo-generator

All of the code and training data can be found on my github here.

jarrod.muddyman@muddykat.tech