Skip to content
UNI
/7 min read

How Deepfakes Work, And Why They're Getting Harder to Detect

A plain-language breakdown of the technology behind AI face-swapping, voice cloning, and synthetic media, and what it means for creators.

In 2023, it took a specialist with GPU access roughly 48 hours to create a convincing deepfake of a public figure. In 2026, it takes anyone with a smartphone about 90 seconds.

That shift changes everything about how we think about likeness protection. This post explains the technology, without jargon, so you understand exactly what you're up against.

The two categories of deepfake

Face-swap deepfakes replace one person's face with another in video. The core technology is a generative adversarial network (GAN): two neural networks, one that generates fake images, one that tries to detect fakes, locked in competition until the generator produces images indistinguishable from real ones.

Modern face-swap tools (Runway, Kling, HeyGen) use diffusion models instead of GANs. Diffusion models start with random noise and iteratively remove it according to patterns learned from billions of training images. The result is photorealistic faces that never existed.

Voice-clone deepfakes work differently. They extract a "voice embedding", a mathematical representation of how your voice sounds, from as little as 3 seconds of audio. Then a text-to-speech system generates new speech that matches that embedding. The output sounds exactly like you saying things you never said.

Why detection is getting harder

Early deepfake detectors looked for artifacts: unnatural blinking, asymmetric faces, inconsistent lighting around the hairline. Generative models quickly learned to eliminate these artifacts because detection models were published openly and used as training signals.

Today's detectors use spectral analysis (looking at frequency patterns invisible to the human eye), physiological signals (pulse variation in facial color that's hard to fake), and provenance tracking (cryptographic certificates embedded at capture time). None of these are foolproof alone.

This is why multi-layer detection matters. Unimpersonationable uses face embeddings (ArcFace 512-dim), perceptual hashing, AI-generation detection (Illuminarty), and voice anti-spoofing (AASIST) in combination, because any single method can be defeated.

What this means for creators

The practical implication: your face is already being used by AI systems you've never interacted with. Stable Diffusion, FLUX, and most commercial image generators were trained on billions of web-scraped images, which almost certainly include photos of you if you have any public presence.

Opt-out registries (Adobe's Content Authenticity Initiative, Spawning's "Have I Been Trained") help, but they're voluntary and self-reported. The only reliable protection is:

  1. Register your biometrics, create a cryptographic baseline before violations occur
  2. Monitor regularly, violations spread faster than manual monitoring can catch
  3. Enforce in one click, DMCA notices drafted in seconds, ready to send

The technology will keep improving. The asymmetry between creating fakes and removing them is only fixable with automated infrastructure, not manual effort.

Protect your likeness

Find where your face is being used.

The scan is free and takes about two minutes. No account, no credit card.

Start a free scan →
← All posts