2026-04-10Blog

Deepfake Fraud in Video Calls: How to Stop It

In January 2024, a finance worker at a multinational company was tricked into transferring $25 million after joining a video call with what appeared to be the company's CFO and several colleagues. Every person on the call was a deepfake. The attacker used real-time face-swapping technology to impersonate multiple executives simultaneously. This was not a movie plot - it happened, and it is happening with increasing frequency.

The Billion Dollar Problem: Deepfake-Driven CEO Fraud

Deepfake-enabled fraud losses exceeded $1.1 billion in the United States alone in 2025, and the number is accelerating. CEO fraud (also called business email compromise or BEC) has been the most costly form of cybercrime for over a decade. Deepfakes have supercharged it by adding video and audio impersonation to what was previously an email-only attack.

The economics of deepfake fraud are terrifying for defenders:

Creating a convincing voice clone requires as little as 3 seconds of sample audio
Real-time video face-swapping software is available as open source (DeepFaceLive, SimSwap)
A single successful CEO fraud attack yields an average of $2.4 million per incident
Detection tools lag behind generation tools - by the time a detection method is published, attackers adapt

How Real-Time Video Deepfakes Actually Work

Modern deepfake video calls use a pipeline of technologies running in real time:

Face capture - The attacker's webcam captures their face in real-time
Face swapping - A neural network maps the attacker's facial expressions and movements onto the target's face model in real-time (under 50ms latency)
Voice cloning - A text-to-speech model generates the target's voice from text input, or a voice conversion model transforms the attacker's speech into the target's voice in real-time
Virtual camera injection - The deepfake video output is routed through a virtual camera driver that video conferencing software (Zoom, Teams, Meet) treats as a real webcam
Background synthesis - The background is replaced with a plausible setting (the target's office, a conference room) using standard virtual background technology

The entire pipeline runs on a consumer gaming GPU. The attacker needs only a few photos or a short video of the target to build the face model, and a few seconds of audio to clone the voice.

Why Existing Video Conferencing Security Fails

Current video conferencing platforms have no built-in defense against deepfakes:

Zoom, Teams, Meet - Accept any video input from any source, including virtual cameras. There is no verification that the video feed comes from a physical camera pointed at a real face
End-to-end encryption - Protects the call from eavesdropping but does nothing to verify who is on the call. Encrypted deepfakes are still deepfakes
Meeting passwords and waiting rooms - Prevent unauthorized access but do not verify identity. The attacker can join with legitimate credentials (obtained through phishing) and still be a deepfake
Recording and transcription - Create a record of what happened but cannot distinguish real from fake in the recording

Real-Time Liveness and Human Verification During Calls

The solution is not trying to detect deepfakes (a losing arms race) but verifying that a real human is physically present at the other end of the call. This requires hardware-based liveness detection that cannot be spoofed by virtual camera injection:

3D depth sensing - Using TrueDepth cameras or structured light to verify three-dimensional facial geometry. Deepfake video is 2D - it cannot produce genuine depth data
Infrared analysis - Human skin reflects infrared light differently than screens. IR sensors detect sub-surface blood flow patterns that deepfakes cannot replicate
Camera attestation - Cryptographically verifying that the video feed comes from a physical camera sensor, not a virtual camera driver
Challenge-response - Requesting specific micro-actions (subtle head movements, eye tracking) that must match the on-screen rendering. Deepfakes introduce measurable latency in challenge-response that physical faces do not

Deploying POY Verify for High-Stakes Video Authentication

POY Verify can be integrated into video call workflows to provide real-time human verification:

Pre-call verification - Before a high-stakes call begins, each participant completes a 30-second POY verification on their device. This confirms a real human is physically present using on-device Secure Enclave processing
Trust score display - Each verified participant's trust score is visible to other participants, providing a real-time confidence signal
Periodic re-verification - For extended high-risk sessions, the system can require periodic re-verification check-ins to ensure the same human remains present throughout the call
Tamper-evident logging - Every verification event is cryptographically logged, creating an audit trail that proves each participant was a verified human at each checkpoint

This approach does not try to detect whether the video feed is a deepfake. Instead, it establishes an independent, hardware-backed proof channel that confirms a real human is present - regardless of what appears on the video feed. The deepfake can show whatever it wants; the liveness verification proves who is actually there.

For organizations handling high-value decisions via video (M&A discussions, board meetings, financial approvals, classified briefings), this independent verification channel is becoming essential. The cost of a single successful deepfake CEO fraud attack ($2.4M average) dwarfs the cost of implementing verification for the calls that matter most.

Prove You Are Real

POY Verify is the privacy-first human verification layer for the internet. No data collected. No identity required.

VERIFY ME NOW