How POY Verify Defeats Real-Time Deepfake Face Swaps
A person holds up a photo. An app transforms their live face into a photorealistic 3D replica of the photo - in real-time. They blink, nod, smile, and turn their head. The deepfake follows perfectly. Every camera-only verification system on earth is fooled. POY Verify is not. Here is exactly why.
The Attack: Real-Time Face Swap in 30 Seconds
Modern deepfake face swap apps (DeepFaceLive, SimSwap, FaceFusion) can transform a person's live camera feed into someone else's face in real-time. The technology has reached a point where:
- The swap handles head rotation - turning left, right, up, down
- The swap handles expressions - blinking, smiling, frowning, raising eyebrows
- The swap handles lighting - shadows, highlights, and ambient light adjust naturally
- The swap handles occlusion - hair, glasses, and hands in front of the face
- Latency is under 50ms - indistinguishable from a live feed to the human eye
This means any verification system that only looks at the RGB camera image is compromised. The deepfake IS the camera image. There is nothing in the pixels to distinguish it from a real face.
Why Every Camera-Only System Fails
Traditional identity verification - including systems from major vendors - relies on analyzing the RGB camera feed. They check:
- Does the face match the document photo? - The deepfake face IS the document photo. Match: 99%+.
- Is the person blinking? - The deepfake blinks when the attacker blinks. Check passed.
- Can they turn their head? - The deepfake turns when the attacker turns. Check passed.
- Is the image a screen replay? - No, it is a live rendering. Not a screen. Check passed.
The fundamental problem: these systems ask "does this look like a real face?" The answer is yes - because modern deepfakes look indistinguishable from real faces in RGB. The question they should ask is "is this a real face?" That requires sensors beyond the camera.
POY Verify's 4-Layer Deepfake Defense
3D Depth Sensing - Seeing Through the Mask
Your phone's depth sensor (Apple TrueDepth, Android ToF) projects thousands of invisible infrared dots onto your face and measures how far away each point is. This creates a 3D depth map of your actual facial geometry - your real bone structure.
A deepfake filter modifies the RGB camera image but cannot modify the depth data. The depth sensor is a physically separate piece of hardware. It sees the real face underneath.
The result: the depth map shows Person A's skull (the attacker), but the RGB image shows Person B's face (the deepfake). POY Verify compares both. They don't match. Verification fails instantly.
Key measurements that differ between any two humans:
- Inter-orbital distance (eye socket spacing)
- Nasal bridge depth and angle
- Mandible width (jawbone)
- Zygomatic arch prominence (cheekbones)
- Supraorbital ridge depth (brow bone)
- Philtrum length and depth
Infrared Analysis - Seeing What Eyes Cannot
Human skin is not just a surface. It is a layered biological structure with blood vessels, melanin, collagen, and fat deposits that each interact with infrared light in specific, measurable ways.
When a phone's IR emitter illuminates a face, the resulting IR image reveals:
- Sub-surface blood flow - Visible as subtle pulsing patterns that match heartbeat
- Hemoglobin absorption - Blood absorbs specific IR wavelengths, creating a unique vascular map
- Tissue scattering - Light penetrates skin and scatters through tissue layers differently than any artificial material
- Temperature gradients - The nose tip, ears, and cheeks have different temperatures than the forehead and neck
A deepfake filter cannot modify the IR feed because it operates on a separate sensor. The IR image shows the attacker's real skin properties - which do not match the deepfake identity. A screen shows zero IR skin properties. A mask shows artificial material properties. Only real human skin passes.
Camera Pipeline Attestation - Catching the Injection
Deepfake face swaps work by intercepting the camera pipeline. Instead of the raw sensor data going directly to the verification app, a filter app sits in the middle, modifies each frame, and passes the altered frames forward. This is called a camera injection attack.
POY Verify detects injection attacks through device attestation:
- Apple App Attest - Cryptographically verifies the app is running unmodified on genuine Apple hardware, and that the camera session was not intercepted by a third-party filter
- Google Play Integrity - Verifies the device is not rooted, the app is genuine, and the runtime environment has not been tampered with
- Camera session binding - The verification session is cryptographically bound to the physical camera sensor. If frames are coming from a virtual camera driver (which deepfake apps require), the binding breaks
Even if a deepfake is visually perfect across all sensors, the injection itself is detectable at the operating system level. The camera pipeline has been modified. POY Verify sees the modification. Verification fails.
468-Point Landmark Cross-Validation
POY Verify uses MediaPipe FaceLandmarker to map 468 precise points on the face. These landmarks capture the geometric relationships that define a unique human face - distances, angles, proportions, and spatial relationships that are determined by bone structure.
The critical insight: landmarks extracted from the depth sensor data reflect the attacker's real bone structure. Landmarks extracted from the RGB camera data reflect the deepfake's rendered face. POY Verify extracts both sets and compares them.
If the depth-derived landmarks and the RGB-derived landmarks describe different facial geometries, a deepfake is present. This comparison happens in under 200 milliseconds and is mathematically precise - not a subjective visual judgment.
Cross-validation checks:
- Eye socket spacing: depth vs RGB
- Nose bridge angle: depth vs RGB
- Jaw width at mandibular angle: depth vs RGB
- Brow ridge depth: depth vs RGB
- Cheekbone prominence: depth vs RGB
- Chin projection: depth vs RGB
If any measurement diverges beyond 2 standard deviations between the depth and RGB sources, the system flags a deepfake with >99.7% confidence.
POY VERIFY CROSS-VALIDATES ALL 4 LAYERS
What About Other Attack Types?
| Attack Type | Description | RGB Camera | POY Verify | Why POY Wins |
|---|---|---|---|---|
| Real-time face swap | Live deepfake overlay on camera feed | Fooled | Caught | Depth/RGB mismatch + camera injection detected |
| Pre-recorded video | Playing a video of the target on a screen | Sometimes caught | Caught | Zero depth data + IR detects screen surface |
| Printed photo | Holding a high-res photo in front of camera | Usually caught | Caught | Zero depth data + no IR skin properties |
| 3D-printed mask | Physical replica of target's face | Fooled | Caught | No IR blood flow + no micro-movements |
| Silicone mask | Professional-grade prosthetic mask | Fooled | Caught | Wrong IR material properties + no blood pulse |
| Injected video feed | Virtual camera driver with synthetic frames | Fooled | Caught | Camera attestation detects modified pipeline |
| AI-generated face | Entirely synthetic person that never existed | Fooled | Caught | No depth data + no IR properties + injection detected |
The Key Insight: Sensors Cannot Be Deepfaked
Deepfake technology is advancing rapidly. Visual quality will continue to improve. Within a few years, AI-generated faces may be completely indistinguishable from real faces in photographs and video.
But deepfakes operate in the digital domain - they modify pixels. They cannot modify physics. They cannot change how infrared light interacts with biological tissue. They cannot create 3D depth data from a 2D rendering. They cannot fool a hardware sensor that measures the physical world.
This is POY Verify's architectural advantage. By verifying humanity through hardware sensors rather than pixel analysis, the system is immune to improvements in deepfake visual quality. The better deepfakes get at fooling eyes, the more valuable hardware-based verification becomes.
"You can fake what a camera sees. You cannot fake what a depth sensor measures. You cannot fake what an infrared emitter detects. You cannot fake the physics of biological tissue. That is why hardware-based liveness detection is the only approach that survives the deepfake era."
All of This Happens With Zero Data Collection
Every layer of deepfake defense described above runs entirely on the user's device inside the Secure Enclave:
- No facial images are transmitted to any server
- No depth maps leave the device
- No infrared data is uploaded
- No biometric templates are stored centrally
- Only a SHA-256 hash - a one-way mathematical fingerprint - is generated and used for verification
The system proves you are a real, unique human without ever seeing your face on a server. Your biometric data never leaves the physical chip inside your phone. There is no database to breach because no biometric data exists anywhere but your device.
Read the full POY Protocol Whitepaper for the complete technical architecture, or visit the Trust Center for compliance and security details.
Frequently Asked Questions
Deepfake face swap apps can fool software-only verification that only analyzes the RGB camera feed. They cannot fool hardware-based verification that uses 3D depth sensors, infrared analysis, and camera attestation. POY Verify uses all three hardware layers, making deepfake face swaps detectable regardless of how realistic the visual rendering appears.
3D depth sensors measure the physical distance to every point on a face using structured light or time-of-flight lasers. A deepfake filter modifies the RGB camera image but cannot change the depth data because depth is measured by a separate physical sensor. The depth map shows the real person's bone structure underneath the deepfake overlay, creating a mismatch that POY Verify detects instantly.
A screen displaying a deepfake video has zero depth variation - it is a flat surface. The 3D depth sensor measures a flat plane instead of the contours of a human face. Additionally, infrared sensors detect that the surface reflects IR light like a screen, not like human skin with sub-surface blood flow. Both signals immediately fail the liveness check.
3D-printed and silicone masks defeat RGB cameras but fail infrared analysis. Human skin has unique IR properties caused by sub-surface blood flow, hemoglobin absorption, and tissue layering. Masks made of plastic, silicone, resin, or any non-biological material reflect infrared light differently. Additionally, masks cannot produce involuntary micro-movements like blood pulse and micro-saccadic eye movements.
Experience Hardware-Based Verification
Try POY Verify's biometric liveness detection yourself. 30 seconds. Zero data collected. Deepfake-proof by design.
VERIFY ME NOWOr read the technical whitepaper for the full cryptographic architecture