Verification compares a new voice recording against a user’s stored voiceprint. Voxmind runs two parallel checks: a voiceprint match (returning a confidence score) and a deepfake detection check (identifying synthetic or replayed audio). Both results are included in the webhook payload.Like enrollment, verification is asynchronous. You receive a 202 Accepted immediately and your webhook receives the full result within 1–2 seconds on average.
Always check deepfake_detected before granting access. A voice clone may produce a non-trivial match score. The deepfake flag is your definitive signal — any result with deepfake_detected: true should be treated as a security event and logged, regardless of the match_score.
The voice recording from the current authentication attempt, base64-encoded. Same format requirements as enrollment: WAV or MP3, minimum 16kHz, at least 3 seconds of speech.
A unique identifier you generate for this verification request. Returned in the webhook payload so you can correlate the async result with the correct user session.
Your user’s identifier — must exactly match the external_id used during their enrollment. Voxmind uses this to retrieve the correct voiceprint for comparison.
The primary language of the verification audio. The user can speak a different language than they enrolled in — Voxmind handles this — but specifying the correct language improves accuracy.
Optional. The unique identifier of the device being used for this verification attempt. When provided, this is compared against the device fingerprint used at enrollment. Mismatches are flagged in analytics and can be used to detect account sharing or device-switching attacks.
The verification outcome. One of verified, rejected, or inconclusive. An inconclusive result means audio quality was insufficient for a reliable determination — treat it as a rejection and prompt the user to try again.
A confidence score between 0.0 and 1.0 representing how closely the submitted audio matches the enrolled voiceprint. Scores above 0.85 indicate a strong match. Your application should define a threshold appropriate to your security requirements — higher stakes use cases should use a higher threshold.
true if the audio was identified as AI-generated, synthetic, or replayed. This runs as a parallel check to voiceprint matching and is always present in the payload. A value of true should result in immediate rejection and fraud logging, regardless of match_score.