Before you begin: You’ll need an API token. If you don’t have one yet, sign up at developers.voxmind.ai and your trial token will be emailed to you within a few seconds. The trial token is valid for your sandbox environment and doesn’t require a credit card.
Step 1: Get your organisation ID
Every resource in the Voxmind API is scoped to your organisation. Think of it as your tenant identifier — all your users, voiceprints, and settings live under it. After signup, your organisation ID is included in your welcome email. You can also retrieve it by calling:org_id handy — you’ll use it in every subsequent request.
Step 2: Enroll a user
Enrollment creates a voiceprint for a user in your system. You send a voice recording as a binary blob alongside your user’s identifier (external_id — this is your user’s ID in your own database, so you control the format).
Why async? Voiceprint generation involves running audio through our ML pipeline. It typically completes in 1–3 seconds but we return immediately so your application doesn’t block. Voxmind calls your webhook with the result when ready. See the Webhooks guide for setup.
Audio requirements for enrollment
For best results, the voice recording should be a WAV or MP3 file, at least 3 seconds long (5 seconds is ideal), recorded at a minimum of 16kHz sample rate. The user can say anything — Voxmind is text-independent. Background noise is handled by our preprocessing pipeline, but quieter environments produce more accurate voiceprints.Step 3: Verify a user
Once a user is enrolled, you can verify them at any time. The verification call is structurally identical to enrollment — you send a new voice recording with the sameexternal_id that was used during enrollment. Voxmind finds their stored voiceprint and compares it.
Step 4: Understand the result
When Voxmind calls your webhook, the payload will contain three key pieces of information: whether the voice matched the enrolled voiceprint, the confidence score for that match, and whether the audio was flagged as a deepfake or replay attack. Your application logic should combine all three signals. A passing score alone isn’t enough — you should reject any verification attempt wheredeepfake_detected is true, even if the voice score is technically above your threshold. An attacker using a high-quality voice clone might produce a reasonable match score, and the deepfake flag is your last line of defence.

