Minimum audio requirements
Voxmind accepts WAV and MP3 audio files. The recording should be at least 3 seconds of clean speech, with 5 seconds being the sweet spot for optimal voiceprint accuracy. The sample rate should be a minimum of 16kHz — most modern recording APIs on web and mobile default to 44.1kHz or 48kHz, which is fine; Voxmind will downsample internally. What matters far more than length is the signal-to-noise ratio. Three seconds of clean speech in a quiet room produces a better voiceprint than ten seconds of speech with consistent background noise, because the phoneme-frequency extraction pipeline has to work harder to isolate clean phoneme boundaries when noise is present.Designing your enrollment UX
The enrollment experience matters for two reasons: it affects audio quality (users who understand what you need will speak more naturally and clearly) and it affects completion rates (users who find the process confusing will abandon it). Tell users what to say. Even though Voxmind is text-independent, users benefit from a prompt. “Please say your full name and confirm today’s date” works well — it’s natural, generates varied phoneme content, and gives users something specific to focus on rather than feeling like they’re talking into the void. Use a visual indicator to show recording is active. A simple animated waveform or countdown timer signals that the system is listening and processing. Without it, users often speak too softly or stop speaking before the required duration. Validate before submitting. Record the audio client-side and do a quick client-side check on duration (is it at least 2 seconds?) and amplitude (is there actually speech present?) before you submit to the API. This catches the common failure modes — user didn’t speak, recording was too brief — before wasting an API call. Offer a re-enrollment path. Circumstances change. A user who enrolled on a phone in a quiet environment might need to re-enroll when you build a desktop app. Make it easy to update their voiceprint in your account settings flow.Handling the async response
Enrollment returns HTTP 202 (Accepted) immediately and delivers the result to your configured webhook endpoint when processing is complete, typically within 1–3 seconds. Your webhook payload will indicate whether the enrollment was successful and whether the voiceprint quality meets the threshold for reliable verification. If the enrollment quality score is below the minimum threshold — which can happen with very short audio, very noisy recordings, or audio where no clear speech was detected — Voxmind will flag this in the webhook response. Build your flow to handle this gracefully: rather than silently failing, tell the user the enrollment didn’t capture clearly and prompt them to try again.Re-enrollment and voiceprint updates
You can submit a new enrollment for anexternal_id at any time. The new recording will replace the existing voiceprint. There is no concept of accumulating multiple enrollments — each user has a single active voiceprint associated with their external_id in your organisation.
This is intentional: maintaining a single current voiceprint keeps the matching model simple and avoids the complexity of managing voiceprint versions. If a user’s voice characteristics change significantly — which is rare but can happen after surgery, illness, or significant aging — re-enrollment resolves it cleanly.

