Skip to main content

Voice mode

Voice mode turns a session into a conversation. You speak. I listen, think, and speak back. The chat thread keeps a written transcript so nothing is lost when you finish.

When you can use it

Voice mode is available inside any practice session — focus session, free practice, micro, milestone validation, or gauntlet — wherever the chat composer renders.

It runs on web only, and only in browsers that support the Opus codec via MediaRecorder. In practice that means Chrome or Edge. On Safari, Firefox, or any mobile browser without Opus support, the voice-mode strip shows the verbatim message:

Voice mode requires Chrome or Edge.

…and the toggle is suppressed. You can still type.

The native iOS and Android apps don't run voice mode in the browser path — voice on native is a separate roadmap item.

How to start a turn

Inside a session, the strip above the composer shows a green pill button: Start voice mode.

Tap it and:

  1. Your browser asks for microphone permission. Grant it. Without it, voice can't start.
  2. The fullscreen voice orb takes over the chat. The breathing circle appears in the center.
  3. The orb cycles through states as the turn runs: connecting, listening, thinking, ARIA speaking.

When you're done with the turn, tap the End voice pill at the bottom of the orb. The orb dismisses and you return to the chat thread with both your turn and my response transcribed in.

The orb states

The orb is the only UI you see during a turn. Its color and animation tell you what's happening:

StateColorWhat it means
ConnectingMuted greyI'm setting up the connection.
Requesting microphoneMuted greyBrowser permission prompt is up.
ListeningGreen, pulsing with your voiceYour microphone is live. Speak.
ThinkingGold, gentle breatheI have your audio and I'm composing a response.
ARIA speakingGreen, pulsing with playbackAudio is playing back. Don't talk over it.
Wrapping upMuted greyThe session is closing.

The orb scales with real audio amplitude when you're talking and when I'm talking back, so you have a live signal that the mic is registering and that playback is actually moving.

A live transcript appears below the orb — your interim words in italic grey, finalized text in solid grey. This isn't a delay artifact; it's intentional, so you can see what I'm hearing and correct yourself if it misheard.

Pricing

Voice is more expensive than text because each turn runs three pieces of infrastructure (speech-to-text, my reasoning, text-to-speech).

  • Each voice turn costs 3 credits.
  • The closing report (when you end the session) costs 5 credits.
  • Voice mode requires a balance of at least 30 credits to start. This is a floor — not the cost of the first turn — that exists so you have ~10 turns of headroom before the next paywall.

If your balance is below 30, the toggle button shows the verbatim message:

Voice mode needs 30 credits

…and is greyed out. Top up credits and the button activates.

I charge before the turn starts, not after. If a turn fails mid-flight (network drop, server error, my model timing out), the credits are refunded automatically.

If you run out of credits inside an active voice session, the next turn fails with a paywall message that the app surfaces as a modal. End voice, top up, and start a new turn.

What voice doesn't change

  • The session contract is identical. Same Socratic interrogation, same weakness-report format at the end, same scoring. Voice is a UI mode, not a different examiner.
  • Cognitive errors are still tagged. Wrong answers in voice get classified and written to your error log the same way they do in text.
  • Voice and text mix freely. You can do half a session in voice and half in text. The transcript stitches them together.
  • The chat thread is the source of truth. Whatever I say in voice ends up as a text bubble in the thread.

When to use voice

  • You think out loud better than you type. Saying "OK, the question is asking about… and I think the trap here is…" is faster and more natural than typing it. The reasoning is what I'm scoring.
  • You're walking, driving (only as a passenger), or doing something that doesn't free your hands. A voice micro session at lunch is realistic in a way a typed one isn't.
  • You want exam-day rhythm. Speaking your reasoning aloud is closer to how an oral exam or live-coding interview works.

When to skip voice

  • You're in a quiet space where you can't speak. Type.
  • The cert involves heavy notation, code, or formulas. The transcript will mishear technical terms. Type those questions.
  • You're low on credits and the test is in 3 days. Stretch the budget on text sessions; reserve voice for moments it actually helps.

Privacy

Audio is streamed to the speech-to-text vendor, not stored. The transcribed text is what lives in your session record and feeds my responses. Audio responses are streamed from the text-to-speech vendor; they're not retained server-side after playback.

Microphone permission is browser-scoped — revoke it any time from your browser's site settings and voice mode will fall back to "Voice mode requires Chrome or Edge"-style messaging.

tip

The first time you run a voice turn, take 30 seconds to test the orb on a softball question. Confirm the mic registers (the orb pulses with your voice), confirm you can hear the playback, and confirm the transcript is reading you correctly. If the orb doesn't pulse or you don't see your words appear, end the turn and check your microphone settings — running a real practice question through a broken setup wastes credits.