Documentation Index
Fetch the complete documentation index at: https://docs.bota.dev/llms.txt
Use this file to discover all available pages before exploring further.
Multi-Modal AI features are Coming Soon and not yet available.
Transcription
Bota transcribes audio recordings using Automatic Speech Recognition (ASR). Transcription is asynchronous — you submit a job and receive results via webhook or polling.- Upload a recording — see Quickstart
- Create a transcription — specify the recording and optional language hint
- Wait for completion — poll or listen for the
transcription.completedwebhook - Retrieve results — structured output with timestamps, speaker labels, and confidence scores
Output Format
A completed transcription includes a full text string and time-stamped segments with speaker diarization:| Field | Description |
|---|---|
start / end | Timestamps in seconds |
text | Transcribed text for this segment |
speaker | Speaker label (e.g., Speaker 1, Speaker 2) |
confidence | Per-segment confidence score (0–1) |
ASR Providers
| Provider | Best For |
|---|---|
whisper | General purpose, multilingual support |
deepgram | Low latency, real-time processing |
assemblyai | Speaker diarization, content analysis |
elevenlabs | High accuracy transcription |
Language Support
Transcription supports 50+ languages. Provide a language hint (e.g.,en, es, zh) to improve accuracy, or omit it for automatic detection.
Transcription API Reference
Summarization
Bota generates structured summaries from transcriptions using LLM providers. Use built-in templates for common formats (SOAP notes, sales calls, legal memos) or provide custom prompts.Templates vs Custom Prompts
- Template — Use a built-in template for standardized, structured output. Best for repeatable workflows.
- Custom Prompt — Provide your own instructions for flexible, ad-hoc summarization.
Provide either a template or a custom prompt, not both.
Built-in Templates
General Notes
Ideal for meetings, discussions, and team syncs. Extracts key points, action items, decisions, and participants.Sales Call
Captures pain points, budget, next steps, and deal sentiment from sales conversations.Clinical SOAP
Generates structured SOAP notes from healthcare encounters.Legal Memo
Summarizes legal proceedings, depositions, and client meetings into structured memos with facts, issues, and analysis.Template Reference
| Template | ID | Use Case |
|---|---|---|
| General Notes | tmpl_general_notes | Meetings, discussions |
| Sales Call | tmpl_sales_call | Sales conversations |
| Clinical SOAP | tmpl_clinical_soap | Healthcare encounters |
| Legal Memo | tmpl_legal_memo | Legal proceedings |
LLM Providers
| Provider | Best For |
|---|---|
gemini | Fast processing, good general quality |
openai | High accuracy, structured output |
claude | Nuanced analysis, long transcripts |
Summarization API Reference
Multi-Modal Analysis
Multi-Modal extends the pipeline with visual context from the Bota Pin Pro. The Pin Pro captures images and video alongside audio, enabling AI that understands both what was said and what was seen.Media Types
| Type | Format | Best For |
|---|---|---|
| Images | JPEG, PNG | Periodic snapshots, whiteboard captures, document scans, equipment photos |
| Video clips | MP4 (H.264) | Short scene captures, demonstrations, walkthroughs |
| Trigger | Description |
|---|---|
| Periodic | Capture at fixed intervals (e.g., every 30 seconds, every 5 minutes) |
| Motion | Capture when significant scene change is detected |
| Manual | Capture on button press |
Video Summary
Generates a visual summary from video clips by identifying key frames, generating captions, and producing a timeline of visual highlights. Useful for quickly reviewing long recordings without watching the entire video. See Create Video Summary for the API reference.Use Cases
Field Inspection
Inspector narrates findings while the camera captures equipment and damage. Video summary highlights key visual moments alongside the transcript.
Clinical Encounter
Doctor-patient conversation captured alongside video of the examination. Transcript + video summary provide a complete record.
Meeting + Whiteboard
Discussion transcript combined with video of whiteboard diagrams. Video summary extracts key frames for quick review.
Training Session
Trainer’s spoken instructions paired with video of demonstrations. Video summary creates a visual timeline of the session.
End-to-End Flow
A typical multi-modal workflow:- Record — End user wears Pin Pro, presses button to start. Audio records continuously; camera captures video.
- Upload — Device uploads audio and video via the Upload URL endpoint (repeated per file), then calls Complete Upload.
- Transcribe — Create a transcription from the audio.
- Summarize — Create a summary from the transcript.
- Video Summary — Create a video summary for visual highlights.
- Deliver — Results delivered via webhook or polling.
BYO API Keys
All AI processing supports bringing your own provider API keys. This gives you control over costs, rate limits, and model selection.- Register your provider API key through the Integrations API
- Test the key to verify it works
- Bota automatically uses your key when you select that provider
Webhooks
| Event | Description |
|---|---|
transcription.completed | Transcription finished successfully |
transcription.failed | Transcription encountered an error |
summary.completed | Summary generated successfully |
summary.failed | Summary encountered an error |
Related
- Bota Pin Pro — Multi-modal hardware
- Upload API Reference — File upload documentation
- Streaming Upload — Upload while recording for faster processing

