Live Two-Way Chat
Real-time conversational AI with natural speech flow - moving beyond forum-style turn-taking.
Vision
Current chatbot conversations are essentially forums with near-instant replies. Humans don't listen to someone speak, stop, think about the context, then respond with an entire paragraph. Live Two-Way Chat simulates natural human conversation:
The Problem
Traditional chat interfaces:
- Wait for complete user input before processing
- Generate entire responses at once
- Can't be interrupted or course-corrected mid-thought
- Feel robotic and turn-based
The Solution
A real-time bidirectional conversation where:
- Continuous transcription - Human voice is transcribed in small constant chunks in the background
- Predictive response preparation - AI analyzes context and pre-prepares replies, modifying them as new context arrives
- Natural interruption - AI decides when to speak:
- Sometimes interrupting if an important point needs to be made
- Sometimes waiting for a question to be asked
- Bidirectional listening - The chatbot listens even while speaking, taking into account what it was saying when interrupted
- Shared context window - A visual workspace for files and artifacts
Shared Context Window
A drag-and-drop workspace visible to both human and AI:
| Content Type | Behavior |
|---|---|
| Images | Displayed for user, visible to AI for analysis |
| Code | Displayed and editable by user, AI can view and modify |
| Documents | Shared context for conversation |
| Split view | Window can split to show 2+ files simultaneously |
The AI can:
- View what's in the window
- Edit code or text files
- Reference images in conversation
- Suggest changes visually
Technical Challenges
- Streaming ASR - Real-time speech-to-text with low latency
- Incremental response generation - Partial responses that can be updated
- Turn-taking model - When to speak, when to wait, when to interrupt
- Context threading - Tracking what was said/being-said when interruptions occur
- Audio ducking - Managing simultaneous speech gracefully
Potential Architecture
┌─────────────────┐ ┌──────────────────┐
│ Microphone │────▶│ Streaming ASR │
│ (continuous) │ │ (Whisper/etc) │
└─────────────────┘ └────────┬─────────┘
│ text chunks
▼
┌─────────────────┐ ┌──────────────────┐
│ Speaker │◀────│ Response Engine │
│ (TTS) │ │ (predictive) │
└─────────────────┘ └────────┬─────────┘
│
┌────────▼─────────┐
│ Context Window │
│ (shared state) │
└──────────────────┘
Inspiration
- Natural human conversations (overlapping speech, interruptions, backchanneling)
- Real-time collaborative editors (Google Docs)
- Voice assistants that feel less robotic
- Pair programming conversations
Related Projects
- Ramble - Voice transcription (could provide ASR component)
- Artifact Editor - Could power the shared context window