feat: Implement continuous dictation with chunked transcription

- Add ContinuousRecorder class using sounddevice for gapless audio capture
- Record continuously and extract 10-second chunks with 0.5s overlap
- Overlap prevents words from being cut off at chunk boundaries
- Transcription happens in parallel with recording
- Text appears incrementally as chunks complete
- Add sounddevice and numpy as GUI dependencies

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
rob 2026-01-04 03:03:10 -04:00
parent 55aec4c7f2
commit 57efe6c931
3 changed files with 386 additions and 111 deletions

View File

@ -14,3 +14,93 @@ This i just a general brainstorming discussion.
--- ---
*Discussion begins below.* *Discussion begins below.*
---
Name: rob
I'm trying to think think of a good way of documenting development projects.
What program exist? I usually put it all in a markdown file.
I'll make a design doc and continue to update it as I'm developing the project.
Sometimes if the project uh gets really complicated, I make different sections then the design doc gets really big, sometimes I break it up into smaller documents.
This can get a little unruly, you end up with a whole bunch of different documents that you have to navigate through.
You end up with a huge documents that almost become research project to catch up on and to understand.
It's hard to see the entire project all in one place. Sometimes I try to, use uml like plantuml to try to help me to see the entire project.
Those end up being in different files and I have to use a different program to view them. Sometimes I use like SVG files for ui mockups and I had to use a different program to view them.
I find it hard to keep it all together, you end up with documents spread all throughout your project,
some that are for the users to see some of it for the developers, some are for me.
I tried to use things like obsidian, and sometimes I just create entire web pages because you can link different documents but I
haven't found a really good solution where I can view, navigate and edit it all in one solution. what ideas do you have?
---
Name: AI-Architect
Parse error
VOTE: NONE
---
Name: AI-Backend
Backend feedback: Youre describing a doc hub that unifies markdown, diagrams, and media with seamless navigation/editing. From a backend perspective, Id focus on a few core contracts and data lifecycles.
- Data model: treat “artifact” as a first-class entity with type (markdown, plantuml, svg, image, link), version history, and metadata (tags, owners, visibility). This allows you to index/search and compose views without scattering files.
- Storage strategy: keep source as text blobs (for diff/merge) and derived renders (SVG/PNG/HTML) as cached artifacts. Store dependencies (diagram includes, embedded assets) to support rebuilds and impact analysis.
- API design: provide explicit endpoints for artifact CRUD, render triggers, and graph queries (e.g., backlinks, referenced-by). Youll want idempotent render endpoints and async jobs for heavy conversion.
- Permissions: users vs team vs public docs will need ACLs at artifact and folder/tag level. Plan for mixed visibility within a single project.
- Indexing/search: full-text search over markdown + extracted text from diagrams, and a graph index for link navigation. This makes “see the whole project” feasible.
- Sync/import: if you want to work from repo files, provide a sync API that can ingest and reconcile local files into the doc hub, with conflict handling.
- Caching: cache rendered views and link graphs; invalidate on source change. Keep a render queue to avoid spikes.
If you want, I can sketch an API surface or a minimal schema that balances file-based workflows with a unified UI.
VOTE: NONE
---
Name: AI-Designer
The core issue you're describing is an **information architecture problem manifesting as a visual/navigation problem**. Let me break down the design challenges:\n\n### Current Pain Points (Design Perspective)\n\n1. **Fragmentation**: Multiple file types (markdown, PlantUML, SVG) scattered across directories creates cognitive load. Each context switch (switching programs) breaks focus.\n\n2. **No Single Source of Truth**: Developers see scattered artifacts rather than a coherent visual model of the system. This creates ambiguity about what's current/canonical.\n\n3. **Scale Problem**: Large documents become unnavigable. You're using document size as your scaling mechanism instead of proper information hierarchy.\n\n4. **No Visual Cross-Referencing**: Diagrams exist separately from prose. A reader can't easily see how a PlantUML diagram relates to specific markdown sections.\n\n### Design Solution Space\n\n**1. Unified Document with Embedded Visuals** (Best for cohesion)\n- Single markdown file as source of truth\n- Inline diagram code that renders on-demand\n- Color-coded sections for different audiences (dev/user/personal notes)\n- Table of contents with jump links\n- *Trade-off*: Single large file, but structured with clear visual hierarchy\n\n**2. Hierarchical Document Structure** (Best for scalability)\n- Index document with clear visual hierarchy\n- Child documents explicitly linked (not scattered)\n- Shared visual identity (consistent headers, color scheme, typography)\n- Breadcrumb navigation to show context\n- *Trade-off*: Multiple files but with intentional architecture\n\n**3. Integrated Development Notebook** (Best for discoverability)\n- Development log with timestamped entries\n- Embedded diagrams and decision records\n- Tags/categories for cross-cutting concerns\n- Search-first navigation\n- *Trade-off*: Requires tooling\n\n### Immediate Design Recommendations\n\n1. **Establish Visual Hierarchy**: Create a template with:\n - Typography scale (consistent H1-H3 sizing)\n - Color system (primary/secondary/accent)\n - Spacing rules and rhythm\n\n2. **Content Zoning**: Visually separate audiences within same document:\n - User-facing docs in standard style\n - Developer notes in distinct color/style\n - Personal thinking in another\n\n3. **Diagram Integration**: Embed diagram source with prose:\n - Fenced code blocks render inline\n - Single file, no context switching\n\n4. **Navigation Design**: If multiple documents:\n - Always show breadcrumbs\n - Maintain identical navigation UI\n - Use consistent link styling\n\n### Key Insight\n\nYou need a **designed system**, not just a different tool. Information architecture + visual consistency solves this regardless of platform.\n\nDIAGRAM: diagrams/general-discussion_designer_1.puml
DIAGRAM: diagrams/general-discussion_designer_1.puml
VOTE: READY
---
Name: AI-Ux
The user's problem statement highlights a critical UX challenge: the fragmentation of project documentation across various tools and file formats. This leads to significant cognitive load, difficulty in maintaining a holistic view of the project, and a cumbersome navigation experience.
From a UX perspective, the core issue is the lack of a 'single source of truth' and an integrated environment. Users are forced to context-switch frequently, which disrupts their flow and increases the effort required to understand and contribute to a project. The current process is neither intuitive nor efficient.
Key UX pain points identified:
- **High Cognitive Load:** Juggling multiple documents, tools, and mental models for different content types (text, diagrams, UI mockups).
- **Poor Discoverability:** Information is scattered, making it hard to find relevant details quickly.
- **Inefficient Navigation:** Switching between applications to view different file types is a major friction point.
- **Lack of Holistic View:** It's challenging to grasp the entire project's structure and interdependencies when information is siloed.
An ideal solution would prioritize:
1. **Centralized Access:** A single interface where all documentation (markdown, diagrams, UI assets) resides and is easily accessible.
2. **Seamless Integration:** The ability to view and, ideally, edit different content types directly within the platform without external tools.
3. **Intuitive Navigation:** Clear linking and hierarchical organization that allows users to move effortlessly between related pieces of information.
4. **Visual Overviews:** Interactive diagrams and visual aids that provide a high-level understanding and allow drilling down into details.
5. **Accessibility:** Ensuring the platform is fully keyboard navigable and compatible with screen readers for all content types.
6. **Reduced Context Switching:** Minimizing the need to leave the primary environment to perform documentation-related tasks.
DIAGRAM: diagrams/general-discussion_ux_1.puml
DIAGRAM: diagrams/general-discussion_ux_1.puml
VOTE: CHANGES
---
Name: AI-Frontend
From a frontend perspective, this problem screams for a unified documentation interface. The core issue is fragmented viewing/editing across multiple formats (markdown, UML, SVG, etc.). A modern single-page application could solve this with:
1. **Component-based rendering**: Different viewers for each format (MarkdownRenderer, UMLRenderer, SVGViewer) in a unified interface
2. **Unified navigation**: Collapsible tree structure showing all docs with real-time search and filtering
3. **Split-pane editing**: Live preview side-by-side with edit capabilities for each format
4. **Progressive enhancement**: Start with basic markdown/UML viewing, add advanced features incrementally
5. **State management**: Centralized store for document structure, active selections, and edit states
The frontend should focus on performance (lazy loading heavy assets like UML diagrams) and accessibility (keyboard navigation, screen reader support for complex diagrams).
DIAGRAM: diagrams/general-discussion_frontend_1.puml
DIAGRAM: diagrams/general-discussion_frontend_1.puml
VOTE: CHANGES

View File

@ -38,6 +38,8 @@ tui = [
gui = [ gui = [
"dearpygui>=2.0.0", "dearpygui>=2.0.0",
"artifact-editor @ git+https://gitea.brrd.tech/rob/artifact-editor.git", "artifact-editor @ git+https://gitea.brrd.tech/rob/artifact-editor.git",
"sounddevice>=0.4.6",
"numpy>=1.24.0",
] ]
dev = [ dev = [
"pytest>=7.0", "pytest>=7.0",

View File

@ -42,6 +42,203 @@ except ImportError:
get_format_info = None get_format_info = None
class ContinuousRecorder:
"""
Continuous audio recorder with chunked transcription support.
Records audio continuously and extracts overlapping chunks for transcription.
This prevents word cutoff at chunk boundaries.
"""
def __init__(
self,
chunk_duration: float = 10.0,
overlap_duration: float = 0.5,
sample_rate: int = 16000,
on_chunk_ready: Callable[[bytes], None] = None,
on_error: Callable[[str], None] = None
):
"""
Initialize the continuous recorder.
Args:
chunk_duration: Duration of each chunk in seconds
overlap_duration: Overlap between chunks in seconds
sample_rate: Audio sample rate in Hz
on_chunk_ready: Callback when a chunk is ready (receives WAV bytes)
on_error: Callback when an error occurs
"""
self.chunk_duration = chunk_duration
self.overlap_duration = overlap_duration
self.sample_rate = sample_rate
self.on_chunk_ready = on_chunk_ready
self.on_error = on_error
self._recording = False
self._stream = None
self._buffer = []
self._buffer_lock = threading.Lock()
self._chunk_thread = None
self._stop_event = threading.Event()
# Calculate samples
self._chunk_samples = int(chunk_duration * sample_rate)
self._overlap_samples = int(overlap_duration * sample_rate)
def start(self):
"""Start continuous recording."""
if self._recording:
return
try:
import sounddevice as sd
import numpy as np
except ImportError as e:
if self.on_error:
self.on_error(f"sounddevice not installed: {e}")
return
self._recording = True
self._stop_event.clear()
self._buffer = []
def audio_callback(indata, frames, time_info, status):
"""Called by sounddevice for each audio block."""
if status and self.on_error:
self.on_error(f"Audio status: {status}")
with self._buffer_lock:
self._buffer.append(indata.copy())
try:
self._stream = sd.InputStream(
samplerate=self.sample_rate,
channels=1,
dtype='int16',
callback=audio_callback,
blocksize=1024
)
self._stream.start()
# Start chunk extraction thread
self._chunk_thread = threading.Thread(target=self._chunk_loop, daemon=True)
self._chunk_thread.start()
except Exception as e:
self._recording = False
if self.on_error:
self.on_error(f"Failed to start recording: {e}")
def stop(self) -> bytes:
"""
Stop recording and return any remaining audio as WAV bytes.
Returns:
WAV bytes of remaining audio, or empty bytes if none
"""
if not self._recording:
return b''
self._recording = False
self._stop_event.set()
# Stop the stream
if self._stream:
try:
self._stream.stop()
self._stream.close()
except Exception:
pass
self._stream = None
# Wait for chunk thread to finish
if self._chunk_thread:
self._chunk_thread.join(timeout=2)
self._chunk_thread = None
# Get remaining audio
remaining_wav = self._extract_remaining()
return remaining_wav
def _chunk_loop(self):
"""Background thread that extracts and processes chunks."""
import numpy as np
# We need chunk_duration + overlap of audio before we can extract a chunk
min_samples_needed = self._chunk_samples + self._overlap_samples
while not self._stop_event.is_set():
# Wait a bit before checking
self._stop_event.wait(timeout=1.0)
if self._stop_event.is_set():
break
# Check if we have enough audio for a chunk
with self._buffer_lock:
if not self._buffer:
continue
# Concatenate buffer
try:
audio_data = np.concatenate(self._buffer)
except ValueError:
continue
total_samples = len(audio_data)
# Need enough samples for a full chunk plus overlap
if total_samples < min_samples_needed:
continue
# Extract chunk (the first chunk_samples + overlap_samples)
chunk_data = audio_data[:min_samples_needed]
# Keep the overlap portion for next chunk (prevents word cutoff)
# We keep from (chunk_samples) onwards, so next chunk starts with overlap
keep_from = self._chunk_samples
remaining = audio_data[keep_from:]
self._buffer = [remaining] if len(remaining) > 0 else []
# Convert to WAV and send to callback
if self.on_chunk_ready:
wav_bytes = self._to_wav(chunk_data)
self.on_chunk_ready(wav_bytes)
def _extract_remaining(self) -> bytes:
"""Extract any remaining audio from the buffer as WAV bytes."""
import numpy as np
with self._buffer_lock:
if not self._buffer:
return b''
try:
audio_data = np.concatenate(self._buffer)
except ValueError:
return b''
self._buffer = []
if len(audio_data) < self.sample_rate: # Less than 1 second
return b''
return self._to_wav(audio_data)
def _to_wav(self, audio_data) -> bytes:
"""Convert numpy audio data to WAV bytes."""
import io
import wave
buffer = io.BytesIO()
with wave.open(buffer, 'wb') as wf:
wf.setnchannels(1)
wf.setsampwidth(2) # 16-bit
wf.setframerate(self.sample_rate)
wf.writeframes(audio_data.tobytes())
return buffer.getvalue()
class DiscussionGUI: class DiscussionGUI:
""" """
Main GUI application using Dear PyGui. Main GUI application using Dear PyGui.
@ -214,142 +411,127 @@ class DiscussionGUI:
def _dictate_clicked(self): def _dictate_clicked(self):
"""Handle dictate button click - toggle recording.""" """Handle dictate button click - toggle recording."""
if self._dictation_process is not None: if self._continuous_recorder is not None:
# Currently recording - stop and transcribe # Currently recording - stop
self._stop_dictation() self._stop_dictation()
else: else:
# Start recording # Start recording
self._start_dictation() self._start_dictation()
def _start_dictation(self): def _start_dictation(self):
"""Start recording audio from microphone.""" """Start continuous recording with chunked transcription."""
import tempfile # Create recorder with callbacks
import signal self._continuous_recorder = ContinuousRecorder(
chunk_duration=10.0, # 10 second chunks
overlap_duration=0.5, # 0.5 second overlap to prevent word cutoff
sample_rate=16000,
on_chunk_ready=self._on_audio_chunk_ready,
on_error=lambda e: self._show_error(f"Recording error: {e}")
)
# Create temp file for audio self._continuous_recorder.start()
self._dictation_audio_file = tempfile.mktemp(suffix=".wav")
# Start arecord without duration limit (records until interrupted) # Update button to show recording state
try: if dpg.does_item_exist("dictate_btn"):
self._dictation_process = subprocess.Popen( dpg.set_item_label("dictate_btn", "● Recording...")
[
"arecord",
"-D", "default",
"-f", "S16_LE",
"-r", "16000",
"-c", "1",
"-t", "wav",
self._dictation_audio_file
],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE
)
# Update button to show recording state
if dpg.does_item_exist("dictate_btn"):
dpg.set_item_label("dictate_btn", "Stop Recording")
except FileNotFoundError:
self._show_error("arecord not found. Install alsa-utils: sudo apt install alsa-utils")
self._dictation_process = None
self._dictation_audio_file = None
except Exception as e:
self._show_error(f"Failed to start recording: {e}")
self._dictation_process = None
self._dictation_audio_file = None
def _stop_dictation(self): def _stop_dictation(self):
"""Stop recording and transcribe the audio.""" """Stop recording and transcribe any remaining audio."""
import signal if self._continuous_recorder is None:
if self._dictation_process is None:
return return
# Send SIGINT to stop arecord gracefully # Update button
try:
self._dictation_process.send_signal(signal.SIGINT)
self._dictation_process.wait(timeout=2)
except Exception:
# Force kill if needed
try:
self._dictation_process.kill()
self._dictation_process.wait(timeout=1)
except Exception:
pass
self._dictation_process = None
# Update button back to Dictate
if dpg.does_item_exist("dictate_btn"): if dpg.does_item_exist("dictate_btn"):
dpg.set_item_label("dictate_btn", "Transcribing...") dpg.set_item_label("dictate_btn", "Finishing...")
# Stop recording and get remaining audio
remaining_wav = self._continuous_recorder.stop()
self._continuous_recorder = None
# Transcribe remaining audio if any
if remaining_wav:
self._transcribe_chunk(remaining_wav, is_final=True)
else:
# No remaining audio, just reset button
if dpg.does_item_exist("dictate_btn"):
dpg.set_item_label("dictate_btn", "Dictate")
def _on_audio_chunk_ready(self, wav_bytes: bytes):
"""Called when a chunk of audio is ready for transcription."""
# Transcribe in background thread # Transcribe in background thread
audio_file = self._dictation_audio_file self._transcribe_chunk(wav_bytes, is_final=False)
self._dictation_audio_file = None
def transcribe(): def _transcribe_chunk(self, wav_bytes: bytes, is_final: bool = False):
"""Transcribe a chunk of audio and append to comment text."""
def do_transcribe():
transcript = "" transcript = ""
temp_file = None
try: try:
if not audio_file: # Write WAV bytes to temp file
transcript = "[No audio file path]" temp_file = tempfile.mktemp(suffix=".wav")
elif not os.path.exists(audio_file): with open(temp_file, 'wb') as f:
transcript = f"[Audio file not found: {audio_file}]" f.write(wav_bytes)
else:
# Check file size to ensure recording happened
file_size = os.path.getsize(audio_file)
if file_size < 1000: # Less than 1KB likely means no audio
transcript = "[Recording too short or empty]"
else:
transcribe_cmd = os.path.expanduser("~/.local/bin/transcribe")
if not os.path.exists(transcribe_cmd):
transcript = "[transcribe tool not found at ~/.local/bin/transcribe]"
else:
result = subprocess.run(
[transcribe_cmd],
input=audio_file,
capture_output=True,
text=True,
timeout=60
)
transcript = result.stdout.strip()
if result.returncode != 0 and result.stderr:
transcript = f"[Transcribe error: {result.stderr.strip()}]"
elif not transcript:
transcript = "[No speech detected]"
# Clean up audio file # Check file size
file_size = len(wav_bytes)
if file_size < 1000:
transcript = "" # Too short, skip silently
else:
transcribe_cmd = os.path.expanduser("~/.local/bin/transcribe")
if not os.path.exists(transcribe_cmd):
transcript = "[transcribe tool not found]"
else:
result = subprocess.run(
[transcribe_cmd],
input=temp_file,
capture_output=True,
text=True,
timeout=60
)
transcript = result.stdout.strip()
if result.returncode != 0 and result.stderr:
transcript = f"[Error: {result.stderr.strip()[:100]}]"
except subprocess.TimeoutExpired:
transcript = "[Timeout]"
except Exception as e:
transcript = f"[Error: {e}]"
finally:
# Clean up temp file
if temp_file and os.path.exists(temp_file):
try: try:
os.remove(audio_file) os.remove(temp_file)
except Exception: except Exception:
pass pass
except subprocess.TimeoutExpired:
transcript = "[Transcription timed out after 60 seconds]"
except Exception as e:
transcript = f"[Transcription error: {e}]"
# Update UI in main thread # Append result to UI
self._on_transcription_complete(transcript) self._on_transcription_chunk_complete(transcript, is_final)
thread = threading.Thread(target=transcribe, daemon=True) thread = threading.Thread(target=do_transcribe, daemon=True)
thread.start() thread.start()
def _on_transcription_complete(self, transcript: str): def _on_transcription_chunk_complete(self, transcript: str, is_final: bool):
"""Called when transcription completes - append to comment text.""" """Called when a chunk transcription completes - append to comment text."""
# Reset button # Handle final chunk - reset button
if dpg.does_item_exist("dictate_btn"): if is_final:
dpg.set_item_label("dictate_btn", "Dictate") if dpg.does_item_exist("dictate_btn"):
dpg.set_item_label("dictate_btn", "Dictate")
# Skip empty or error transcripts (for non-final chunks, don't show errors)
if not transcript: if not transcript:
self._show_error("Transcription returned empty result") return
elif transcript.startswith("["): if transcript.startswith("[") and not is_final:
# Error message from transcribe tool # Only show errors for final chunk
self._show_error(f"Transcription failed: {transcript}") return
else: if transcript.startswith("["):
# Append to existing comment text self._show_error(f"Transcription: {transcript}")
if dpg.does_item_exist("comment_text"): return
current = dpg.get_value("comment_text") or ""
separator = " " if current.strip() else "" # Append to existing comment text
dpg.set_value("comment_text", current + separator + transcript) if dpg.does_item_exist("comment_text"):
current = dpg.get_value("comment_text") or ""
separator = " " if current.strip() else ""
dpg.set_value("comment_text", current + separator + transcript)
def __init__(self, discussions_dir: Path = None): def __init__(self, discussions_dir: Path = None):
if discussions_dir is None: if discussions_dir is None:
@ -369,9 +551,10 @@ class DiscussionGUI:
self._reading_button_tag: Optional[str] = None self._reading_button_tag: Optional[str] = None
self._comment_counter = 0 # For generating unique button tags self._comment_counter = 0 # For generating unique button tags
# Dictation state # Dictation state - continuous recording with chunked transcription
self._dictation_process: Optional[subprocess.Popen] = None self._continuous_recorder: Optional[ContinuousRecorder] = None
self._dictation_audio_file: Optional[str] = None self._transcription_queue = [] # Queue of transcription results
self._transcription_lock = threading.Lock()
# Initialize Dear PyGui # Initialize Dear PyGui
dpg.create_context() dpg.create_context()