feat: Implement continuous dictation with chunked transcription
- Add ContinuousRecorder class using sounddevice for gapless audio capture - Record continuously and extract 10-second chunks with 0.5s overlap - Overlap prevents words from being cut off at chunk boundaries - Transcription happens in parallel with recording - Text appears incrementally as chunks complete - Add sounddevice and numpy as GUI dependencies 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
55aec4c7f2
commit
57efe6c931
|
|
@ -14,3 +14,93 @@ This i just a general brainstorming discussion.
|
||||||
---
|
---
|
||||||
|
|
||||||
*Discussion begins below.*
|
*Discussion begins below.*
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
Name: rob
|
||||||
|
I'm trying to think think of a good way of documenting development projects.
|
||||||
|
What program exist? I usually put it all in a markdown file.
|
||||||
|
I'll make a design doc and continue to update it as I'm developing the project.
|
||||||
|
Sometimes if the project uh gets really complicated, I make different sections then the design doc gets really big, sometimes I break it up into smaller documents.
|
||||||
|
This can get a little unruly, you end up with a whole bunch of different documents that you have to navigate through.
|
||||||
|
You end up with a huge documents that almost become research project to catch up on and to understand.
|
||||||
|
It's hard to see the entire project all in one place. Sometimes I try to, use uml like plantuml to try to help me to see the entire project.
|
||||||
|
Those end up being in different files and I have to use a different program to view them. Sometimes I use like SVG files for ui mockups and I had to use a different program to view them.
|
||||||
|
I find it hard to keep it all together, you end up with documents spread all throughout your project,
|
||||||
|
some that are for the users to see some of it for the developers, some are for me.
|
||||||
|
I tried to use things like obsidian, and sometimes I just create entire web pages because you can link different documents but I
|
||||||
|
haven't found a really good solution where I can view, navigate and edit it all in one solution. what ideas do you have?
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
Name: AI-Architect
|
||||||
|
Parse error
|
||||||
|
VOTE: NONE
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
Name: AI-Backend
|
||||||
|
Backend feedback: You’re describing a doc hub that unifies markdown, diagrams, and media with seamless navigation/editing. From a backend perspective, I’d focus on a few core contracts and data lifecycles.
|
||||||
|
|
||||||
|
- Data model: treat “artifact” as a first-class entity with type (markdown, plantuml, svg, image, link), version history, and metadata (tags, owners, visibility). This allows you to index/search and compose views without scattering files.
|
||||||
|
- Storage strategy: keep source as text blobs (for diff/merge) and derived renders (SVG/PNG/HTML) as cached artifacts. Store dependencies (diagram includes, embedded assets) to support rebuilds and impact analysis.
|
||||||
|
- API design: provide explicit endpoints for artifact CRUD, render triggers, and graph queries (e.g., backlinks, referenced-by). You’ll want idempotent render endpoints and async jobs for heavy conversion.
|
||||||
|
- Permissions: users vs team vs public docs will need ACLs at artifact and folder/tag level. Plan for mixed visibility within a single project.
|
||||||
|
- Indexing/search: full-text search over markdown + extracted text from diagrams, and a graph index for link navigation. This makes “see the whole project” feasible.
|
||||||
|
- Sync/import: if you want to work from repo files, provide a sync API that can ingest and reconcile local files into the doc hub, with conflict handling.
|
||||||
|
- Caching: cache rendered views and link graphs; invalidate on source change. Keep a render queue to avoid spikes.
|
||||||
|
|
||||||
|
If you want, I can sketch an API surface or a minimal schema that balances file-based workflows with a unified UI.
|
||||||
|
VOTE: NONE
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
Name: AI-Designer
|
||||||
|
The core issue you're describing is an **information architecture problem manifesting as a visual/navigation problem**. Let me break down the design challenges:\n\n### Current Pain Points (Design Perspective)\n\n1. **Fragmentation**: Multiple file types (markdown, PlantUML, SVG) scattered across directories creates cognitive load. Each context switch (switching programs) breaks focus.\n\n2. **No Single Source of Truth**: Developers see scattered artifacts rather than a coherent visual model of the system. This creates ambiguity about what's current/canonical.\n\n3. **Scale Problem**: Large documents become unnavigable. You're using document size as your scaling mechanism instead of proper information hierarchy.\n\n4. **No Visual Cross-Referencing**: Diagrams exist separately from prose. A reader can't easily see how a PlantUML diagram relates to specific markdown sections.\n\n### Design Solution Space\n\n**1. Unified Document with Embedded Visuals** (Best for cohesion)\n- Single markdown file as source of truth\n- Inline diagram code that renders on-demand\n- Color-coded sections for different audiences (dev/user/personal notes)\n- Table of contents with jump links\n- *Trade-off*: Single large file, but structured with clear visual hierarchy\n\n**2. Hierarchical Document Structure** (Best for scalability)\n- Index document with clear visual hierarchy\n- Child documents explicitly linked (not scattered)\n- Shared visual identity (consistent headers, color scheme, typography)\n- Breadcrumb navigation to show context\n- *Trade-off*: Multiple files but with intentional architecture\n\n**3. Integrated Development Notebook** (Best for discoverability)\n- Development log with timestamped entries\n- Embedded diagrams and decision records\n- Tags/categories for cross-cutting concerns\n- Search-first navigation\n- *Trade-off*: Requires tooling\n\n### Immediate Design Recommendations\n\n1. **Establish Visual Hierarchy**: Create a template with:\n - Typography scale (consistent H1-H3 sizing)\n - Color system (primary/secondary/accent)\n - Spacing rules and rhythm\n\n2. **Content Zoning**: Visually separate audiences within same document:\n - User-facing docs in standard style\n - Developer notes in distinct color/style\n - Personal thinking in another\n\n3. **Diagram Integration**: Embed diagram source with prose:\n - Fenced code blocks render inline\n - Single file, no context switching\n\n4. **Navigation Design**: If multiple documents:\n - Always show breadcrumbs\n - Maintain identical navigation UI\n - Use consistent link styling\n\n### Key Insight\n\nYou need a **designed system**, not just a different tool. Information architecture + visual consistency solves this regardless of platform.\n\nDIAGRAM: diagrams/general-discussion_designer_1.puml
|
||||||
|
|
||||||
|
DIAGRAM: diagrams/general-discussion_designer_1.puml
|
||||||
|
VOTE: READY
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
Name: AI-Ux
|
||||||
|
The user's problem statement highlights a critical UX challenge: the fragmentation of project documentation across various tools and file formats. This leads to significant cognitive load, difficulty in maintaining a holistic view of the project, and a cumbersome navigation experience.
|
||||||
|
|
||||||
|
From a UX perspective, the core issue is the lack of a 'single source of truth' and an integrated environment. Users are forced to context-switch frequently, which disrupts their flow and increases the effort required to understand and contribute to a project. The current process is neither intuitive nor efficient.
|
||||||
|
|
||||||
|
Key UX pain points identified:
|
||||||
|
- **High Cognitive Load:** Juggling multiple documents, tools, and mental models for different content types (text, diagrams, UI mockups).
|
||||||
|
- **Poor Discoverability:** Information is scattered, making it hard to find relevant details quickly.
|
||||||
|
- **Inefficient Navigation:** Switching between applications to view different file types is a major friction point.
|
||||||
|
- **Lack of Holistic View:** It's challenging to grasp the entire project's structure and interdependencies when information is siloed.
|
||||||
|
|
||||||
|
An ideal solution would prioritize:
|
||||||
|
1. **Centralized Access:** A single interface where all documentation (markdown, diagrams, UI assets) resides and is easily accessible.
|
||||||
|
2. **Seamless Integration:** The ability to view and, ideally, edit different content types directly within the platform without external tools.
|
||||||
|
3. **Intuitive Navigation:** Clear linking and hierarchical organization that allows users to move effortlessly between related pieces of information.
|
||||||
|
4. **Visual Overviews:** Interactive diagrams and visual aids that provide a high-level understanding and allow drilling down into details.
|
||||||
|
5. **Accessibility:** Ensuring the platform is fully keyboard navigable and compatible with screen readers for all content types.
|
||||||
|
6. **Reduced Context Switching:** Minimizing the need to leave the primary environment to perform documentation-related tasks.
|
||||||
|
|
||||||
|
DIAGRAM: diagrams/general-discussion_ux_1.puml
|
||||||
|
|
||||||
|
DIAGRAM: diagrams/general-discussion_ux_1.puml
|
||||||
|
VOTE: CHANGES
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
Name: AI-Frontend
|
||||||
|
From a frontend perspective, this problem screams for a unified documentation interface. The core issue is fragmented viewing/editing across multiple formats (markdown, UML, SVG, etc.). A modern single-page application could solve this with:
|
||||||
|
|
||||||
|
1. **Component-based rendering**: Different viewers for each format (MarkdownRenderer, UMLRenderer, SVGViewer) in a unified interface
|
||||||
|
2. **Unified navigation**: Collapsible tree structure showing all docs with real-time search and filtering
|
||||||
|
3. **Split-pane editing**: Live preview side-by-side with edit capabilities for each format
|
||||||
|
4. **Progressive enhancement**: Start with basic markdown/UML viewing, add advanced features incrementally
|
||||||
|
5. **State management**: Centralized store for document structure, active selections, and edit states
|
||||||
|
|
||||||
|
The frontend should focus on performance (lazy loading heavy assets like UML diagrams) and accessibility (keyboard navigation, screen reader support for complex diagrams).
|
||||||
|
|
||||||
|
DIAGRAM: diagrams/general-discussion_frontend_1.puml
|
||||||
|
|
||||||
|
DIAGRAM: diagrams/general-discussion_frontend_1.puml
|
||||||
|
VOTE: CHANGES
|
||||||
|
|
|
||||||
|
|
@ -38,6 +38,8 @@ tui = [
|
||||||
gui = [
|
gui = [
|
||||||
"dearpygui>=2.0.0",
|
"dearpygui>=2.0.0",
|
||||||
"artifact-editor @ git+https://gitea.brrd.tech/rob/artifact-editor.git",
|
"artifact-editor @ git+https://gitea.brrd.tech/rob/artifact-editor.git",
|
||||||
|
"sounddevice>=0.4.6",
|
||||||
|
"numpy>=1.24.0",
|
||||||
]
|
]
|
||||||
dev = [
|
dev = [
|
||||||
"pytest>=7.0",
|
"pytest>=7.0",
|
||||||
|
|
|
||||||
|
|
@ -42,6 +42,203 @@ except ImportError:
|
||||||
get_format_info = None
|
get_format_info = None
|
||||||
|
|
||||||
|
|
||||||
|
class ContinuousRecorder:
|
||||||
|
"""
|
||||||
|
Continuous audio recorder with chunked transcription support.
|
||||||
|
|
||||||
|
Records audio continuously and extracts overlapping chunks for transcription.
|
||||||
|
This prevents word cutoff at chunk boundaries.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
chunk_duration: float = 10.0,
|
||||||
|
overlap_duration: float = 0.5,
|
||||||
|
sample_rate: int = 16000,
|
||||||
|
on_chunk_ready: Callable[[bytes], None] = None,
|
||||||
|
on_error: Callable[[str], None] = None
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Initialize the continuous recorder.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
chunk_duration: Duration of each chunk in seconds
|
||||||
|
overlap_duration: Overlap between chunks in seconds
|
||||||
|
sample_rate: Audio sample rate in Hz
|
||||||
|
on_chunk_ready: Callback when a chunk is ready (receives WAV bytes)
|
||||||
|
on_error: Callback when an error occurs
|
||||||
|
"""
|
||||||
|
self.chunk_duration = chunk_duration
|
||||||
|
self.overlap_duration = overlap_duration
|
||||||
|
self.sample_rate = sample_rate
|
||||||
|
self.on_chunk_ready = on_chunk_ready
|
||||||
|
self.on_error = on_error
|
||||||
|
|
||||||
|
self._recording = False
|
||||||
|
self._stream = None
|
||||||
|
self._buffer = []
|
||||||
|
self._buffer_lock = threading.Lock()
|
||||||
|
self._chunk_thread = None
|
||||||
|
self._stop_event = threading.Event()
|
||||||
|
|
||||||
|
# Calculate samples
|
||||||
|
self._chunk_samples = int(chunk_duration * sample_rate)
|
||||||
|
self._overlap_samples = int(overlap_duration * sample_rate)
|
||||||
|
|
||||||
|
def start(self):
|
||||||
|
"""Start continuous recording."""
|
||||||
|
if self._recording:
|
||||||
|
return
|
||||||
|
|
||||||
|
try:
|
||||||
|
import sounddevice as sd
|
||||||
|
import numpy as np
|
||||||
|
except ImportError as e:
|
||||||
|
if self.on_error:
|
||||||
|
self.on_error(f"sounddevice not installed: {e}")
|
||||||
|
return
|
||||||
|
|
||||||
|
self._recording = True
|
||||||
|
self._stop_event.clear()
|
||||||
|
self._buffer = []
|
||||||
|
|
||||||
|
def audio_callback(indata, frames, time_info, status):
|
||||||
|
"""Called by sounddevice for each audio block."""
|
||||||
|
if status and self.on_error:
|
||||||
|
self.on_error(f"Audio status: {status}")
|
||||||
|
with self._buffer_lock:
|
||||||
|
self._buffer.append(indata.copy())
|
||||||
|
|
||||||
|
try:
|
||||||
|
self._stream = sd.InputStream(
|
||||||
|
samplerate=self.sample_rate,
|
||||||
|
channels=1,
|
||||||
|
dtype='int16',
|
||||||
|
callback=audio_callback,
|
||||||
|
blocksize=1024
|
||||||
|
)
|
||||||
|
self._stream.start()
|
||||||
|
|
||||||
|
# Start chunk extraction thread
|
||||||
|
self._chunk_thread = threading.Thread(target=self._chunk_loop, daemon=True)
|
||||||
|
self._chunk_thread.start()
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self._recording = False
|
||||||
|
if self.on_error:
|
||||||
|
self.on_error(f"Failed to start recording: {e}")
|
||||||
|
|
||||||
|
def stop(self) -> bytes:
|
||||||
|
"""
|
||||||
|
Stop recording and return any remaining audio as WAV bytes.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
WAV bytes of remaining audio, or empty bytes if none
|
||||||
|
"""
|
||||||
|
if not self._recording:
|
||||||
|
return b''
|
||||||
|
|
||||||
|
self._recording = False
|
||||||
|
self._stop_event.set()
|
||||||
|
|
||||||
|
# Stop the stream
|
||||||
|
if self._stream:
|
||||||
|
try:
|
||||||
|
self._stream.stop()
|
||||||
|
self._stream.close()
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
self._stream = None
|
||||||
|
|
||||||
|
# Wait for chunk thread to finish
|
||||||
|
if self._chunk_thread:
|
||||||
|
self._chunk_thread.join(timeout=2)
|
||||||
|
self._chunk_thread = None
|
||||||
|
|
||||||
|
# Get remaining audio
|
||||||
|
remaining_wav = self._extract_remaining()
|
||||||
|
return remaining_wav
|
||||||
|
|
||||||
|
def _chunk_loop(self):
|
||||||
|
"""Background thread that extracts and processes chunks."""
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
# We need chunk_duration + overlap of audio before we can extract a chunk
|
||||||
|
min_samples_needed = self._chunk_samples + self._overlap_samples
|
||||||
|
|
||||||
|
while not self._stop_event.is_set():
|
||||||
|
# Wait a bit before checking
|
||||||
|
self._stop_event.wait(timeout=1.0)
|
||||||
|
|
||||||
|
if self._stop_event.is_set():
|
||||||
|
break
|
||||||
|
|
||||||
|
# Check if we have enough audio for a chunk
|
||||||
|
with self._buffer_lock:
|
||||||
|
if not self._buffer:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Concatenate buffer
|
||||||
|
try:
|
||||||
|
audio_data = np.concatenate(self._buffer)
|
||||||
|
except ValueError:
|
||||||
|
continue
|
||||||
|
|
||||||
|
total_samples = len(audio_data)
|
||||||
|
|
||||||
|
# Need enough samples for a full chunk plus overlap
|
||||||
|
if total_samples < min_samples_needed:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Extract chunk (the first chunk_samples + overlap_samples)
|
||||||
|
chunk_data = audio_data[:min_samples_needed]
|
||||||
|
|
||||||
|
# Keep the overlap portion for next chunk (prevents word cutoff)
|
||||||
|
# We keep from (chunk_samples) onwards, so next chunk starts with overlap
|
||||||
|
keep_from = self._chunk_samples
|
||||||
|
remaining = audio_data[keep_from:]
|
||||||
|
self._buffer = [remaining] if len(remaining) > 0 else []
|
||||||
|
|
||||||
|
# Convert to WAV and send to callback
|
||||||
|
if self.on_chunk_ready:
|
||||||
|
wav_bytes = self._to_wav(chunk_data)
|
||||||
|
self.on_chunk_ready(wav_bytes)
|
||||||
|
|
||||||
|
def _extract_remaining(self) -> bytes:
|
||||||
|
"""Extract any remaining audio from the buffer as WAV bytes."""
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
with self._buffer_lock:
|
||||||
|
if not self._buffer:
|
||||||
|
return b''
|
||||||
|
|
||||||
|
try:
|
||||||
|
audio_data = np.concatenate(self._buffer)
|
||||||
|
except ValueError:
|
||||||
|
return b''
|
||||||
|
|
||||||
|
self._buffer = []
|
||||||
|
|
||||||
|
if len(audio_data) < self.sample_rate: # Less than 1 second
|
||||||
|
return b''
|
||||||
|
|
||||||
|
return self._to_wav(audio_data)
|
||||||
|
|
||||||
|
def _to_wav(self, audio_data) -> bytes:
|
||||||
|
"""Convert numpy audio data to WAV bytes."""
|
||||||
|
import io
|
||||||
|
import wave
|
||||||
|
|
||||||
|
buffer = io.BytesIO()
|
||||||
|
with wave.open(buffer, 'wb') as wf:
|
||||||
|
wf.setnchannels(1)
|
||||||
|
wf.setsampwidth(2) # 16-bit
|
||||||
|
wf.setframerate(self.sample_rate)
|
||||||
|
wf.writeframes(audio_data.tobytes())
|
||||||
|
|
||||||
|
return buffer.getvalue()
|
||||||
|
|
||||||
|
|
||||||
class DiscussionGUI:
|
class DiscussionGUI:
|
||||||
"""
|
"""
|
||||||
Main GUI application using Dear PyGui.
|
Main GUI application using Dear PyGui.
|
||||||
|
|
@ -214,142 +411,127 @@ class DiscussionGUI:
|
||||||
|
|
||||||
def _dictate_clicked(self):
|
def _dictate_clicked(self):
|
||||||
"""Handle dictate button click - toggle recording."""
|
"""Handle dictate button click - toggle recording."""
|
||||||
if self._dictation_process is not None:
|
if self._continuous_recorder is not None:
|
||||||
# Currently recording - stop and transcribe
|
# Currently recording - stop
|
||||||
self._stop_dictation()
|
self._stop_dictation()
|
||||||
else:
|
else:
|
||||||
# Start recording
|
# Start recording
|
||||||
self._start_dictation()
|
self._start_dictation()
|
||||||
|
|
||||||
def _start_dictation(self):
|
def _start_dictation(self):
|
||||||
"""Start recording audio from microphone."""
|
"""Start continuous recording with chunked transcription."""
|
||||||
import tempfile
|
# Create recorder with callbacks
|
||||||
import signal
|
self._continuous_recorder = ContinuousRecorder(
|
||||||
|
chunk_duration=10.0, # 10 second chunks
|
||||||
|
overlap_duration=0.5, # 0.5 second overlap to prevent word cutoff
|
||||||
|
sample_rate=16000,
|
||||||
|
on_chunk_ready=self._on_audio_chunk_ready,
|
||||||
|
on_error=lambda e: self._show_error(f"Recording error: {e}")
|
||||||
|
)
|
||||||
|
|
||||||
# Create temp file for audio
|
self._continuous_recorder.start()
|
||||||
self._dictation_audio_file = tempfile.mktemp(suffix=".wav")
|
|
||||||
|
|
||||||
# Start arecord without duration limit (records until interrupted)
|
# Update button to show recording state
|
||||||
try:
|
if dpg.does_item_exist("dictate_btn"):
|
||||||
self._dictation_process = subprocess.Popen(
|
dpg.set_item_label("dictate_btn", "● Recording...")
|
||||||
[
|
|
||||||
"arecord",
|
|
||||||
"-D", "default",
|
|
||||||
"-f", "S16_LE",
|
|
||||||
"-r", "16000",
|
|
||||||
"-c", "1",
|
|
||||||
"-t", "wav",
|
|
||||||
self._dictation_audio_file
|
|
||||||
],
|
|
||||||
stdout=subprocess.PIPE,
|
|
||||||
stderr=subprocess.PIPE
|
|
||||||
)
|
|
||||||
|
|
||||||
# Update button to show recording state
|
|
||||||
if dpg.does_item_exist("dictate_btn"):
|
|
||||||
dpg.set_item_label("dictate_btn", "Stop Recording")
|
|
||||||
|
|
||||||
except FileNotFoundError:
|
|
||||||
self._show_error("arecord not found. Install alsa-utils: sudo apt install alsa-utils")
|
|
||||||
self._dictation_process = None
|
|
||||||
self._dictation_audio_file = None
|
|
||||||
except Exception as e:
|
|
||||||
self._show_error(f"Failed to start recording: {e}")
|
|
||||||
self._dictation_process = None
|
|
||||||
self._dictation_audio_file = None
|
|
||||||
|
|
||||||
def _stop_dictation(self):
|
def _stop_dictation(self):
|
||||||
"""Stop recording and transcribe the audio."""
|
"""Stop recording and transcribe any remaining audio."""
|
||||||
import signal
|
if self._continuous_recorder is None:
|
||||||
|
|
||||||
if self._dictation_process is None:
|
|
||||||
return
|
return
|
||||||
|
|
||||||
# Send SIGINT to stop arecord gracefully
|
# Update button
|
||||||
try:
|
|
||||||
self._dictation_process.send_signal(signal.SIGINT)
|
|
||||||
self._dictation_process.wait(timeout=2)
|
|
||||||
except Exception:
|
|
||||||
# Force kill if needed
|
|
||||||
try:
|
|
||||||
self._dictation_process.kill()
|
|
||||||
self._dictation_process.wait(timeout=1)
|
|
||||||
except Exception:
|
|
||||||
pass
|
|
||||||
|
|
||||||
self._dictation_process = None
|
|
||||||
|
|
||||||
# Update button back to Dictate
|
|
||||||
if dpg.does_item_exist("dictate_btn"):
|
if dpg.does_item_exist("dictate_btn"):
|
||||||
dpg.set_item_label("dictate_btn", "Transcribing...")
|
dpg.set_item_label("dictate_btn", "Finishing...")
|
||||||
|
|
||||||
|
# Stop recording and get remaining audio
|
||||||
|
remaining_wav = self._continuous_recorder.stop()
|
||||||
|
self._continuous_recorder = None
|
||||||
|
|
||||||
|
# Transcribe remaining audio if any
|
||||||
|
if remaining_wav:
|
||||||
|
self._transcribe_chunk(remaining_wav, is_final=True)
|
||||||
|
else:
|
||||||
|
# No remaining audio, just reset button
|
||||||
|
if dpg.does_item_exist("dictate_btn"):
|
||||||
|
dpg.set_item_label("dictate_btn", "Dictate")
|
||||||
|
|
||||||
|
def _on_audio_chunk_ready(self, wav_bytes: bytes):
|
||||||
|
"""Called when a chunk of audio is ready for transcription."""
|
||||||
# Transcribe in background thread
|
# Transcribe in background thread
|
||||||
audio_file = self._dictation_audio_file
|
self._transcribe_chunk(wav_bytes, is_final=False)
|
||||||
self._dictation_audio_file = None
|
|
||||||
|
|
||||||
def transcribe():
|
def _transcribe_chunk(self, wav_bytes: bytes, is_final: bool = False):
|
||||||
|
"""Transcribe a chunk of audio and append to comment text."""
|
||||||
|
def do_transcribe():
|
||||||
transcript = ""
|
transcript = ""
|
||||||
|
temp_file = None
|
||||||
try:
|
try:
|
||||||
if not audio_file:
|
# Write WAV bytes to temp file
|
||||||
transcript = "[No audio file path]"
|
temp_file = tempfile.mktemp(suffix=".wav")
|
||||||
elif not os.path.exists(audio_file):
|
with open(temp_file, 'wb') as f:
|
||||||
transcript = f"[Audio file not found: {audio_file}]"
|
f.write(wav_bytes)
|
||||||
else:
|
|
||||||
# Check file size to ensure recording happened
|
|
||||||
file_size = os.path.getsize(audio_file)
|
|
||||||
if file_size < 1000: # Less than 1KB likely means no audio
|
|
||||||
transcript = "[Recording too short or empty]"
|
|
||||||
else:
|
|
||||||
transcribe_cmd = os.path.expanduser("~/.local/bin/transcribe")
|
|
||||||
if not os.path.exists(transcribe_cmd):
|
|
||||||
transcript = "[transcribe tool not found at ~/.local/bin/transcribe]"
|
|
||||||
else:
|
|
||||||
result = subprocess.run(
|
|
||||||
[transcribe_cmd],
|
|
||||||
input=audio_file,
|
|
||||||
capture_output=True,
|
|
||||||
text=True,
|
|
||||||
timeout=60
|
|
||||||
)
|
|
||||||
transcript = result.stdout.strip()
|
|
||||||
if result.returncode != 0 and result.stderr:
|
|
||||||
transcript = f"[Transcribe error: {result.stderr.strip()}]"
|
|
||||||
elif not transcript:
|
|
||||||
transcript = "[No speech detected]"
|
|
||||||
|
|
||||||
# Clean up audio file
|
# Check file size
|
||||||
|
file_size = len(wav_bytes)
|
||||||
|
if file_size < 1000:
|
||||||
|
transcript = "" # Too short, skip silently
|
||||||
|
else:
|
||||||
|
transcribe_cmd = os.path.expanduser("~/.local/bin/transcribe")
|
||||||
|
if not os.path.exists(transcribe_cmd):
|
||||||
|
transcript = "[transcribe tool not found]"
|
||||||
|
else:
|
||||||
|
result = subprocess.run(
|
||||||
|
[transcribe_cmd],
|
||||||
|
input=temp_file,
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=60
|
||||||
|
)
|
||||||
|
transcript = result.stdout.strip()
|
||||||
|
if result.returncode != 0 and result.stderr:
|
||||||
|
transcript = f"[Error: {result.stderr.strip()[:100]}]"
|
||||||
|
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
transcript = "[Timeout]"
|
||||||
|
except Exception as e:
|
||||||
|
transcript = f"[Error: {e}]"
|
||||||
|
finally:
|
||||||
|
# Clean up temp file
|
||||||
|
if temp_file and os.path.exists(temp_file):
|
||||||
try:
|
try:
|
||||||
os.remove(audio_file)
|
os.remove(temp_file)
|
||||||
except Exception:
|
except Exception:
|
||||||
pass
|
pass
|
||||||
except subprocess.TimeoutExpired:
|
|
||||||
transcript = "[Transcription timed out after 60 seconds]"
|
|
||||||
except Exception as e:
|
|
||||||
transcript = f"[Transcription error: {e}]"
|
|
||||||
|
|
||||||
# Update UI in main thread
|
# Append result to UI
|
||||||
self._on_transcription_complete(transcript)
|
self._on_transcription_chunk_complete(transcript, is_final)
|
||||||
|
|
||||||
thread = threading.Thread(target=transcribe, daemon=True)
|
thread = threading.Thread(target=do_transcribe, daemon=True)
|
||||||
thread.start()
|
thread.start()
|
||||||
|
|
||||||
def _on_transcription_complete(self, transcript: str):
|
def _on_transcription_chunk_complete(self, transcript: str, is_final: bool):
|
||||||
"""Called when transcription completes - append to comment text."""
|
"""Called when a chunk transcription completes - append to comment text."""
|
||||||
# Reset button
|
# Handle final chunk - reset button
|
||||||
if dpg.does_item_exist("dictate_btn"):
|
if is_final:
|
||||||
dpg.set_item_label("dictate_btn", "Dictate")
|
if dpg.does_item_exist("dictate_btn"):
|
||||||
|
dpg.set_item_label("dictate_btn", "Dictate")
|
||||||
|
|
||||||
|
# Skip empty or error transcripts (for non-final chunks, don't show errors)
|
||||||
if not transcript:
|
if not transcript:
|
||||||
self._show_error("Transcription returned empty result")
|
return
|
||||||
elif transcript.startswith("["):
|
if transcript.startswith("[") and not is_final:
|
||||||
# Error message from transcribe tool
|
# Only show errors for final chunk
|
||||||
self._show_error(f"Transcription failed: {transcript}")
|
return
|
||||||
else:
|
if transcript.startswith("["):
|
||||||
# Append to existing comment text
|
self._show_error(f"Transcription: {transcript}")
|
||||||
if dpg.does_item_exist("comment_text"):
|
return
|
||||||
current = dpg.get_value("comment_text") or ""
|
|
||||||
separator = " " if current.strip() else ""
|
# Append to existing comment text
|
||||||
dpg.set_value("comment_text", current + separator + transcript)
|
if dpg.does_item_exist("comment_text"):
|
||||||
|
current = dpg.get_value("comment_text") or ""
|
||||||
|
separator = " " if current.strip() else ""
|
||||||
|
dpg.set_value("comment_text", current + separator + transcript)
|
||||||
|
|
||||||
def __init__(self, discussions_dir: Path = None):
|
def __init__(self, discussions_dir: Path = None):
|
||||||
if discussions_dir is None:
|
if discussions_dir is None:
|
||||||
|
|
@ -369,9 +551,10 @@ class DiscussionGUI:
|
||||||
self._reading_button_tag: Optional[str] = None
|
self._reading_button_tag: Optional[str] = None
|
||||||
self._comment_counter = 0 # For generating unique button tags
|
self._comment_counter = 0 # For generating unique button tags
|
||||||
|
|
||||||
# Dictation state
|
# Dictation state - continuous recording with chunked transcription
|
||||||
self._dictation_process: Optional[subprocess.Popen] = None
|
self._continuous_recorder: Optional[ContinuousRecorder] = None
|
||||||
self._dictation_audio_file: Optional[str] = None
|
self._transcription_queue = [] # Queue of transcription results
|
||||||
|
self._transcription_lock = threading.Lock()
|
||||||
|
|
||||||
# Initialize Dear PyGui
|
# Initialize Dear PyGui
|
||||||
dpg.create_context()
|
dpg.create_context()
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue