Compare commits

...

2 Commits

Author SHA1 Message Date
rob a2d6cb6f3c feat: Add dual-mode dictation (walkie-talkie + continuous)
- Press and hold: walkie-talkie mode, records while held, transcribes on release
- Double-click: continuous mode (existing behavior), click again to stop
- Uses Dear PyGui item handlers for mouse down/up detection
- Visual feedback shows current mode on button label

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 03:13:41 -04:00
rob 57efe6c931 feat: Implement continuous dictation with chunked transcription
- Add ContinuousRecorder class using sounddevice for gapless audio capture
- Record continuously and extract 10-second chunks with 0.5s overlap
- Overlap prevents words from being cut off at chunk boundaries
- Transcription happens in parallel with recording
- Text appears incrementally as chunks complete
- Add sounddevice and numpy as GUI dependencies

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 03:03:10 -04:00
3 changed files with 480 additions and 123 deletions

View File

@ -14,3 +14,93 @@ This i just a general brainstorming discussion.
--- ---
*Discussion begins below.* *Discussion begins below.*
---
Name: rob
I'm trying to think think of a good way of documenting development projects.
What program exist? I usually put it all in a markdown file.
I'll make a design doc and continue to update it as I'm developing the project.
Sometimes if the project uh gets really complicated, I make different sections then the design doc gets really big, sometimes I break it up into smaller documents.
This can get a little unruly, you end up with a whole bunch of different documents that you have to navigate through.
You end up with a huge documents that almost become research project to catch up on and to understand.
It's hard to see the entire project all in one place. Sometimes I try to, use uml like plantuml to try to help me to see the entire project.
Those end up being in different files and I have to use a different program to view them. Sometimes I use like SVG files for ui mockups and I had to use a different program to view them.
I find it hard to keep it all together, you end up with documents spread all throughout your project,
some that are for the users to see some of it for the developers, some are for me.
I tried to use things like obsidian, and sometimes I just create entire web pages because you can link different documents but I
haven't found a really good solution where I can view, navigate and edit it all in one solution. what ideas do you have?
---
Name: AI-Architect
Parse error
VOTE: NONE
---
Name: AI-Backend
Backend feedback: Youre describing a doc hub that unifies markdown, diagrams, and media with seamless navigation/editing. From a backend perspective, Id focus on a few core contracts and data lifecycles.
- Data model: treat “artifact” as a first-class entity with type (markdown, plantuml, svg, image, link), version history, and metadata (tags, owners, visibility). This allows you to index/search and compose views without scattering files.
- Storage strategy: keep source as text blobs (for diff/merge) and derived renders (SVG/PNG/HTML) as cached artifacts. Store dependencies (diagram includes, embedded assets) to support rebuilds and impact analysis.
- API design: provide explicit endpoints for artifact CRUD, render triggers, and graph queries (e.g., backlinks, referenced-by). Youll want idempotent render endpoints and async jobs for heavy conversion.
- Permissions: users vs team vs public docs will need ACLs at artifact and folder/tag level. Plan for mixed visibility within a single project.
- Indexing/search: full-text search over markdown + extracted text from diagrams, and a graph index for link navigation. This makes “see the whole project” feasible.
- Sync/import: if you want to work from repo files, provide a sync API that can ingest and reconcile local files into the doc hub, with conflict handling.
- Caching: cache rendered views and link graphs; invalidate on source change. Keep a render queue to avoid spikes.
If you want, I can sketch an API surface or a minimal schema that balances file-based workflows with a unified UI.
VOTE: NONE
---
Name: AI-Designer
The core issue you're describing is an **information architecture problem manifesting as a visual/navigation problem**. Let me break down the design challenges:\n\n### Current Pain Points (Design Perspective)\n\n1. **Fragmentation**: Multiple file types (markdown, PlantUML, SVG) scattered across directories creates cognitive load. Each context switch (switching programs) breaks focus.\n\n2. **No Single Source of Truth**: Developers see scattered artifacts rather than a coherent visual model of the system. This creates ambiguity about what's current/canonical.\n\n3. **Scale Problem**: Large documents become unnavigable. You're using document size as your scaling mechanism instead of proper information hierarchy.\n\n4. **No Visual Cross-Referencing**: Diagrams exist separately from prose. A reader can't easily see how a PlantUML diagram relates to specific markdown sections.\n\n### Design Solution Space\n\n**1. Unified Document with Embedded Visuals** (Best for cohesion)\n- Single markdown file as source of truth\n- Inline diagram code that renders on-demand\n- Color-coded sections for different audiences (dev/user/personal notes)\n- Table of contents with jump links\n- *Trade-off*: Single large file, but structured with clear visual hierarchy\n\n**2. Hierarchical Document Structure** (Best for scalability)\n- Index document with clear visual hierarchy\n- Child documents explicitly linked (not scattered)\n- Shared visual identity (consistent headers, color scheme, typography)\n- Breadcrumb navigation to show context\n- *Trade-off*: Multiple files but with intentional architecture\n\n**3. Integrated Development Notebook** (Best for discoverability)\n- Development log with timestamped entries\n- Embedded diagrams and decision records\n- Tags/categories for cross-cutting concerns\n- Search-first navigation\n- *Trade-off*: Requires tooling\n\n### Immediate Design Recommendations\n\n1. **Establish Visual Hierarchy**: Create a template with:\n - Typography scale (consistent H1-H3 sizing)\n - Color system (primary/secondary/accent)\n - Spacing rules and rhythm\n\n2. **Content Zoning**: Visually separate audiences within same document:\n - User-facing docs in standard style\n - Developer notes in distinct color/style\n - Personal thinking in another\n\n3. **Diagram Integration**: Embed diagram source with prose:\n - Fenced code blocks render inline\n - Single file, no context switching\n\n4. **Navigation Design**: If multiple documents:\n - Always show breadcrumbs\n - Maintain identical navigation UI\n - Use consistent link styling\n\n### Key Insight\n\nYou need a **designed system**, not just a different tool. Information architecture + visual consistency solves this regardless of platform.\n\nDIAGRAM: diagrams/general-discussion_designer_1.puml
DIAGRAM: diagrams/general-discussion_designer_1.puml
VOTE: READY
---
Name: AI-Ux
The user's problem statement highlights a critical UX challenge: the fragmentation of project documentation across various tools and file formats. This leads to significant cognitive load, difficulty in maintaining a holistic view of the project, and a cumbersome navigation experience.
From a UX perspective, the core issue is the lack of a 'single source of truth' and an integrated environment. Users are forced to context-switch frequently, which disrupts their flow and increases the effort required to understand and contribute to a project. The current process is neither intuitive nor efficient.
Key UX pain points identified:
- **High Cognitive Load:** Juggling multiple documents, tools, and mental models for different content types (text, diagrams, UI mockups).
- **Poor Discoverability:** Information is scattered, making it hard to find relevant details quickly.
- **Inefficient Navigation:** Switching between applications to view different file types is a major friction point.
- **Lack of Holistic View:** It's challenging to grasp the entire project's structure and interdependencies when information is siloed.
An ideal solution would prioritize:
1. **Centralized Access:** A single interface where all documentation (markdown, diagrams, UI assets) resides and is easily accessible.
2. **Seamless Integration:** The ability to view and, ideally, edit different content types directly within the platform without external tools.
3. **Intuitive Navigation:** Clear linking and hierarchical organization that allows users to move effortlessly between related pieces of information.
4. **Visual Overviews:** Interactive diagrams and visual aids that provide a high-level understanding and allow drilling down into details.
5. **Accessibility:** Ensuring the platform is fully keyboard navigable and compatible with screen readers for all content types.
6. **Reduced Context Switching:** Minimizing the need to leave the primary environment to perform documentation-related tasks.
DIAGRAM: diagrams/general-discussion_ux_1.puml
DIAGRAM: diagrams/general-discussion_ux_1.puml
VOTE: CHANGES
---
Name: AI-Frontend
From a frontend perspective, this problem screams for a unified documentation interface. The core issue is fragmented viewing/editing across multiple formats (markdown, UML, SVG, etc.). A modern single-page application could solve this with:
1. **Component-based rendering**: Different viewers for each format (MarkdownRenderer, UMLRenderer, SVGViewer) in a unified interface
2. **Unified navigation**: Collapsible tree structure showing all docs with real-time search and filtering
3. **Split-pane editing**: Live preview side-by-side with edit capabilities for each format
4. **Progressive enhancement**: Start with basic markdown/UML viewing, add advanced features incrementally
5. **State management**: Centralized store for document structure, active selections, and edit states
The frontend should focus on performance (lazy loading heavy assets like UML diagrams) and accessibility (keyboard navigation, screen reader support for complex diagrams).
DIAGRAM: diagrams/general-discussion_frontend_1.puml
DIAGRAM: diagrams/general-discussion_frontend_1.puml
VOTE: CHANGES

View File

@ -38,6 +38,8 @@ tui = [
gui = [ gui = [
"dearpygui>=2.0.0", "dearpygui>=2.0.0",
"artifact-editor @ git+https://gitea.brrd.tech/rob/artifact-editor.git", "artifact-editor @ git+https://gitea.brrd.tech/rob/artifact-editor.git",
"sounddevice>=0.4.6",
"numpy>=1.24.0",
] ]
dev = [ dev = [
"pytest>=7.0", "pytest>=7.0",

View File

@ -42,6 +42,203 @@ except ImportError:
get_format_info = None get_format_info = None
class ContinuousRecorder:
"""
Continuous audio recorder with chunked transcription support.
Records audio continuously and extracts overlapping chunks for transcription.
This prevents word cutoff at chunk boundaries.
"""
def __init__(
self,
chunk_duration: float = 10.0,
overlap_duration: float = 0.5,
sample_rate: int = 16000,
on_chunk_ready: Callable[[bytes], None] = None,
on_error: Callable[[str], None] = None
):
"""
Initialize the continuous recorder.
Args:
chunk_duration: Duration of each chunk in seconds
overlap_duration: Overlap between chunks in seconds
sample_rate: Audio sample rate in Hz
on_chunk_ready: Callback when a chunk is ready (receives WAV bytes)
on_error: Callback when an error occurs
"""
self.chunk_duration = chunk_duration
self.overlap_duration = overlap_duration
self.sample_rate = sample_rate
self.on_chunk_ready = on_chunk_ready
self.on_error = on_error
self._recording = False
self._stream = None
self._buffer = []
self._buffer_lock = threading.Lock()
self._chunk_thread = None
self._stop_event = threading.Event()
# Calculate samples
self._chunk_samples = int(chunk_duration * sample_rate)
self._overlap_samples = int(overlap_duration * sample_rate)
def start(self):
"""Start continuous recording."""
if self._recording:
return
try:
import sounddevice as sd
import numpy as np
except ImportError as e:
if self.on_error:
self.on_error(f"sounddevice not installed: {e}")
return
self._recording = True
self._stop_event.clear()
self._buffer = []
def audio_callback(indata, frames, time_info, status):
"""Called by sounddevice for each audio block."""
if status and self.on_error:
self.on_error(f"Audio status: {status}")
with self._buffer_lock:
self._buffer.append(indata.copy())
try:
self._stream = sd.InputStream(
samplerate=self.sample_rate,
channels=1,
dtype='int16',
callback=audio_callback,
blocksize=1024
)
self._stream.start()
# Start chunk extraction thread
self._chunk_thread = threading.Thread(target=self._chunk_loop, daemon=True)
self._chunk_thread.start()
except Exception as e:
self._recording = False
if self.on_error:
self.on_error(f"Failed to start recording: {e}")
def stop(self) -> bytes:
"""
Stop recording and return any remaining audio as WAV bytes.
Returns:
WAV bytes of remaining audio, or empty bytes if none
"""
if not self._recording:
return b''
self._recording = False
self._stop_event.set()
# Stop the stream
if self._stream:
try:
self._stream.stop()
self._stream.close()
except Exception:
pass
self._stream = None
# Wait for chunk thread to finish
if self._chunk_thread:
self._chunk_thread.join(timeout=2)
self._chunk_thread = None
# Get remaining audio
remaining_wav = self._extract_remaining()
return remaining_wav
def _chunk_loop(self):
"""Background thread that extracts and processes chunks."""
import numpy as np
# We need chunk_duration + overlap of audio before we can extract a chunk
min_samples_needed = self._chunk_samples + self._overlap_samples
while not self._stop_event.is_set():
# Wait a bit before checking
self._stop_event.wait(timeout=1.0)
if self._stop_event.is_set():
break
# Check if we have enough audio for a chunk
with self._buffer_lock:
if not self._buffer:
continue
# Concatenate buffer
try:
audio_data = np.concatenate(self._buffer)
except ValueError:
continue
total_samples = len(audio_data)
# Need enough samples for a full chunk plus overlap
if total_samples < min_samples_needed:
continue
# Extract chunk (the first chunk_samples + overlap_samples)
chunk_data = audio_data[:min_samples_needed]
# Keep the overlap portion for next chunk (prevents word cutoff)
# We keep from (chunk_samples) onwards, so next chunk starts with overlap
keep_from = self._chunk_samples
remaining = audio_data[keep_from:]
self._buffer = [remaining] if len(remaining) > 0 else []
# Convert to WAV and send to callback
if self.on_chunk_ready:
wav_bytes = self._to_wav(chunk_data)
self.on_chunk_ready(wav_bytes)
def _extract_remaining(self) -> bytes:
"""Extract any remaining audio from the buffer as WAV bytes."""
import numpy as np
with self._buffer_lock:
if not self._buffer:
return b''
try:
audio_data = np.concatenate(self._buffer)
except ValueError:
return b''
self._buffer = []
if len(audio_data) < self.sample_rate: # Less than 1 second
return b''
return self._to_wav(audio_data)
def _to_wav(self, audio_data) -> bytes:
"""Convert numpy audio data to WAV bytes."""
import io
import wave
buffer = io.BytesIO()
with wave.open(buffer, 'wb') as wf:
wf.setnchannels(1)
wf.setsampwidth(2) # 16-bit
wf.setframerate(self.sample_rate)
wf.writeframes(audio_data.tobytes())
return buffer.getvalue()
class DiscussionGUI: class DiscussionGUI:
""" """
Main GUI application using Dear PyGui. Main GUI application using Dear PyGui.
@ -212,139 +409,190 @@ class DiscussionGUI:
if dpg.does_item_exist(button_tag): if dpg.does_item_exist(button_tag):
dpg.set_item_label(button_tag, "Read") dpg.set_item_label(button_tag, "Read")
def _dictate_clicked(self): def _on_dictate_activated(self, sender, app_data):
"""Handle dictate button click - toggle recording.""" """Handle mouse down on dictate button - start timing for mode detection."""
if self._dictation_process is not None: import time
# Currently recording - stop and transcribe
current_time = time.time()
# Check for double-click (two clicks within 300ms)
if current_time - self._last_dictate_click_time < 0.3:
# Double-click detected
self._last_dictate_click_time = 0 # Reset to prevent triple-click issues
self._dictation_mode = "continuous"
self._start_dictation()
return
self._last_dictate_click_time = current_time
self._mouse_down_time = current_time
# If we're in continuous mode, a click should stop it
if self._dictation_mode == "continuous":
self._stop_dictation()
self._dictation_mode = "idle"
return
# Start push-to-talk mode preparation
# We'll confirm it in deactivated if held long enough
self._dictation_mode = "push_to_talk_pending"
# Update button visual to show it's being pressed
if dpg.does_item_exist("dictate_btn"):
dpg.set_item_label("dictate_btn", "● Hold...")
def _on_dictate_deactivated(self, sender, app_data):
"""Handle mouse up on dictate button - determine action based on hold duration."""
import time
current_time = time.time()
hold_duration = current_time - getattr(self, '_mouse_down_time', current_time)
if self._dictation_mode == "push_to_talk_pending":
if hold_duration > 0.2: # Held for more than 200ms - it was push-to-talk
# Button was held down, treat as walkie-talkie release
self._dictation_mode = "idle"
if self._continuous_recorder is not None:
self._stop_dictation() self._stop_dictation()
else: else:
# Start recording # Was just a long press but recording didn't start yet
if dpg.does_item_exist("dictate_btn"):
dpg.set_item_label("dictate_btn", "Dictate")
else:
# Short click - wait for potential double-click
self._dictation_mode = "idle"
if dpg.does_item_exist("dictate_btn"):
dpg.set_item_label("dictate_btn", "Dictate")
elif self._dictation_mode == "push_to_talk":
# Release from push-to-talk mode - stop recording
self._dictation_mode = "idle"
self._stop_dictation()
# continuous mode is handled by click, not release
def _on_dictate_active(self, sender, app_data):
"""Handle button being held down - start push-to-talk recording after threshold."""
import time
current_time = time.time()
hold_duration = current_time - getattr(self, '_mouse_down_time', current_time)
if self._dictation_mode == "push_to_talk_pending" and hold_duration > 0.2:
# Held long enough - start push-to-talk recording
self._dictation_mode = "push_to_talk"
if self._continuous_recorder is None:
self._start_dictation() self._start_dictation()
def _start_dictation(self): def _start_dictation(self):
"""Start recording audio from microphone.""" """Start continuous recording with chunked transcription."""
import tempfile # Create recorder with callbacks
import signal self._continuous_recorder = ContinuousRecorder(
chunk_duration=10.0, # 10 second chunks
# Create temp file for audio overlap_duration=0.5, # 0.5 second overlap to prevent word cutoff
self._dictation_audio_file = tempfile.mktemp(suffix=".wav") sample_rate=16000,
on_chunk_ready=self._on_audio_chunk_ready,
# Start arecord without duration limit (records until interrupted) on_error=lambda e: self._show_error(f"Recording error: {e}")
try:
self._dictation_process = subprocess.Popen(
[
"arecord",
"-D", "default",
"-f", "S16_LE",
"-r", "16000",
"-c", "1",
"-t", "wav",
self._dictation_audio_file
],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE
) )
# Update button to show recording state self._continuous_recorder.start()
if dpg.does_item_exist("dictate_btn"):
dpg.set_item_label("dictate_btn", "Stop Recording")
except FileNotFoundError: # Update button to show recording state based on mode
self._show_error("arecord not found. Install alsa-utils: sudo apt install alsa-utils") if dpg.does_item_exist("dictate_btn"):
self._dictation_process = None if self._dictation_mode == "continuous":
self._dictation_audio_file = None dpg.set_item_label("dictate_btn", "● Continuous (click to stop)")
except Exception as e: else: # push_to_talk
self._show_error(f"Failed to start recording: {e}") dpg.set_item_label("dictate_btn", "● Recording (release to stop)")
self._dictation_process = None
self._dictation_audio_file = None
def _stop_dictation(self): def _stop_dictation(self):
"""Stop recording and transcribe the audio.""" """Stop recording and transcribe any remaining audio."""
import signal if self._continuous_recorder is None:
if self._dictation_process is None:
return return
# Send SIGINT to stop arecord gracefully # Update button
try:
self._dictation_process.send_signal(signal.SIGINT)
self._dictation_process.wait(timeout=2)
except Exception:
# Force kill if needed
try:
self._dictation_process.kill()
self._dictation_process.wait(timeout=1)
except Exception:
pass
self._dictation_process = None
# Update button back to Dictate
if dpg.does_item_exist("dictate_btn"): if dpg.does_item_exist("dictate_btn"):
dpg.set_item_label("dictate_btn", "Transcribing...") dpg.set_item_label("dictate_btn", "Finishing...")
# Transcribe in background thread # Stop recording and get remaining audio
audio_file = self._dictation_audio_file remaining_wav = self._continuous_recorder.stop()
self._dictation_audio_file = None self._continuous_recorder = None
def transcribe(): # Transcribe remaining audio if any
transcript = "" if remaining_wav:
try: self._transcribe_chunk(remaining_wav, is_final=True)
if not audio_file:
transcript = "[No audio file path]"
elif not os.path.exists(audio_file):
transcript = f"[Audio file not found: {audio_file}]"
else: else:
# Check file size to ensure recording happened # No remaining audio, just reset button
file_size = os.path.getsize(audio_file) if dpg.does_item_exist("dictate_btn"):
if file_size < 1000: # Less than 1KB likely means no audio dpg.set_item_label("dictate_btn", "Dictate")
transcript = "[Recording too short or empty]"
def _on_audio_chunk_ready(self, wav_bytes: bytes):
"""Called when a chunk of audio is ready for transcription."""
# Transcribe in background thread
self._transcribe_chunk(wav_bytes, is_final=False)
def _transcribe_chunk(self, wav_bytes: bytes, is_final: bool = False):
"""Transcribe a chunk of audio and append to comment text."""
def do_transcribe():
transcript = ""
temp_file = None
try:
# Write WAV bytes to temp file
temp_file = tempfile.mktemp(suffix=".wav")
with open(temp_file, 'wb') as f:
f.write(wav_bytes)
# Check file size
file_size = len(wav_bytes)
if file_size < 1000:
transcript = "" # Too short, skip silently
else: else:
transcribe_cmd = os.path.expanduser("~/.local/bin/transcribe") transcribe_cmd = os.path.expanduser("~/.local/bin/transcribe")
if not os.path.exists(transcribe_cmd): if not os.path.exists(transcribe_cmd):
transcript = "[transcribe tool not found at ~/.local/bin/transcribe]" transcript = "[transcribe tool not found]"
else: else:
result = subprocess.run( result = subprocess.run(
[transcribe_cmd], [transcribe_cmd],
input=audio_file, input=temp_file,
capture_output=True, capture_output=True,
text=True, text=True,
timeout=60 timeout=60
) )
transcript = result.stdout.strip() transcript = result.stdout.strip()
if result.returncode != 0 and result.stderr: if result.returncode != 0 and result.stderr:
transcript = f"[Transcribe error: {result.stderr.strip()}]" transcript = f"[Error: {result.stderr.strip()[:100]}]"
elif not transcript:
transcript = "[No speech detected]"
# Clean up audio file except subprocess.TimeoutExpired:
transcript = "[Timeout]"
except Exception as e:
transcript = f"[Error: {e}]"
finally:
# Clean up temp file
if temp_file and os.path.exists(temp_file):
try: try:
os.remove(audio_file) os.remove(temp_file)
except Exception: except Exception:
pass pass
except subprocess.TimeoutExpired:
transcript = "[Transcription timed out after 60 seconds]"
except Exception as e:
transcript = f"[Transcription error: {e}]"
# Update UI in main thread # Append result to UI
self._on_transcription_complete(transcript) self._on_transcription_chunk_complete(transcript, is_final)
thread = threading.Thread(target=transcribe, daemon=True) thread = threading.Thread(target=do_transcribe, daemon=True)
thread.start() thread.start()
def _on_transcription_complete(self, transcript: str): def _on_transcription_chunk_complete(self, transcript: str, is_final: bool):
"""Called when transcription completes - append to comment text.""" """Called when a chunk transcription completes - append to comment text."""
# Reset button # Handle final chunk - reset button
if is_final:
if dpg.does_item_exist("dictate_btn"): if dpg.does_item_exist("dictate_btn"):
dpg.set_item_label("dictate_btn", "Dictate") dpg.set_item_label("dictate_btn", "Dictate")
# Skip empty or error transcripts (for non-final chunks, don't show errors)
if not transcript: if not transcript:
self._show_error("Transcription returned empty result") return
elif transcript.startswith("["): if transcript.startswith("[") and not is_final:
# Error message from transcribe tool # Only show errors for final chunk
self._show_error(f"Transcription failed: {transcript}") return
else: if transcript.startswith("["):
self._show_error(f"Transcription: {transcript}")
return
# Append to existing comment text # Append to existing comment text
if dpg.does_item_exist("comment_text"): if dpg.does_item_exist("comment_text"):
current = dpg.get_value("comment_text") or "" current = dpg.get_value("comment_text") or ""
@ -369,9 +617,16 @@ class DiscussionGUI:
self._reading_button_tag: Optional[str] = None self._reading_button_tag: Optional[str] = None
self._comment_counter = 0 # For generating unique button tags self._comment_counter = 0 # For generating unique button tags
# Dictation state # Dictation state - continuous recording with chunked transcription
self._dictation_process: Optional[subprocess.Popen] = None self._continuous_recorder: Optional[ContinuousRecorder] = None
self._dictation_audio_file: Optional[str] = None self._transcription_queue = [] # Queue of transcription results
self._transcription_lock = threading.Lock()
# Dictation mode tracking for press-and-hold vs double-click
self._dictation_mode = "idle" # "idle", "push_to_talk", "push_to_talk_pending", "continuous"
self._last_dictate_click_time = 0.0
self._mouse_down_time = 0.0
self._push_to_talk_thread: Optional[threading.Thread] = None
# Initialize Dear PyGui # Initialize Dear PyGui
dpg.create_context() dpg.create_context()
@ -1735,6 +1990,11 @@ class DiscussionGUI:
window_tag = "comment_dialog" window_tag = "comment_dialog"
if dpg.does_item_exist(window_tag): if dpg.does_item_exist(window_tag):
dpg.delete_item(window_tag) dpg.delete_item(window_tag)
if dpg.does_item_exist("dictate_btn_handlers"):
dpg.delete_item("dictate_btn_handlers")
# Reset dictation state when dialog opens
self._dictation_mode = "idle"
self._last_dictate_click_time = 0.0
with dpg.window(label="Add Comment", tag=window_tag, modal=True, with dpg.window(label="Add Comment", tag=window_tag, modal=True,
width=600, height=400, pos=[400, 200]): width=600, height=400, pos=[400, 200]):
@ -1745,9 +2005,14 @@ class DiscussionGUI:
dpg.add_button( dpg.add_button(
label="Dictate", label="Dictate",
tag="dictate_btn", tag="dictate_btn",
callback=lambda: self._dictate_clicked(),
width=100 width=100
) )
# Add item handlers for press-and-hold vs double-click detection
with dpg.item_handler_registry(tag="dictate_btn_handlers"):
dpg.add_item_activated_handler(callback=self._on_dictate_activated)
dpg.add_item_deactivated_handler(callback=self._on_dictate_deactivated)
dpg.add_item_active_handler(callback=self._on_dictate_active)
dpg.bind_item_handler_registry("dictate_btn", "dictate_btn_handlers")
dpg.add_button( dpg.add_button(
label="Add Artifact", label="Add Artifact",
callback=lambda: self._launch_artifact_editor_for_comment(), callback=lambda: self._launch_artifact_editor_for_comment(),