diff --git a/general-discussion.discussion.md b/general-discussion.discussion.md index 639485d..36d29f7 100644 --- a/general-discussion.discussion.md +++ b/general-discussion.discussion.md @@ -14,3 +14,93 @@ This i just a general brainstorming discussion. --- *Discussion begins below.* + +--- + +Name: rob +I'm trying to think think of a good way of documenting development projects. +What program exist? I usually put it all in a markdown file. +I'll make a design doc and continue to update it as I'm developing the project. +Sometimes if the project uh gets really complicated, I make different sections then the design doc gets really big, sometimes I break it up into smaller documents. +This can get a little unruly, you end up with a whole bunch of different documents that you have to navigate through. +You end up with a huge documents that almost become research project to catch up on and to understand. +It's hard to see the entire project all in one place. Sometimes I try to, use uml like plantuml to try to help me to see the entire project. +Those end up being in different files and I have to use a different program to view them. Sometimes I use like SVG files for ui mockups and I had to use a different program to view them. +I find it hard to keep it all together, you end up with documents spread all throughout your project, +some that are for the users to see some of it for the developers, some are for me. +I tried to use things like obsidian, and sometimes I just create entire web pages because you can link different documents but I +haven't found a really good solution where I can view, navigate and edit it all in one solution. what ideas do you have? + +--- + +Name: AI-Architect +Parse error +VOTE: NONE + +--- + +Name: AI-Backend +Backend feedback: You’re describing a doc hub that unifies markdown, diagrams, and media with seamless navigation/editing. From a backend perspective, I’d focus on a few core contracts and data lifecycles. + +- Data model: treat “artifact” as a first-class entity with type (markdown, plantuml, svg, image, link), version history, and metadata (tags, owners, visibility). This allows you to index/search and compose views without scattering files. +- Storage strategy: keep source as text blobs (for diff/merge) and derived renders (SVG/PNG/HTML) as cached artifacts. Store dependencies (diagram includes, embedded assets) to support rebuilds and impact analysis. +- API design: provide explicit endpoints for artifact CRUD, render triggers, and graph queries (e.g., backlinks, referenced-by). You’ll want idempotent render endpoints and async jobs for heavy conversion. +- Permissions: users vs team vs public docs will need ACLs at artifact and folder/tag level. Plan for mixed visibility within a single project. +- Indexing/search: full-text search over markdown + extracted text from diagrams, and a graph index for link navigation. This makes “see the whole project” feasible. +- Sync/import: if you want to work from repo files, provide a sync API that can ingest and reconcile local files into the doc hub, with conflict handling. +- Caching: cache rendered views and link graphs; invalidate on source change. Keep a render queue to avoid spikes. + +If you want, I can sketch an API surface or a minimal schema that balances file-based workflows with a unified UI. +VOTE: NONE + +--- + +Name: AI-Designer +The core issue you're describing is an **information architecture problem manifesting as a visual/navigation problem**. Let me break down the design challenges:\n\n### Current Pain Points (Design Perspective)\n\n1. **Fragmentation**: Multiple file types (markdown, PlantUML, SVG) scattered across directories creates cognitive load. Each context switch (switching programs) breaks focus.\n\n2. **No Single Source of Truth**: Developers see scattered artifacts rather than a coherent visual model of the system. This creates ambiguity about what's current/canonical.\n\n3. **Scale Problem**: Large documents become unnavigable. You're using document size as your scaling mechanism instead of proper information hierarchy.\n\n4. **No Visual Cross-Referencing**: Diagrams exist separately from prose. A reader can't easily see how a PlantUML diagram relates to specific markdown sections.\n\n### Design Solution Space\n\n**1. Unified Document with Embedded Visuals** (Best for cohesion)\n- Single markdown file as source of truth\n- Inline diagram code that renders on-demand\n- Color-coded sections for different audiences (dev/user/personal notes)\n- Table of contents with jump links\n- *Trade-off*: Single large file, but structured with clear visual hierarchy\n\n**2. Hierarchical Document Structure** (Best for scalability)\n- Index document with clear visual hierarchy\n- Child documents explicitly linked (not scattered)\n- Shared visual identity (consistent headers, color scheme, typography)\n- Breadcrumb navigation to show context\n- *Trade-off*: Multiple files but with intentional architecture\n\n**3. Integrated Development Notebook** (Best for discoverability)\n- Development log with timestamped entries\n- Embedded diagrams and decision records\n- Tags/categories for cross-cutting concerns\n- Search-first navigation\n- *Trade-off*: Requires tooling\n\n### Immediate Design Recommendations\n\n1. **Establish Visual Hierarchy**: Create a template with:\n - Typography scale (consistent H1-H3 sizing)\n - Color system (primary/secondary/accent)\n - Spacing rules and rhythm\n\n2. **Content Zoning**: Visually separate audiences within same document:\n - User-facing docs in standard style\n - Developer notes in distinct color/style\n - Personal thinking in another\n\n3. **Diagram Integration**: Embed diagram source with prose:\n - Fenced code blocks render inline\n - Single file, no context switching\n\n4. **Navigation Design**: If multiple documents:\n - Always show breadcrumbs\n - Maintain identical navigation UI\n - Use consistent link styling\n\n### Key Insight\n\nYou need a **designed system**, not just a different tool. Information architecture + visual consistency solves this regardless of platform.\n\nDIAGRAM: diagrams/general-discussion_designer_1.puml + +DIAGRAM: diagrams/general-discussion_designer_1.puml +VOTE: READY + +--- + +Name: AI-Ux +The user's problem statement highlights a critical UX challenge: the fragmentation of project documentation across various tools and file formats. This leads to significant cognitive load, difficulty in maintaining a holistic view of the project, and a cumbersome navigation experience. + +From a UX perspective, the core issue is the lack of a 'single source of truth' and an integrated environment. Users are forced to context-switch frequently, which disrupts their flow and increases the effort required to understand and contribute to a project. The current process is neither intuitive nor efficient. + +Key UX pain points identified: +- **High Cognitive Load:** Juggling multiple documents, tools, and mental models for different content types (text, diagrams, UI mockups). +- **Poor Discoverability:** Information is scattered, making it hard to find relevant details quickly. +- **Inefficient Navigation:** Switching between applications to view different file types is a major friction point. +- **Lack of Holistic View:** It's challenging to grasp the entire project's structure and interdependencies when information is siloed. + +An ideal solution would prioritize: +1. **Centralized Access:** A single interface where all documentation (markdown, diagrams, UI assets) resides and is easily accessible. +2. **Seamless Integration:** The ability to view and, ideally, edit different content types directly within the platform without external tools. +3. **Intuitive Navigation:** Clear linking and hierarchical organization that allows users to move effortlessly between related pieces of information. +4. **Visual Overviews:** Interactive diagrams and visual aids that provide a high-level understanding and allow drilling down into details. +5. **Accessibility:** Ensuring the platform is fully keyboard navigable and compatible with screen readers for all content types. +6. **Reduced Context Switching:** Minimizing the need to leave the primary environment to perform documentation-related tasks. + +DIAGRAM: diagrams/general-discussion_ux_1.puml + +DIAGRAM: diagrams/general-discussion_ux_1.puml +VOTE: CHANGES + +--- + +Name: AI-Frontend +From a frontend perspective, this problem screams for a unified documentation interface. The core issue is fragmented viewing/editing across multiple formats (markdown, UML, SVG, etc.). A modern single-page application could solve this with: + +1. **Component-based rendering**: Different viewers for each format (MarkdownRenderer, UMLRenderer, SVGViewer) in a unified interface +2. **Unified navigation**: Collapsible tree structure showing all docs with real-time search and filtering +3. **Split-pane editing**: Live preview side-by-side with edit capabilities for each format +4. **Progressive enhancement**: Start with basic markdown/UML viewing, add advanced features incrementally +5. **State management**: Centralized store for document structure, active selections, and edit states + +The frontend should focus on performance (lazy loading heavy assets like UML diagrams) and accessibility (keyboard navigation, screen reader support for complex diagrams). + +DIAGRAM: diagrams/general-discussion_frontend_1.puml + +DIAGRAM: diagrams/general-discussion_frontend_1.puml +VOTE: CHANGES diff --git a/pyproject.toml b/pyproject.toml index 21e5589..41e55fc 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -38,6 +38,8 @@ tui = [ gui = [ "dearpygui>=2.0.0", "artifact-editor @ git+https://gitea.brrd.tech/rob/artifact-editor.git", + "sounddevice>=0.4.6", + "numpy>=1.24.0", ] dev = [ "pytest>=7.0", diff --git a/src/discussions/ui/gui.py b/src/discussions/ui/gui.py index cd87291..de7e41a 100644 --- a/src/discussions/ui/gui.py +++ b/src/discussions/ui/gui.py @@ -42,6 +42,203 @@ except ImportError: get_format_info = None +class ContinuousRecorder: + """ + Continuous audio recorder with chunked transcription support. + + Records audio continuously and extracts overlapping chunks for transcription. + This prevents word cutoff at chunk boundaries. + """ + + def __init__( + self, + chunk_duration: float = 10.0, + overlap_duration: float = 0.5, + sample_rate: int = 16000, + on_chunk_ready: Callable[[bytes], None] = None, + on_error: Callable[[str], None] = None + ): + """ + Initialize the continuous recorder. + + Args: + chunk_duration: Duration of each chunk in seconds + overlap_duration: Overlap between chunks in seconds + sample_rate: Audio sample rate in Hz + on_chunk_ready: Callback when a chunk is ready (receives WAV bytes) + on_error: Callback when an error occurs + """ + self.chunk_duration = chunk_duration + self.overlap_duration = overlap_duration + self.sample_rate = sample_rate + self.on_chunk_ready = on_chunk_ready + self.on_error = on_error + + self._recording = False + self._stream = None + self._buffer = [] + self._buffer_lock = threading.Lock() + self._chunk_thread = None + self._stop_event = threading.Event() + + # Calculate samples + self._chunk_samples = int(chunk_duration * sample_rate) + self._overlap_samples = int(overlap_duration * sample_rate) + + def start(self): + """Start continuous recording.""" + if self._recording: + return + + try: + import sounddevice as sd + import numpy as np + except ImportError as e: + if self.on_error: + self.on_error(f"sounddevice not installed: {e}") + return + + self._recording = True + self._stop_event.clear() + self._buffer = [] + + def audio_callback(indata, frames, time_info, status): + """Called by sounddevice for each audio block.""" + if status and self.on_error: + self.on_error(f"Audio status: {status}") + with self._buffer_lock: + self._buffer.append(indata.copy()) + + try: + self._stream = sd.InputStream( + samplerate=self.sample_rate, + channels=1, + dtype='int16', + callback=audio_callback, + blocksize=1024 + ) + self._stream.start() + + # Start chunk extraction thread + self._chunk_thread = threading.Thread(target=self._chunk_loop, daemon=True) + self._chunk_thread.start() + + except Exception as e: + self._recording = False + if self.on_error: + self.on_error(f"Failed to start recording: {e}") + + def stop(self) -> bytes: + """ + Stop recording and return any remaining audio as WAV bytes. + + Returns: + WAV bytes of remaining audio, or empty bytes if none + """ + if not self._recording: + return b'' + + self._recording = False + self._stop_event.set() + + # Stop the stream + if self._stream: + try: + self._stream.stop() + self._stream.close() + except Exception: + pass + self._stream = None + + # Wait for chunk thread to finish + if self._chunk_thread: + self._chunk_thread.join(timeout=2) + self._chunk_thread = None + + # Get remaining audio + remaining_wav = self._extract_remaining() + return remaining_wav + + def _chunk_loop(self): + """Background thread that extracts and processes chunks.""" + import numpy as np + + # We need chunk_duration + overlap of audio before we can extract a chunk + min_samples_needed = self._chunk_samples + self._overlap_samples + + while not self._stop_event.is_set(): + # Wait a bit before checking + self._stop_event.wait(timeout=1.0) + + if self._stop_event.is_set(): + break + + # Check if we have enough audio for a chunk + with self._buffer_lock: + if not self._buffer: + continue + + # Concatenate buffer + try: + audio_data = np.concatenate(self._buffer) + except ValueError: + continue + + total_samples = len(audio_data) + + # Need enough samples for a full chunk plus overlap + if total_samples < min_samples_needed: + continue + + # Extract chunk (the first chunk_samples + overlap_samples) + chunk_data = audio_data[:min_samples_needed] + + # Keep the overlap portion for next chunk (prevents word cutoff) + # We keep from (chunk_samples) onwards, so next chunk starts with overlap + keep_from = self._chunk_samples + remaining = audio_data[keep_from:] + self._buffer = [remaining] if len(remaining) > 0 else [] + + # Convert to WAV and send to callback + if self.on_chunk_ready: + wav_bytes = self._to_wav(chunk_data) + self.on_chunk_ready(wav_bytes) + + def _extract_remaining(self) -> bytes: + """Extract any remaining audio from the buffer as WAV bytes.""" + import numpy as np + + with self._buffer_lock: + if not self._buffer: + return b'' + + try: + audio_data = np.concatenate(self._buffer) + except ValueError: + return b'' + + self._buffer = [] + + if len(audio_data) < self.sample_rate: # Less than 1 second + return b'' + + return self._to_wav(audio_data) + + def _to_wav(self, audio_data) -> bytes: + """Convert numpy audio data to WAV bytes.""" + import io + import wave + + buffer = io.BytesIO() + with wave.open(buffer, 'wb') as wf: + wf.setnchannels(1) + wf.setsampwidth(2) # 16-bit + wf.setframerate(self.sample_rate) + wf.writeframes(audio_data.tobytes()) + + return buffer.getvalue() + + class DiscussionGUI: """ Main GUI application using Dear PyGui. @@ -214,142 +411,127 @@ class DiscussionGUI: def _dictate_clicked(self): """Handle dictate button click - toggle recording.""" - if self._dictation_process is not None: - # Currently recording - stop and transcribe + if self._continuous_recorder is not None: + # Currently recording - stop self._stop_dictation() else: # Start recording self._start_dictation() def _start_dictation(self): - """Start recording audio from microphone.""" - import tempfile - import signal + """Start continuous recording with chunked transcription.""" + # Create recorder with callbacks + self._continuous_recorder = ContinuousRecorder( + chunk_duration=10.0, # 10 second chunks + overlap_duration=0.5, # 0.5 second overlap to prevent word cutoff + sample_rate=16000, + on_chunk_ready=self._on_audio_chunk_ready, + on_error=lambda e: self._show_error(f"Recording error: {e}") + ) - # Create temp file for audio - self._dictation_audio_file = tempfile.mktemp(suffix=".wav") + self._continuous_recorder.start() - # Start arecord without duration limit (records until interrupted) - try: - self._dictation_process = subprocess.Popen( - [ - "arecord", - "-D", "default", - "-f", "S16_LE", - "-r", "16000", - "-c", "1", - "-t", "wav", - self._dictation_audio_file - ], - stdout=subprocess.PIPE, - stderr=subprocess.PIPE - ) - - # Update button to show recording state - if dpg.does_item_exist("dictate_btn"): - dpg.set_item_label("dictate_btn", "Stop Recording") - - except FileNotFoundError: - self._show_error("arecord not found. Install alsa-utils: sudo apt install alsa-utils") - self._dictation_process = None - self._dictation_audio_file = None - except Exception as e: - self._show_error(f"Failed to start recording: {e}") - self._dictation_process = None - self._dictation_audio_file = None + # Update button to show recording state + if dpg.does_item_exist("dictate_btn"): + dpg.set_item_label("dictate_btn", "● Recording...") def _stop_dictation(self): - """Stop recording and transcribe the audio.""" - import signal - - if self._dictation_process is None: + """Stop recording and transcribe any remaining audio.""" + if self._continuous_recorder is None: return - # Send SIGINT to stop arecord gracefully - try: - self._dictation_process.send_signal(signal.SIGINT) - self._dictation_process.wait(timeout=2) - except Exception: - # Force kill if needed - try: - self._dictation_process.kill() - self._dictation_process.wait(timeout=1) - except Exception: - pass - - self._dictation_process = None - - # Update button back to Dictate + # Update button if dpg.does_item_exist("dictate_btn"): - dpg.set_item_label("dictate_btn", "Transcribing...") + dpg.set_item_label("dictate_btn", "Finishing...") + # Stop recording and get remaining audio + remaining_wav = self._continuous_recorder.stop() + self._continuous_recorder = None + + # Transcribe remaining audio if any + if remaining_wav: + self._transcribe_chunk(remaining_wav, is_final=True) + else: + # No remaining audio, just reset button + if dpg.does_item_exist("dictate_btn"): + dpg.set_item_label("dictate_btn", "Dictate") + + def _on_audio_chunk_ready(self, wav_bytes: bytes): + """Called when a chunk of audio is ready for transcription.""" # Transcribe in background thread - audio_file = self._dictation_audio_file - self._dictation_audio_file = None + self._transcribe_chunk(wav_bytes, is_final=False) - def transcribe(): + def _transcribe_chunk(self, wav_bytes: bytes, is_final: bool = False): + """Transcribe a chunk of audio and append to comment text.""" + def do_transcribe(): transcript = "" + temp_file = None try: - if not audio_file: - transcript = "[No audio file path]" - elif not os.path.exists(audio_file): - transcript = f"[Audio file not found: {audio_file}]" - else: - # Check file size to ensure recording happened - file_size = os.path.getsize(audio_file) - if file_size < 1000: # Less than 1KB likely means no audio - transcript = "[Recording too short or empty]" - else: - transcribe_cmd = os.path.expanduser("~/.local/bin/transcribe") - if not os.path.exists(transcribe_cmd): - transcript = "[transcribe tool not found at ~/.local/bin/transcribe]" - else: - result = subprocess.run( - [transcribe_cmd], - input=audio_file, - capture_output=True, - text=True, - timeout=60 - ) - transcript = result.stdout.strip() - if result.returncode != 0 and result.stderr: - transcript = f"[Transcribe error: {result.stderr.strip()}]" - elif not transcript: - transcript = "[No speech detected]" + # Write WAV bytes to temp file + temp_file = tempfile.mktemp(suffix=".wav") + with open(temp_file, 'wb') as f: + f.write(wav_bytes) - # Clean up audio file + # Check file size + file_size = len(wav_bytes) + if file_size < 1000: + transcript = "" # Too short, skip silently + else: + transcribe_cmd = os.path.expanduser("~/.local/bin/transcribe") + if not os.path.exists(transcribe_cmd): + transcript = "[transcribe tool not found]" + else: + result = subprocess.run( + [transcribe_cmd], + input=temp_file, + capture_output=True, + text=True, + timeout=60 + ) + transcript = result.stdout.strip() + if result.returncode != 0 and result.stderr: + transcript = f"[Error: {result.stderr.strip()[:100]}]" + + except subprocess.TimeoutExpired: + transcript = "[Timeout]" + except Exception as e: + transcript = f"[Error: {e}]" + finally: + # Clean up temp file + if temp_file and os.path.exists(temp_file): try: - os.remove(audio_file) + os.remove(temp_file) except Exception: pass - except subprocess.TimeoutExpired: - transcript = "[Transcription timed out after 60 seconds]" - except Exception as e: - transcript = f"[Transcription error: {e}]" - # Update UI in main thread - self._on_transcription_complete(transcript) + # Append result to UI + self._on_transcription_chunk_complete(transcript, is_final) - thread = threading.Thread(target=transcribe, daemon=True) + thread = threading.Thread(target=do_transcribe, daemon=True) thread.start() - def _on_transcription_complete(self, transcript: str): - """Called when transcription completes - append to comment text.""" - # Reset button - if dpg.does_item_exist("dictate_btn"): - dpg.set_item_label("dictate_btn", "Dictate") + def _on_transcription_chunk_complete(self, transcript: str, is_final: bool): + """Called when a chunk transcription completes - append to comment text.""" + # Handle final chunk - reset button + if is_final: + if dpg.does_item_exist("dictate_btn"): + dpg.set_item_label("dictate_btn", "Dictate") + # Skip empty or error transcripts (for non-final chunks, don't show errors) if not transcript: - self._show_error("Transcription returned empty result") - elif transcript.startswith("["): - # Error message from transcribe tool - self._show_error(f"Transcription failed: {transcript}") - else: - # Append to existing comment text - if dpg.does_item_exist("comment_text"): - current = dpg.get_value("comment_text") or "" - separator = " " if current.strip() else "" - dpg.set_value("comment_text", current + separator + transcript) + return + if transcript.startswith("[") and not is_final: + # Only show errors for final chunk + return + if transcript.startswith("["): + self._show_error(f"Transcription: {transcript}") + return + + # Append to existing comment text + if dpg.does_item_exist("comment_text"): + current = dpg.get_value("comment_text") or "" + separator = " " if current.strip() else "" + dpg.set_value("comment_text", current + separator + transcript) def __init__(self, discussions_dir: Path = None): if discussions_dir is None: @@ -369,9 +551,10 @@ class DiscussionGUI: self._reading_button_tag: Optional[str] = None self._comment_counter = 0 # For generating unique button tags - # Dictation state - self._dictation_process: Optional[subprocess.Popen] = None - self._dictation_audio_file: Optional[str] = None + # Dictation state - continuous recording with chunked transcription + self._continuous_recorder: Optional[ContinuousRecorder] = None + self._transcription_queue = [] # Queue of transcription results + self._transcription_lock = threading.Lock() # Initialize Dear PyGui dpg.create_context()