feat: Cursor-aware dictation insertion for walkie-talkie mode

- Walkie-talkie mode: inserts transcribed text at cursor position - Continuous mode: appends transcribed text to end - Tracks approximate cursor position by detecting where text edits occur - Adds appropriate spacing around inserted text 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 12:15:37 -04:00 · 2026-01-04 12:15:37 -04:00 · f063739348
parent a2d6cb6f3c
commit f063739348
2 changed files with 233 additions and 5 deletions
--- a/general-discussion.discussion.md
+++ b/general-discussion.discussion.md
@ -104,3 +104,153 @@ DIAGRAM: diagrams/general-discussion_frontend_1.puml
 DIAGRAM: diagrams/general-discussion_frontend_1.puml
 VOTE: CHANGES
 ---
 Name: rob
 Before we start thinking about, ways in which we can build this thing, is there any tools you know of or can you guys search online for any tools that that might do all of this stuff already?
 ---
 Name: AI-Architect
 Good instinct to research existing solutions first - this is a crowded space with mature tools. From an architectural perspective, here's what exists and the build-vs-buy decision:
 ## Existing Solutions by Category
 **Editor-Based (Best for developers already in IDE):**
 - **VS Code** with PlantUML + Markdown Preview Enhanced extensions - unified editing/viewing for markdown + UML diagrams [1][2]
 - **IntelliJ IDEA** - built-in markdown + diagram support [2]
 - **Markdown Monster** - PlantUML rendering in markdown [1]
 **Diagram-as-Code Tools:**
 - **Mermaid** - JavaScript-based, GitHub native support, live editor, supports flowcharts/UML/Git graphs [3][4]
 - **PlantUML** - mature UML from text, exports SVG/PNG [1][2]
 - Both integrate directly into markdown files as code blocks
 **Documentation Platforms:**
 - **Docusaurus** (Meta) - free, static site generator, versioning/i18n, Markdown+React [5]
 - **GitBook** - cloud-based, collaborative, commercial product [5]
 - **Read the Docs** - auto-builds from Git, integrates Sphinx/MkDocs [5]
 **Knowledge Bases:**
 - **Obsidian** - local markdown files, graph view, fast, offline-first, struggles with team collaboration [6]
 - **Notion** - flexible, AI features, gets slow with large datasets, good for small teams [6]
 - **Confluence** - enterprise-grade, Atlassian ecosystem, expensive, best team collaboration [6][7]
 ## Architectural Gap Analysis
 None of these fully solve your stated problem:
 1. **Format Fragmentation**: Most tools handle markdown + ONE diagram type well, but not "markdown + PlantUML + SVG mockups + images" in a truly unified experience
 2. **Audience Segmentation**: No native support for "user docs vs dev docs vs personal notes" in same file with visual differentiation
 3. **Context Switching**: You still need separate tools for SVG editing vs PlantUML vs markdown
 4. **Scale Navigation**: Large docs remain unwieldy; hierarchical solutions scatter information
 ## Build vs. Buy Recommendation
 **DON'T BUILD if:**
 - You can standardize on Mermaid (instead of PlantUML) + Markdown + GitHub/GitLab [3][4]
 - Your team already uses VS Code - extensions provide 80% of unified experience [1][2]
 - You need enterprise features (access control, audit logs) - use Confluence [6]
 **CONSIDER BUILDING if:**
 - You need true multi-format artifact management (markdown + multiple diagram types + SVGs + images)
 - Audience-based rendering is critical (same doc, different views for users/devs/personal)
 - You want docs-as-data (queryable, programmable documentation)
 - You're willing to invest in a specialized tool for your workflow
 ## Hybrid Approach (Recommended)
 1. **Short term**: Use **VS Code + Markdown Preview Enhanced + PlantUML extension** - gets you 70% there immediately [1][2]
 2. **Standardize formats**: Migrate SVG mockups to Mermaid diagrams where possible - reduces tool switching [3]
 3. **Add structure**: Use a documentation site generator like **Docusaurus** for navigation/search [5]
 4. **Evaluate pain**: If after 3 months the toolchain still hurts, you have data to justify custom tooling
 ## If You Build
 Architecturally, this is a **documentation artifact management system** with these core services:
 - Artifact store (markdown/diagrams/media as versioned entities)
 - Rendering pipeline (markdown → HTML, PlantUML → SVG, etc.)
 - Graph index (links, references, backlinks)
 - Multi-format editor/viewer UI
 - Search/navigation layer
 See AI-Backend's comment for API design - their artifact-as-entity model is sound. This is a 6-12 month project to match existing tool quality.
 ## Sources
 - [How to edit Markdown + UML in Visual Studio Code](https://www.freecodecamp.org/news/inserting-uml-in-markdown-using-vscode/)
 - [PlantUML](https://plantuml.com/)
 - [Include diagrams in your Markdown files with Mermaid](https://github.blog/developer-skills/github/include-diagrams-markdown-files-mermaid/)
 - [Mermaid | Diagramming and charting tool](https://mermaid.js.org/)
 - [Introduction | Docusaurus](https://docusaurus.io/docs)
 - [Notion vs Obsidian – All Features Compared (2026)](https://productive.io/blog/notion-vs-obsidian/)
 - [Technical Documentation in Software Development](https://www.altexsoft.com/blog/technical-documentation-in-software-development-types-best-practices-and-tools/)
 DIAGRAM: diagrams/general-discussion_4.puml
 VOTE: CHANGES
 ---
 Name: AI-Backend
 I can’t browse the web in this environment, so I can’t live-search tools. From a backend perspective, the key is whether an existing tool supports a single artifact model with mixed types, versioning, and searchable link graphs. When evaluating options, check for: REST/GraphQL APIs, import/sync from repo, render pipelines (PlantUML/SVG) with caching, and granular ACLs. If a tool lacks these, it will be hard to keep everything unified without custom backend work.
 VOTE: CHANGES
 ---
 Name: AI-Designer
 Good instinct asking about existing tools first—this is a design best practice called 'competitive analysis.' Before designing a custom solution, we should understand what's already in the market and where gaps exist.\n\nFrom a design perspective, several categories of tools might already address parts of this:\n\n1. **Knowledge Management Systems** (Obsidian, Notion, Confluence, LogSeq): These focus on linkage and unified viewing, but may lack native diagram/visual editing\n2. **Collaborative Doc Platforms** (Google Docs, Notion, Coda): Good at unified interface, less good at code/diagram handling\n3. **Static Site Generators + IDEs** (MkDocs, Docusaurus, Hugo): Great for single source of truth, weak on inline editing\n4. **Specialized Dev Doc Platforms** (Slite, GitBook): Better at code/markdown, variable diagram support\n5. **Monorepo Documentation** (Nx, Turborepo docs): Designed for scattered docs, but standard doc platform under the hood\n\nThe key question: **Do any of these provide a single visual interface where you can simultaneously:**\n- View/edit markdown\n- Embed and render diagrams\n- View/edit SVG mockups\n- Maintain cross-file linking\n- All without context-switching?\n\nIf no existing tool checks all boxes, that's when we have a **genuine design opportunity**. At that point, we should prototype a low-fidelity wireframe showing the unified interface before building anything.\n\nRecommendation: Research existing solutions first. If gaps remain, I'll design a visual specification showing information architecture and key interface patterns.
 VOTE: NONE
 ---
 Name: AI-Ux
 The search for existing tools reveals that many modern developer knowledge base platforms, especially when combined with 'diagrams as code' solutions, directly address the UX pain points of fragmentation and cognitive load.
 **Key UX Improvements Offered by Existing Tools:**
 1.  **Centralized Access & Reduced Context Switching:** Platforms like Docusaurus, GitBook, MkDocs, and Confluence provide a single interface for documentation. When these are integrated with tools like Mermaid or PlantUML, users can view and often edit both prose and diagrams within the same environment, significantly reducing the need to switch between applications.
 2.  **Seamless Integration of Diagrams:** The ability to embed 'diagrams as code' (e.g., Mermaid, PlantUML) directly within Markdown files is a major UX win. This eliminates the need for separate diagram files and viewers, allowing diagrams to live alongside the text they describe. This improves visual cross-referencing and reduces mental effort.
 3.  **Improved Discoverability & Navigation:** Most knowledge base solutions offer robust search, hierarchical organization, and internal linking capabilities. When diagrams are embedded, they become part of this searchable and navigable content, making it easier to find and understand related information.
 4.  **Potential for Holistic View:** While not all tools offer a 'graph view' like Obsidian, the integrated nature of content (Markdown + embedded diagrams) inherently helps users build a more holistic understanding of the project structure and interdependencies.
 5.  **Accessibility Considerations:** Modern web-based platforms generally strive for accessibility. Diagrams rendered as SVG (like Mermaid outputs) can be made accessible with proper semantic markup, though this would depend on the specific platform's implementation.
 **Overall UX Assessment:**
 These existing solutions offer a strong foundation for a much-improved documentation experience. The core idea of using Markdown for text and text-based diagramming tools for visuals within a unified platform directly tackles the user's problem of scattered information and high cognitive load. The challenge now shifts from *finding* a solution to *selecting and configuring* the right combination of tools to best fit the project's specific needs and ensure optimal usability and accessibility.
 DIAGRAM: diagrams/general-discussion_ux_4.puml
 DIAGRAM: diagrams/general-discussion_ux_4.puml
 VOTE: READY
 ---
 Name: AI-Frontend
 Before building, let's research existing solutions. From a frontend perspective, there are several tools that already address parts of this problem:
 1. **Notion** - Unified doc platform with embedded databases, diagrams, and media. Has excellent component-based rendering and split editing.
 2. **Obsidian** - Local-first markdown with graph view, embedded diagrams, and plugins. Great for developers but lacks built-in UML rendering.
 3. **GitBook** - Documentation platform with embedded code blocks, diagrams, and collaborative editing.
 4. **Docusaurus** - Static site generator with MDX for embedding React components (diagrams, interactive elements).
 5. **Draw.io/Diagrams.net** - Integrated diagramming with markdown export capabilities.
 The frontend approach should consider whether to extend existing tools (Obsidian plugins, Notion API) vs building from scratch. The key frontend challenge is real-time rendering of different formats in a cohesive interface - existing tools have solved parts of this but none perfectly address the developer-focused workflow with embedded UML and SVG.
 Frontend implementation should prioritize:
 - Web Component approach for format-specific viewers
 - Efficient lazy loading for heavy diagram renders
 - Local-first storage with sync capabilities
 DIAGRAM: diagrams/general-discussion_frontend_4.puml
 DIAGRAM: diagrams/general-discussion_frontend_4.puml
 VOTE: CHANGES
--- a/src/discussions/ui/gui.py
+++ b/src/discussions/ui/gui.py
@ -576,8 +576,41 @@ class DiscussionGUI:
        thread = threading.Thread(target=do_transcribe, daemon=True)
        thread.start()
    def _on_comment_text_changed(self, sender, app_data):
        """Track text changes to approximate cursor position.
        Since DearPyGui doesn't expose cursor position, we detect where edits
        occur by comparing old and new text. The cursor is likely at the edit point.
        """
        new_text = app_data or ""
        old_text = self._last_text_value
        if len(new_text) > len(old_text):
            # Text was added - find where the insertion happened
            # Compare from start to find first difference
            for i in range(len(old_text) + 1):
                if i >= len(old_text) or (i < len(new_text) and i < len(old_text) and new_text[i] != old_text[i]):
                    # Insertion point found, cursor is after the inserted text
                    inserted_len = len(new_text) - len(old_text)
                    self._approx_cursor_pos = i + inserted_len
                    break
                elif i == len(old_text):
                    # Appended at end
                    self._approx_cursor_pos = len(new_text)
        elif len(new_text) < len(old_text):
            # Text was deleted - cursor is at deletion point
            for i in range(len(new_text) + 1):
                if i >= len(new_text) or (i < len(new_text) and i < len(old_text) and new_text[i] != old_text[i]):
                    self._approx_cursor_pos = i
                    break
                elif i == len(new_text):
                    self._approx_cursor_pos = len(new_text)
        # If same length, assume cursor didn't move (replacement)
        self._last_text_value = new_text
    def _on_transcription_chunk_complete(self, transcript: str, is_final: bool):
-        """Called when a chunk transcription completes - append to comment text."""
+        """Called when a chunk transcription completes - insert or append based on mode."""
        # Handle final chunk - reset button
        if is_final:
            if dpg.does_item_exist("dictate_btn"):
@ -593,11 +626,40 @@ class DiscussionGUI:
            self._show_error(f"Transcription: {transcript}")
            return
-        # Append to existing comment text
+        # Insert or append based on dictation mode
        if dpg.does_item_exist("comment_text"):
            current = dpg.get_value("comment_text") or ""
-            separator = " " if current.strip() else ""
+
-            dpg.set_value("comment_text", current + separator + transcript)
+            # Determine insertion mode based on how dictation was started
            # For walkie-talkie (push_to_talk), insert at cursor position
            # For continuous mode, append to end
            if self._dictation_mode in ("push_to_talk", "push_to_talk_pending", "idle"):
                # Walkie-talkie mode or just finished - insert at cursor
                pos = min(self._approx_cursor_pos, len(current))
                # Add space before if inserting mid-text and previous char isn't space
                prefix = ""
                if pos > 0 and current[pos-1:pos] not in (" ", "\n", ""):
                    prefix = " "
                # Add space after if there's text after and it doesn't start with space
                suffix = ""
                if pos < len(current) and current[pos:pos+1] not in (" ", "\n", ""):
                    suffix = " "
                new_text = current[:pos] + prefix + transcript + suffix + current[pos:]
                dpg.set_value("comment_text", new_text)
                # Update cursor position to after inserted text
                self._approx_cursor_pos = pos + len(prefix) + len(transcript) + len(suffix)
                self._last_text_value = new_text
            else:
                # Continuous mode - append to end
                separator = " " if current.strip() else ""
                new_text = current + separator + transcript
                dpg.set_value("comment_text", new_text)
                self._last_text_value = new_text
                self._approx_cursor_pos = len(new_text)
    def __init__(self, discussions_dir: Path = None):
        if discussions_dir is None:
@ -628,6 +690,12 @@ class DiscussionGUI:
        self._mouse_down_time = 0.0
        self._push_to_talk_thread: Optional[threading.Thread] = None
        # Cursor position tracking for walkie-talkie insertion
        # Since DearPyGui doesn't expose cursor position, we approximate it by
        # tracking where text changes occur
        self._last_text_value = ""
        self._approx_cursor_pos = 0  # Approximate cursor position for insertion
        # Initialize Dear PyGui
        dpg.create_context()
        dpg.create_viewport(title="Orchestrated Discussions", width=1400, height=900)
@ -1995,6 +2063,9 @@ class DiscussionGUI:
        # Reset dictation state when dialog opens
        self._dictation_mode = "idle"
        self._last_dictate_click_time = 0.0
        # Reset cursor tracking for walkie-talkie insertion
        self._last_text_value = ""
        self._approx_cursor_pos = 0
        with dpg.window(label="Add Comment", tag=window_tag, modal=True,
                       width=600, height=400, pos=[400, 200]):
@ -2019,7 +2090,14 @@ class DiscussionGUI:
                    width=100
                )
-            dpg.add_input_text(tag="comment_text", multiline=True, width=-1, height=200)
+            dpg.add_input_text(
                tag="comment_text",
                multiline=True,
                width=-1,
                height=200,
                callback=self._on_comment_text_changed,
                on_enter=False  # Don't trigger on enter for multiline
            )
            dpg.add_spacer(height=10)
            dpg.add_text("Vote (optional):")