docs: complete update for AI normalization architecture

Updated all documentation to reflect the new two-tier extraction system: **workflow-marker-extraction.puml:** - Completely rewritten to show AI normalization flow - Documents agents.normalize_discussion() as primary method - Shows simple line-start fallback for explicit markers - Includes natural conversation examples vs. explicit markers - Demonstrates resilience and cost-effectiveness **AUTOMATION.md:** - Restructured "Conversation Guidelines" section - Emphasizes natural conversation as recommended approach - Clarifies AI normalization extracts from conversational text - Documents explicit markers as fallback when AI unavailable - Explains two-tier architecture benefits **diagrams-README.md:** - Already updated in previous commit All documentation now accurately reflects: ✅ AI-powered extraction (agents.py) for natural conversation ✅ Simple fallback parsing (workflow.py) for explicit markers ✅ Multi-provider resilience (claude → codex → gemini) ✅ No strict formatting requirements for participants 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-02 20:04:09 -04:00 · 2025-11-02 20:04:09 -04:00 · 33b550ad5b
parent 380c7b5d12
commit 33b550ad5b
3 changed files with 293 additions and 211 deletions
--- a/docs/AUTOMATION.md
+++ b/docs/AUTOMATION.md
@ -265,41 +265,73 @@ Captures architectural decisions with rationale.
 <!-- SUMMARY:DECISIONS END -->
 ```

-## Conversation Guidelines (Optional)
+## Conversation Guidelines

-Using these markers helps extract information accurately. **Many work without AI using regex:**
+### Natural Conversation (Recommended)
+
+**Write naturally - AI normalization extracts markers automatically:**

 ```markdown
-# Markers (✅ = works without AI)
+# Examples of natural conversation that AI understands:

-Q: <question>          # ✅ Mark questions explicitly (also: "Question:", or ending with ?)
-A: <answer>            # Mark answers explicitly (AI tracks these)
-Re: <response>         # Partial answers or follow-ups (AI tracks these)
+- Alice: I think we should use OAuth2. Does anyone know if we need OAuth 2.1 specifically?
+  VOTE: READY
+
+- Bob: Good question Alice. I'm making a decision here - we'll use OAuth 2.0 for now.
+  @Carol can you research migration paths to 2.1? VOTE: CHANGES
+
+- Carol: I've completed the OAuth research. We can upgrade later without breaking changes.
+  VOTE: READY
+```
+
+**AI normalization (via `agents.py`) extracts:**
+- Decisions from natural language ("I'm making a decision here - ...")
+- Questions from conversational text ("Does anyone know if...")
+- Action items with @mentions ("@Carol can you research...")
+- Votes (always tracked: `VOTE: READY|CHANGES|REJECT`)
+
+### Explicit Markers (Fallback)
+
+**If AI is unavailable, these explicit line-start markers work as fallback:**
+
+```markdown
+# Markers (✅ = works without AI as simple fallback)
+
+QUESTION: <question>   # ✅ Explicit question marker
+Q: <question>          # ✅ Short form

 TODO: <action>         # ✅ New unassigned task
-ACTION: <action>       # ✅ Task with implied ownership (alias for TODO)
-ASSIGNED: <task> @name # ✅ Claimed task (extracts @mention as assignee)
+ACTION: <action>       # ✅ Task with implied ownership
+ASSIGNED: <task> @name # ✅ Claimed task
 DONE: <completion>     # ✅ Mark task complete

-DECISION: <choice>     # ✅ Architectural decision (AI adds rationale/alternatives)
-Rationale: <why>       # Explain reasoning (AI extracts this)
+DECISION: <choice>     # ✅ Architectural decision

-VOTE: READY|CHANGES|REJECT  # ✅ REQUIRED for voting (always tracked)
+VOTE: READY|CHANGES|REJECT  # ✅ ALWAYS tracked (with or without AI)

-@Name                  # ✅ Mention someone specifically
-@all                   # ✅ Mention everyone
+@Name                  # ✅ Mention extraction (simple regex)
 ```

-**Example Workflow:**
+**Example with explicit markers:**
 ```markdown
- Alice: Q: Should we support OAuth2?
+- Alice: QUESTION: Should we support OAuth2?
 - Bob: TODO: Research OAuth2 libraries
- Bob: ASSIGNED: OAuth2 library research (@Bob taking ownership)
- Carol: DECISION: Use OAuth2 for authentication. Rationale: Industry standard with good library support.
- Carol: DONE: Completed OAuth2 comparison document
- Dave: @all Please review the comparison by Friday. VOTE: READY
+- Bob: ASSIGNED: OAuth2 library research
+- Carol: DECISION: Use OAuth2 for authentication
+- Dave: @all Please review. VOTE: READY
 ```

+### Two-Tier Architecture
+
+1. **AI Normalization (Primary):** Handles natural conversation, embedded markers, context understanding
+2. **Simple Fallback:** Handles explicit line-start markers when AI unavailable
+
+Benefits:
+- ✅ Participants write naturally without strict formatting
+- ✅ Resilient (multi-provider fallback: claude → codex → gemini)
+- ✅ Works offline/API-down with explicit markers
+- ✅ Cost-effective (uses fast models for extraction)
+
 ## Implementation Details

 ### Incremental Processing
--- a/docs/workflow-marker-extraction.puml
+++ b/docs/workflow-marker-extraction.puml
@ -1,6 +1,6 @@
@startuml workflow-marker-extraction
 !theme plain
-title Workflow Marker Extraction with Regex Pattern Matching
+title Workflow Marker Extraction with AI Normalization

 start

@ -8,152 +8,80 @@ start

 :workflow.py reads file content;

-partition "Parse Comments" {
-  :Split file into lines;
-
-  repeat
-    :Read next line;
-
-    if (Line is HTML comment?) then (yes)
-      :Skip (metadata);
-    else if (Line is heading?) then (yes)
-      :Skip (structure);
-    else (participant comment)
-      :Extract participant name\n(before first ":");
+partition "Two-Tier Extraction" {
+  :Call extract_structured_basic()\nSimple fallback parsing;

  note right
-        **Participant Format:**
-        - Rob: Comment text...
-        - Sarah: Comment text...
-        - AI_Claude: Comment text...
+    **Fallback: Simple Line-Start Matching**
+    Only matches explicit markers at line start:
+    - DECISION: text
+    - QUESTION: text
+    - Q: text
+    - ACTION: text
+    - TODO: text
+    - ASSIGNED: text
+    - DONE: text

-        Names starting with "AI_"
-        are excluded from voting if
-        allow_agent_votes: false
+    Uses case-insensitive startswith() matching.
+    Handles strictly-formatted discussions.
  end note

-      partition "Extract Structured Markers" {
-        :Apply regex patterns\nto comment text;
+  :Store fallback results\n(decisions, questions, actions, mentions);

-        if (**DECISION**: found?) then (yes)
-          :Extract decision text;
-          :Store decision record;
+  :Call agents.normalize_discussion()\nAI-powered extraction;
+
+  partition "AI Normalization (agents.py)" {
+    :Build prompt for AI model;
    note right
-            **Pattern:**
-            (?:\\*\\*)?DECISION(?:\\*\\*)?
-            \\s*:\\s*(.+?)
-            (?=\\s*(?:\\*\\*QUESTION|\\*\\*ACTION|VOTE:)|$)
+      **AI Prompt:**
+      "Extract structured information from discussion.
+      Return JSON with: votes, questions, decisions,
+      action_items, mentions"

-            **Captures:** Decision text until next marker
-
-            **Example:**
-            {
-              participant: "Rob",
-              decision: "text...",
-              rationale: "",
-              supporters: []
-            }
+      Supports natural conversation like:
+      "I'm making a decision here - we'll use X"
+      "Does anyone know if we need Y?"
+      "@Sarah can you check Z?"
    end note
-        endif

-        if (**QUESTION**: found?) then (yes)
-          :Extract question text;
-          :Store question record;
+    :Execute command chain\n(claude → codex → gemini);
+
+    if (AI returned valid JSON?) then (yes)
+      :Parse JSON response;
+      :Extract structured data:\n- votes\n- questions\n- decisions\n- action_items\n- mentions;
+      :Override fallback results\nwith AI results;
      note right
-            **Pattern:**
-            (?:\\*\\*)?(?:QUESTION|Q)(?:\\*\\*)?
-            \\s*:\\s*(.+?)
-            (?=\\s*(?:\\*\\*DECISION|\\*\\*ACTION|VOTE:)|$)
-
-            **Captures:** Question text until next marker
-
-            **Example:**
-            {
-              participant: "Rob",
-              question: "text...",
-              status: "OPEN"
-            }
+        **AI advantages:**
+        - Handles embedded markers
+        - Understands context
+        - Extracts from natural language
+        - No strict formatting required
      end note
-        endif
-
-        if (**ACTION**: found?) then (yes)
-          :Extract action text;
-          :Search for @mention in text;
-          :Store action record;
+    else (no - AI failed or unavailable)
+      :Use fallback results only;
      note right
-            **Pattern:**
-            (?:\\*\\*)?(?:ACTION|TODO)(?:\\*\\*)?
-            \\s*:\\s*(.+?)
-            (?=\\s*(?:\\*\\*DECISION|\\*\\*QUESTION|VOTE:)|$)
-
-            **Captures:** Action text + assignee from @mention
-
-            **Example:**
-            {
-              participant: "Rob",
-              action: "text...",
-              assignee: "Sarah",
-              status: "TODO"
-            }
+        **Fallback activated when:**
+        - All providers fail
+        - Invalid JSON response
+        - agents.py import fails
+        - API rate limits hit
      end note
    endif
-
-        if (Line ends with "?") then (yes)
-          :Auto-detect as question;
-          note right
-            Fallback heuristic:
-            If no explicit marker but
-            line ends with "?",
-            treat as question
-          end note
-        endif
-
-        if (@mention found?) then (yes)
-          :Extract @mentions;
-          :Store in "Awaiting Replies" list;
-        endif
  }
-
-      if (VOTE: line found?) then (yes)
-        :Extract vote value:\nREADY|CHANGES|REJECT;
-        :Store latest vote per participant;
-      endif
-    endif
-
-  repeat while (More lines?) is (yes)
-  -> no;
 }

 partition "Generate Summary Sections" {
-  :Format Decisions section:
-  - Group by participant
-  - Number sequentially
-  - Include rationale if present;
+  :Format Decisions section:\n- Group by participant\n- Number sequentially\n- Include rationale if present;

-  :Format Open Questions section:
-  - List unanswered questions
-  - Track by participant
-  - Mark status (OPEN/PARTIAL);
+  :Format Open Questions section:\n- List unanswered questions\n- Track by participant\n- Mark status (OPEN/PARTIAL);

-  :Format Action Items section:
-  - Group by status (TODO/ASSIGNED/DONE)
-  - Show assignees
-  - Link to requesters;
+  :Format Action Items section:\n- Group by status (TODO/ASSIGNED/DONE)\n- Show assignees\n- Link to requesters;

-  :Format Awaiting Replies section:
-  - Group by @mentioned person
-  - Show context of request
-  - Track unresolved mentions;
+  :Format Awaiting Replies section:\n- Group by @mentioned person\n- Show context of request\n- Track unresolved mentions;

-  :Format Votes section:
-  - Count by value (READY/CHANGES/REJECT)
-  - List latest vote per participant
-  - Exclude AI votes if configured;
+  :Format Votes section:\n- Count by value (READY/CHANGES/REJECT)\n- List latest vote per participant\n- Exclude AI votes if configured;

-  :Format Timeline section:
-  - Chronological order (newest first)
-  - Include status changes
-  - Summarize key events;
+  :Format Timeline section:\n- Chronological order (newest first)\n- Include status changes\n- Summarize key events;
 }

 :Update marker blocks in .sum.md;
@ -168,73 +96,44 @@ end note
 stop

 legend bottom
-  **Example Input (feature.discussion.md):**
+  **Example Input (natural conversation):**

-  Rob: The architecture looks solid. **DECISION**: We'll use PostgreSQL
-  for the database. **QUESTION**: Should we use TypeScript or JavaScript?
-  **ACTION**: @Sarah please research auth libraries. Looking forward to
-  feedback. VOTE: CHANGES
+  Rob: I've been thinking about the timeline. I'm making a decision here -
+  we'll build the upload system first. Does anyone know if we need real-time
+  preview? @Sarah can you research Unity Asset Store API? VOTE: READY

-  **Extracted Output (.sum.md):**
+  **AI Normalization Output (JSON):**
+  {
+    "votes": [{"participant": "Rob", "vote": "READY"}],
+    "decisions": [{"participant": "Rob",
+                   "decision": "build the upload system first"}],
+    "questions": [{"participant": "Rob",
+                   "question": "if we need real-time preview"}],
+    "action_items": [{"participant": "Rob", "action": "research Unity API",
+                      "assignee": "Sarah"}],
+    "mentions": [{"from": "Rob", "to": "Sarah"}]
+  }

-  <!-- SUMMARY:DECISIONS START -->
-  ## Decisions (ADR-style)
-  ### Decision 1: We'll use PostgreSQL for the database.
-  - **Proposed by:** @Rob
-  <!-- SUMMARY:DECISIONS END -->
-
-  <!-- SUMMARY:OPEN_QUESTIONS START -->
-  ## Open Questions
-  - @Rob: Should we use TypeScript or JavaScript?
-  <!-- SUMMARY:OPEN_QUESTIONS END -->
-
-  <!-- SUMMARY:ACTION_ITEMS START -->
-  ## Action Items
-  ### TODO (unassigned):
-  - [ ] @Sarah please research auth libraries (suggested by @Rob)
-  <!-- SUMMARY:ACTION_ITEMS END -->
-
-  <!-- SUMMARY:AWAITING START -->
-  ## Awaiting Replies
-  ### @Sarah
-  - @Rob: ... **ACTION**: @Sarah please research auth libraries ...
-  <!-- SUMMARY:AWAITING END -->
+  **Fallback Only Matches:**
+  DECISION: We'll build upload first
+  QUESTION: Do we need real-time preview?
+  ACTION: @Sarah research Unity API
 endlegend

 note right
-  **Regex Pattern Details:**
+  **Architecture Benefits:**

-  **Decision Pattern:**
-  (?:\\*\\*)?DECISION(?:\\*\\*)?\\s*:\\s*(.+?)
-  (?=\\s*(?:\\*\\*QUESTION|\\*\\*ACTION|VOTE:)|$)
+  ✓ Participants write naturally
+  ✓ No strict formatting rules
+  ✓ AI handles understanding
+  ✓ Simple code for fallback
+  ✓ Resilient (multi-provider chain)
+  ✓ Cost-effective (fast models)

-  **Features:**
-  - Case-insensitive
-  - Optional markdown bold (**) on both sides
-  - Captures text until next marker or VOTE:
-  - DOTALL mode for multi-line capture
-
-  **Supported Formats:**
-  - DECISION: text
-  - **DECISION**: text
-  - decision: text
-  - **decision**: text
-end note
-
-note right
-  **Why Regex Instead of Line-Start Matching?**
-
-  ✗ Old approach: `if line.startswith("decision:"):`
-  Problem: Markers embedded mid-sentence fail
-
-  ✓ New approach: Regex search anywhere in line
-  Handles: "Good point. **DECISION**: We'll use X."
-
-  **Benefits:**
-  - Natural conversational style
-  - Markdown formatting preserved
-  - Multiple markers per comment
-  - Robust extraction
+  **Files:**
+  - automation/agents.py (AI normalization)
+  - automation/workflow.py (fallback + orchestration)
+  - automation/patcher.py (provider chain execution)
 end note

@enduml
--- a/docs/workflow-marker-extraction.svg
+++ b/docs/workflow-marker-extraction.svg
@ -0,0 +1,151 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" contentScriptType="application/ecmascript" contentStyleType="text/css" height="103px" preserveAspectRatio="none" style="width:352px;height:103px;background:#000000;" version="1.1" viewBox="0 0 352 103" width="352px" zoomAndPan="magnify"><defs/><g><rect fill="#11060A" height="1" style="stroke: #11060A; stroke-width: 1.0;" width="1" x="0" y="0"/><rect fill="#33FF02" height="24.0679" style="stroke: #33FF02; stroke-width: 1.0;" width="346" x="5" y="5"/><text fill="#000000" font-family="sans-serif" font-size="14" font-weight="bold" lengthAdjust="spacingAndGlyphs" textLength="344" x="6" y="20">[From workflow-marker-extraction.puml (line 2) ]</text><text fill="#33FF02" font-family="sans-serif" font-size="14" font-weight="bold" lengthAdjust="spacingAndGlyphs" textLength="0" x="9" y="43.0679"/><text fill="#33FF02" font-family="sans-serif" font-size="14" font-weight="bold" lengthAdjust="spacingAndGlyphs" textLength="275" x="5" y="62.1358">@startuml workflow-marker-extraction</text><text fill="#33FF02" font-family="sans-serif" font-size="14" font-weight="bold" lengthAdjust="spacingAndGlyphs" textLength="87" x="5" y="81.2038">!theme plain</text><text fill="#FF0000" font-family="sans-serif" font-size="14" font-weight="bold" lengthAdjust="spacingAndGlyphs" textLength="93" x="9" y="100.2717">Syntax Error?</text><!--MD5=[32d7802434cc4c797d2bc79c191390cf]
+@startuml workflow-marker-extraction
+!theme plain
+title Workflow Marker Extraction with AI Normalization
+
+start
+
+:Discussion file staged\n(feature.discussion.md,\ndesign.discussion.md, etc);
+
+:workflow.py reads file content;
+
+partition "Two-Tier Extraction" {
+  :Call extract_structured_basic()\nSimple fallback parsing;
+
+  note right
+    **Fallback: Simple Line-Start Matching**
+    Only matches explicit markers at line start:
+    - DECISION: text
+    - QUESTION: text
+    - Q: text
+    - ACTION: text
+    - TODO: text
+    - ASSIGNED: text
+    - DONE: text
+
+    Uses case-insensitive startswith() matching.
+    Handles strictly-formatted discussions.
+  end note
+
+  :Store fallback results\n(decisions, questions, actions, mentions);
+
+  :Call agents.normalize_discussion()\nAI-powered extraction;
+
+  partition "AI Normalization (agents.py)" {
+    :Build prompt for AI model;
+    note right
+      **AI Prompt:**
+      "Extract structured information from discussion.
+      Return JSON with: votes, questions, decisions,
+      action_items, mentions"
+
+      Supports natural conversation like:
+      "I'm making a decision here - we'll use X"
+      "Does anyone know if we need Y?"
+      "@Sarah can you check Z?"
+    end note
+
+    :Execute command chain\n(claude → codex → gemini);
+
+    if (AI returned valid JSON?) then (yes)
+      :Parse JSON response;
+      :Extract structured data:\n- votes\n- questions\n- decisions\n- action_items\n- mentions;
+      :Override fallback results\nwith AI results;
+      note right
+        **AI advantages:**
+        - Handles embedded markers
+        - Understands context
+        - Extracts from natural language
+        - No strict formatting required
+      end note
+    else (no - AI failed or unavailable)
+      :Use fallback results only;
+      note right
+        **Fallback activated when:**
+        - All providers fail
+        - Invalid JSON response
+        - agents.py import fails
+        - API rate limits hit
+      end note
+    endif
+  }
+}
+
+partition "Generate Summary Sections" {
+  :Format Decisions section:\n- Group by participant\n- Number sequentially\n- Include rationale if present;
+
+  :Format Open Questions section:\n- List unanswered questions\n- Track by participant\n- Mark status (OPEN/PARTIAL);
+
+  :Format Action Items section:\n- Group by status (TODO/ASSIGNED/DONE)\n- Show assignees\n- Link to requesters;
+
+  :Format Awaiting Replies section:\n- Group by @mentioned person\n- Show context of request\n- Track unresolved mentions;
+
+  :Format Votes section:\n- Count by value (READY/CHANGES/REJECT)\n- List latest vote per participant\n- Exclude AI votes if configured;
+
+  :Format Timeline section:\n- Chronological order (newest first)\n- Include status changes\n- Summarize key events;
+}
+
+:Update marker blocks in .sum.md;
+note right
+  <!- - SUMMARY:DECISIONS START - ->
+  ...
+  <!- - SUMMARY:DECISIONS END - ->
+end note
+
+:Stage updated .sum.md file;
+
+stop
+
+legend bottom
+  **Example Input (natural conversation):**
+
+  Rob: I've been thinking about the timeline. I'm making a decision here -
+  we'll build the upload system first. Does anyone know if we need real-time
+  preview? @Sarah can you research Unity Asset Store API? VOTE: READY
+
+  **AI Normalization Output (JSON):**
+  {
+    "votes": [{"participant": "Rob", "vote": "READY"}],
+    "decisions": [{"participant": "Rob",
+                   "decision": "build the upload system first"}],
+    "questions": [{"participant": "Rob",
+                   "question": "if we need real-time preview"}],
+    "action_items": [{"participant": "Rob", "action": "research Unity API",
+                      "assignee": "Sarah"}],
+    "mentions": [{"from": "Rob", "to": "Sarah"}]
+  }
+
+  **Fallback Only Matches:**
+  DECISION: We'll build upload first
+  QUESTION: Do we need real-time preview?
+  ACTION: @Sarah research Unity API
+endlegend
+
+note right
+  **Architecture Benefits:**
+
+  ✓ Participants write naturally
+  ✓ No strict formatting rules
+  ✓ AI handles understanding
+  ✓ Simple code for fallback
+  ✓ Resilient (multi-provider chain)
+  ✓ Cost-effective (fast models)
+
+  **Files:**
+  - automation/agents.py (AI normalization)
+  - automation/workflow.py (fallback + orchestration)
+  - automation/patcher.py (provider chain execution)
+end note
+
+@enduml
+
+PlantUML version 1.2020.02(Sun Mar 01 06:22:07 AST 2020)
+(GPL source distribution)
+Java Runtime: OpenJDK Runtime Environment
+JVM: OpenJDK 64-Bit Server VM
+Java Version: 21.0.8+9-Ubuntu-0ubuntu124.04.1
+Operating System: Linux
+Default Encoding: UTF-8
+Language: en
+Country: CA
+--></g></svg>