orchestrated-discussions/examples/brainstorm_sketch_test.disc...

22 KiB
Raw Blame History

Notification System Brainstorm

Problem Statement

Our application needs a way to notify users about important events (new messages, status changes, alerts). Currently we have no notification system at all.

Known Constraints

  • Small team (2 developers)
  • Need something working within 2 weeks
  • Must work on web and mobile
  • Budget: minimal (prefer open source)
  • Current stack: Python backend, React frontend

What Success Looks Like

  • Users get timely notifications about events they care about
  • Users can control what notifications they receive
  • System is reliable enough that important notifications aren't lost

Agreed Scope (from cluster phase)

  • MVP: Web notifications only (browser), mobile comes later
  • Start with 3 event types: new message, status change, system alert
  • "Timely" means within 30 seconds
  • Simple on/off preferences per event type
  • Target: 1000 users initially

We're now in the sketch phase - create rough diagrams to illustrate the approach.


Name: AI-Architect Parse error VOTE: NONE


Name: AI-Security

Security Analysis - Notification System

From a security perspective, a notification system introduces several critical attack surfaces that need early consideration:

Key Security Components

  1. Authentication & Authorization Layer

    • Who can send notifications? (internal services only? authenticated users?)
    • Who can receive which notifications? (proper user segmentation)
    • API authentication between backend and notification service
  2. Input Validation & Sanitization

    • Notification content must be sanitized to prevent XSS in browser notifications
    • Event type validation to prevent unauthorized notification types
    • Rate limiting per user/service to prevent notification spam/DoS
  3. Data Protection

    • Notification content may contain sensitive data (PII, business data)
    • Storage considerations if implementing delivery guarantees
    • Encryption in transit (WSS for WebSocket, HTTPS for polling)
  4. Session Management

    • WebSocket connection authentication (token-based preferred)
    • Session hijacking prevention
    • Proper cleanup of stale connections

Threat Model

DIAGRAM: diagrams/notification-system-brainstorm_security_1.puml

Primary Threats:

  • XSS via notifications: Malicious user injects script into notification content → executes in victim's browser
  • Notification spam/DoS: Attacker floods user with notifications, exhausting resources
  • Authorization bypass: User receives notifications intended for another user
  • MitM attacks: Unencrypted WebSocket connections expose notification content
  • Token theft: Stolen auth tokens allow impersonation for WebSocket connections

Critical Security Requirements

  1. Mandatory: Sanitize ALL notification content server-side before sending
  2. Mandatory: Authenticate WebSocket connections with short-lived tokens
  3. Mandatory: Enforce per-user/per-service rate limits (suggest: 100/min per user)
  4. Recommended: Use WSS (WebSocket Secure) protocol only
  5. Recommended: Implement notification content ACLs (user can only receive their own data)

CONCERN: Browser notifications API requires user permission, but malicious notifications could still be used for phishing attacks ("Your account will be suspended - click here")

CONCERN: If using WebSockets, need strategy for connection state management - stale connections are a DoS vector

QUESTION: Will notifications contain sensitive data, or just "You have 1 new message" style summaries? This affects our encryption and storage requirements.

QUESTION: Will there be a notification history/persistence? This significantly expands the attack surface.

Recommendations for MVP

  • Use established libraries (e.g., Socket.IO with built-in security features) rather than raw WebSockets
  • Implement rate limiting from day 1 (much harder to add later)
  • Sanitize on server side using a whitelist approach (only allow plain text for MVP)
  • Use short-lived JWT tokens for WebSocket auth (5-15 min expiry)
  • Log all notification sends for security auditing

VOTE: CHANGES - Need to explicitly address XSS prevention and rate limiting before implementation

Diagram: diagrams/notification-system-brainstorm_security_1.puml VOTE: CHANGES


Name: AI-Pragmatist Looking at the agreed scope, this is actually well-scoped already. Let me sketch the simplest thing that could work.

MVP Architecture (Week 1-2):

  1. Event Generation (Python backend)

    • When event happens → insert row in notifications table
    • Fields: user_id, event_type, message, created_at, read
  2. Delivery (polling - yes, polling)

    • Frontend polls /api/notifications/unread every 30s
    • Returns new notifications since last check
    • Browser Notification API shows popup
  3. Preferences (simple)

    • user_preferences table: user_id, event_type, enabled
    • Check before inserting notification

That's it. No WebSockets, no message queue, no separate notification service.

Why polling?

  • Works everywhere (no WebSocket infrastructure)
  • Handles 1000 users easily (30s interval = ~33 req/s)
  • Dead simple to implement and debug
  • Can upgrade later if needed

What we're NOT doing (yet):

  • Push notifications (mobile)
  • Real-time delivery (<30s)
  • Notification history/pagination
  • Batching/grouping
  • Email/SMS delivery

See diagram for the flow. We can build this in a week, then iterate based on real usage.

READY to sketch - this is simple enough to ship quickly.

Diagram: diagrams/notification-system-brainstorm_mvp_1.puml VOTE: READY


Name: AI-Moderator

Discussion Status

We've just entered the sketch phase following a successful clustering phase that defined our MVP scope:

  • Web notifications only (browser)
  • 3 event types: new message, status change, system alert
  • 30-second delivery target
  • Simple on/off preferences
  • 1000 initial users

Next Steps

We need rough diagrams to illustrate the technical approach. Specifically:

  • Component diagrams showing key pieces (3-5 components)
  • Sequence diagrams for notification flow
  • Mark any unknowns with "?"
  • Keep it rough - we're exploring, not finalizing

Waiting For

@architect - Please start us off with a high-level component diagram and sequence flow @security - Review the approach for security considerations @pragmatist - Assess feasibility for 2-week timeline with 2 developers

All participants should focus on creating simple diagrams and identifying the key architectural pieces.


Name: AI-Architect Parse error VOTE: NONE


Name: AI-Security Parse error VOTE: NONE


Name: AI-Pragmatist

Pragmatic Reality Check

Looking at what's been proposed:

Security's approach: 5 critical requirements, threat modeling, ACLs, JWT rotation, audit logging, rate limiting "from day 1".

Reality: We have 2 developers and 2 weeks. That security spec is a month of work.

CONCERN: COMPLEXITY - Security is architecting for enterprise scale when we need to ship for 1000 users

What We Actually Need

For 1000 users receiving basic notifications:

  1. Input sanitization: Use Python's bleach library (2 lines of code)
  2. Rate limiting: Flask-Limiter with @limiter.limit("100/minute") decorator (1 line per endpoint)
  3. Auth: Reuse existing session auth (0 new code)

That's it. That's the security MVP.

What we're NOT doing yet:

  • WebSocket infrastructure (we're polling, remember?)
  • Notification persistence (just show & dismiss)
  • Separate auth tokens (use existing sessions)
  • Audit logging (add when someone asks for it)

Simplified Flow

DIAGRAM: diagrams/notification-system-brainstorm_mvp_3.puml

This diagram shows polling vs the WebSocket complexity Security is implying. Polling means:

  • No connection state management
  • No token refresh logic
  • No WebSocket security surface
  • Standard HTTPS/session auth

Week 1-2 Implementation

Week 1: Core flow

  • Day 1-2: DB table + insert on events
  • Day 3-4: /api/notifications/unread endpoint + preferences
  • Day 5: Frontend polling + Browser Notification API

Week 2: Polish

  • Add bleach sanitization
  • Add Flask-Limiter (100/min)
  • Test with team
  • Ship to 50 beta users

Then we iterate based on actual problems, not theoretical ones.

QUESTION: @security - Can we agree on sanitization + rate limiting as the security MVP, and defer WebSocket security since we're using polling?

VOTE: READY - This is shippable in 2 weeks with 2 devs

DIAGRAM: diagrams/notification-system-brainstorm_mvp_3.puml VOTE: READY


Name: AI-Moderator

Sketch Phase Summary

We have two architectural approaches with diagrams:

AI-Pragmatist's Polling Approach:

  • Simple polling every 30s to /api/notifications/unread
  • notifications and user_preferences tables in existing DB
  • Browser Notification API for popups
  • No WebSockets, no message queue
  • Vote: READY - considers this shippable in 1 week

AI-Security's Analysis:

  • Identified 5 critical attack surfaces (XSS, DoS, auth bypass, MitM, token theft)
  • Provided threat model diagram
  • Recommends WebSocket approach with Socket.IO + security measures
  • Vote: CHANGES - requires explicit XSS prevention and rate limiting

AI-Architect:

  • Parse error on response - needs to re-respond

Key Tension

There's a architectural mismatch:

  • Pragmatist proposes simple polling (30s intervals, DB-based)
  • Security's recommendations assume WebSockets (tokens, connection management)

Both are valid for different approaches, but we need to align on which path before advancing.

Open Questions

  1. Data Sensitivity (from Security): Will notifications contain sensitive data or just summaries?
  2. Persistence (from Security): Will there be notification history? This affects security surface.
  3. XSS Prevention: How will we sanitize notification content? (Security requires this explicit)
  4. Rate Limiting: What limits per user/service? (Security requires from day 1)

Phase Advancement Assessment

We have diagrams (✓) but:

  • Missing architect's main flow diagram (parse error)
  • Need to resolve polling vs WebSocket approach
  • Security concerns not yet addressed in chosen design

Per phase instructions, we should have 3-5 key pieces identified with rough diagrams. We're close but not ready.

Recommendations

  1. @architect - Please re-submit your response (there was a parse error). We need your component diagram and sequence flow.
  2. @security - After architect responds, please review the specific approach chosen and provide targeted security requirements for that architecture (polling vs WebSocket)
  3. @pragmatist - Once security requirements are clear, confirm the 2-week timeline still holds with security measures included

Next phase readiness: Not yet - need architect's diagram and alignment on technical approach before advancing to reality_check phase.


Name: AI-Architect

Architectural Sketch - Component & Flow Analysis

Let me provide the missing architectural perspective to resolve the polling vs WebSocket tension.

High-Level Component View

ARTIFACT: diagrams/notification-system-brainstorm_architect_6.puml

The diagram shows both approaches side-by-side to clarify the architectural tradeoffs.

Architectural Analysis

Key Components (5 pieces):

  1. Event Source - Existing backend services that trigger notifications
  2. Notification Store - Database table (notifications) - persistence layer
  3. Notification API - REST endpoint(s) for delivery
  4. Client Poller - Frontend polling mechanism (30s interval)
  5. Browser Notification API - Native browser popup system

Why Polling is Architecturally Sound for MVP

Alignment with constraints:

  • Small team (2 devs) → Simpler mental model, fewer moving parts
  • 2-week timeline → No new infrastructure (WebSocket servers, connection pools)
  • 1000 users → 30s polling = ~33 req/s = trivial load
  • Existing stack → Reuses current HTTP/session infrastructure

Architectural advantages:

  • Stateless - No connection state management, no reconnection logic
  • Debuggable - Standard HTTP requests in browser DevTools
  • Scalable horizontally - Any app server can handle any poll request
  • Fail-safe - Client automatically recovers by next poll
  • Cache-friendly - Can use HTTP caching headers if needed

When to evolve:

  • User count > 10,000 (polling becomes inefficient)
  • Latency requirement < 10s (30s polling insufficient)
  • Battery concerns on mobile (constant polling drains battery)
  • Rich interactions (typing indicators, presence - need real-time)

Security Integration

@security's concerns are valid but apply differently to polling:

Required (Day 1):

  1. XSS Prevention - Sanitize notification content server-side using bleach (Security's point #1)
  2. Rate Limiting - Flask-Limiter on /api/notifications/unread endpoint (Security's point #3)
  3. Authorization - Existing session auth ensures users only see their notifications (Security's point #2)

Not Applicable (Polling Architecture):

  • WebSocket token management (we're not using WebSockets)
  • Connection hijacking prevention (stateless HTTP)
  • Stale connection cleanup (no persistent connections)

Deferred (Post-MVP):

  • Notification history/persistence beyond current session
  • Audit logging (add when compliance requires it)
  • Content encryption at rest (notifications are ephemeral for MVP)

Answers to Open Questions

Q: Will notifications contain sensitive data? A: For MVP, use summaries only ("You have 1 new message") - keeps security surface minimal. Full content requires user to click through to authenticated page.

Q: Will there be notification history? A: Not for MVP - notifications shown once, then dismissed. This avoids storage security concerns. Add persistence in v2 if users request it.

Unknown/Risks (Marked with "?")

  1. Browser support - Browser Notification API permission model varies. Need to test on target browsers.
  2. Notification volume - If users get >10 notifications/min, UX degrades. May need batching logic (not in scope yet).
  3. Backend event integration - How do existing services trigger notifications? Need hook points identified (implementation detail).

Recommendation

VOTE: READY to proceed with polling architecture + security essentials (sanitization + rate limiting).

This is a classic example of "boring technology" - proven, simple, maintainable. We can ship in 2 weeks, learn from real usage, and evolve to WebSockets later only if metrics show we need it.

Phase advancement criteria met:

  • ✓ 3-5 key components identified
  • ✓ Rough diagrams created (component + sequence)
  • ✓ Unknowns marked with "?"
  • ✓ Architectural tradeoffs explicit

Ready for reality_check phase to validate implementation details. VOTE: READY


Name: AI-Security Parse error VOTE: NONE


Name: AI-Pragmatist

Moderator's Point is Critical

We have a fundamental mismatch: I proposed polling, Security analyzed WebSockets. Let me clarify the polling approach with explicit security measures that address Security's concerns.

Polling Architecture with Security Built-In

Core Components (4 pieces):

  1. Event Publisher (Python backend)

    • Sanitizes content with bleach.clean(text, tags=[], strip=True) - no HTML allowed
    • Inserts to notifications table only if user preference enabled
    • Rate limit: max 100 notifications per user per minute (app-level check)
  2. Notifications Table (PostgreSQL)

    • Columns: id, user_id, event_type, message_text (sanitized), created_at, read_at
    • Index on (user_id, read_at) for fast unread queries
    • No sensitive data - just event summaries ("You have a new message")
  3. REST API Endpoint (Flask)

    • GET /api/notifications/unread - returns JSON array
    • Uses existing session authentication (no new auth)
    • Flask-Limiter: @limiter.limit("120/minute") on endpoint
    • SQL with user_id filter prevents authorization bypass
  4. Frontend Poller (React)

    • Polls every 30s with setInterval
    • Browser Notification API (requires user permission)
    • Text-only display (no HTML rendering) - double defense against XSS

Security Measures Explicitly Addressed

XSS Prevention:

  • Server-side: bleach.clean() strips all HTML/JS
  • Client-side: Display as plain text only in Browser Notification API
  • Double defense: Even if sanitization fails, no rendering attack surface

Rate Limiting:

  • Application layer: 100 notifications/user/minute when creating
  • API layer: 120 requests/minute per user on endpoint (Flask-Limiter)
  • Database constraint: prevent spam at source

Authorization:

  • SQL query: WHERE user_id = session.user_id
  • Cannot query other users' notifications (DB-level enforcement)

Data Protection:

  • No sensitive data in notifications (answer to Security's question)
  • No persistence beyond 30 days (auto-delete old notifications)
  • HTTPS enforced (existing infrastructure)

DoS Prevention:

  • Polling interval (30s) self-regulates load
  • Rate limits prevent notification spam
  • No persistent connections to exhaust

DIAGRAM: diagrams/notification-system-brainstorm_mvp_6.puml

Implementation Timeline

Week 1:

  • Day 1: DB migration + sanitization helper
  • Day 2-3: Event publisher integration (3 event types)
  • Day 4: REST endpoint + Flask-Limiter
  • Day 5: Frontend poller + Browser API

Week 2:

  • Day 1-2: User preferences UI
  • Day 3: Security testing (XSS attempts, rate limit verification)
  • Day 4: Performance testing (1000 concurrent users)
  • Day 5: Ship to 50 beta users

Total: 10 developer-days (2 devs × 1 week) + 1 week polish = 2 weeks

What We're Deferring

  • WebSocket infrastructure (not needed for 30s latency)
  • Notification history UI (just keep in DB for now)
  • Mobile push notifications (scoped out already)
  • Email/SMS fallback (not in requirements)
  • Advanced batching/grouping (not needed at 1000 users)

Response to Security's Questions

  1. Sensitive data? No - summaries only ("You have 1 new message")
  2. History/persistence? DB storage yes, UI access no (defer to v2)
  3. Rate limiting? 100/min creation + 120/min API calls
  4. Sanitization? bleach.clean() server-side + text-only display

VOTE: READY - This addresses all mandatory security requirements while staying shippable in 2 weeks

DIAGRAM: diagrams/notification-system-brainstorm_mvp_6.puml VOTE: READY


Name: AI-Moderator

Sketch Phase Status

We've made significant progress with two concrete architectural proposals and diagrams:

What's Been Discussed

AI-Pragmatist's Polling Architecture:

  • Simple 30-second polling to /api/notifications/unread endpoint
  • Two DB tables: notifications and user_preferences
  • Browser Notification API for display
  • Implementation timeline: 1-2 weeks with clear breakdown
  • Diagrams: diagrams/notification-system-brainstorm_mvp_1.puml, diagrams/notification-system-brainstorm_mvp_3.puml

AI-Security's Security Analysis:

  • Comprehensive threat model with 5 attack surfaces identified
  • Critical requirements: XSS prevention, rate limiting, authentication, encryption
  • Diagram: diagrams/notification-system-brainstorm_security_1.puml
  • Recommends Socket.IO with security hardening

AI-Architect:

  • Two parse errors - responses not recorded

Architectural Tension

There's a productive tension between approaches:

  • Pragmatist: Minimal polling-based approach, can ship in 1 week
  • Security: More robust requirements that assume WebSocket infrastructure
  • Pragmatist's counter: Simplified security MVP (sanitization + rate limiting) for polling approach

Open Questions

  1. Data sensitivity: Will notifications contain sensitive data or just summaries? (affects encryption/storage)
  2. Persistence: Will there be notification history? (expands attack surface)
  3. Security MVP agreement: Can we proceed with sanitization + rate limiting for polling, defer WebSocket security? (@security)

Concerns Raised

  1. Browser notification phishing potential (even with permissions)
  2. WebSocket connection state management complexity
  3. COMPLEXITY: Risk of over-engineering for 1000-user MVP

Phase Advancement Assessment

Per sketch phase instructions:

  • ✓ Created simple diagrams (3 diagrams submitted)
  • ✓ Identified 3-5 key pieces (polling endpoint, DB tables, sanitization, rate limiting, browser API)
  • ✓ Marked unknowns (data sensitivity, persistence needs)
  • ✗ AI-Architect's diagrams missing due to parse errors

However: The core architectural approach is now clear:

  • Polling-based delivery (30s intervals)
  • DB persistence for unread notifications
  • Sanitization via bleach library
  • Rate limiting via Flask-Limiter (100/min)
  • Browser Notification API for display

The discussion has converged on a pragmatic polling approach with security hardening. @security's final question about the security MVP is the last blocker.

Recommendation

ADVANCE TO REALITY_CHECK PHASE: Yes, with caveat

We have sufficient architectural clarity to move forward:

  • Technical approach is defined (polling-based)
  • Security requirements are scoped to the approach (not WebSockets)
  • Implementation timeline is concrete (1-2 weeks)
  • Key unknowns are identified and can be addressed in reality_check

@architect can contribute in the reality_check phase if the parse issues are resolved.

@security - Please confirm you can accept the simplified security MVP (sanitization + rate limiting) for the polling approach, or state specific blockers.