# Notification System Brainstorm ## Problem Statement Our application needs a way to notify users about important events (new messages, status changes, alerts). Currently we have no notification system at all. ## Known Constraints - Small team (2 developers) - Need something working within 2 weeks - Must work on web and mobile - Budget: minimal (prefer open source) - Current stack: Python backend, React frontend ## What Success Looks Like - Users get timely notifications about events they care about - Users can control what notifications they receive - System is reliable enough that important notifications aren't lost ## Agreed Scope (from cluster phase) - MVP: Web notifications only (browser), mobile comes later - Start with 3 event types: new message, status change, system alert - "Timely" means within 30 seconds - Simple on/off preferences per event type - Target: 1000 users initially --- *We're now in the sketch phase - create rough diagrams to illustrate the approach.* --- --- Name: AI-Architect Parse error VOTE: NONE --- Name: AI-Security ## Security Analysis - Notification System From a security perspective, a notification system introduces several critical attack surfaces that need early consideration: ### Key Security Components 1. **Authentication & Authorization Layer** - Who can send notifications? (internal services only? authenticated users?) - Who can receive which notifications? (proper user segmentation) - API authentication between backend and notification service 2. **Input Validation & Sanitization** - Notification content must be sanitized to prevent XSS in browser notifications - Event type validation to prevent unauthorized notification types - Rate limiting per user/service to prevent notification spam/DoS 3. **Data Protection** - Notification content may contain sensitive data (PII, business data) - Storage considerations if implementing delivery guarantees - Encryption in transit (WSS for WebSocket, HTTPS for polling) 4. **Session Management** - WebSocket connection authentication (token-based preferred) - Session hijacking prevention - Proper cleanup of stale connections ### Threat Model DIAGRAM: diagrams/notification-system-brainstorm_security_1.puml **Primary Threats:** - **XSS via notifications**: Malicious user injects script into notification content → executes in victim's browser - **Notification spam/DoS**: Attacker floods user with notifications, exhausting resources - **Authorization bypass**: User receives notifications intended for another user - **MitM attacks**: Unencrypted WebSocket connections expose notification content - **Token theft**: Stolen auth tokens allow impersonation for WebSocket connections ### Critical Security Requirements 1. **Mandatory**: Sanitize ALL notification content server-side before sending 2. **Mandatory**: Authenticate WebSocket connections with short-lived tokens 3. **Mandatory**: Enforce per-user/per-service rate limits (suggest: 100/min per user) 4. **Recommended**: Use WSS (WebSocket Secure) protocol only 5. **Recommended**: Implement notification content ACLs (user can only receive their own data) CONCERN: Browser notifications API requires user permission, but malicious notifications could still be used for phishing attacks ("Your account will be suspended - click here") CONCERN: If using WebSockets, need strategy for connection state management - stale connections are a DoS vector QUESTION: Will notifications contain sensitive data, or just "You have 1 new message" style summaries? This affects our encryption and storage requirements. QUESTION: Will there be a notification history/persistence? This significantly expands the attack surface. ### Recommendations for MVP - Use established libraries (e.g., Socket.IO with built-in security features) rather than raw WebSockets - Implement rate limiting from day 1 (much harder to add later) - Sanitize on server side using a whitelist approach (only allow plain text for MVP) - Use short-lived JWT tokens for WebSocket auth (5-15 min expiry) - Log all notification sends for security auditing VOTE: CHANGES - Need to explicitly address XSS prevention and rate limiting before implementation **Diagram:** `diagrams/notification-system-brainstorm_security_1.puml` VOTE: CHANGES --- Name: AI-Pragmatist Looking at the agreed scope, this is actually well-scoped already. Let me sketch the simplest thing that could work. MVP Architecture (Week 1-2): 1. **Event Generation** (Python backend) - When event happens → insert row in `notifications` table - Fields: user_id, event_type, message, created_at, read 2. **Delivery** (polling - yes, polling) - Frontend polls `/api/notifications/unread` every 30s - Returns new notifications since last check - Browser Notification API shows popup 3. **Preferences** (simple) - `user_preferences` table: user_id, event_type, enabled - Check before inserting notification That's it. No WebSockets, no message queue, no separate notification service. **Why polling?** - Works everywhere (no WebSocket infrastructure) - Handles 1000 users easily (30s interval = ~33 req/s) - Dead simple to implement and debug - Can upgrade later if needed **What we're NOT doing (yet):** - Push notifications (mobile) - Real-time delivery (<30s) - Notification history/pagination - Batching/grouping - Email/SMS delivery See diagram for the flow. We can build this in a week, then iterate based on real usage. READY to sketch - this is simple enough to ship quickly. **Diagram:** `diagrams/notification-system-brainstorm_mvp_1.puml` VOTE: READY --- Name: AI-Moderator ## Discussion Status We've just entered the **sketch phase** following a successful clustering phase that defined our MVP scope: - Web notifications only (browser) - 3 event types: new message, status change, system alert - 30-second delivery target - Simple on/off preferences - 1000 initial users ## Next Steps We need rough diagrams to illustrate the technical approach. Specifically: - Component diagrams showing key pieces (3-5 components) - Sequence diagrams for notification flow - Mark any unknowns with "?" - Keep it rough - we're exploring, not finalizing ## Waiting For @architect - Please start us off with a high-level component diagram and sequence flow @security - Review the approach for security considerations @pragmatist - Assess feasibility for 2-week timeline with 2 developers All participants should focus on creating simple diagrams and identifying the key architectural pieces. --- Name: AI-Architect Parse error VOTE: NONE --- Name: AI-Security Parse error VOTE: NONE --- Name: AI-Pragmatist ## Pragmatic Reality Check Looking at what's been proposed: **Security's approach**: 5 critical requirements, threat modeling, ACLs, JWT rotation, audit logging, rate limiting "from day 1". **Reality**: We have 2 developers and 2 weeks. That security spec is a month of work. CONCERN: COMPLEXITY - Security is architecting for enterprise scale when we need to ship for 1000 users ## What We Actually Need For 1000 users receiving basic notifications: 1. **Input sanitization**: Use Python's `bleach` library (2 lines of code) 2. **Rate limiting**: Flask-Limiter with `@limiter.limit("100/minute")` decorator (1 line per endpoint) 3. **Auth**: Reuse existing session auth (0 new code) That's it. That's the security MVP. **What we're NOT doing yet:** - WebSocket infrastructure (we're polling, remember?) - Notification persistence (just show & dismiss) - Separate auth tokens (use existing sessions) - Audit logging (add when someone asks for it) ## Simplified Flow DIAGRAM: diagrams/notification-system-brainstorm_mvp_3.puml This diagram shows polling vs the WebSocket complexity Security is implying. Polling means: - No connection state management - No token refresh logic - No WebSocket security surface - Standard HTTPS/session auth ## Week 1-2 Implementation **Week 1**: Core flow - Day 1-2: DB table + insert on events - Day 3-4: `/api/notifications/unread` endpoint + preferences - Day 5: Frontend polling + Browser Notification API **Week 2**: Polish - Add `bleach` sanitization - Add Flask-Limiter (100/min) - Test with team - Ship to 50 beta users Then we iterate based on actual problems, not theoretical ones. QUESTION: @security - Can we agree on sanitization + rate limiting as the security MVP, and defer WebSocket security since we're using polling? VOTE: READY - This is shippable in 2 weeks with 2 devs DIAGRAM: diagrams/notification-system-brainstorm_mvp_3.puml VOTE: READY --- Name: AI-Moderator ## Sketch Phase Summary We have two architectural approaches with diagrams: **AI-Pragmatist's Polling Approach:** - Simple polling every 30s to `/api/notifications/unread` - `notifications` and `user_preferences` tables in existing DB - Browser Notification API for popups - No WebSockets, no message queue - **Vote: READY** - considers this shippable in 1 week **AI-Security's Analysis:** - Identified 5 critical attack surfaces (XSS, DoS, auth bypass, MitM, token theft) - Provided threat model diagram - Recommends WebSocket approach with Socket.IO + security measures - **Vote: CHANGES** - requires explicit XSS prevention and rate limiting **AI-Architect:** - Parse error on response - **needs to re-respond** ## Key Tension There's a architectural mismatch: - Pragmatist proposes simple polling (30s intervals, DB-based) - Security's recommendations assume WebSockets (tokens, connection management) Both are valid for different approaches, but we need to align on which path before advancing. ## Open Questions 1. **Data Sensitivity** (from Security): Will notifications contain sensitive data or just summaries? 2. **Persistence** (from Security): Will there be notification history? This affects security surface. 3. **XSS Prevention**: How will we sanitize notification content? (Security requires this explicit) 4. **Rate Limiting**: What limits per user/service? (Security requires from day 1) ## Phase Advancement Assessment We have diagrams (✓) but: - Missing architect's main flow diagram (parse error) - Need to resolve polling vs WebSocket approach - Security concerns not yet addressed in chosen design Per phase instructions, we should have 3-5 key pieces identified with rough diagrams. We're close but not ready. ## Recommendations 1. @architect - Please re-submit your response (there was a parse error). We need your component diagram and sequence flow. 2. @security - After architect responds, please review the *specific* approach chosen and provide targeted security requirements for that architecture (polling vs WebSocket) 3. @pragmatist - Once security requirements are clear, confirm the 2-week timeline still holds with security measures included **Next phase readiness:** Not yet - need architect's diagram and alignment on technical approach before advancing to reality_check phase. --- Name: AI-Architect ## Architectural Sketch - Component & Flow Analysis Let me provide the missing architectural perspective to resolve the polling vs WebSocket tension. ### High-Level Component View ARTIFACT: diagrams/notification-system-brainstorm_architect_6.puml The diagram shows both approaches side-by-side to clarify the architectural tradeoffs. ### Architectural Analysis **Key Components (5 pieces):** 1. **Event Source** - Existing backend services that trigger notifications 2. **Notification Store** - Database table (`notifications`) - persistence layer 3. **Notification API** - REST endpoint(s) for delivery 4. **Client Poller** - Frontend polling mechanism (30s interval) 5. **Browser Notification API** - Native browser popup system ### Why Polling is Architecturally Sound for MVP **Alignment with constraints:** - Small team (2 devs) → Simpler mental model, fewer moving parts - 2-week timeline → No new infrastructure (WebSocket servers, connection pools) - 1000 users → 30s polling = ~33 req/s = trivial load - Existing stack → Reuses current HTTP/session infrastructure **Architectural advantages:** - **Stateless** - No connection state management, no reconnection logic - **Debuggable** - Standard HTTP requests in browser DevTools - **Scalable horizontally** - Any app server can handle any poll request - **Fail-safe** - Client automatically recovers by next poll - **Cache-friendly** - Can use HTTP caching headers if needed **When to evolve:** - User count > 10,000 (polling becomes inefficient) - Latency requirement < 10s (30s polling insufficient) - Battery concerns on mobile (constant polling drains battery) - Rich interactions (typing indicators, presence - need real-time) ### Security Integration @security's concerns are valid but apply differently to polling: **Required (Day 1):** 1. **XSS Prevention** - Sanitize notification content server-side using `bleach` (Security's point #1) 2. **Rate Limiting** - Flask-Limiter on `/api/notifications/unread` endpoint (Security's point #3) 3. **Authorization** - Existing session auth ensures users only see their notifications (Security's point #2) **Not Applicable (Polling Architecture):** - WebSocket token management (we're not using WebSockets) - Connection hijacking prevention (stateless HTTP) - Stale connection cleanup (no persistent connections) **Deferred (Post-MVP):** - Notification history/persistence beyond current session - Audit logging (add when compliance requires it) - Content encryption at rest (notifications are ephemeral for MVP) ### Answers to Open Questions **Q: Will notifications contain sensitive data?** A: For MVP, use summaries only ("You have 1 new message") - keeps security surface minimal. Full content requires user to click through to authenticated page. **Q: Will there be notification history?** A: Not for MVP - notifications shown once, then dismissed. This avoids storage security concerns. Add persistence in v2 if users request it. ### Unknown/Risks (Marked with "?") 1. **Browser support** - Browser Notification API permission model varies. Need to test on target browsers. 2. **Notification volume** - If users get >10 notifications/min, UX degrades. May need batching logic (not in scope yet). 3. **Backend event integration** - How do existing services trigger notifications? Need hook points identified (implementation detail). ### Recommendation **VOTE: READY** to proceed with polling architecture + security essentials (sanitization + rate limiting). This is a classic example of "boring technology" - proven, simple, maintainable. We can ship in 2 weeks, learn from real usage, and evolve to WebSockets later *only if* metrics show we need it. **Phase advancement criteria met:** - ✓ 3-5 key components identified - ✓ Rough diagrams created (component + sequence) - ✓ Unknowns marked with "?" - ✓ Architectural tradeoffs explicit Ready for reality_check phase to validate implementation details. VOTE: READY --- Name: AI-Security Parse error VOTE: NONE --- Name: AI-Pragmatist ## Moderator's Point is Critical We have a fundamental mismatch: I proposed polling, Security analyzed WebSockets. Let me clarify the polling approach with explicit security measures that address Security's concerns. ## Polling Architecture with Security Built-In **Core Components (4 pieces):** 1. **Event Publisher** (Python backend) - Sanitizes content with `bleach.clean(text, tags=[], strip=True)` - no HTML allowed - Inserts to `notifications` table only if user preference enabled - Rate limit: max 100 notifications per user per minute (app-level check) 2. **Notifications Table** (PostgreSQL) - Columns: id, user_id, event_type, message_text (sanitized), created_at, read_at - Index on (user_id, read_at) for fast unread queries - No sensitive data - just event summaries ("You have a new message") 3. **REST API Endpoint** (Flask) - `GET /api/notifications/unread` - returns JSON array - Uses existing session authentication (no new auth) - Flask-Limiter: `@limiter.limit("120/minute")` on endpoint - SQL with user_id filter prevents authorization bypass 4. **Frontend Poller** (React) - Polls every 30s with `setInterval` - Browser Notification API (requires user permission) - Text-only display (no HTML rendering) - double defense against XSS ## Security Measures Explicitly Addressed **XSS Prevention:** - Server-side: `bleach.clean()` strips all HTML/JS - Client-side: Display as plain text only in Browser Notification API - **Double defense**: Even if sanitization fails, no rendering attack surface **Rate Limiting:** - Application layer: 100 notifications/user/minute when creating - API layer: 120 requests/minute per user on endpoint (Flask-Limiter) - Database constraint: prevent spam at source **Authorization:** - SQL query: `WHERE user_id = session.user_id` - Cannot query other users' notifications (DB-level enforcement) **Data Protection:** - No sensitive data in notifications (answer to Security's question) - No persistence beyond 30 days (auto-delete old notifications) - HTTPS enforced (existing infrastructure) **DoS Prevention:** - Polling interval (30s) self-regulates load - Rate limits prevent notification spam - No persistent connections to exhaust DIAGRAM: diagrams/notification-system-brainstorm_mvp_6.puml ## Implementation Timeline **Week 1:** - Day 1: DB migration + sanitization helper - Day 2-3: Event publisher integration (3 event types) - Day 4: REST endpoint + Flask-Limiter - Day 5: Frontend poller + Browser API **Week 2:** - Day 1-2: User preferences UI - Day 3: Security testing (XSS attempts, rate limit verification) - Day 4: Performance testing (1000 concurrent users) - Day 5: Ship to 50 beta users **Total:** 10 developer-days (2 devs × 1 week) + 1 week polish = 2 weeks ## What We're Deferring - WebSocket infrastructure (not needed for 30s latency) - Notification history UI (just keep in DB for now) - Mobile push notifications (scoped out already) - Email/SMS fallback (not in requirements) - Advanced batching/grouping (not needed at 1000 users) ## Response to Security's Questions 1. **Sensitive data?** No - summaries only ("You have 1 new message") 2. **History/persistence?** DB storage yes, UI access no (defer to v2) 3. **Rate limiting?** 100/min creation + 120/min API calls 4. **Sanitization?** bleach.clean() server-side + text-only display VOTE: READY - This addresses all mandatory security requirements while staying shippable in 2 weeks DIAGRAM: diagrams/notification-system-brainstorm_mvp_6.puml VOTE: READY --- Name: AI-Moderator ## Sketch Phase Status We've made significant progress with two concrete architectural proposals and diagrams: ### What's Been Discussed **AI-Pragmatist's Polling Architecture:** - Simple 30-second polling to `/api/notifications/unread` endpoint - Two DB tables: `notifications` and `user_preferences` - Browser Notification API for display - Implementation timeline: 1-2 weeks with clear breakdown - **Diagrams:** `diagrams/notification-system-brainstorm_mvp_1.puml`, `diagrams/notification-system-brainstorm_mvp_3.puml` **AI-Security's Security Analysis:** - Comprehensive threat model with 5 attack surfaces identified - Critical requirements: XSS prevention, rate limiting, authentication, encryption - **Diagram:** `diagrams/notification-system-brainstorm_security_1.puml` - Recommends Socket.IO with security hardening **AI-Architect:** - Two parse errors - **responses not recorded** ### Architectural Tension There's a productive tension between approaches: - **Pragmatist:** Minimal polling-based approach, can ship in 1 week - **Security:** More robust requirements that assume WebSocket infrastructure - **Pragmatist's counter:** Simplified security MVP (sanitization + rate limiting) for polling approach ### Open Questions 1. **Data sensitivity:** Will notifications contain sensitive data or just summaries? (affects encryption/storage) 2. **Persistence:** Will there be notification history? (expands attack surface) 3. **Security MVP agreement:** Can we proceed with sanitization + rate limiting for polling, defer WebSocket security? (@security) ### Concerns Raised 1. Browser notification phishing potential (even with permissions) 2. WebSocket connection state management complexity 3. **COMPLEXITY:** Risk of over-engineering for 1000-user MVP ## Phase Advancement Assessment **Per sketch phase instructions:** - ✓ Created simple diagrams (3 diagrams submitted) - ✓ Identified 3-5 key pieces (polling endpoint, DB tables, sanitization, rate limiting, browser API) - ✓ Marked unknowns (data sensitivity, persistence needs) - ✗ AI-Architect's diagrams missing due to parse errors **However:** The core architectural approach is now clear: - Polling-based delivery (30s intervals) - DB persistence for unread notifications - Sanitization via `bleach` library - Rate limiting via Flask-Limiter (100/min) - Browser Notification API for display The discussion has converged on a pragmatic polling approach with security hardening. @security's final question about the security MVP is the last blocker. ## Recommendation **ADVANCE TO REALITY_CHECK PHASE:** Yes, with caveat We have sufficient architectural clarity to move forward: - Technical approach is defined (polling-based) - Security requirements are scoped to the approach (not WebSockets) - Implementation timeline is concrete (1-2 weeks) - Key unknowns are identified and can be addressed in reality_check @architect can contribute in the reality_check phase if the parse issues are resolved. @security - Please confirm you can accept the simplified security MVP (sanitization + rate limiting) for the polling approach, or state specific blockers.