22 KiB
Notification System Brainstorm
Problem Statement
Our application needs a way to notify users about important events (new messages, status changes, alerts). Currently we have no notification system at all.
Known Constraints
- Small team (2 developers)
- Need something working within 2 weeks
- Must work on web and mobile
- Budget: minimal (prefer open source)
- Current stack: Python backend, React frontend
What Success Looks Like
- Users get timely notifications about events they care about
- Users can control what notifications they receive
- System is reliable enough that important notifications aren't lost
Agreed Scope (from cluster phase)
- MVP: Web notifications only (browser), mobile comes later
- Start with 3 event types: new message, status change, system alert
- "Timely" means within 30 seconds
- Simple on/off preferences per event type
- Target: 1000 users initially
We're now in the sketch phase - create rough diagrams to illustrate the approach.
Name: AI-Architect Parse error VOTE: NONE
Name: AI-Security
Security Analysis - Notification System
From a security perspective, a notification system introduces several critical attack surfaces that need early consideration:
Key Security Components
-
Authentication & Authorization Layer
- Who can send notifications? (internal services only? authenticated users?)
- Who can receive which notifications? (proper user segmentation)
- API authentication between backend and notification service
-
Input Validation & Sanitization
- Notification content must be sanitized to prevent XSS in browser notifications
- Event type validation to prevent unauthorized notification types
- Rate limiting per user/service to prevent notification spam/DoS
-
Data Protection
- Notification content may contain sensitive data (PII, business data)
- Storage considerations if implementing delivery guarantees
- Encryption in transit (WSS for WebSocket, HTTPS for polling)
-
Session Management
- WebSocket connection authentication (token-based preferred)
- Session hijacking prevention
- Proper cleanup of stale connections
Threat Model
DIAGRAM: diagrams/notification-system-brainstorm_security_1.puml
Primary Threats:
- XSS via notifications: Malicious user injects script into notification content → executes in victim's browser
- Notification spam/DoS: Attacker floods user with notifications, exhausting resources
- Authorization bypass: User receives notifications intended for another user
- MitM attacks: Unencrypted WebSocket connections expose notification content
- Token theft: Stolen auth tokens allow impersonation for WebSocket connections
Critical Security Requirements
- Mandatory: Sanitize ALL notification content server-side before sending
- Mandatory: Authenticate WebSocket connections with short-lived tokens
- Mandatory: Enforce per-user/per-service rate limits (suggest: 100/min per user)
- Recommended: Use WSS (WebSocket Secure) protocol only
- Recommended: Implement notification content ACLs (user can only receive their own data)
CONCERN: Browser notifications API requires user permission, but malicious notifications could still be used for phishing attacks ("Your account will be suspended - click here")
CONCERN: If using WebSockets, need strategy for connection state management - stale connections are a DoS vector
QUESTION: Will notifications contain sensitive data, or just "You have 1 new message" style summaries? This affects our encryption and storage requirements.
QUESTION: Will there be a notification history/persistence? This significantly expands the attack surface.
Recommendations for MVP
- Use established libraries (e.g., Socket.IO with built-in security features) rather than raw WebSockets
- Implement rate limiting from day 1 (much harder to add later)
- Sanitize on server side using a whitelist approach (only allow plain text for MVP)
- Use short-lived JWT tokens for WebSocket auth (5-15 min expiry)
- Log all notification sends for security auditing
VOTE: CHANGES - Need to explicitly address XSS prevention and rate limiting before implementation
Diagram: diagrams/notification-system-brainstorm_security_1.puml
VOTE: CHANGES
Name: AI-Pragmatist Looking at the agreed scope, this is actually well-scoped already. Let me sketch the simplest thing that could work.
MVP Architecture (Week 1-2):
-
Event Generation (Python backend)
- When event happens → insert row in
notificationstable - Fields: user_id, event_type, message, created_at, read
- When event happens → insert row in
-
Delivery (polling - yes, polling)
- Frontend polls
/api/notifications/unreadevery 30s - Returns new notifications since last check
- Browser Notification API shows popup
- Frontend polls
-
Preferences (simple)
user_preferencestable: user_id, event_type, enabled- Check before inserting notification
That's it. No WebSockets, no message queue, no separate notification service.
Why polling?
- Works everywhere (no WebSocket infrastructure)
- Handles 1000 users easily (30s interval = ~33 req/s)
- Dead simple to implement and debug
- Can upgrade later if needed
What we're NOT doing (yet):
- Push notifications (mobile)
- Real-time delivery (<30s)
- Notification history/pagination
- Batching/grouping
- Email/SMS delivery
See diagram for the flow. We can build this in a week, then iterate based on real usage.
READY to sketch - this is simple enough to ship quickly.
Diagram: diagrams/notification-system-brainstorm_mvp_1.puml
VOTE: READY
Name: AI-Moderator
Discussion Status
We've just entered the sketch phase following a successful clustering phase that defined our MVP scope:
- Web notifications only (browser)
- 3 event types: new message, status change, system alert
- 30-second delivery target
- Simple on/off preferences
- 1000 initial users
Next Steps
We need rough diagrams to illustrate the technical approach. Specifically:
- Component diagrams showing key pieces (3-5 components)
- Sequence diagrams for notification flow
- Mark any unknowns with "?"
- Keep it rough - we're exploring, not finalizing
Waiting For
@architect - Please start us off with a high-level component diagram and sequence flow @security - Review the approach for security considerations @pragmatist - Assess feasibility for 2-week timeline with 2 developers
All participants should focus on creating simple diagrams and identifying the key architectural pieces.
Name: AI-Architect Parse error VOTE: NONE
Name: AI-Security Parse error VOTE: NONE
Name: AI-Pragmatist
Pragmatic Reality Check
Looking at what's been proposed:
Security's approach: 5 critical requirements, threat modeling, ACLs, JWT rotation, audit logging, rate limiting "from day 1".
Reality: We have 2 developers and 2 weeks. That security spec is a month of work.
CONCERN: COMPLEXITY - Security is architecting for enterprise scale when we need to ship for 1000 users
What We Actually Need
For 1000 users receiving basic notifications:
- Input sanitization: Use Python's
bleachlibrary (2 lines of code) - Rate limiting: Flask-Limiter with
@limiter.limit("100/minute")decorator (1 line per endpoint) - Auth: Reuse existing session auth (0 new code)
That's it. That's the security MVP.
What we're NOT doing yet:
- WebSocket infrastructure (we're polling, remember?)
- Notification persistence (just show & dismiss)
- Separate auth tokens (use existing sessions)
- Audit logging (add when someone asks for it)
Simplified Flow
DIAGRAM: diagrams/notification-system-brainstorm_mvp_3.puml
This diagram shows polling vs the WebSocket complexity Security is implying. Polling means:
- No connection state management
- No token refresh logic
- No WebSocket security surface
- Standard HTTPS/session auth
Week 1-2 Implementation
Week 1: Core flow
- Day 1-2: DB table + insert on events
- Day 3-4:
/api/notifications/unreadendpoint + preferences - Day 5: Frontend polling + Browser Notification API
Week 2: Polish
- Add
bleachsanitization - Add Flask-Limiter (100/min)
- Test with team
- Ship to 50 beta users
Then we iterate based on actual problems, not theoretical ones.
QUESTION: @security - Can we agree on sanitization + rate limiting as the security MVP, and defer WebSocket security since we're using polling?
VOTE: READY - This is shippable in 2 weeks with 2 devs
DIAGRAM: diagrams/notification-system-brainstorm_mvp_3.puml VOTE: READY
Name: AI-Moderator
Sketch Phase Summary
We have two architectural approaches with diagrams:
AI-Pragmatist's Polling Approach:
- Simple polling every 30s to
/api/notifications/unread notificationsanduser_preferencestables in existing DB- Browser Notification API for popups
- No WebSockets, no message queue
- Vote: READY - considers this shippable in 1 week
AI-Security's Analysis:
- Identified 5 critical attack surfaces (XSS, DoS, auth bypass, MitM, token theft)
- Provided threat model diagram
- Recommends WebSocket approach with Socket.IO + security measures
- Vote: CHANGES - requires explicit XSS prevention and rate limiting
AI-Architect:
- Parse error on response - needs to re-respond
Key Tension
There's a architectural mismatch:
- Pragmatist proposes simple polling (30s intervals, DB-based)
- Security's recommendations assume WebSockets (tokens, connection management)
Both are valid for different approaches, but we need to align on which path before advancing.
Open Questions
- Data Sensitivity (from Security): Will notifications contain sensitive data or just summaries?
- Persistence (from Security): Will there be notification history? This affects security surface.
- XSS Prevention: How will we sanitize notification content? (Security requires this explicit)
- Rate Limiting: What limits per user/service? (Security requires from day 1)
Phase Advancement Assessment
We have diagrams (✓) but:
- Missing architect's main flow diagram (parse error)
- Need to resolve polling vs WebSocket approach
- Security concerns not yet addressed in chosen design
Per phase instructions, we should have 3-5 key pieces identified with rough diagrams. We're close but not ready.
Recommendations
- @architect - Please re-submit your response (there was a parse error). We need your component diagram and sequence flow.
- @security - After architect responds, please review the specific approach chosen and provide targeted security requirements for that architecture (polling vs WebSocket)
- @pragmatist - Once security requirements are clear, confirm the 2-week timeline still holds with security measures included
Next phase readiness: Not yet - need architect's diagram and alignment on technical approach before advancing to reality_check phase.
Name: AI-Architect
Architectural Sketch - Component & Flow Analysis
Let me provide the missing architectural perspective to resolve the polling vs WebSocket tension.
High-Level Component View
ARTIFACT: diagrams/notification-system-brainstorm_architect_6.puml
The diagram shows both approaches side-by-side to clarify the architectural tradeoffs.
Architectural Analysis
Key Components (5 pieces):
- Event Source - Existing backend services that trigger notifications
- Notification Store - Database table (
notifications) - persistence layer - Notification API - REST endpoint(s) for delivery
- Client Poller - Frontend polling mechanism (30s interval)
- Browser Notification API - Native browser popup system
Why Polling is Architecturally Sound for MVP
Alignment with constraints:
- Small team (2 devs) → Simpler mental model, fewer moving parts
- 2-week timeline → No new infrastructure (WebSocket servers, connection pools)
- 1000 users → 30s polling = ~33 req/s = trivial load
- Existing stack → Reuses current HTTP/session infrastructure
Architectural advantages:
- Stateless - No connection state management, no reconnection logic
- Debuggable - Standard HTTP requests in browser DevTools
- Scalable horizontally - Any app server can handle any poll request
- Fail-safe - Client automatically recovers by next poll
- Cache-friendly - Can use HTTP caching headers if needed
When to evolve:
- User count > 10,000 (polling becomes inefficient)
- Latency requirement < 10s (30s polling insufficient)
- Battery concerns on mobile (constant polling drains battery)
- Rich interactions (typing indicators, presence - need real-time)
Security Integration
@security's concerns are valid but apply differently to polling:
Required (Day 1):
- XSS Prevention - Sanitize notification content server-side using
bleach(Security's point #1) - Rate Limiting - Flask-Limiter on
/api/notifications/unreadendpoint (Security's point #3) - Authorization - Existing session auth ensures users only see their notifications (Security's point #2)
Not Applicable (Polling Architecture):
- WebSocket token management (we're not using WebSockets)
- Connection hijacking prevention (stateless HTTP)
- Stale connection cleanup (no persistent connections)
Deferred (Post-MVP):
- Notification history/persistence beyond current session
- Audit logging (add when compliance requires it)
- Content encryption at rest (notifications are ephemeral for MVP)
Answers to Open Questions
Q: Will notifications contain sensitive data? A: For MVP, use summaries only ("You have 1 new message") - keeps security surface minimal. Full content requires user to click through to authenticated page.
Q: Will there be notification history? A: Not for MVP - notifications shown once, then dismissed. This avoids storage security concerns. Add persistence in v2 if users request it.
Unknown/Risks (Marked with "?")
- Browser support - Browser Notification API permission model varies. Need to test on target browsers.
- Notification volume - If users get >10 notifications/min, UX degrades. May need batching logic (not in scope yet).
- Backend event integration - How do existing services trigger notifications? Need hook points identified (implementation detail).
Recommendation
VOTE: READY to proceed with polling architecture + security essentials (sanitization + rate limiting).
This is a classic example of "boring technology" - proven, simple, maintainable. We can ship in 2 weeks, learn from real usage, and evolve to WebSockets later only if metrics show we need it.
Phase advancement criteria met:
- ✓ 3-5 key components identified
- ✓ Rough diagrams created (component + sequence)
- ✓ Unknowns marked with "?"
- ✓ Architectural tradeoffs explicit
Ready for reality_check phase to validate implementation details. VOTE: READY
Name: AI-Security Parse error VOTE: NONE
Name: AI-Pragmatist
Moderator's Point is Critical
We have a fundamental mismatch: I proposed polling, Security analyzed WebSockets. Let me clarify the polling approach with explicit security measures that address Security's concerns.
Polling Architecture with Security Built-In
Core Components (4 pieces):
-
Event Publisher (Python backend)
- Sanitizes content with
bleach.clean(text, tags=[], strip=True)- no HTML allowed - Inserts to
notificationstable only if user preference enabled - Rate limit: max 100 notifications per user per minute (app-level check)
- Sanitizes content with
-
Notifications Table (PostgreSQL)
- Columns: id, user_id, event_type, message_text (sanitized), created_at, read_at
- Index on (user_id, read_at) for fast unread queries
- No sensitive data - just event summaries ("You have a new message")
-
REST API Endpoint (Flask)
GET /api/notifications/unread- returns JSON array- Uses existing session authentication (no new auth)
- Flask-Limiter:
@limiter.limit("120/minute")on endpoint - SQL with user_id filter prevents authorization bypass
-
Frontend Poller (React)
- Polls every 30s with
setInterval - Browser Notification API (requires user permission)
- Text-only display (no HTML rendering) - double defense against XSS
- Polls every 30s with
Security Measures Explicitly Addressed
XSS Prevention:
- Server-side:
bleach.clean()strips all HTML/JS - Client-side: Display as plain text only in Browser Notification API
- Double defense: Even if sanitization fails, no rendering attack surface
Rate Limiting:
- Application layer: 100 notifications/user/minute when creating
- API layer: 120 requests/minute per user on endpoint (Flask-Limiter)
- Database constraint: prevent spam at source
Authorization:
- SQL query:
WHERE user_id = session.user_id - Cannot query other users' notifications (DB-level enforcement)
Data Protection:
- No sensitive data in notifications (answer to Security's question)
- No persistence beyond 30 days (auto-delete old notifications)
- HTTPS enforced (existing infrastructure)
DoS Prevention:
- Polling interval (30s) self-regulates load
- Rate limits prevent notification spam
- No persistent connections to exhaust
DIAGRAM: diagrams/notification-system-brainstorm_mvp_6.puml
Implementation Timeline
Week 1:
- Day 1: DB migration + sanitization helper
- Day 2-3: Event publisher integration (3 event types)
- Day 4: REST endpoint + Flask-Limiter
- Day 5: Frontend poller + Browser API
Week 2:
- Day 1-2: User preferences UI
- Day 3: Security testing (XSS attempts, rate limit verification)
- Day 4: Performance testing (1000 concurrent users)
- Day 5: Ship to 50 beta users
Total: 10 developer-days (2 devs × 1 week) + 1 week polish = 2 weeks
What We're Deferring
- WebSocket infrastructure (not needed for 30s latency)
- Notification history UI (just keep in DB for now)
- Mobile push notifications (scoped out already)
- Email/SMS fallback (not in requirements)
- Advanced batching/grouping (not needed at 1000 users)
Response to Security's Questions
- Sensitive data? No - summaries only ("You have 1 new message")
- History/persistence? DB storage yes, UI access no (defer to v2)
- Rate limiting? 100/min creation + 120/min API calls
- Sanitization? bleach.clean() server-side + text-only display
VOTE: READY - This addresses all mandatory security requirements while staying shippable in 2 weeks
DIAGRAM: diagrams/notification-system-brainstorm_mvp_6.puml VOTE: READY
Name: AI-Moderator
Sketch Phase Status
We've made significant progress with two concrete architectural proposals and diagrams:
What's Been Discussed
AI-Pragmatist's Polling Architecture:
- Simple 30-second polling to
/api/notifications/unreadendpoint - Two DB tables:
notificationsanduser_preferences - Browser Notification API for display
- Implementation timeline: 1-2 weeks with clear breakdown
- Diagrams:
diagrams/notification-system-brainstorm_mvp_1.puml,diagrams/notification-system-brainstorm_mvp_3.puml
AI-Security's Security Analysis:
- Comprehensive threat model with 5 attack surfaces identified
- Critical requirements: XSS prevention, rate limiting, authentication, encryption
- Diagram:
diagrams/notification-system-brainstorm_security_1.puml - Recommends Socket.IO with security hardening
AI-Architect:
- Two parse errors - responses not recorded
Architectural Tension
There's a productive tension between approaches:
- Pragmatist: Minimal polling-based approach, can ship in 1 week
- Security: More robust requirements that assume WebSocket infrastructure
- Pragmatist's counter: Simplified security MVP (sanitization + rate limiting) for polling approach
Open Questions
- Data sensitivity: Will notifications contain sensitive data or just summaries? (affects encryption/storage)
- Persistence: Will there be notification history? (expands attack surface)
- Security MVP agreement: Can we proceed with sanitization + rate limiting for polling, defer WebSocket security? (@security)
Concerns Raised
- Browser notification phishing potential (even with permissions)
- WebSocket connection state management complexity
- COMPLEXITY: Risk of over-engineering for 1000-user MVP
Phase Advancement Assessment
Per sketch phase instructions:
- ✓ Created simple diagrams (3 diagrams submitted)
- ✓ Identified 3-5 key pieces (polling endpoint, DB tables, sanitization, rate limiting, browser API)
- ✓ Marked unknowns (data sensitivity, persistence needs)
- ✗ AI-Architect's diagrams missing due to parse errors
However: The core architectural approach is now clear:
- Polling-based delivery (30s intervals)
- DB persistence for unread notifications
- Sanitization via
bleachlibrary - Rate limiting via Flask-Limiter (100/min)
- Browser Notification API for display
The discussion has converged on a pragmatic polling approach with security hardening. @security's final question about the security MVP is the last blocker.
Recommendation
ADVANCE TO REALITY_CHECK PHASE: Yes, with caveat
We have sufficient architectural clarity to move forward:
- Technical approach is defined (polling-based)
- Security requirements are scoped to the approach (not WebSockets)
- Implementation timeline is concrete (1-2 weeks)
- Key unknowns are identified and can be addressed in reality_check
@architect can contribute in the reality_check phase if the parse issues are resolved.
@security - Please confirm you can accept the simplified security MVP (sanitization + rate limiting) for the polling approach, or state specific blockers.