22 KiB

Raw Blame History

Notification System Brainstorm

Problem Statement

Our application needs a way to notify users about important events (new messages, status changes, alerts). Currently we have no notification system at all.

Known Constraints

Small team (2 developers)
Need something working within 2 weeks
Must work on web and mobile
Budget: minimal (prefer open source)
Current stack: Python backend, React frontend

What Success Looks Like

Users get timely notifications about events they care about
Users can control what notifications they receive
System is reliable enough that important notifications aren't lost

Agreed Scope (from cluster phase)

MVP: Web notifications only (browser), mobile comes later
Start with 3 event types: new message, status change, system alert
"Timely" means within 30 seconds
Simple on/off preferences per event type
Target: 1000 users initially

We're now in the sketch phase - create rough diagrams to illustrate the approach.

Name: AI-Architect Parse error VOTE: NONE

Name: AI-Security

Security Analysis - Notification System

From a security perspective, a notification system introduces several critical attack surfaces that need early consideration:

Key Security Components

Authentication & Authorization Layer
- Who can send notifications? (internal services only? authenticated users?)
- Who can receive which notifications? (proper user segmentation)
- API authentication between backend and notification service
Input Validation & Sanitization
- Notification content must be sanitized to prevent XSS in browser notifications
- Event type validation to prevent unauthorized notification types
- Rate limiting per user/service to prevent notification spam/DoS
Data Protection
- Notification content may contain sensitive data (PII, business data)
- Storage considerations if implementing delivery guarantees
- Encryption in transit (WSS for WebSocket, HTTPS for polling)
Session Management
- WebSocket connection authentication (token-based preferred)
- Session hijacking prevention
- Proper cleanup of stale connections

Threat Model

DIAGRAM: diagrams/notification-system-brainstorm_security_1.puml

Primary Threats:

XSS via notifications: Malicious user injects script into notification content → executes in victim's browser
Notification spam/DoS: Attacker floods user with notifications, exhausting resources
Authorization bypass: User receives notifications intended for another user
MitM attacks: Unencrypted WebSocket connections expose notification content
Token theft: Stolen auth tokens allow impersonation for WebSocket connections

Critical Security Requirements

Mandatory: Sanitize ALL notification content server-side before sending
Mandatory: Authenticate WebSocket connections with short-lived tokens
Mandatory: Enforce per-user/per-service rate limits (suggest: 100/min per user)
Recommended: Use WSS (WebSocket Secure) protocol only
Recommended: Implement notification content ACLs (user can only receive their own data)

CONCERN: Browser notifications API requires user permission, but malicious notifications could still be used for phishing attacks ("Your account will be suspended - click here")

CONCERN: If using WebSockets, need strategy for connection state management - stale connections are a DoS vector

QUESTION: Will notifications contain sensitive data, or just "You have 1 new message" style summaries? This affects our encryption and storage requirements.

QUESTION: Will there be a notification history/persistence? This significantly expands the attack surface.

Recommendations for MVP

Use established libraries (e.g., Socket.IO with built-in security features) rather than raw WebSockets
Implement rate limiting from day 1 (much harder to add later)
Sanitize on server side using a whitelist approach (only allow plain text for MVP)
Use short-lived JWT tokens for WebSocket auth (5-15 min expiry)
Log all notification sends for security auditing

VOTE: CHANGES - Need to explicitly address XSS prevention and rate limiting before implementation

Diagram: diagrams/notification-system-brainstorm_security_1.puml VOTE: CHANGES

Name: AI-Pragmatist Looking at the agreed scope, this is actually well-scoped already. Let me sketch the simplest thing that could work.

MVP Architecture (Week 1-2):

Event Generation (Python backend)
- When event happens → insert row in notifications table
- Fields: user_id, event_type, message, created_at, read
Delivery (polling - yes, polling)
- Frontend polls /api/notifications/unread every 30s
- Returns new notifications since last check
- Browser Notification API shows popup
Preferences (simple)
- user_preferences table: user_id, event_type, enabled
- Check before inserting notification

That's it. No WebSockets, no message queue, no separate notification service.

Why polling?

Works everywhere (no WebSocket infrastructure)
Handles 1000 users easily (30s interval = ~33 req/s)
Dead simple to implement and debug
Can upgrade later if needed

What we're NOT doing (yet):

Push notifications (mobile)
Real-time delivery (<30s)
Notification history/pagination
Batching/grouping
Email/SMS delivery

See diagram for the flow. We can build this in a week, then iterate based on real usage.

READY to sketch - this is simple enough to ship quickly.

Diagram: diagrams/notification-system-brainstorm_mvp_1.puml VOTE: READY

Name: AI-Moderator

Discussion Status

We've just entered the sketch phase following a successful clustering phase that defined our MVP scope:

Web notifications only (browser)
3 event types: new message, status change, system alert
30-second delivery target
Simple on/off preferences
1000 initial users

Next Steps

We need rough diagrams to illustrate the technical approach. Specifically:

Component diagrams showing key pieces (3-5 components)
Sequence diagrams for notification flow
Mark any unknowns with "?"
Keep it rough - we're exploring, not finalizing

Waiting For

@architect - Please start us off with a high-level component diagram and sequence flow @security - Review the approach for security considerations @pragmatist - Assess feasibility for 2-week timeline with 2 developers

All participants should focus on creating simple diagrams and identifying the key architectural pieces.

Name: AI-Architect Parse error VOTE: NONE

Name: AI-Security Parse error VOTE: NONE

Name: AI-Pragmatist

Pragmatic Reality Check

Looking at what's been proposed:

Security's approach: 5 critical requirements, threat modeling, ACLs, JWT rotation, audit logging, rate limiting "from day 1".

Reality: We have 2 developers and 2 weeks. That security spec is a month of work.

CONCERN: COMPLEXITY - Security is architecting for enterprise scale when we need to ship for 1000 users

What We Actually Need

For 1000 users receiving basic notifications:

Input sanitization: Use Python's bleach library (2 lines of code)
Rate limiting: Flask-Limiter with @limiter.limit("100/minute") decorator (1 line per endpoint)
Auth: Reuse existing session auth (0 new code)

That's it. That's the security MVP.

What we're NOT doing yet:

WebSocket infrastructure (we're polling, remember?)
Notification persistence (just show & dismiss)
Separate auth tokens (use existing sessions)
Audit logging (add when someone asks for it)

Simplified Flow

DIAGRAM: diagrams/notification-system-brainstorm_mvp_3.puml

This diagram shows polling vs the WebSocket complexity Security is implying. Polling means:

No connection state management
No token refresh logic
No WebSocket security surface
Standard HTTPS/session auth

Week 1-2 Implementation

Week 1: Core flow

Day 1-2: DB table + insert on events
Day 3-4: /api/notifications/unread endpoint + preferences
Day 5: Frontend polling + Browser Notification API

Week 2: Polish

Add bleach sanitization
Add Flask-Limiter (100/min)
Test with team
Ship to 50 beta users

Then we iterate based on actual problems, not theoretical ones.

QUESTION: @security - Can we agree on sanitization + rate limiting as the security MVP, and defer WebSocket security since we're using polling?

VOTE: READY - This is shippable in 2 weeks with 2 devs

DIAGRAM: diagrams/notification-system-brainstorm_mvp_3.puml VOTE: READY

Name: AI-Moderator

Sketch Phase Summary

We have two architectural approaches with diagrams:

AI-Pragmatist's Polling Approach:

Simple polling every 30s to /api/notifications/unread
notifications and user_preferences tables in existing DB
Browser Notification API for popups
No WebSockets, no message queue
Vote: READY - considers this shippable in 1 week

AI-Security's Analysis:

Identified 5 critical attack surfaces (XSS, DoS, auth bypass, MitM, token theft)
Provided threat model diagram
Recommends WebSocket approach with Socket.IO + security measures
Vote: CHANGES - requires explicit XSS prevention and rate limiting

AI-Architect:

Parse error on response - needs to re-respond

Key Tension

There's a architectural mismatch:

Pragmatist proposes simple polling (30s intervals, DB-based)
Security's recommendations assume WebSockets (tokens, connection management)

Both are valid for different approaches, but we need to align on which path before advancing.

Open Questions

Data Sensitivity (from Security): Will notifications contain sensitive data or just summaries?
Persistence (from Security): Will there be notification history? This affects security surface.
XSS Prevention: How will we sanitize notification content? (Security requires this explicit)
Rate Limiting: What limits per user/service? (Security requires from day 1)

Phase Advancement Assessment

We have diagrams (✓) but:

Missing architect's main flow diagram (parse error)
Need to resolve polling vs WebSocket approach
Security concerns not yet addressed in chosen design

Per phase instructions, we should have 3-5 key pieces identified with rough diagrams. We're close but not ready.

Recommendations

@architect - Please re-submit your response (there was a parse error). We need your component diagram and sequence flow.
@security - After architect responds, please review the specific approach chosen and provide targeted security requirements for that architecture (polling vs WebSocket)
@pragmatist - Once security requirements are clear, confirm the 2-week timeline still holds with security measures included

Next phase readiness: Not yet - need architect's diagram and alignment on technical approach before advancing to reality_check phase.

Name: AI-Architect

Architectural Sketch - Component & Flow Analysis

Let me provide the missing architectural perspective to resolve the polling vs WebSocket tension.

High-Level Component View

ARTIFACT: diagrams/notification-system-brainstorm_architect_6.puml

The diagram shows both approaches side-by-side to clarify the architectural tradeoffs.

Architectural Analysis

Key Components (5 pieces):

Event Source - Existing backend services that trigger notifications
Notification Store - Database table (notifications) - persistence layer
Notification API - REST endpoint(s) for delivery
Client Poller - Frontend polling mechanism (30s interval)
Browser Notification API - Native browser popup system

Why Polling is Architecturally Sound for MVP

Alignment with constraints:

Small team (2 devs) → Simpler mental model, fewer moving parts
2-week timeline → No new infrastructure (WebSocket servers, connection pools)
1000 users → 30s polling = ~33 req/s = trivial load
Existing stack → Reuses current HTTP/session infrastructure

Architectural advantages:

Stateless - No connection state management, no reconnection logic
Debuggable - Standard HTTP requests in browser DevTools
Scalable horizontally - Any app server can handle any poll request
Fail-safe - Client automatically recovers by next poll
Cache-friendly - Can use HTTP caching headers if needed

When to evolve:

User count > 10,000 (polling becomes inefficient)
Latency requirement < 10s (30s polling insufficient)
Battery concerns on mobile (constant polling drains battery)
Rich interactions (typing indicators, presence - need real-time)

Security Integration

@security's concerns are valid but apply differently to polling:

Required (Day 1):

XSS Prevention - Sanitize notification content server-side using bleach (Security's point #1)
Rate Limiting - Flask-Limiter on /api/notifications/unread endpoint (Security's point #3)
Authorization - Existing session auth ensures users only see their notifications (Security's point #2)

Not Applicable (Polling Architecture):

WebSocket token management (we're not using WebSockets)
Connection hijacking prevention (stateless HTTP)
Stale connection cleanup (no persistent connections)

Deferred (Post-MVP):

Notification history/persistence beyond current session
Audit logging (add when compliance requires it)
Content encryption at rest (notifications are ephemeral for MVP)

Answers to Open Questions

Q: Will notifications contain sensitive data? A: For MVP, use summaries only ("You have 1 new message") - keeps security surface minimal. Full content requires user to click through to authenticated page.

Q: Will there be notification history? A: Not for MVP - notifications shown once, then dismissed. This avoids storage security concerns. Add persistence in v2 if users request it.

Unknown/Risks (Marked with "?")

Browser support - Browser Notification API permission model varies. Need to test on target browsers.
Notification volume - If users get >10 notifications/min, UX degrades. May need batching logic (not in scope yet).
Backend event integration - How do existing services trigger notifications? Need hook points identified (implementation detail).

Recommendation

VOTE: READY to proceed with polling architecture + security essentials (sanitization + rate limiting).

This is a classic example of "boring technology" - proven, simple, maintainable. We can ship in 2 weeks, learn from real usage, and evolve to WebSockets later only if metrics show we need it.

Phase advancement criteria met:

✓ 3-5 key components identified
✓ Rough diagrams created (component + sequence)
✓ Unknowns marked with "?"
✓ Architectural tradeoffs explicit

Ready for reality_check phase to validate implementation details. VOTE: READY

Name: AI-Security Parse error VOTE: NONE

Name: AI-Pragmatist

Moderator's Point is Critical

We have a fundamental mismatch: I proposed polling, Security analyzed WebSockets. Let me clarify the polling approach with explicit security measures that address Security's concerns.

Polling Architecture with Security Built-In

Core Components (4 pieces):

Event Publisher (Python backend)
- Sanitizes content with bleach.clean(text, tags=[], strip=True) - no HTML allowed
- Inserts to notifications table only if user preference enabled
- Rate limit: max 100 notifications per user per minute (app-level check)
Notifications Table (PostgreSQL)
- Columns: id, user_id, event_type, message_text (sanitized), created_at, read_at
- Index on (user_id, read_at) for fast unread queries
- No sensitive data - just event summaries ("You have a new message")
REST API Endpoint (Flask)
- GET /api/notifications/unread - returns JSON array
- Uses existing session authentication (no new auth)
- Flask-Limiter: @limiter.limit("120/minute") on endpoint
- SQL with user_id filter prevents authorization bypass
Frontend Poller (React)
- Polls every 30s with setInterval
- Browser Notification API (requires user permission)
- Text-only display (no HTML rendering) - double defense against XSS

Security Measures Explicitly Addressed

XSS Prevention:

Server-side: bleach.clean() strips all HTML/JS
Client-side: Display as plain text only in Browser Notification API
Double defense: Even if sanitization fails, no rendering attack surface

Rate Limiting:

Application layer: 100 notifications/user/minute when creating
API layer: 120 requests/minute per user on endpoint (Flask-Limiter)
Database constraint: prevent spam at source

Authorization:

SQL query: WHERE user_id = session.user_id
Cannot query other users' notifications (DB-level enforcement)

Data Protection:

No sensitive data in notifications (answer to Security's question)
No persistence beyond 30 days (auto-delete old notifications)
HTTPS enforced (existing infrastructure)

DoS Prevention:

Polling interval (30s) self-regulates load
Rate limits prevent notification spam
No persistent connections to exhaust

DIAGRAM: diagrams/notification-system-brainstorm_mvp_6.puml

Implementation Timeline

Week 1:

Day 1: DB migration + sanitization helper
Day 2-3: Event publisher integration (3 event types)
Day 4: REST endpoint + Flask-Limiter
Day 5: Frontend poller + Browser API

Week 2:

Day 1-2: User preferences UI
Day 3: Security testing (XSS attempts, rate limit verification)
Day 4: Performance testing (1000 concurrent users)
Day 5: Ship to 50 beta users

Total: 10 developer-days (2 devs × 1 week) + 1 week polish = 2 weeks

What We're Deferring

WebSocket infrastructure (not needed for 30s latency)
Notification history UI (just keep in DB for now)
Mobile push notifications (scoped out already)
Email/SMS fallback (not in requirements)
Advanced batching/grouping (not needed at 1000 users)

Response to Security's Questions

Sensitive data? No - summaries only ("You have 1 new message")
History/persistence? DB storage yes, UI access no (defer to v2)
Rate limiting? 100/min creation + 120/min API calls
Sanitization? bleach.clean() server-side + text-only display

VOTE: READY - This addresses all mandatory security requirements while staying shippable in 2 weeks

DIAGRAM: diagrams/notification-system-brainstorm_mvp_6.puml VOTE: READY

Name: AI-Moderator

Sketch Phase Status

We've made significant progress with two concrete architectural proposals and diagrams:

What's Been Discussed

AI-Pragmatist's Polling Architecture:

Simple 30-second polling to /api/notifications/unread endpoint
Two DB tables: notifications and user_preferences
Browser Notification API for display
Implementation timeline: 1-2 weeks with clear breakdown
Diagrams: diagrams/notification-system-brainstorm_mvp_1.puml, diagrams/notification-system-brainstorm_mvp_3.puml

AI-Security's Security Analysis:

Comprehensive threat model with 5 attack surfaces identified
Critical requirements: XSS prevention, rate limiting, authentication, encryption
Diagram: diagrams/notification-system-brainstorm_security_1.puml
Recommends Socket.IO with security hardening

AI-Architect:

Two parse errors - responses not recorded

Architectural Tension

There's a productive tension between approaches:

Pragmatist: Minimal polling-based approach, can ship in 1 week
Security: More robust requirements that assume WebSocket infrastructure
Pragmatist's counter: Simplified security MVP (sanitization + rate limiting) for polling approach

Open Questions

Data sensitivity: Will notifications contain sensitive data or just summaries? (affects encryption/storage)
Persistence: Will there be notification history? (expands attack surface)
Security MVP agreement: Can we proceed with sanitization + rate limiting for polling, defer WebSocket security? (@security)

Concerns Raised

Browser notification phishing potential (even with permissions)
WebSocket connection state management complexity
COMPLEXITY: Risk of over-engineering for 1000-user MVP

Phase Advancement Assessment

Per sketch phase instructions:

✓ Created simple diagrams (3 diagrams submitted)
✓ Identified 3-5 key pieces (polling endpoint, DB tables, sanitization, rate limiting, browser API)
✓ Marked unknowns (data sensitivity, persistence needs)
✗ AI-Architect's diagrams missing due to parse errors

However: The core architectural approach is now clear:

Polling-based delivery (30s intervals)
DB persistence for unread notifications
Sanitization via bleach library
Rate limiting via Flask-Limiter (100/min)
Browser Notification API for display

The discussion has converged on a pragmatic polling approach with security hardening. @security's final question about the security MVP is the last blocker.

Recommendation

ADVANCE TO REALITY_CHECK PHASE: Yes, with caveat

We have sufficient architectural clarity to move forward:

Technical approach is defined (polling-based)
Security requirements are scoped to the approach (not WebSockets)
Implementation timeline is concrete (1-2 weeks)
Key unknowns are identified and can be addressed in reality_check

@architect can contribute in the reality_check phase if the parse issues are resolved.

@security - Please confirm you can accept the simplified security MVP (sanitization + rate limiting) for the polling approach, or state specific blockers.

22 KiB Raw Blame History Unescape Escape

Notification System Brainstorm

Problem Statement

Known Constraints

What Success Looks Like

Agreed Scope (from cluster phase)

We're now in the sketch phase - create rough diagrams to illustrate the approach.

Security Analysis - Notification System

Key Security Components

Threat Model

Critical Security Requirements

Recommendations for MVP

Discussion Status

Next Steps

Waiting For

Pragmatic Reality Check

What We Actually Need

Simplified Flow

Week 1-2 Implementation

Sketch Phase Summary

Key Tension

Open Questions

Phase Advancement Assessment

Recommendations

Architectural Sketch - Component & Flow Analysis

High-Level Component View

Architectural Analysis

Why Polling is Architecturally Sound for MVP

Security Integration

Answers to Open Questions

Unknown/Risks (Marked with "?")

Recommendation

Moderator's Point is Critical

Polling Architecture with Security Built-In

Security Measures Explicitly Addressed

Implementation Timeline

What We're Deferring

Response to Security's Questions

Sketch Phase Status

What's Been Discussed

Architectural Tension

Open Questions

Concerns Raised

Phase Advancement Assessment

Recommendation

22 KiB

Raw Blame History