smarttools/docs/REGISTRY.md

# SmartTools Registry Design

## Purpose
Build a centralized registry for SmartTools to enable discovery, publishing, dependency management, and future curation at scale.

## Terminology

| Term | Definition |
|------|------------|
| **Tool definition** | The full YAML file in the registry (`config.yaml`) containing name, steps, arguments, etc. |
| **Tool config** | The configuration within a tool definition (arguments, steps, provider settings) |
| **smarttools.yaml** | Project manifest file declaring tool dependencies and overrides |
| **config.yaml** | The tool definition file, both in registry and when installed locally |
| **Owner** | Immutable namespace slug identifying the publisher (e.g., `rob`, `alice`) |
| **Publisher** | A registered user who can publish tools to the registry |
| **Wrapper script** | Auto-generated bash script in `~/.local/bin/` that invokes a tool |

**Canonical naming:** Use `SmartTools-Registry` (capitalized, hyphenated) for the repository name.

## Diagram References
- System overview: `discussions/diagrams/smarttools-registry_rob_1.puml`
- Data flows: `discussions/diagrams/smarttools-registry_rob_5.puml`

## System Overview
Users interact via the CLI and a future Web UI. Both call a Registry API hosted at `https://gitea.brrd.tech/api/v1` (future alias: `registry.smarttools.dev/api/v1`). The API syncs from a Gitea-backed registry repo and maintains a SQLite cache/search index.

**Canonical API base path:** `https://gitea.brrd.tech/api/v1`

All API endpoints are versioned under `/api/v1`. When breaking changes are needed, a new version (`/api/v2`) will be introduced with deprecation notices.

Core API endpoints:
- `GET /api/v1/tools`
- `GET /api/v1/tools/search?q=...`
- `GET /api/v1/tools/{owner}/{name}`
- `GET /api/v1/tools/{owner}/{name}/versions`
- `GET /api/v1/tools/{owner}/{name}/download?version=...`
- `POST /api/v1/tools` (publish)
- `GET /api/v1/categories`
- `GET /api/v1/stats/popular`
- `POST /api/v1/webhook/gitea`

### Pagination

All list endpoints support pagination:

| Parameter | Default | Max | Description |
|-----------|---------|-----|-------------|
| `page` | 1 | - | Page number (1-indexed) |
| `per_page` | 20 | 100 | Items per page |
| `sort` | `downloads` | - | Sort field |
| `order` | `desc` | - | Sort order (asc/desc) |

**Stable ordering:** To ensure deterministic results across pages, sorting includes a secondary key:
- Primary: requested field (e.g., `downloads`)
- Secondary: `published_at` (desc)
- Tertiary: `id` (for absolute stability)

```sql
ORDER BY downloads DESC, published_at DESC, id DESC
LIMIT 20 OFFSET 0
```

**Response pagination metadata:**
```json
{
  "data": [...],
  "meta": {
    "page": 1,
    "per_page": 20,
    "total": 142,
    "total_pages": 8
  }
}
```

### Input Constraints

Size limits to prevent oversized uploads:

| Field | Max Size | Notes |
|-------|----------|-------|
| `config.yaml` | 64 KB | Tool definition |
| `README.md` | 256 KB | Documentation |
| Request body | 512 KB | Total POST payload |
| Tool name | 64 chars | Alphanumeric + hyphen |
| Description | 500 chars | Short summary |
| Tag | 32 chars | Individual tag |
| Tags array | 10 items | Maximum tags per tool |

**Validation errors:**
```json
{
  "error": {
    "code": "PAYLOAD_TOO_LARGE",
    "message": "config.yaml exceeds 64KB limit",
    "details": {
      "field": "config",
      "size": 72000,
      "limit": 65536
    }
  }
}
```

### Sort Fields and Indexes

**Allowed sort fields:**

| Endpoint | Allowed `sort` values |
|----------|----------------------|
| `GET /tools` | `downloads`, `published_at`, `name` |
| `GET /tools/search` | `relevance`, `downloads`, `published_at` |
| `GET /categories` | `name`, `tool_count` |

Invalid sort values return 400:
```json
{"error": {"code": "INVALID_SORT", "message": "Unknown sort field 'foo'. Allowed: downloads, published_at, name"}}
```

**Database indexes:**
```sql
-- Frequent query patterns
CREATE INDEX idx_tools_owner_name ON tools(owner, name);
CREATE INDEX idx_tools_category ON tools(category);
CREATE INDEX idx_tools_published_at ON tools(published_at DESC);
CREATE INDEX idx_tools_downloads ON tools(downloads DESC);
CREATE INDEX idx_tools_owner_name_version ON tools(owner, name, version);

-- For pagination stability
CREATE INDEX idx_tools_sort_stable ON tools(downloads DESC, published_at DESC, id DESC);

-- Publisher lookups
CREATE INDEX idx_publishers_slug ON publishers(slug);
CREATE INDEX idx_publishers_email ON publishers(email);

-- Token lookups
CREATE INDEX idx_api_tokens_hash ON api_tokens(token_hash);
CREATE INDEX idx_api_tokens_publisher ON api_tokens(publisher_id);
```

### API Version Compatibility

**Forward compatibility:** Clients should ignore unknown fields in API responses:

```python
# Good: ignore unknown fields
tool = response['data']
name = tool.get('name')
# Don't fail if 'new_field' exists but client doesn't know about it

# Bad: strict parsing that fails on unknown fields
tool = ToolSchema.parse(response['data'])  # May fail on new fields
```

**Backward compatibility:** The API will:
- Never remove fields in a version (only deprecate)
- Never change field types
- Add new optional fields without version bump
- Use new version (`/api/v2`) for breaking changes

**Deprecation process:**
1. Add `X-Deprecated-Field: old_field` header
2. Document in changelog
3. Remove after 6 months minimum
4. Major version bump if widely used

**Client version header:**
```
X-SmartTools-Client: cli/1.2.0
```
Helps server track client versions for deprecation decisions.

## Source of Truth
- Gitea registry repo is the source of truth.
- API syncs repo content into SQLite for fast queries, stats, and FTS5 search.
- `index.json` remains useful for offline CLI search and as a fallback.

If the cache is stale, the API can fall back to repo reads; a warning header may be emitted.

## Namespacing and Paths
Support owner/name from day one:
- Registry path: `tools/{owner}/{name}/config.yaml`
- API URL: `/tools/{owner}/{name}`
- Install: `smarttools registry install rob/summarize`
- Shorthand: `smarttools registry install summarize` resolves to the official namespace.

PR branches: `submit/{owner}/{name}/{version}`.

### Namespace Identity

The `owner` is an **immutable slug**, not the display name:

```sql
-- In publishers table
slug TEXT UNIQUE NOT NULL,        -- immutable: "rob", "alice-dev"
display_name TEXT NOT NULL,       -- mutable: "Rob", "Alice Developer"
```

**Slug rules:**
- Lowercase alphanumeric + hyphens only: `^[a-z0-9][a-z0-9-]*[a-z0-9]$`
- 2-39 characters
- Cannot start/end with hyphen
- Set once at registration, cannot be changed
- Reserved slugs: `official`, `admin`, `system`, `api`, `registry`

**Rename policy:**
- `display_name` can be changed anytime via dashboard
- `slug` (owner) is permanent to preserve URLs and tool references
- If a publisher absolutely must change slug (legal reasons, etc.):
  1. Create new account with new slug
  2. Republish tools under new namespace
  3. Mark old tools as deprecated with `replacement` pointing to new namespace
  4. Old namespace remains reserved (cannot be reused by others)

**Why immutable:**
- `rob/summarize@1.0.0` must always resolve to the same tool
- Prevents namespace hijacking after rename
- Simplifies caching and CDN strategies

## Tool Format (Registry == Local)
Registry tool folders mirror local tools:
```
tools/
  rob/
    summarize/
      config.yaml
      README.md
```

Tool files match the existing SmartTools format. Registry-specific metadata is kept under `registry:`. Deprecation is tool-defined and top-level:
```yaml
name: summarize
version: "1.2.0"
deprecated: true
deprecated_message: "Security issue. Use v1.2.1"
replacement: "rob/summarize@1.2.1"
registry:
  published_at: "2025-01-15T10:30:00Z"
  downloads: 142
```

**Schema compatibility note:** The current SmartTools config parser may reject unknown top-level keys like `deprecated`, `replacement`, and `registry`. Before implementing registry features:
1. Update the YAML parser to ignore unknown keys (permissive mode)
2. Or explicitly define these fields in the Tool dataclass with defaults
3. Validate registry-specific fields only when publishing, not when running locally

This ensures local tools continue to work even if they don't have registry fields.

## Versioning and Immutability
- Unique key: `owner/name + version`.
- Published versions are immutable.
- Deprecation uses `deprecated`, `deprecated_message`, and `replacement`.
- CLI warns on install if a version is deprecated.

### Yank Policy

Yanking allows removing a version from resolution without deleting it (for auditability):

```yaml
# In tool config
yanked: true
yanked_reason: "Critical security vulnerability CVE-2025-1234"
yanked_at: "2025-01-20T15:00:00Z"
```

**Yanked version behavior:**

| Operation | Behavior |
|-----------|----------|
| `install foo@1.0.0` (exact) | Warns but allows install |
| `install foo@^1.0.0` (constraint) | Excludes yanked, resolves to next valid |
| `search` / `browse` | Hidden by default, shown with `--include-yanked` |
| Direct URL access | Returns tool with `yanked: true` in response |
| Already installed | Continues to work, no forced removal |

**Database schema addition:**
```sql
-- Add to tools table
yanked BOOLEAN DEFAULT FALSE,
yanked_reason TEXT,
yanked_at TIMESTAMP
```

**Yank vs Delete:**
- **Yank**: Version remains in DB, excluded from resolution, auditable
- **Delete**: Reserved for DMCA/legal, requires admin action, leaves tombstone record

### Version Format

Tools use semantic versioning (semver):
```
MAJOR.MINOR.PATCH[-PRERELEASE][+BUILD]

Examples:
  1.0.0           # stable release
  1.2.3           # stable release
  2.0.0-alpha.1   # prerelease
  2.0.0-beta.2    # prerelease
  2.0.0-rc.1      # release candidate
```

### Version Constraints

Manifest files support these constraint formats:

| Constraint | Meaning | Example Match |
|------------|---------|---------------|
| `1.2.3` | Exact version | `1.2.3` only |
| `>=1.2.0` | Minimum version | `1.2.0`, `1.3.0`, `2.0.0` |
| `<2.0.0` | Below version | `1.9.9`, `1.0.0` |
| `>=1.0.0,<2.0.0` | Range | `1.0.0` to `1.9.9` |
| `^1.2.3` | Compatible (same major) | `1.2.3` to `1.9.9` |
| `~1.2.3` | Approximately (same minor) | `1.2.3` to `1.2.9` |
| `*` | Any version | latest stable |

### Version Resolution Rules

When resolving a version constraint:

1. **Filter**: Get all versions matching the constraint
2. **Exclude prereleases**: Unless constraint explicitly includes them (e.g., `>=2.0.0-alpha.1`)
3. **Sort**: By semver precedence (descending)
4. **Select**: Highest matching version

**Tie-breakers:**
- Stable versions preferred over prereleases
- Later publish date wins if versions are equal (shouldn't happen with immutability)

**Unsatisfiable constraints:**
```json
// API Response: 404
{
  "error": {
    "code": "VERSION_NOT_FOUND",
    "message": "No version of 'rob/summarize' satisfies constraint '>=5.0.0'",
    "details": {
      "tool": "rob/summarize",
      "constraint": ">=5.0.0",
      "available_versions": ["1.0.0", "1.1.0", "1.2.0"],
      "latest_stable": "1.2.0"
    }
  }
}
```

### Prerelease Handling

- Prereleases are **not** returned for `*` or range constraints by default
- To install prerelease: `smarttools registry install rob/summarize@2.0.0-beta.1`
- To allow prereleases in manifest: `version: ">=2.0.0-0"` (the `-0` suffix includes prereleases)

### Download Endpoint Version Selection

The `/api/v1/tools/{owner}/{name}/download` endpoint accepts version parameters:

| Parameter | Behavior | Example |
|-----------|----------|---------|
| (none) | Returns latest stable version | `/download` → `1.2.0` |
| `version=1.2.0` | Exact version (must exist) | `/download?version=1.2.0` |
| `version=^1.0.0` | Server resolves constraint | `/download?version=^1.0.0` → `1.2.0` |
| `version=latest` | Alias for latest stable | `/download?version=latest` |

**Server-side resolution:** The API server resolves version constraints, not the client. This ensures consistent resolution and allows the server to apply policies (e.g., exclude yanked versions).

```
GET /api/v1/tools/rob/summarize/download?version=^1.0.0&install=true

Response (200):
{
  "data": {
    "owner": "rob",
    "name": "summarize",
    "resolved_version": "1.2.0",
    "config": "... YAML content ..."
  },
  "meta": {
    "constraint": "^1.0.0",
    "available_versions": ["1.0.0", "1.1.0", "1.2.0"]
  }
}
```

**Invalid/unsatisfiable constraint:**
```
GET /api/v1/tools/rob/summarize/download?version=^5.0.0

Response (404):
{
  "error": {
    "code": "CONSTRAINT_UNSATISFIABLE",
    "message": "No version matches constraint '^5.0.0'",
    "details": {
      "constraint": "^5.0.0",
      "latest_stable": "1.2.0",
      "available_versions": ["1.0.0", "1.1.0", "1.2.0"]
    }
  }
}
```

## Tool Resolution Order
When a tool is invoked, the CLI searches in this order:

1. **Local project**: `./.smarttools/<owner>/<name>/config.yaml` (or `./.smarttools/<name>/` for unnamespaced)
2. **Global user**: `~/.smarttools/<owner>/<name>/config.yaml`
3. **Registry**: Fetch from API, install to global, then run
4. **Error**: `Tool '<toolname>' not found`

Step 3 only occurs if `auto_fetch_from_registry: true` in config (default: true).

**Path convention:** Use `.smarttools/` (with leading dot) for both local and global to maintain consistency.

Resolution also respects namespacing:
- `summarize` → searches for any tool named `summarize`, prefers `official/summarize` if exists
- `rob/summarize` → searches for exactly `rob/summarize`

### Official Namespace

The slug `official` is reserved for curated, high-quality tools maintained by the registry administrators.

- Shorthand `summarize` resolves to `official/summarize` if it exists
- If no `official/summarize`, falls back to most-downloaded tool named `summarize`
- To avoid ambiguity, always use full `owner/name` in manifests

Reserved slugs that cannot be registered: `official`, `admin`, `system`, `api`, `registry`, `smarttools`

## Auto-Fetch Behavior
When enabled (`auto_fetch_from_registry: true`), missing tools are automatically fetched:

```bash
$ summarize < file.txt
# Tool 'summarize' not found locally.
# Fetching from registry...
# Installed: official/summarize@1.2.0
# Running...
```

Behavior details:
- Fetches latest stable version unless pinned in `smarttools.yaml`
- Installs to `~/.smarttools/<owner>/<name>/`
- Generates wrapper script in `~/.local/bin/`
- Subsequent runs use local copy (no re-fetch)

To disable (require explicit install):
```yaml
# ~/.smarttools/config.yaml
auto_fetch_from_registry: false
```

### Wrapper Script Collisions

When two tools from different owners have the same name:

| Scenario | Behavior |
|----------|----------|
| Install `official/summarize` | Creates wrapper `~/.local/bin/summarize` |
| Install `rob/summarize` (collision) | Creates wrapper `~/.local/bin/rob-summarize` |
| Uninstall `official/summarize` | Removes `summarize` wrapper, promotes `rob-summarize` → `summarize` if desired |

The first-installed tool with a given name gets the short wrapper. Subsequent tools use `owner-name` format.

To invoke a specific owner's tool:
```bash
# Short form (whichever was installed first)
summarize < file.txt

# Explicit owner form (always works)
rob-summarize < file.txt

# Or via smarttools run
smarttools run rob/summarize < file.txt
```

## Project Manifest (smarttools.yaml)
Defines tool dependencies with optional runtime overrides:
```
name: my-ai-project
version: "1.0.0"
dependencies:
  - name: rob/summarize
    version: ">=1.0.0"
overrides:
  rob/summarize:
    provider: ollama
```

Overrides are applied at runtime and do not mutate installed tool configs.

## CLI Config and Tokens
Global config lives in `~/.smarttools/config.yaml`:
```yaml
registry:
  url: https://gitea.brrd.tech/api/v1    # Must match canonical base path
  token: "reg_xxxxxxxxxxxx"
client_id: "anon_abc123def456"
auto_fetch_from_registry: true
```

`client_id` is generated locally and used for anonymous install dedupe.

## Publishing and Auth
Publishing uses registry accounts, not Gitea accounts:
- Public endpoints require no auth.
- `POST /tools` requires a registry token.
- The API server uses a private Gitea service account to open PRs.

### Publish Idempotency and Edge Cases

**Idempotency key:** `owner/name@version`

| Scenario | API Response | HTTP Code |
|----------|--------------|-----------|
| New version, no PR exists | Create PR, return URL | `201 Created` |
| PR already exists (pending) | Return existing PR URL | `200 OK` |
| Version already published | Error: version exists | `409 Conflict` |
| PR was closed without merge | Allow new PR | `201 Created` |
| PR was merged, then tool deleted | Error: version exists (tombstone) | `409 Conflict` |

**Version immutability enforcement:**
```json
// Attempt to publish existing version
// Response: 409 Conflict
{
  "error": {
    "code": "VERSION_EXISTS",
    "message": "Version 1.2.0 of 'rob/summarize' already exists and cannot be overwritten",
    "details": {
      "published_at": "2025-01-15T10:30:00Z",
      "action": "Bump version number to publish changes"
    }
  }
}
```

**Closed PR handling:**
- Track PR state in database: `pending`, `merged`, `closed`
- If PR was closed (rejected/abandoned), allow new submission for same version
- If PR was merged, version is immutable forever

**Update flow (new version, not overwrite):**
1. Developer modifies tool locally
2. Bumps version in `config.yaml` (e.g., `1.2.0` → `1.3.0`)
3. Runs `smarttools registry publish`
4. New PR created for `1.3.0`
5. Old version `1.2.0` remains available

## Publisher Registration
Publishers register on the registry website, not Gitea:

**Registration flow:**
1. User visits `https://gitea.brrd.tech/registry/register` (or future `registry.smarttools.dev`)
2. Creates account with email + password + slug
3. Receives verification email (optional in v1, but track `verified` status)
4. Logs into dashboard at `/dashboard`
5. Generates API token from dashboard
6. Uses token in CLI for publishing

### Authentication Security

**Password hashing:**
- Algorithm: Argon2id (memory-hard, recommended by OWASP)
- Parameters: `memory=65536, iterations=3, parallelism=4`
- Library: `argon2-cffi` for Python

```python
from argon2 import PasswordHasher
ph = PasswordHasher(memory_cost=65536, time_cost=3, parallelism=4)
hash = ph.hash(password)
ph.verify(hash, password)  # raises on mismatch
```

**API token format:**
```
reg_<random-32-bytes-base62>

Example: reg_7kX9mPqR2sT4vW6xY8zA1bC3dE5fG7hJ
```
- Prefix `reg_` for easy identification in logs/configs
- 32 bytes of cryptographically random data
- Base62 encoded (alphanumeric, no special chars)
- Total length: ~47 characters
- Stored as SHA-256 hash in database (never plain text)

**Token lifecycle:**
| Action | Behavior |
|--------|----------|
| Generate | Create new token, return once, store hash |
| List | Show token name, created date, last used (not the token itself) |
| Revoke | Set `revoked_at` timestamp, reject future uses |
| Rotate | Generate new token, optionally revoke old |

**Rate limits:**

| Endpoint | Limit | Window | Scope | Retry-After |
|----------|-------|--------|-------|-------------|
| `POST /register` | 5 | 1 hour | IP | 3600 |
| `POST /login` | 10 | 15 min | IP | 900 |
| `POST /login` (failed) | 5 | 15 min | IP + email | 900 |
| `POST /tokens` | 10 | 1 hour | Token | 3600 |
| `POST /tools` | 20 | 1 hour | Token | 3600 |
| `GET /tools/*` | 100 | 1 min | IP | 60 |
| `GET /download` | 60 | 1 min | IP | 60 |

**Rate limit response (429):**
```json
{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Too many requests. Try again in 60 seconds.",
    "details": {
      "limit": 100,
      "window": "1 minute",
      "retry_after": 60
    }
  }
}
```

**Headers on rate-limited response:**
```
HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1705766400
```

**Scope priority:** For authenticated requests, both IP and token limits apply. The more restrictive limit wins.

**Account lockout:**
- After 5 failed login attempts: 15-minute lockout for that email
- After 10 failed attempts: 1-hour lockout
- Lockout clears on successful password reset

**Password reset flow (deferred to v1.1):**
1. User requests reset via email
2. Server generates time-limited token (1 hour expiry)
3. Email contains reset link with token
4. User sets new password
5. All existing sessions/tokens optionally invalidated

**Email verification flow (deferred to v1.1):**
1. On registration, send verification email
2. User clicks link with verification token
3. Set `verified = true` in database
4. Unverified accounts can browse but not publish

### Token Scopes and Authorization

Tokens have scopes that limit their capabilities:

| Scope | Permissions |
|-------|-------------|
| `read` | View own published tools, download stats |
| `publish` | Submit new tools, update own tool metadata |
| `admin` | Yank tools, manage categories (registry admins only) |

**Default scope:** New tokens get `read,publish` by default.

**Ownership enforcement:**

```python
@app.route('/api/v1/tools', methods=['POST'])
@require_token(scopes=['publish'])
def publish_tool():
    token = get_current_token()
    tool_data = request.json

    # Enforce owner == token holder's slug
    if tool_data['owner'] != token.publisher.slug:
        return {
            "error": {
                "code": "FORBIDDEN",
                "message": f"Cannot publish to namespace '{tool_data['owner']}'. "
                           f"Your namespace is '{token.publisher.slug}'."
            }
        }, 403

    # Proceed with publish...
```

**`GET /api/v1/me/tools` authorization:**
- Requires valid token with `read` scope
- Returns only tools where `owner == token.publisher.slug`
- Includes pending PRs and all versions (including yanked)

### Web Session Security

Dashboard login uses session cookies (not tokens) for browser auth:

**Cookie settings:**
```python
SESSION_COOKIE_NAME = 'smarttools_session'
SESSION_COOKIE_HTTPONLY = True      # Prevent JS access
SESSION_COOKIE_SECURE = True        # HTTPS only in production
SESSION_COOKIE_SAMESITE = 'Lax'     # CSRF protection
SESSION_COOKIE_MAX_AGE = 86400 * 7  # 7 days
```

**CSRF protection:**
- All POST/PUT/DELETE forms include `csrf_token` hidden field
- Token validated server-side before processing
- 403 Forbidden if token missing or invalid

**Session lifecycle:**
| Event | Action |
|-------|--------|
| Login | Create session, set cookie |
| Logout | Delete session, clear cookie |
| Idle 24h | Session expires, re-login required |
| Password change | Invalidate all sessions |
| Token revocation | Existing sessions continue (token != session) |

**Secure session storage:**
```python
# Store sessions in DB, not filesystem
from flask_session import Session
app.config['SESSION_TYPE'] = 'sqlalchemy'
app.config['SESSION_SQLALCHEMY_TABLE'] = 'sessions'
```

**Database schema:**
```sql
-- Publishers
CREATE TABLE publishers (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    email TEXT UNIQUE NOT NULL,
    password_hash TEXT NOT NULL,
    slug TEXT UNIQUE NOT NULL,            -- immutable namespace: "rob", "alice-dev"
    display_name TEXT NOT NULL,           -- mutable: "Rob", "Alice Developer"
    bio TEXT,
    website TEXT,
    verified BOOLEAN DEFAULT FALSE,
    locked_until TIMESTAMP,               -- account lockout
    failed_login_attempts INTEGER DEFAULT 0,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- API tokens (one publisher can have multiple)
CREATE TABLE api_tokens (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    publisher_id INTEGER NOT NULL REFERENCES publishers(id),
    token_hash TEXT NOT NULL,
    name TEXT NOT NULL,           -- "CLI token", "CI token"
    last_used_at TIMESTAMP,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    revoked_at TIMESTAMP          -- NULL if active
);

-- Tools (links to publisher)
CREATE TABLE tools (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    owner TEXT NOT NULL,          -- namespace slug (immutable, from publisher.slug)
    name TEXT NOT NULL,
    version TEXT NOT NULL,
    description TEXT,
    category TEXT,
    tags TEXT,                    -- JSON array
    config_yaml TEXT NOT NULL,    -- Full tool config
    readme TEXT,
    publisher_id INTEGER NOT NULL REFERENCES publishers(id),
    deprecated BOOLEAN DEFAULT FALSE,
    deprecated_message TEXT,
    replacement TEXT,
    downloads INTEGER DEFAULT 0,
    published_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE(owner, name, version)
);

-- Download stats (for deduplication)
CREATE TABLE download_stats (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    tool_id INTEGER NOT NULL REFERENCES tools(id),
    client_id TEXT NOT NULL,
    downloaded_at DATE NOT NULL,
    UNIQUE(tool_id, client_id, downloaded_at)
);

-- Search index (FTS5)
CREATE VIRTUAL TABLE tools_fts USING fts5(
    name, description, tags, readme,
    content='tools',
    content_rowid='id'
);

-- FTS5 sync triggers (required for external content tables)
CREATE TRIGGER tools_ai AFTER INSERT ON tools BEGIN
    INSERT INTO tools_fts(rowid, name, description, tags, readme)
    VALUES (new.id, new.name, new.description, new.tags, new.readme);
END;

CREATE TRIGGER tools_ad AFTER DELETE ON tools BEGIN
    INSERT INTO tools_fts(tools_fts, rowid, name, description, tags, readme)
    VALUES ('delete', old.id, old.name, old.description, old.tags, old.readme);
END;

CREATE TRIGGER tools_au AFTER UPDATE ON tools BEGIN
    INSERT INTO tools_fts(tools_fts, rowid, name, description, tags, readme)
    VALUES ('delete', old.id, old.name, old.description, old.tags, old.readme);
    INSERT INTO tools_fts(rowid, name, description, tags, readme)
    VALUES (new.id, new.name, new.description, new.tags, new.readme);
END;

-- Pending PRs (track publish state)
CREATE TABLE pending_prs (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    publisher_id INTEGER NOT NULL REFERENCES publishers(id),
    owner TEXT NOT NULL,
    name TEXT NOT NULL,
    version TEXT NOT NULL,
    pr_number INTEGER NOT NULL,
    pr_url TEXT NOT NULL,
    status TEXT NOT NULL DEFAULT 'pending',  -- pending, merged, closed
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE(owner, name, version)
);

-- Webhook sync log (idempotency)
CREATE TABLE webhook_log (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    delivery_id TEXT UNIQUE NOT NULL,        -- Gitea delivery ID
    event_type TEXT NOT NULL,
    processed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```

**Note on tags indexing:** The `tags` column stores JSON arrays as text. For v1, FTS5 will search within the JSON string. If tag filtering becomes a bottleneck, normalize to a `tool_tags` junction table:

```sql
-- Future: normalized tags (if needed)
CREATE TABLE tags (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT UNIQUE NOT NULL
);

CREATE TABLE tool_tags (
    tool_id INTEGER REFERENCES tools(id),
    tag_id INTEGER REFERENCES tags(id),
    PRIMARY KEY (tool_id, tag_id)
);
```

**CLI first-time publish flow:**
```bash
$ smarttools registry publish

No registry account configured.

1. Register at: https://gitea.brrd.tech/registry/register
2. Generate a token from your dashboard
3. Enter your token below

Registry token: ********
Token saved to ~/.smarttools/config.yaml

Validating tool...
✓ config.yaml is valid
✓ README.md exists (2.3 KB)
✓ Version 1.0.0 not yet published

Publishing rob/my-tool@1.0.0...
✓ PR created: https://gitea.brrd.tech/rob/SmartTools-Registry/pulls/42

Your tool is pending review. You'll receive an email when it's approved.
```

## CLI Commands Reference
Full mapping of CLI commands to API calls:

### Registry Commands

```bash
# Search for tools
$ smarttools registry search <query> [--category=<cat>] [--limit=20]
    → GET /api/v1/tools/search?q=<query>&category=<cat>&limit=20

# Browse tools (TUI)
$ smarttools registry browse [--category=<cat>]
    → GET /api/v1/tools?category=<cat>&page=1
    → GET /api/v1/categories

# View tool details
$ smarttools registry info <owner/name>
    → GET /api/v1/tools/<owner>/<name>

# Install a tool
$ smarttools registry install <owner/name> [--version=<ver>]
    → GET /api/v1/tools/<owner>/<name>/download?version=<ver>&install=true
    → Writes to ~/.smarttools/<owner>/<name>/config.yaml
    → Generates ~/.local/bin/<name> wrapper (or <owner>-<name> if collision)

# Uninstall a tool
$ smarttools registry uninstall <owner/name>
    → Removes ~/.smarttools/<owner>/<name>/
    → Removes wrapper script

# Publish a tool
$ smarttools registry publish [path] [--dry-run]
    → POST /api/v1/tools (with registry token)
    → Returns PR URL

# List my published tools
$ smarttools registry my-tools
    → GET /api/v1/me/tools (with registry token)

# Update index cache
$ smarttools registry update
    → GET /api/v1/index.json
    → Writes to ~/.smarttools/registry/index.json
```

### Project Commands

```bash
# Install project dependencies from smarttools.yaml
$ smarttools install
    → Reads ./smarttools.yaml
    → For each dependency:
        GET /api/v1/tools/<owner>/<name>/download?version=<constraint>&install=true
    → Installs to ~/.smarttools/<owner>/<name>/

# Add a dependency to smarttools.yaml
$ smarttools add <owner/name> [--version=<constraint>]
    → Adds to ./smarttools.yaml dependencies
    → Runs install for that tool

# Show project dependencies status
$ smarttools deps
    → Reads ./smarttools.yaml
    → Shows installed status for each dependency
    → Note: "smarttools list" is reserved for listing installed tools
```

**Command naming note:** `smarttools list` already exists to list locally installed tools. Use `smarttools deps` to show project manifest dependencies.

### Flags available on most commands

| Flag | Description |
|------|-------------|
| `--offline` | Use cached index only, don't fetch |
| `--refresh` | Force refresh of cached data |
| `--json` | Output in JSON format |
| `--verbose` | Show detailed output |

## Webhooks and Security

### HMAC Verification

All Gitea webhooks are verified using HMAC-SHA256:

```python
import hmac
import hashlib

def verify_webhook(request, secret):
    signature = request.headers.get('X-Gitea-Signature')
    if not signature:
        return False

    expected = hmac.new(
        secret.encode(),
        request.body,
        hashlib.sha256
    ).hexdigest()

    return hmac.compare_digest(signature, expected)
```

### Replay Protection

While sync is idempotent, implement basic replay protection:

```python
def process_webhook(request):
    delivery_id = request.headers.get('X-Gitea-Delivery')

    # Check if already processed
    if db.webhook_log.exists(delivery_id=delivery_id):
        return {"status": "already_processed"}, 200

    # Verify signature
    if not verify_webhook(request, WEBHOOK_SECRET):
        return {"error": "invalid_signature"}, 401

    # Process with lock to prevent concurrent processing
    with db.lock(f"webhook:{delivery_id}"):
        # Double-check after acquiring lock
        if db.webhook_log.exists(delivery_id=delivery_id):
            return {"status": "already_processed"}, 200

        # Process the webhook
        result = sync_from_repo()

        # Log successful processing
        db.webhook_log.insert(
            delivery_id=delivery_id,
            event_type=request.json.get('action'),
            processed_at=datetime.utcnow()
        )

    return {"status": "processed"}, 200
```

### Sync Job Locking

Prevent concurrent sync operations:

```python
# Using file lock or database advisory lock
SYNC_LOCK_TIMEOUT = 300  # 5 minutes max

def sync_from_repo():
    try:
        with acquire_lock("registry_sync", timeout=SYNC_LOCK_TIMEOUT):
            # Pull latest from Gitea
            repo.fetch()
            repo.reset('origin/main', hard=True)

            # Parse and update database
            for tool_path in glob('tools/*/*/config.yaml'):
                update_tool_in_db(tool_path)

            # Rebuild FTS index if needed
            rebuild_fts_index()

    except LockTimeout:
        logger.warning("Sync already in progress, skipping")
        return {"status": "skipped", "reason": "sync_in_progress"}
```

### Atomic Sync Strategy

To avoid partially updated DB during webhook sync, use transactional table swap:

```python
def sync_from_repo_atomic():
    with acquire_lock("registry_sync", timeout=SYNC_LOCK_TIMEOUT):
        # 1. Pull latest from Gitea
        repo.fetch()
        repo.reset('origin/main', hard=True)

        # 2. Parse all tools into memory
        new_tools = []
        for tool_path in glob('tools/*/*/config.yaml'):
            tool_data = parse_tool(tool_path)
            if tool_data:
                new_tools.append(tool_data)

        # 3. Atomic swap using transaction
        with db.transaction():
            # Create temp table
            db.execute("CREATE TABLE tools_new AS SELECT * FROM tools WHERE 0")

            # Bulk insert into temp table
            for tool in new_tools:
                db.execute("INSERT INTO tools_new ...", tool)

            # Swap tables atomically
            db.execute("ALTER TABLE tools RENAME TO tools_old")
            db.execute("ALTER TABLE tools_new RENAME TO tools")
            db.execute("DROP TABLE tools_old")

            # Rebuild FTS index
            db.execute("INSERT INTO tools_fts(tools_fts) VALUES('rebuild')")

            # Update sync timestamp
            db.execute("UPDATE sync_status SET last_sync = ?", [datetime.utcnow()])
```

**Why atomic:** Per-row updates with FTS triggers can yield inconsistent reads under load. Readers may see partial state mid-sync. Table swap ensures all-or-nothing visibility.

### Error Handling

| Error Scenario | Behavior |
|----------------|----------|
| Repo fetch fails | Log error, retry in 5 min, alert if 3 failures |
| YAML parse error | Skip tool, log error, continue with others |
| Database write fails | Rollback transaction, retry once, then alert |
| Lock timeout | Skip this sync, next webhook will retry |

## Automated CI Validation
PRs are validated automatically using SmartTools (dogfooding):

```
PR Submitted
    │
    ▼
┌─────────────────────────────────────┐
│  Gitea CI runs validation tools:    │
│  • schema-validator                 │
│  • security-scanner                 │
│  • duplicate-detector               │
└───────────────┬─────────────────────┘
                │
        ┌───────┴───────┐
        │               │
    All pass        Any fail
        │               │
        ▼               ▼
  Auto-merge or     Add comment,
  flag for review   request changes
```

Validation checks:
1. **Schema validation**: config.yaml matches expected format
2. **Security scan**: No dangerous shell commands, no secrets in prompts
3. **Duplicate detection**: AI-powered similarity check against existing tools
4. **README check**: README.md exists and is non-empty

CI workflow (`.gitea/workflows/validate.yaml`):
```yaml
name: Validate Tool Submission
on: [pull_request]
jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Validate schema
        run: python scripts/validate_tool.py ${{ github.event.pull_request.head.sha }}
      - name: Security scan
        run: smarttools run security-scanner < changed_files.txt
      - name: Check duplicates
        run: smarttools run duplicate-detector < changed_files.txt
```

## Registry Repository Structure
Full structure of the SmartTools-Registry repo:

```
SmartTools-Registry/
├── README.md                        # Registry overview
├── CONTRIBUTING.md                  # How to submit tools
├── LICENSE
│
├── tools/                           # All published tools
│   ├── rob/
│   │   ├── summarize/
│   │   │   ├── config.yaml
│   │   │   └── README.md
│   │   └── translate/
│   │       ├── config.yaml
│   │       └── README.md
│   └── alice/
│       └── code-review/
│           ├── config.yaml
│           └── README.md
│
├── categories/
│   └── categories.yaml              # Category definitions
│
├── index.json                       # Auto-generated search index
│
├── .gitea/
│   └── workflows/
│       ├── validate.yaml            # PR validation
│       ├── build-index.yaml         # Rebuild index on merge
│       └── notify-api.yaml          # Webhook to API server
│
└── scripts/
    ├── validate_tool.py             # Schema validation
    ├── build_index.py               # Generate index.json
    ├── check_duplicates.py          # Similarity detection
    └── security_scan.py             # Security checks
```

`categories.yaml` format:
```yaml
categories:
  - name: text-processing
    description: Tools for manipulating and analyzing text
    icon: 📝
  - name: code
    description: Tools for code review, generation, and analysis
    icon: 💻
  - name: data
    description: Tools for data transformation and analysis
    icon: 📊
  - name: media
    description: Tools for image, audio, and video processing
    icon: 🎨
  - name: productivity
    description: General productivity and automation tools
    icon: ⚡
```

## Download Stats

### Counting Methodology

- Count installs only, not views or searches
- Increment **after** successful download (response sent)
- Dedupe by `client_id + tool_id + date`

```python
def download_tool(owner, name, version, install=False, client_id=None):
    tool = get_tool(owner, name, version)
    if not tool:
        return {"error": "not_found"}, 404

    config_yaml = tool.config_yaml

    # Only count if this is an install (not just viewing)
    if install:
        record_download(tool.id, client_id)

    return {"config": config_yaml}, 200

def record_download(tool_id, client_id):
    today = date.today()

    # Use client_id if provided, otherwise generate anonymous fallback
    effective_client_id = client_id or f"anon_{hash(request.remote_addr)}"

    # Dedupe: only count once per client per tool per day
    try:
        db.download_stats.insert(
            tool_id=tool_id,
            client_id=effective_client_id,
            downloaded_at=today
        )
        # Increment counter (can be async/batch updated)
        db.execute("UPDATE tools SET downloads = downloads + 1 WHERE id = ?", [tool_id])
    except IntegrityError:
        pass  # Already counted today, ignore
```

### Client ID Generation

CLI generates a persistent anonymous ID on first run:

```python
# In CLI, on first run
import uuid
import os

CONFIG_PATH = os.path.expanduser("~/.smarttools/config.yaml")

def get_or_create_client_id():
    config = load_config()
    if 'client_id' not in config:
        config['client_id'] = f"anon_{uuid.uuid4().hex[:16]}"
        save_config(config)
    return config['client_id']
```

**Fallback when client_id missing:**
- If header `X-Client-ID` not sent, use IP hash as fallback
- This still provides some dedupe for anonymous users
- Logged users' downloads are attributed to their account instead

### Privacy Considerations

- No IP addresses stored in database
- `client_id` is client-controlled and can be regenerated
- Stats are aggregated (total count), not individual tracking

### Async Stats Strategy

To avoid DB contention on the hot download path:

```python
from queue import Queue
from threading import Thread

# In-memory queue for stats
stats_queue = Queue()

def record_download_async(tool_id, client_id):
    """Non-blocking: enqueue for background processing"""
    stats_queue.put({
        'tool_id': tool_id,
        'client_id': client_id,
        'date': date.today()
    })

def stats_worker():
    """Background thread: batch process stats every 5 seconds"""
    batch = []
    while True:
        try:
            item = stats_queue.get(timeout=5)
            batch.append(item)
        except Empty:
            if batch:
                flush_batch(batch)
                batch = []

def flush_batch(batch):
    """Bulk insert with conflict ignore"""
    with db.transaction():
        for item in batch:
            try:
                db.execute("""
                    INSERT INTO download_stats (tool_id, client_id, downloaded_at)
                    VALUES (?, ?, ?)
                    ON CONFLICT DO NOTHING
                """, [item['tool_id'], item['client_id'], item['date']])
            except Exception as e:
                logger.warning(f"Stats insert failed: {e}")
                # Don't fail downloads for stats errors
```

**Failure behavior:** If stats DB write fails, log the error but don't fail the download. Stats are "best effort" - the download must succeed.

## Search
- Primary search: SQLite FTS5 inside the API.
- `index.json` provides offline CLI search and backup.
- If FTS5 is stale, return results with `X-Search-Index-Stale: true`.

## API Caching Strategy

### Cache Headers

| Endpoint | Cache-Control | ETag | Notes |
|----------|---------------|------|-------|
| `GET /index.json` | `max-age=300, stale-while-revalidate=60` | Yes | 5 min cache, background refresh |
| `GET /tools/{owner}/{name}` | `max-age=60` | Yes | 1 min cache |
| `GET /tools/{owner}/{name}/download` | `max-age=3600, immutable` | Yes | Immutable versions, 1 hour |
| `GET /tools/search` | `no-cache` | No | Always fresh |
| `GET /categories` | `max-age=3600` | Yes | Categories change rarely |

### ETag Implementation

```python
import hashlib
from datetime import datetime

def get_tool_etag(tool):
    """Generate ETag from tool identity (immutable versions don't change)"""
    # Since versions are immutable, owner/name@version is stable
    # Use published_at for extra safety (not updated_at, which doesn't exist)
    content = f"{tool.owner}/{tool.name}@{tool.version}:{tool.published_at.isoformat()}"
    return hashlib.md5(content.encode()).hexdigest()

def get_index_etag():
    """Generate ETag from last sync timestamp"""
    last_sync = db.get_last_sync_time()
    return hashlib.md5(last_sync.isoformat().encode()).hexdigest()

@app.route('/api/v1/tools/<owner>/<name>/download')
def download_tool(owner, name):
    version = request.args.get('version', 'latest')
    tool = resolve_and_get_tool(owner, name, version)
    etag = get_tool_etag(tool)

    # Check If-None-Match header
    if request.headers.get('If-None-Match') == etag:
        return '', 304  # Not Modified

    response = jsonify({
        "data": {
            "owner": tool.owner,
            "name": tool.name,
            "resolved_version": tool.version,
            "config": tool.config_yaml
        }
    })
    response.headers['ETag'] = etag
    response.headers['Cache-Control'] = 'max-age=3600, immutable'
    return response
```

**Note:** Since tool versions are immutable, the ETag based on `owner/name@version` is permanently stable. The `published_at` timestamp is included for defense-in-depth but won't change.

### DB vs Repo Read Strategy

| Scenario | Read From | Reason |
|----------|-----------|--------|
| Normal operation | SQLite DB | Fast, indexed, FTS |
| DB empty/corrupted | Gitea repo | Fallback/recovery |
| Webhook sync in progress | DB (stale OK) | Avoid blocking reads |
| Search query | SQLite FTS5 | Full-text search |
| Download specific version | DB, fallback to repo | DB is cache, repo is truth |

### Staleness Detection

```python
STALE_THRESHOLD = timedelta(minutes=10)

def is_db_stale():
    last_sync = db.get_last_sync_time()
    return datetime.utcnow() - last_sync > STALE_THRESHOLD

@app.route('/tools/search')
def search_tools(q):
    results = db.search_fts(q)

    response = jsonify({"results": results})
    if is_db_stale():
        response.headers['X-Search-Index-Stale'] = 'true'
        response.headers['X-Last-Sync'] = db.get_last_sync_time().isoformat()

    return response
```

## Error Model

### Response Envelopes

**Success response:**
```json
{
  "data": { ... },
  "meta": {
    "page": 1,
    "per_page": 20,
    "total": 42,
    "total_pages": 3
  }
}
```

**Error response:**
```json
{
  "error": {
    "code": "TOOL_NOT_FOUND",
    "message": "Tool 'foo/bar' does not exist",
    "details": {
      "owner": "foo",
      "name": "bar",
      "suggestion": "Did you mean 'rob/bar'?"
    },
    "docs_url": "https://registry.smarttools.dev/docs/errors#TOOL_NOT_FOUND"
  }
}
```

### Error Codes

| Code | HTTP | Description |
|------|------|-------------|
| `TOOL_NOT_FOUND` | 404 | Tool does not exist |
| `VERSION_NOT_FOUND` | 404 | Requested version doesn't exist |
| `VERSION_EXISTS` | 409 | Cannot overwrite published version |
| `INVALID_VERSION` | 400 | Version string is not valid semver |
| `INVALID_CONSTRAINT` | 400 | Version constraint syntax error |
| `CONSTRAINT_UNSATISFIABLE` | 404 | No version matches constraint |
| `VALIDATION_ERROR` | 400 | Tool config validation failed |
| `UNAUTHORIZED` | 401 | Missing or invalid auth token |
| `FORBIDDEN` | 403 | Token valid but lacks permission |
| `RATE_LIMITED` | 429 | Too many requests |
| `SLUG_TAKEN` | 409 | Namespace slug already registered |
| `ACCOUNT_LOCKED` | 403 | Too many failed login attempts |
| `SERVER_ERROR` | 500 | Internal error (logged for debugging) |

## Error Scenarios and Fallbacks

### CLI Error Handling

| Scenario | CLI Behavior | User Message |
|----------|--------------|--------------|
| Registry offline | Use cached tools if available | "Registry unavailable. Using cached version." |
| Tool not found | Check cache, then fail | "Tool 'foo/bar' not found in registry or cache." |
| Version constraint unsatisfiable | Show available versions | "No version matches '>=5.0.0'. Available: 1.0.0, 1.1.0, 1.2.0" |
| Auth token expired | Prompt for new token | "Token expired. Please re-authenticate." |
| Rate limited | Wait and retry (backoff) | "Rate limited. Retrying in 30 seconds..." |
| Network timeout | Retry with backoff, then fail | "Connection timed out. Check your network." |

### Validation Failure Details

When `VALIDATION_ERROR` occurs, provide specific field errors:

```json
{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Tool configuration is invalid",
    "details": {
      "errors": [
        {
          "path": "steps[0].provider",
          "message": "Provider 'gpt5' is not recognized",
          "allowed": ["claude", "openai", "ollama", "mock"]
        },
        {
          "path": "version",
          "message": "Version '1.0' is not valid semver (use '1.0.0')"
        }
      ]
    },
    "docs_url": "https://registry.smarttools.dev/docs/tool-format"
  }
}
```

### Dependency Resolution Failures

When `smarttools install` fails on a manifest:

```bash
$ smarttools install

Error: Could not resolve all dependencies

  rob/summarize@^2.0.0
    ✗ No matching version (latest: 1.2.0)

  alice/translate@>=1.0.0
    ✓ Found 1.3.0

Suggestions:
  - Update rob/summarize constraint to "^1.0.0"
  - Contact the tool author for a v2 release
```

### Graceful Degradation

| Component Down | Fallback Behavior |
|----------------|-------------------|
| API server | CLI uses `~/.smarttools/registry/index.json` for search |
| Gitea repo | API serves from DB cache (may be stale) |
| FTS5 index | Fall back to LIKE queries (slower but works) |
| Network | Use locally installed tools, skip registry features |

## UX Requirements (CLI/TUI)

### Publishing UX

- `smarttools registry publish --dry-run` validates locally and shows what would be submitted:
  ```bash
  $ smarttools registry publish --dry-run

  Validating tool...
  ✓ config.yaml is valid
  ✓ README.md exists (2.3 KB)
  ✓ Version 1.1.0 not yet published

  Would submit:
    Owner: rob
    Name: summarize
    Version: 1.1.0
    Category: text-processing
    Tags: summarization, ai, text

  Config preview:
  ─────────────────────────────
  name: summarize
  version: "1.1.0"
  description: Summarize text using AI
  ...
  ─────────────────────────────

  Run without --dry-run to submit for review.
  ```

- **Version bump reminder:** CLI warns if version hasn't changed from published:
  ```
  ⚠ Version 1.0.0 is already published. Bump version in config.yaml to publish changes.
  ```

- First-time publishing flow prompts for token and saves it to config.

### Progress Indicators

Long-running operations show progress:

```bash
$ smarttools install

Installing project dependencies...
  [1/3] rob/summarize@^1.0.0
        Resolving version... 1.2.0
        Downloading... done
        Installing... done ✓
  [2/3] alice/translate@>=2.0.0
        Resolving version... 2.1.0
        Downloading... done
        Installing... done ✓
  [3/3] official/code-review@*
        Resolving version... 1.0.0
        Downloading... done
        Installing... done ✓

✓ Installed 3 tools
```

```bash
$ smarttools registry publish

Submitting rob/summarize@1.1.0...
  Validating... done ✓
  Uploading... done ✓
  Creating PR... done ✓

✓ PR created: https://gitea.brrd.tech/rob/SmartTools-Registry/pulls/42

Your tool is pending review. You'll receive an email when it's approved.
```

### TUI Browse

`smarttools registry browse` opens a full-screen terminal UI:

```
┌─ SmartTools Registry ───────────────────────────────────────┐
│ Search: [________________] [All Categories ▼] [Sort: Popular ▼] │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ▶ rob/summarize v1.2.0                          ⬇ 142     │
│    Summarize text using AI                                  │
│    [text-processing] [ai] [summarization]                   │
│                                                             │
│    alice/translate v2.1.0                        ⬇ 98      │
│    Translate text between languages                         │
│    [text-processing] [translation]                          │
│                                                             │
│    official/code-review v1.0.0                   ⬇ 87      │
│    AI-powered code review                                   │
│    [code] [review] [ai]                                     │
│                                                             │
├─────────────────────────────────────────────────────────────┤
│ ↑↓ Navigate  Enter: Details  i: Install  /: Search  q: Quit │
└─────────────────────────────────────────────────────────────┘
```

**Keyboard shortcuts:**
| Key | Action |
|-----|--------|
| `↑/↓` or `j/k` | Navigate list |
| `Enter` | View tool details |
| `i` | Install selected tool |
| `/` | Focus search box |
| `c` | Change category filter |
| `s` | Change sort order |
| `?` | Show help |
| `q` | Quit |

**Virtual scrolling:** For large tool lists (>100), use virtual scrolling to maintain performance.

### Project Initialization

```bash
$ smarttools init

Creating smarttools.yaml...

Project name [my-project]: my-ai-project
Version [1.0.0]:

Would you like to add any tools? (search with 's', skip with Enter)
> s
Search: summ
  1. rob/summarize v1.2.0 - Summarize text using AI
  2. alice/summary v1.0.0 - Generate summaries

Add tool (number, or Enter to finish): 1
Added rob/summarize@^1.2.0

Add tool (number, or Enter to finish):

✓ Created smarttools.yaml

name: my-ai-project
version: "1.0.0"
dependencies:
  - name: rob/summarize
    version: "^1.2.0"

Run 'smarttools install' to install dependencies.
```

### Accessibility

- **CLI:** All output works with screen readers, no color-only information
- **TUI:** Full keyboard navigation, high-contrast mode support
- **Web UI:** WCAG 2.1 AA compliance target
  - Semantic HTML
  - ARIA labels for interactive elements
  - Focus management in modals
  - Skip links for navigation

## Offline Cache
Cache registry index locally:
```
~/.smarttools/registry/index.json
```
Refresh when older than 24 hours; support `--offline` and `--refresh` flags.

### Index Integrity

The cached `index.json` includes integrity metadata:

```json
{
  "version": "1.0",
  "generated_at": "2025-01-20T12:00:00Z",
  "checksum": "sha256:abc123...",
  "tool_count": 142,
  "tools": [...]
}
```

**API response headers:**
```
ETag: "abc123def456"
X-Index-Checksum: sha256:abc123...
X-Index-Generated: 2025-01-20T12:00:00Z
```

**CLI verification:**
```python
def verify_cached_index():
    """Verify cached index integrity on load"""
    cached = load_cached_index()
    if not cached:
        return None

    # Verify checksum
    content = json.dumps(cached['tools'], sort_keys=True)
    computed = hashlib.sha256(content.encode()).hexdigest()

    if computed != cached.get('checksum', '').replace('sha256:', ''):
        logger.warning("Cached index checksum mismatch, will refresh")
        return None

    return cached
```

**Corruption handling:**
- If checksum fails, discard cache and fetch fresh
- If partial write detected (missing fields), discard and refresh
- CLI shows warning: "Cached index corrupted, fetching fresh copy..."

## Web UI Vision
The registry includes a full website, not just an API:

**Site structure:**
```
registry.smarttools.dev (or gitea.brrd.tech/registry)
├── /                           # Landing page
├── /tools                      # Browse all tools
├── /tools/{owner}/{name}       # Tool detail page
├── /categories                 # Browse by category
├── /categories/{name}          # Tools in category
├── /search?q=...               # Search results
├── /docs                       # Documentation
│   ├── /docs/getting-started
│   ├── /docs/creating-tools
│   ├── /docs/publishing
│   └── /docs/best-practices
├── /tutorials                  # Step-by-step guides
│   ├── /tutorials/first-tool
│   ├── /tutorials/chaining-steps
│   └── /tutorials/code-steps
├── /examples                   # Example projects
├── /blog                       # Updates, announcements (optional)
├── /register                   # Publisher registration
├── /login                      # Publisher login
├── /dashboard                  # Publisher dashboard
│   ├── /dashboard/tools        # My published tools
│   ├── /dashboard/tokens       # API tokens
│   └── /dashboard/settings     # Account settings
└── /api/v1/...                 # API endpoints
```

**Landing page content:**
- Hero: "Share and discover AI-powered CLI tools"
- Quick install example
- Featured/popular tools
- Category highlights
- "Get Started" CTA

**Tool detail page:**
- Name, description, version, author
- README rendered as markdown (sanitized)
- Install command (copy-to-clipboard)
- Version history
- Download stats
- Category/tags
- "Report" button for abuse

### README Security

When rendering README markdown, apply XSS sanitization:

```python
import bleach
from markdown import markdown

ALLOWED_TAGS = [
    'h1', 'h2', 'h3', 'h4', 'h5', 'h6',
    'p', 'br', 'hr',
    'ul', 'ol', 'li',
    'strong', 'em', 'code', 'pre',
    'blockquote',
    'a', 'img',
    'table', 'thead', 'tbody', 'tr', 'th', 'td'
]

ALLOWED_ATTRS = {
    'a': ['href', 'title'],
    'img': ['src', 'alt', 'title'],
    'code': ['class'],  # for syntax highlighting
}

def render_readme_safe(readme_raw: str) -> str:
    """Convert markdown to sanitized HTML"""
    # Convert markdown to HTML
    html = markdown(readme_raw, extensions=['fenced_code', 'tables'])

    # Sanitize to prevent XSS
    safe_html = bleach.clean(
        html,
        tags=ALLOWED_TAGS,
        attributes=ALLOWED_ATTRS,
        strip=True
    )

    # Linkify URLs
    safe_html = bleach.linkify(safe_html)

    return safe_html
```

**Storage strategy:**
- Store raw README in `tools.readme`
- Render and sanitize on request (or cache rendered HTML)
- Never trust client-submitted HTML directly

**Tech stack options:**
| Option | Pros | Cons |
|--------|------|------|
| Flask + Jinja + Tailwind | Simple, Python-only, fast to build | Less interactive |
| FastAPI + Vue/React SPA | Modern, interactive | More complex, separate build |
| Astro/Next.js | Great SEO, static-first | Different stack (Node.js) |

**Recommendation:** Flask + Jinja + Tailwind for v1
- Keeps everything in Python
- Server-rendered is fine for a registry
- Good SEO out of the box
- Can add interactivity with Alpine.js or htmx if needed

**Monetization considerations:**
- AdSense-compatible (server-rendered pages)
- Analytics tracking for traffic insights
- Future: sponsored tools, featured placements
- Future: premium publisher tiers (more tools, priority review)

## Implementation Phases

### Phase 1: Foundation
- Define `smarttools.yaml` manifest format
- Implement tool resolution order (local → global → registry)
- Create SmartTools-Registry repo on Gitea (bootstrap)
- Add 3-5 example tools to seed the registry

### Phase 2: Core Backend
- Set up Flask/FastAPI project structure
- Implement SQLite database schema
- Build core API endpoints (list, search, get, download)
- Implement webhook receiver for Gitea sync
- Set up HMAC verification

### Phase 3: CLI Commands
- `smarttools registry search`
- `smarttools registry install`
- `smarttools registry info`
- `smarttools registry browse` (TUI)
- Local index caching

### Phase 4: Publishing
- Publisher registration (web UI)
- Token management
- `smarttools registry publish` command
- PR creation via Gitea API
- CI validation workflows

### Phase 5: Project Dependencies
- `smarttools install` (from manifest)
- `smarttools add` command
- Runtime override application
- Dependency resolution

### Phase 6: Smart Features
- SQLite FTS5 search index
- AI-powered auto-categorization
- Duplicate/similarity detection
- Security scanning

### Phase 7: Full Web UI
- Landing page
- Tool browsing/search pages
- Tool detail pages with README rendering
- Publisher dashboard
- Documentation/tutorials section

### Phase 8: Polish & Scale
- Rate limiting
- Abuse reporting
- Analytics integration
- Performance optimization
- Monitoring/alerting