# SmartTools Registry Design ## Purpose Build a centralized registry for SmartTools to enable discovery, publishing, dependency management, and future curation at scale. ## Terminology | Term | Definition | |------|------------| | **Tool definition** | The full YAML file in the registry (`config.yaml`) containing name, steps, arguments, etc. | | **Tool config** | The configuration within a tool definition (arguments, steps, provider settings) | | **smarttools.yaml** | Project manifest file declaring tool dependencies and overrides | | **config.yaml** | The tool definition file, both in registry and when installed locally | | **Owner** | Immutable namespace slug identifying the publisher (e.g., `rob`, `alice`) | | **Publisher** | A registered user who can publish tools to the registry | | **Wrapper script** | Auto-generated bash script in `~/.local/bin/` that invokes a tool | **Canonical naming:** Use `SmartTools-Registry` (capitalized, hyphenated) for the repository name. ## Diagram References - System overview: `discussions/diagrams/smarttools-registry_rob_1.puml` - Data flows: `discussions/diagrams/smarttools-registry_rob_5.puml` ## System Overview Users interact via the CLI and a future Web UI. Both call a Registry API hosted at `https://gitea.brrd.tech/api/v1` (future alias: `registry.smarttools.dev/api/v1`). The API syncs from a Gitea-backed registry repo and maintains a SQLite cache/search index. **Canonical API base path:** `https://gitea.brrd.tech/api/v1` All API endpoints are versioned under `/api/v1`. When breaking changes are needed, a new version (`/api/v2`) will be introduced with deprecation notices. Core API endpoints: - `GET /api/v1/tools` - `GET /api/v1/tools/search?q=...` - `GET /api/v1/tools/{owner}/{name}` - `GET /api/v1/tools/{owner}/{name}/versions` - `GET /api/v1/tools/{owner}/{name}/download?version=...` - `POST /api/v1/tools` (publish) - `GET /api/v1/categories` - `GET /api/v1/stats/popular` - `POST /api/v1/webhook/gitea` ### Pagination All list endpoints support pagination: | Parameter | Default | Max | Description | |-----------|---------|-----|-------------| | `page` | 1 | - | Page number (1-indexed) | | `per_page` | 20 | 100 | Items per page | | `sort` | `downloads` | - | Sort field | | `order` | `desc` | - | Sort order (asc/desc) | **Stable ordering:** To ensure deterministic results across pages, sorting includes a secondary key: - Primary: requested field (e.g., `downloads`) - Secondary: `published_at` (desc) - Tertiary: `id` (for absolute stability) ```sql ORDER BY downloads DESC, published_at DESC, id DESC LIMIT 20 OFFSET 0 ``` **Response pagination metadata:** ```json { "data": [...], "meta": { "page": 1, "per_page": 20, "total": 142, "total_pages": 8 } } ``` ### Input Constraints Size limits to prevent oversized uploads: | Field | Max Size | Notes | |-------|----------|-------| | `config.yaml` | 64 KB | Tool definition | | `README.md` | 256 KB | Documentation | | Request body | 512 KB | Total POST payload | | Tool name | 64 chars | Alphanumeric + hyphen | | Description | 500 chars | Short summary | | Tag | 32 chars | Individual tag | | Tags array | 10 items | Maximum tags per tool | **Validation errors:** ```json { "error": { "code": "PAYLOAD_TOO_LARGE", "message": "config.yaml exceeds 64KB limit", "details": { "field": "config", "size": 72000, "limit": 65536 } } } ``` ### Sort Fields and Indexes **Allowed sort fields:** | Endpoint | Allowed `sort` values | |----------|----------------------| | `GET /tools` | `downloads`, `published_at`, `name` | | `GET /tools/search` | `relevance`, `downloads`, `published_at` | | `GET /categories` | `name`, `tool_count` | Invalid sort values return 400: ```json {"error": {"code": "INVALID_SORT", "message": "Unknown sort field 'foo'. Allowed: downloads, published_at, name"}} ``` **Database indexes:** ```sql -- Frequent query patterns CREATE INDEX idx_tools_owner_name ON tools(owner, name); CREATE INDEX idx_tools_category ON tools(category); CREATE INDEX idx_tools_published_at ON tools(published_at DESC); CREATE INDEX idx_tools_downloads ON tools(downloads DESC); CREATE INDEX idx_tools_owner_name_version ON tools(owner, name, version); -- For pagination stability CREATE INDEX idx_tools_sort_stable ON tools(downloads DESC, published_at DESC, id DESC); -- Publisher lookups CREATE INDEX idx_publishers_slug ON publishers(slug); CREATE INDEX idx_publishers_email ON publishers(email); -- Token lookups CREATE INDEX idx_api_tokens_hash ON api_tokens(token_hash); CREATE INDEX idx_api_tokens_publisher ON api_tokens(publisher_id); ``` ### API Version Compatibility **Forward compatibility:** Clients should ignore unknown fields in API responses: ```python # Good: ignore unknown fields tool = response['data'] name = tool.get('name') # Don't fail if 'new_field' exists but client doesn't know about it # Bad: strict parsing that fails on unknown fields tool = ToolSchema.parse(response['data']) # May fail on new fields ``` **Backward compatibility:** The API will: - Never remove fields in a version (only deprecate) - Never change field types - Add new optional fields without version bump - Use new version (`/api/v2`) for breaking changes **Deprecation process:** 1. Add `X-Deprecated-Field: old_field` header 2. Document in changelog 3. Remove after 6 months minimum 4. Major version bump if widely used **Client version header:** ``` X-SmartTools-Client: cli/1.2.0 ``` Helps server track client versions for deprecation decisions. ## Source of Truth - Gitea registry repo is the source of truth. - API syncs repo content into SQLite for fast queries, stats, and FTS5 search. - `index.json` remains useful for offline CLI search and as a fallback. If the cache is stale, the API can fall back to repo reads; a warning header may be emitted. ## Namespacing and Paths Support owner/name from day one: - Registry path: `tools/{owner}/{name}/config.yaml` - API URL: `/tools/{owner}/{name}` - Install: `smarttools registry install rob/summarize` - Shorthand: `smarttools registry install summarize` resolves to the official namespace. PR branches: `submit/{owner}/{name}/{version}`. ### Namespace Identity The `owner` is an **immutable slug**, not the display name: ```sql -- In publishers table slug TEXT UNIQUE NOT NULL, -- immutable: "rob", "alice-dev" display_name TEXT NOT NULL, -- mutable: "Rob", "Alice Developer" ``` **Slug rules:** - Lowercase alphanumeric + hyphens only: `^[a-z0-9][a-z0-9-]*[a-z0-9]$` - 2-39 characters - Cannot start/end with hyphen - Set once at registration, cannot be changed - Reserved slugs: `official`, `admin`, `system`, `api`, `registry` **Rename policy:** - `display_name` can be changed anytime via dashboard - `slug` (owner) is permanent to preserve URLs and tool references - If a publisher absolutely must change slug (legal reasons, etc.): 1. Create new account with new slug 2. Republish tools under new namespace 3. Mark old tools as deprecated with `replacement` pointing to new namespace 4. Old namespace remains reserved (cannot be reused by others) **Why immutable:** - `rob/summarize@1.0.0` must always resolve to the same tool - Prevents namespace hijacking after rename - Simplifies caching and CDN strategies ## Tool Format (Registry == Local) Registry tool folders mirror local tools: ``` tools/ rob/ summarize/ config.yaml README.md ``` Tool files match the existing SmartTools format. Registry-specific metadata is kept under `registry:`. Deprecation is tool-defined and top-level: ```yaml name: summarize version: "1.2.0" deprecated: true deprecated_message: "Security issue. Use v1.2.1" replacement: "rob/summarize@1.2.1" registry: published_at: "2025-01-15T10:30:00Z" downloads: 142 ``` **Schema compatibility note:** The current SmartTools config parser may reject unknown top-level keys like `deprecated`, `replacement`, and `registry`. Before implementing registry features: 1. Update the YAML parser to ignore unknown keys (permissive mode) 2. Or explicitly define these fields in the Tool dataclass with defaults 3. Validate registry-specific fields only when publishing, not when running locally This ensures local tools continue to work even if they don't have registry fields. ## Versioning and Immutability - Unique key: `owner/name + version`. - Published versions are immutable. - Deprecation uses `deprecated`, `deprecated_message`, and `replacement`. - CLI warns on install if a version is deprecated. ### Yank Policy Yanking allows removing a version from resolution without deleting it (for auditability): ```yaml # In tool config yanked: true yanked_reason: "Critical security vulnerability CVE-2025-1234" yanked_at: "2025-01-20T15:00:00Z" ``` **Yanked version behavior:** | Operation | Behavior | |-----------|----------| | `install foo@1.0.0` (exact) | Warns but allows install | | `install foo@^1.0.0` (constraint) | Excludes yanked, resolves to next valid | | `search` / `browse` | Hidden by default, shown with `--include-yanked` | | Direct URL access | Returns tool with `yanked: true` in response | | Already installed | Continues to work, no forced removal | **Database schema addition:** ```sql -- Add to tools table yanked BOOLEAN DEFAULT FALSE, yanked_reason TEXT, yanked_at TIMESTAMP ``` **Yank vs Delete:** - **Yank**: Version remains in DB, excluded from resolution, auditable - **Delete**: Reserved for DMCA/legal, requires admin action, leaves tombstone record ### Version Format Tools use semantic versioning (semver): ``` MAJOR.MINOR.PATCH[-PRERELEASE][+BUILD] Examples: 1.0.0 # stable release 1.2.3 # stable release 2.0.0-alpha.1 # prerelease 2.0.0-beta.2 # prerelease 2.0.0-rc.1 # release candidate ``` ### Version Constraints Manifest files support these constraint formats: | Constraint | Meaning | Example Match | |------------|---------|---------------| | `1.2.3` | Exact version | `1.2.3` only | | `>=1.2.0` | Minimum version | `1.2.0`, `1.3.0`, `2.0.0` | | `<2.0.0` | Below version | `1.9.9`, `1.0.0` | | `>=1.0.0,<2.0.0` | Range | `1.0.0` to `1.9.9` | | `^1.2.3` | Compatible (same major) | `1.2.3` to `1.9.9` | | `~1.2.3` | Approximately (same minor) | `1.2.3` to `1.2.9` | | `*` | Any version | latest stable | ### Version Resolution Rules When resolving a version constraint: 1. **Filter**: Get all versions matching the constraint 2. **Exclude prereleases**: Unless constraint explicitly includes them (e.g., `>=2.0.0-alpha.1`) 3. **Sort**: By semver precedence (descending) 4. **Select**: Highest matching version **Tie-breakers:** - Stable versions preferred over prereleases - Later publish date wins if versions are equal (shouldn't happen with immutability) **Unsatisfiable constraints:** ```json // API Response: 404 { "error": { "code": "VERSION_NOT_FOUND", "message": "No version of 'rob/summarize' satisfies constraint '>=5.0.0'", "details": { "tool": "rob/summarize", "constraint": ">=5.0.0", "available_versions": ["1.0.0", "1.1.0", "1.2.0"], "latest_stable": "1.2.0" } } } ``` ### Prerelease Handling - Prereleases are **not** returned for `*` or range constraints by default - To install prerelease: `smarttools registry install rob/summarize@2.0.0-beta.1` - To allow prereleases in manifest: `version: ">=2.0.0-0"` (the `-0` suffix includes prereleases) ### Download Endpoint Version Selection The `/api/v1/tools/{owner}/{name}/download` endpoint accepts version parameters: | Parameter | Behavior | Example | |-----------|----------|---------| | (none) | Returns latest stable version | `/download` → `1.2.0` | | `version=1.2.0` | Exact version (must exist) | `/download?version=1.2.0` | | `version=^1.0.0` | Server resolves constraint | `/download?version=^1.0.0` → `1.2.0` | | `version=latest` | Alias for latest stable | `/download?version=latest` | **Server-side resolution:** The API server resolves version constraints, not the client. This ensures consistent resolution and allows the server to apply policies (e.g., exclude yanked versions). ``` GET /api/v1/tools/rob/summarize/download?version=^1.0.0&install=true Response (200): { "data": { "owner": "rob", "name": "summarize", "resolved_version": "1.2.0", "config": "... YAML content ..." }, "meta": { "constraint": "^1.0.0", "available_versions": ["1.0.0", "1.1.0", "1.2.0"] } } ``` **Invalid/unsatisfiable constraint:** ``` GET /api/v1/tools/rob/summarize/download?version=^5.0.0 Response (404): { "error": { "code": "CONSTRAINT_UNSATISFIABLE", "message": "No version matches constraint '^5.0.0'", "details": { "constraint": "^5.0.0", "latest_stable": "1.2.0", "available_versions": ["1.0.0", "1.1.0", "1.2.0"] } } } ``` ## Tool Resolution Order When a tool is invoked, the CLI searches in this order: 1. **Local project**: `./.smarttools///config.yaml` (or `./.smarttools//` for unnamespaced) 2. **Global user**: `~/.smarttools///config.yaml` 3. **Registry**: Fetch from API, install to global, then run 4. **Error**: `Tool '' not found` Step 3 only occurs if `auto_fetch_from_registry: true` in config (default: true). **Path convention:** Use `.smarttools/` (with leading dot) for both local and global to maintain consistency. Resolution also respects namespacing: - `summarize` → searches for any tool named `summarize`, prefers `official/summarize` if exists - `rob/summarize` → searches for exactly `rob/summarize` ### Official Namespace The slug `official` is reserved for curated, high-quality tools maintained by the registry administrators. - Shorthand `summarize` resolves to `official/summarize` if it exists - If no `official/summarize`, falls back to most-downloaded tool named `summarize` - To avoid ambiguity, always use full `owner/name` in manifests Reserved slugs that cannot be registered: `official`, `admin`, `system`, `api`, `registry`, `smarttools` ## Auto-Fetch Behavior When enabled (`auto_fetch_from_registry: true`), missing tools are automatically fetched: ```bash $ summarize < file.txt # Tool 'summarize' not found locally. # Fetching from registry... # Installed: official/summarize@1.2.0 # Running... ``` Behavior details: - Fetches latest stable version unless pinned in `smarttools.yaml` - Installs to `~/.smarttools///` - Generates wrapper script in `~/.local/bin/` - Subsequent runs use local copy (no re-fetch) To disable (require explicit install): ```yaml # ~/.smarttools/config.yaml auto_fetch_from_registry: false ``` ### Wrapper Script Collisions When two tools from different owners have the same name: | Scenario | Behavior | |----------|----------| | Install `official/summarize` | Creates wrapper `~/.local/bin/summarize` | | Install `rob/summarize` (collision) | Creates wrapper `~/.local/bin/rob-summarize` | | Uninstall `official/summarize` | Removes `summarize` wrapper, promotes `rob-summarize` → `summarize` if desired | The first-installed tool with a given name gets the short wrapper. Subsequent tools use `owner-name` format. To invoke a specific owner's tool: ```bash # Short form (whichever was installed first) summarize < file.txt # Explicit owner form (always works) rob-summarize < file.txt # Or via smarttools run smarttools run rob/summarize < file.txt ``` ## Project Manifest (smarttools.yaml) Defines tool dependencies with optional runtime overrides: ``` name: my-ai-project version: "1.0.0" dependencies: - name: rob/summarize version: ">=1.0.0" overrides: rob/summarize: provider: ollama ``` Overrides are applied at runtime and do not mutate installed tool configs. ## CLI Config and Tokens Global config lives in `~/.smarttools/config.yaml`: ```yaml registry: url: https://gitea.brrd.tech/api/v1 # Must match canonical base path token: "reg_xxxxxxxxxxxx" client_id: "anon_abc123def456" auto_fetch_from_registry: true ``` `client_id` is generated locally and used for anonymous install dedupe. ## Publishing and Auth Publishing uses registry accounts, not Gitea accounts: - Public endpoints require no auth. - `POST /tools` requires a registry token. - The API server uses a private Gitea service account to open PRs. ### Publish Idempotency and Edge Cases **Idempotency key:** `owner/name@version` | Scenario | API Response | HTTP Code | |----------|--------------|-----------| | New version, no PR exists | Create PR, return URL | `201 Created` | | PR already exists (pending) | Return existing PR URL | `200 OK` | | Version already published | Error: version exists | `409 Conflict` | | PR was closed without merge | Allow new PR | `201 Created` | | PR was merged, then tool deleted | Error: version exists (tombstone) | `409 Conflict` | **Version immutability enforcement:** ```json // Attempt to publish existing version // Response: 409 Conflict { "error": { "code": "VERSION_EXISTS", "message": "Version 1.2.0 of 'rob/summarize' already exists and cannot be overwritten", "details": { "published_at": "2025-01-15T10:30:00Z", "action": "Bump version number to publish changes" } } } ``` **Closed PR handling:** - Track PR state in database: `pending`, `merged`, `closed` - If PR was closed (rejected/abandoned), allow new submission for same version - If PR was merged, version is immutable forever **Update flow (new version, not overwrite):** 1. Developer modifies tool locally 2. Bumps version in `config.yaml` (e.g., `1.2.0` → `1.3.0`) 3. Runs `smarttools registry publish` 4. New PR created for `1.3.0` 5. Old version `1.2.0` remains available ## Publisher Registration Publishers register on the registry website, not Gitea: **Registration flow:** 1. User visits `https://gitea.brrd.tech/registry/register` (or future `registry.smarttools.dev`) 2. Creates account with email + password + slug 3. Receives verification email (optional in v1, but track `verified` status) 4. Logs into dashboard at `/dashboard` 5. Generates API token from dashboard 6. Uses token in CLI for publishing ### Authentication Security **Password hashing:** - Algorithm: Argon2id (memory-hard, recommended by OWASP) - Parameters: `memory=65536, iterations=3, parallelism=4` - Library: `argon2-cffi` for Python ```python from argon2 import PasswordHasher ph = PasswordHasher(memory_cost=65536, time_cost=3, parallelism=4) hash = ph.hash(password) ph.verify(hash, password) # raises on mismatch ``` **API token format:** ``` reg_ Example: reg_7kX9mPqR2sT4vW6xY8zA1bC3dE5fG7hJ ``` - Prefix `reg_` for easy identification in logs/configs - 32 bytes of cryptographically random data - Base62 encoded (alphanumeric, no special chars) - Total length: ~47 characters - Stored as SHA-256 hash in database (never plain text) **Token lifecycle:** | Action | Behavior | |--------|----------| | Generate | Create new token, return once, store hash | | List | Show token name, created date, last used (not the token itself) | | Revoke | Set `revoked_at` timestamp, reject future uses | | Rotate | Generate new token, optionally revoke old | **Rate limits:** | Endpoint | Limit | Window | Scope | Retry-After | |----------|-------|--------|-------|-------------| | `POST /register` | 5 | 1 hour | IP | 3600 | | `POST /login` | 10 | 15 min | IP | 900 | | `POST /login` (failed) | 5 | 15 min | IP + email | 900 | | `POST /tokens` | 10 | 1 hour | Token | 3600 | | `POST /tools` | 20 | 1 hour | Token | 3600 | | `GET /tools/*` | 100 | 1 min | IP | 60 | | `GET /download` | 60 | 1 min | IP | 60 | **Rate limit response (429):** ```json { "error": { "code": "RATE_LIMITED", "message": "Too many requests. Try again in 60 seconds.", "details": { "limit": 100, "window": "1 minute", "retry_after": 60 } } } ``` **Headers on rate-limited response:** ``` HTTP/1.1 429 Too Many Requests Retry-After: 60 X-RateLimit-Limit: 100 X-RateLimit-Remaining: 0 X-RateLimit-Reset: 1705766400 ``` **Scope priority:** For authenticated requests, both IP and token limits apply. The more restrictive limit wins. **Account lockout:** - After 5 failed login attempts: 15-minute lockout for that email - After 10 failed attempts: 1-hour lockout - Lockout clears on successful password reset **Password reset flow (deferred to v1.1):** 1. User requests reset via email 2. Server generates time-limited token (1 hour expiry) 3. Email contains reset link with token 4. User sets new password 5. All existing sessions/tokens optionally invalidated **Email verification flow (deferred to v1.1):** 1. On registration, send verification email 2. User clicks link with verification token 3. Set `verified = true` in database 4. Unverified accounts can browse but not publish ### Token Scopes and Authorization Tokens have scopes that limit their capabilities: | Scope | Permissions | |-------|-------------| | `read` | View own published tools, download stats | | `publish` | Submit new tools, update own tool metadata | | `admin` | Yank tools, manage categories (registry admins only) | **Default scope:** New tokens get `read,publish` by default. **Ownership enforcement:** ```python @app.route('/api/v1/tools', methods=['POST']) @require_token(scopes=['publish']) def publish_tool(): token = get_current_token() tool_data = request.json # Enforce owner == token holder's slug if tool_data['owner'] != token.publisher.slug: return { "error": { "code": "FORBIDDEN", "message": f"Cannot publish to namespace '{tool_data['owner']}'. " f"Your namespace is '{token.publisher.slug}'." } }, 403 # Proceed with publish... ``` **`GET /api/v1/me/tools` authorization:** - Requires valid token with `read` scope - Returns only tools where `owner == token.publisher.slug` - Includes pending PRs and all versions (including yanked) ### Web Session Security Dashboard login uses session cookies (not tokens) for browser auth: **Cookie settings:** ```python SESSION_COOKIE_NAME = 'smarttools_session' SESSION_COOKIE_HTTPONLY = True # Prevent JS access SESSION_COOKIE_SECURE = True # HTTPS only in production SESSION_COOKIE_SAMESITE = 'Lax' # CSRF protection SESSION_COOKIE_MAX_AGE = 86400 * 7 # 7 days ``` **CSRF protection:** - All POST/PUT/DELETE forms include `csrf_token` hidden field - Token validated server-side before processing - 403 Forbidden if token missing or invalid **Session lifecycle:** | Event | Action | |-------|--------| | Login | Create session, set cookie | | Logout | Delete session, clear cookie | | Idle 24h | Session expires, re-login required | | Password change | Invalidate all sessions | | Token revocation | Existing sessions continue (token != session) | **Secure session storage:** ```python # Store sessions in DB, not filesystem from flask_session import Session app.config['SESSION_TYPE'] = 'sqlalchemy' app.config['SESSION_SQLALCHEMY_TABLE'] = 'sessions' ``` **Database schema:** ```sql -- Publishers CREATE TABLE publishers ( id INTEGER PRIMARY KEY AUTOINCREMENT, email TEXT UNIQUE NOT NULL, password_hash TEXT NOT NULL, slug TEXT UNIQUE NOT NULL, -- immutable namespace: "rob", "alice-dev" display_name TEXT NOT NULL, -- mutable: "Rob", "Alice Developer" bio TEXT, website TEXT, verified BOOLEAN DEFAULT FALSE, locked_until TIMESTAMP, -- account lockout failed_login_attempts INTEGER DEFAULT 0, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- API tokens (one publisher can have multiple) CREATE TABLE api_tokens ( id INTEGER PRIMARY KEY AUTOINCREMENT, publisher_id INTEGER NOT NULL REFERENCES publishers(id), token_hash TEXT NOT NULL, name TEXT NOT NULL, -- "CLI token", "CI token" last_used_at TIMESTAMP, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, revoked_at TIMESTAMP -- NULL if active ); -- Tools (links to publisher) CREATE TABLE tools ( id INTEGER PRIMARY KEY AUTOINCREMENT, owner TEXT NOT NULL, -- namespace slug (immutable, from publisher.slug) name TEXT NOT NULL, version TEXT NOT NULL, description TEXT, category TEXT, tags TEXT, -- JSON array config_yaml TEXT NOT NULL, -- Full tool config readme TEXT, publisher_id INTEGER NOT NULL REFERENCES publishers(id), deprecated BOOLEAN DEFAULT FALSE, deprecated_message TEXT, replacement TEXT, downloads INTEGER DEFAULT 0, published_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, UNIQUE(owner, name, version) ); -- Download stats (for deduplication) CREATE TABLE download_stats ( id INTEGER PRIMARY KEY AUTOINCREMENT, tool_id INTEGER NOT NULL REFERENCES tools(id), client_id TEXT NOT NULL, downloaded_at DATE NOT NULL, UNIQUE(tool_id, client_id, downloaded_at) ); -- Search index (FTS5) CREATE VIRTUAL TABLE tools_fts USING fts5( name, description, tags, readme, content='tools', content_rowid='id' ); -- FTS5 sync triggers (required for external content tables) CREATE TRIGGER tools_ai AFTER INSERT ON tools BEGIN INSERT INTO tools_fts(rowid, name, description, tags, readme) VALUES (new.id, new.name, new.description, new.tags, new.readme); END; CREATE TRIGGER tools_ad AFTER DELETE ON tools BEGIN INSERT INTO tools_fts(tools_fts, rowid, name, description, tags, readme) VALUES ('delete', old.id, old.name, old.description, old.tags, old.readme); END; CREATE TRIGGER tools_au AFTER UPDATE ON tools BEGIN INSERT INTO tools_fts(tools_fts, rowid, name, description, tags, readme) VALUES ('delete', old.id, old.name, old.description, old.tags, old.readme); INSERT INTO tools_fts(rowid, name, description, tags, readme) VALUES (new.id, new.name, new.description, new.tags, new.readme); END; -- Pending PRs (track publish state) CREATE TABLE pending_prs ( id INTEGER PRIMARY KEY AUTOINCREMENT, publisher_id INTEGER NOT NULL REFERENCES publishers(id), owner TEXT NOT NULL, name TEXT NOT NULL, version TEXT NOT NULL, pr_number INTEGER NOT NULL, pr_url TEXT NOT NULL, status TEXT NOT NULL DEFAULT 'pending', -- pending, merged, closed created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, UNIQUE(owner, name, version) ); -- Webhook sync log (idempotency) CREATE TABLE webhook_log ( id INTEGER PRIMARY KEY AUTOINCREMENT, delivery_id TEXT UNIQUE NOT NULL, -- Gitea delivery ID event_type TEXT NOT NULL, processed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); ``` **Note on tags indexing:** The `tags` column stores JSON arrays as text. For v1, FTS5 will search within the JSON string. If tag filtering becomes a bottleneck, normalize to a `tool_tags` junction table: ```sql -- Future: normalized tags (if needed) CREATE TABLE tags ( id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT UNIQUE NOT NULL ); CREATE TABLE tool_tags ( tool_id INTEGER REFERENCES tools(id), tag_id INTEGER REFERENCES tags(id), PRIMARY KEY (tool_id, tag_id) ); ``` **CLI first-time publish flow:** ```bash $ smarttools registry publish No registry account configured. 1. Register at: https://gitea.brrd.tech/registry/register 2. Generate a token from your dashboard 3. Enter your token below Registry token: ******** Token saved to ~/.smarttools/config.yaml Validating tool... ✓ config.yaml is valid ✓ README.md exists (2.3 KB) ✓ Version 1.0.0 not yet published Publishing rob/my-tool@1.0.0... ✓ PR created: https://gitea.brrd.tech/rob/SmartTools-Registry/pulls/42 Your tool is pending review. You'll receive an email when it's approved. ``` ## CLI Commands Reference Full mapping of CLI commands to API calls: ### Registry Commands ```bash # Search for tools $ smarttools registry search [--category=] [--limit=20] → GET /api/v1/tools/search?q=&category=&limit=20 # Browse tools (TUI) $ smarttools registry browse [--category=] → GET /api/v1/tools?category=&page=1 → GET /api/v1/categories # View tool details $ smarttools registry info → GET /api/v1/tools// # Install a tool $ smarttools registry install [--version=] → GET /api/v1/tools///download?version=&install=true → Writes to ~/.smarttools///config.yaml → Generates ~/.local/bin/ wrapper (or - if collision) # Uninstall a tool $ smarttools registry uninstall → Removes ~/.smarttools/// → Removes wrapper script # Publish a tool $ smarttools registry publish [path] [--dry-run] → POST /api/v1/tools (with registry token) → Returns PR URL # List my published tools $ smarttools registry my-tools → GET /api/v1/me/tools (with registry token) # Update index cache $ smarttools registry update → GET /api/v1/index.json → Writes to ~/.smarttools/registry/index.json ``` ### Project Commands ```bash # Install project dependencies from smarttools.yaml $ smarttools install → Reads ./smarttools.yaml → For each dependency: GET /api/v1/tools///download?version=&install=true → Installs to ~/.smarttools/// # Add a dependency to smarttools.yaml $ smarttools add [--version=] → Adds to ./smarttools.yaml dependencies → Runs install for that tool # Show project dependencies status $ smarttools deps → Reads ./smarttools.yaml → Shows installed status for each dependency → Note: "smarttools list" is reserved for listing installed tools ``` **Command naming note:** `smarttools list` already exists to list locally installed tools. Use `smarttools deps` to show project manifest dependencies. ### Flags available on most commands | Flag | Description | |------|-------------| | `--offline` | Use cached index only, don't fetch | | `--refresh` | Force refresh of cached data | | `--json` | Output in JSON format | | `--verbose` | Show detailed output | ## Webhooks and Security ### HMAC Verification All Gitea webhooks are verified using HMAC-SHA256: ```python import hmac import hashlib def verify_webhook(request, secret): signature = request.headers.get('X-Gitea-Signature') if not signature: return False expected = hmac.new( secret.encode(), request.body, hashlib.sha256 ).hexdigest() return hmac.compare_digest(signature, expected) ``` ### Replay Protection While sync is idempotent, implement basic replay protection: ```python def process_webhook(request): delivery_id = request.headers.get('X-Gitea-Delivery') # Check if already processed if db.webhook_log.exists(delivery_id=delivery_id): return {"status": "already_processed"}, 200 # Verify signature if not verify_webhook(request, WEBHOOK_SECRET): return {"error": "invalid_signature"}, 401 # Process with lock to prevent concurrent processing with db.lock(f"webhook:{delivery_id}"): # Double-check after acquiring lock if db.webhook_log.exists(delivery_id=delivery_id): return {"status": "already_processed"}, 200 # Process the webhook result = sync_from_repo() # Log successful processing db.webhook_log.insert( delivery_id=delivery_id, event_type=request.json.get('action'), processed_at=datetime.utcnow() ) return {"status": "processed"}, 200 ``` ### Sync Job Locking Prevent concurrent sync operations: ```python # Using file lock or database advisory lock SYNC_LOCK_TIMEOUT = 300 # 5 minutes max def sync_from_repo(): try: with acquire_lock("registry_sync", timeout=SYNC_LOCK_TIMEOUT): # Pull latest from Gitea repo.fetch() repo.reset('origin/main', hard=True) # Parse and update database for tool_path in glob('tools/*/*/config.yaml'): update_tool_in_db(tool_path) # Rebuild FTS index if needed rebuild_fts_index() except LockTimeout: logger.warning("Sync already in progress, skipping") return {"status": "skipped", "reason": "sync_in_progress"} ``` ### Atomic Sync Strategy To avoid partially updated DB during webhook sync, use transactional table swap: ```python def sync_from_repo_atomic(): with acquire_lock("registry_sync", timeout=SYNC_LOCK_TIMEOUT): # 1. Pull latest from Gitea repo.fetch() repo.reset('origin/main', hard=True) # 2. Parse all tools into memory new_tools = [] for tool_path in glob('tools/*/*/config.yaml'): tool_data = parse_tool(tool_path) if tool_data: new_tools.append(tool_data) # 3. Atomic swap using transaction with db.transaction(): # Create temp table db.execute("CREATE TABLE tools_new AS SELECT * FROM tools WHERE 0") # Bulk insert into temp table for tool in new_tools: db.execute("INSERT INTO tools_new ...", tool) # Swap tables atomically db.execute("ALTER TABLE tools RENAME TO tools_old") db.execute("ALTER TABLE tools_new RENAME TO tools") db.execute("DROP TABLE tools_old") # Rebuild FTS index db.execute("INSERT INTO tools_fts(tools_fts) VALUES('rebuild')") # Update sync timestamp db.execute("UPDATE sync_status SET last_sync = ?", [datetime.utcnow()]) ``` **Why atomic:** Per-row updates with FTS triggers can yield inconsistent reads under load. Readers may see partial state mid-sync. Table swap ensures all-or-nothing visibility. ### Error Handling | Error Scenario | Behavior | |----------------|----------| | Repo fetch fails | Log error, retry in 5 min, alert if 3 failures | | YAML parse error | Skip tool, log error, continue with others | | Database write fails | Rollback transaction, retry once, then alert | | Lock timeout | Skip this sync, next webhook will retry | ## Automated CI Validation PRs are validated automatically using SmartTools (dogfooding): ``` PR Submitted │ ▼ ┌─────────────────────────────────────┐ │ Gitea CI runs validation tools: │ │ • schema-validator │ │ • security-scanner │ │ • duplicate-detector │ └───────────────┬─────────────────────┘ │ ┌───────┴───────┐ │ │ All pass Any fail │ │ ▼ ▼ Auto-merge or Add comment, flag for review request changes ``` Validation checks: 1. **Schema validation**: config.yaml matches expected format 2. **Security scan**: No dangerous shell commands, no secrets in prompts 3. **Duplicate detection**: AI-powered similarity check against existing tools 4. **README check**: README.md exists and is non-empty CI workflow (`.gitea/workflows/validate.yaml`): ```yaml name: Validate Tool Submission on: [pull_request] jobs: validate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Validate schema run: python scripts/validate_tool.py ${{ github.event.pull_request.head.sha }} - name: Security scan run: smarttools run security-scanner < changed_files.txt - name: Check duplicates run: smarttools run duplicate-detector < changed_files.txt ``` ## Registry Repository Structure Full structure of the SmartTools-Registry repo: ``` SmartTools-Registry/ ├── README.md # Registry overview ├── CONTRIBUTING.md # How to submit tools ├── LICENSE │ ├── tools/ # All published tools │ ├── rob/ │ │ ├── summarize/ │ │ │ ├── config.yaml │ │ │ └── README.md │ │ └── translate/ │ │ ├── config.yaml │ │ └── README.md │ └── alice/ │ └── code-review/ │ ├── config.yaml │ └── README.md │ ├── categories/ │ └── categories.yaml # Category definitions │ ├── index.json # Auto-generated search index │ ├── .gitea/ │ └── workflows/ │ ├── validate.yaml # PR validation │ ├── build-index.yaml # Rebuild index on merge │ └── notify-api.yaml # Webhook to API server │ └── scripts/ ├── validate_tool.py # Schema validation ├── build_index.py # Generate index.json ├── check_duplicates.py # Similarity detection └── security_scan.py # Security checks ``` `categories.yaml` format: ```yaml categories: - name: text-processing description: Tools for manipulating and analyzing text icon: 📝 - name: code description: Tools for code review, generation, and analysis icon: 💻 - name: data description: Tools for data transformation and analysis icon: 📊 - name: media description: Tools for image, audio, and video processing icon: 🎨 - name: productivity description: General productivity and automation tools icon: ⚡ ``` ## Download Stats ### Counting Methodology - Count installs only, not views or searches - Increment **after** successful download (response sent) - Dedupe by `client_id + tool_id + date` ```python def download_tool(owner, name, version, install=False, client_id=None): tool = get_tool(owner, name, version) if not tool: return {"error": "not_found"}, 404 config_yaml = tool.config_yaml # Only count if this is an install (not just viewing) if install: record_download(tool.id, client_id) return {"config": config_yaml}, 200 def record_download(tool_id, client_id): today = date.today() # Use client_id if provided, otherwise generate anonymous fallback effective_client_id = client_id or f"anon_{hash(request.remote_addr)}" # Dedupe: only count once per client per tool per day try: db.download_stats.insert( tool_id=tool_id, client_id=effective_client_id, downloaded_at=today ) # Increment counter (can be async/batch updated) db.execute("UPDATE tools SET downloads = downloads + 1 WHERE id = ?", [tool_id]) except IntegrityError: pass # Already counted today, ignore ``` ### Client ID Generation CLI generates a persistent anonymous ID on first run: ```python # In CLI, on first run import uuid import os CONFIG_PATH = os.path.expanduser("~/.smarttools/config.yaml") def get_or_create_client_id(): config = load_config() if 'client_id' not in config: config['client_id'] = f"anon_{uuid.uuid4().hex[:16]}" save_config(config) return config['client_id'] ``` **Fallback when client_id missing:** - If header `X-Client-ID` not sent, use IP hash as fallback - This still provides some dedupe for anonymous users - Logged users' downloads are attributed to their account instead ### Privacy Considerations - No IP addresses stored in database - `client_id` is client-controlled and can be regenerated - Stats are aggregated (total count), not individual tracking ### Async Stats Strategy To avoid DB contention on the hot download path: ```python from queue import Queue from threading import Thread # In-memory queue for stats stats_queue = Queue() def record_download_async(tool_id, client_id): """Non-blocking: enqueue for background processing""" stats_queue.put({ 'tool_id': tool_id, 'client_id': client_id, 'date': date.today() }) def stats_worker(): """Background thread: batch process stats every 5 seconds""" batch = [] while True: try: item = stats_queue.get(timeout=5) batch.append(item) except Empty: if batch: flush_batch(batch) batch = [] def flush_batch(batch): """Bulk insert with conflict ignore""" with db.transaction(): for item in batch: try: db.execute(""" INSERT INTO download_stats (tool_id, client_id, downloaded_at) VALUES (?, ?, ?) ON CONFLICT DO NOTHING """, [item['tool_id'], item['client_id'], item['date']]) except Exception as e: logger.warning(f"Stats insert failed: {e}") # Don't fail downloads for stats errors ``` **Failure behavior:** If stats DB write fails, log the error but don't fail the download. Stats are "best effort" - the download must succeed. ## Search - Primary search: SQLite FTS5 inside the API. - `index.json` provides offline CLI search and backup. - If FTS5 is stale, return results with `X-Search-Index-Stale: true`. ## API Caching Strategy ### Cache Headers | Endpoint | Cache-Control | ETag | Notes | |----------|---------------|------|-------| | `GET /index.json` | `max-age=300, stale-while-revalidate=60` | Yes | 5 min cache, background refresh | | `GET /tools/{owner}/{name}` | `max-age=60` | Yes | 1 min cache | | `GET /tools/{owner}/{name}/download` | `max-age=3600, immutable` | Yes | Immutable versions, 1 hour | | `GET /tools/search` | `no-cache` | No | Always fresh | | `GET /categories` | `max-age=3600` | Yes | Categories change rarely | ### ETag Implementation ```python import hashlib from datetime import datetime def get_tool_etag(tool): """Generate ETag from tool identity (immutable versions don't change)""" # Since versions are immutable, owner/name@version is stable # Use published_at for extra safety (not updated_at, which doesn't exist) content = f"{tool.owner}/{tool.name}@{tool.version}:{tool.published_at.isoformat()}" return hashlib.md5(content.encode()).hexdigest() def get_index_etag(): """Generate ETag from last sync timestamp""" last_sync = db.get_last_sync_time() return hashlib.md5(last_sync.isoformat().encode()).hexdigest() @app.route('/api/v1/tools///download') def download_tool(owner, name): version = request.args.get('version', 'latest') tool = resolve_and_get_tool(owner, name, version) etag = get_tool_etag(tool) # Check If-None-Match header if request.headers.get('If-None-Match') == etag: return '', 304 # Not Modified response = jsonify({ "data": { "owner": tool.owner, "name": tool.name, "resolved_version": tool.version, "config": tool.config_yaml } }) response.headers['ETag'] = etag response.headers['Cache-Control'] = 'max-age=3600, immutable' return response ``` **Note:** Since tool versions are immutable, the ETag based on `owner/name@version` is permanently stable. The `published_at` timestamp is included for defense-in-depth but won't change. ### DB vs Repo Read Strategy | Scenario | Read From | Reason | |----------|-----------|--------| | Normal operation | SQLite DB | Fast, indexed, FTS | | DB empty/corrupted | Gitea repo | Fallback/recovery | | Webhook sync in progress | DB (stale OK) | Avoid blocking reads | | Search query | SQLite FTS5 | Full-text search | | Download specific version | DB, fallback to repo | DB is cache, repo is truth | ### Staleness Detection ```python STALE_THRESHOLD = timedelta(minutes=10) def is_db_stale(): last_sync = db.get_last_sync_time() return datetime.utcnow() - last_sync > STALE_THRESHOLD @app.route('/tools/search') def search_tools(q): results = db.search_fts(q) response = jsonify({"results": results}) if is_db_stale(): response.headers['X-Search-Index-Stale'] = 'true' response.headers['X-Last-Sync'] = db.get_last_sync_time().isoformat() return response ``` ## Error Model ### Response Envelopes **Success response:** ```json { "data": { ... }, "meta": { "page": 1, "per_page": 20, "total": 42, "total_pages": 3 } } ``` **Error response:** ```json { "error": { "code": "TOOL_NOT_FOUND", "message": "Tool 'foo/bar' does not exist", "details": { "owner": "foo", "name": "bar", "suggestion": "Did you mean 'rob/bar'?" }, "docs_url": "https://registry.smarttools.dev/docs/errors#TOOL_NOT_FOUND" } } ``` ### Error Codes | Code | HTTP | Description | |------|------|-------------| | `TOOL_NOT_FOUND` | 404 | Tool does not exist | | `VERSION_NOT_FOUND` | 404 | Requested version doesn't exist | | `VERSION_EXISTS` | 409 | Cannot overwrite published version | | `INVALID_VERSION` | 400 | Version string is not valid semver | | `INVALID_CONSTRAINT` | 400 | Version constraint syntax error | | `CONSTRAINT_UNSATISFIABLE` | 404 | No version matches constraint | | `VALIDATION_ERROR` | 400 | Tool config validation failed | | `UNAUTHORIZED` | 401 | Missing or invalid auth token | | `FORBIDDEN` | 403 | Token valid but lacks permission | | `RATE_LIMITED` | 429 | Too many requests | | `SLUG_TAKEN` | 409 | Namespace slug already registered | | `ACCOUNT_LOCKED` | 403 | Too many failed login attempts | | `SERVER_ERROR` | 500 | Internal error (logged for debugging) | ## Error Scenarios and Fallbacks ### CLI Error Handling | Scenario | CLI Behavior | User Message | |----------|--------------|--------------| | Registry offline | Use cached tools if available | "Registry unavailable. Using cached version." | | Tool not found | Check cache, then fail | "Tool 'foo/bar' not found in registry or cache." | | Version constraint unsatisfiable | Show available versions | "No version matches '>=5.0.0'. Available: 1.0.0, 1.1.0, 1.2.0" | | Auth token expired | Prompt for new token | "Token expired. Please re-authenticate." | | Rate limited | Wait and retry (backoff) | "Rate limited. Retrying in 30 seconds..." | | Network timeout | Retry with backoff, then fail | "Connection timed out. Check your network." | ### Validation Failure Details When `VALIDATION_ERROR` occurs, provide specific field errors: ```json { "error": { "code": "VALIDATION_ERROR", "message": "Tool configuration is invalid", "details": { "errors": [ { "path": "steps[0].provider", "message": "Provider 'gpt5' is not recognized", "allowed": ["claude", "openai", "ollama", "mock"] }, { "path": "version", "message": "Version '1.0' is not valid semver (use '1.0.0')" } ] }, "docs_url": "https://registry.smarttools.dev/docs/tool-format" } } ``` ### Dependency Resolution Failures When `smarttools install` fails on a manifest: ```bash $ smarttools install Error: Could not resolve all dependencies rob/summarize@^2.0.0 ✗ No matching version (latest: 1.2.0) alice/translate@>=1.0.0 ✓ Found 1.3.0 Suggestions: - Update rob/summarize constraint to "^1.0.0" - Contact the tool author for a v2 release ``` ### Graceful Degradation | Component Down | Fallback Behavior | |----------------|-------------------| | API server | CLI uses `~/.smarttools/registry/index.json` for search | | Gitea repo | API serves from DB cache (may be stale) | | FTS5 index | Fall back to LIKE queries (slower but works) | | Network | Use locally installed tools, skip registry features | ## UX Requirements (CLI/TUI) ### Publishing UX - `smarttools registry publish --dry-run` validates locally and shows what would be submitted: ```bash $ smarttools registry publish --dry-run Validating tool... ✓ config.yaml is valid ✓ README.md exists (2.3 KB) ✓ Version 1.1.0 not yet published Would submit: Owner: rob Name: summarize Version: 1.1.0 Category: text-processing Tags: summarization, ai, text Config preview: ───────────────────────────── name: summarize version: "1.1.0" description: Summarize text using AI ... ───────────────────────────── Run without --dry-run to submit for review. ``` - **Version bump reminder:** CLI warns if version hasn't changed from published: ``` ⚠ Version 1.0.0 is already published. Bump version in config.yaml to publish changes. ``` - First-time publishing flow prompts for token and saves it to config. ### Progress Indicators Long-running operations show progress: ```bash $ smarttools install Installing project dependencies... [1/3] rob/summarize@^1.0.0 Resolving version... 1.2.0 Downloading... done Installing... done ✓ [2/3] alice/translate@>=2.0.0 Resolving version... 2.1.0 Downloading... done Installing... done ✓ [3/3] official/code-review@* Resolving version... 1.0.0 Downloading... done Installing... done ✓ ✓ Installed 3 tools ``` ```bash $ smarttools registry publish Submitting rob/summarize@1.1.0... Validating... done ✓ Uploading... done ✓ Creating PR... done ✓ ✓ PR created: https://gitea.brrd.tech/rob/SmartTools-Registry/pulls/42 Your tool is pending review. You'll receive an email when it's approved. ``` ### TUI Browse `smarttools registry browse` opens a full-screen terminal UI: ``` ┌─ SmartTools Registry ───────────────────────────────────────┐ │ Search: [________________] [All Categories ▼] [Sort: Popular ▼] │ ├─────────────────────────────────────────────────────────────┤ │ │ │ ▶ rob/summarize v1.2.0 ⬇ 142 │ │ Summarize text using AI │ │ [text-processing] [ai] [summarization] │ │ │ │ alice/translate v2.1.0 ⬇ 98 │ │ Translate text between languages │ │ [text-processing] [translation] │ │ │ │ official/code-review v1.0.0 ⬇ 87 │ │ AI-powered code review │ │ [code] [review] [ai] │ │ │ ├─────────────────────────────────────────────────────────────┤ │ ↑↓ Navigate Enter: Details i: Install /: Search q: Quit │ └─────────────────────────────────────────────────────────────┘ ``` **Keyboard shortcuts:** | Key | Action | |-----|--------| | `↑/↓` or `j/k` | Navigate list | | `Enter` | View tool details | | `i` | Install selected tool | | `/` | Focus search box | | `c` | Change category filter | | `s` | Change sort order | | `?` | Show help | | `q` | Quit | **Virtual scrolling:** For large tool lists (>100), use virtual scrolling to maintain performance. ### Project Initialization ```bash $ smarttools init Creating smarttools.yaml... Project name [my-project]: my-ai-project Version [1.0.0]: Would you like to add any tools? (search with 's', skip with Enter) > s Search: summ 1. rob/summarize v1.2.0 - Summarize text using AI 2. alice/summary v1.0.0 - Generate summaries Add tool (number, or Enter to finish): 1 Added rob/summarize@^1.2.0 Add tool (number, or Enter to finish): ✓ Created smarttools.yaml name: my-ai-project version: "1.0.0" dependencies: - name: rob/summarize version: "^1.2.0" Run 'smarttools install' to install dependencies. ``` ### Accessibility - **CLI:** All output works with screen readers, no color-only information - **TUI:** Full keyboard navigation, high-contrast mode support - **Web UI:** WCAG 2.1 AA compliance target - Semantic HTML - ARIA labels for interactive elements - Focus management in modals - Skip links for navigation ## Offline Cache Cache registry index locally: ``` ~/.smarttools/registry/index.json ``` Refresh when older than 24 hours; support `--offline` and `--refresh` flags. ### Index Integrity The cached `index.json` includes integrity metadata: ```json { "version": "1.0", "generated_at": "2025-01-20T12:00:00Z", "checksum": "sha256:abc123...", "tool_count": 142, "tools": [...] } ``` **API response headers:** ``` ETag: "abc123def456" X-Index-Checksum: sha256:abc123... X-Index-Generated: 2025-01-20T12:00:00Z ``` **CLI verification:** ```python def verify_cached_index(): """Verify cached index integrity on load""" cached = load_cached_index() if not cached: return None # Verify checksum content = json.dumps(cached['tools'], sort_keys=True) computed = hashlib.sha256(content.encode()).hexdigest() if computed != cached.get('checksum', '').replace('sha256:', ''): logger.warning("Cached index checksum mismatch, will refresh") return None return cached ``` **Corruption handling:** - If checksum fails, discard cache and fetch fresh - If partial write detected (missing fields), discard and refresh - CLI shows warning: "Cached index corrupted, fetching fresh copy..." ## Web UI Vision The registry includes a full website, not just an API: **Site structure:** ``` registry.smarttools.dev (or gitea.brrd.tech/registry) ├── / # Landing page ├── /tools # Browse all tools ├── /tools/{owner}/{name} # Tool detail page ├── /categories # Browse by category ├── /categories/{name} # Tools in category ├── /search?q=... # Search results ├── /docs # Documentation │ ├── /docs/getting-started │ ├── /docs/creating-tools │ ├── /docs/publishing │ └── /docs/best-practices ├── /tutorials # Step-by-step guides │ ├── /tutorials/first-tool │ ├── /tutorials/chaining-steps │ └── /tutorials/code-steps ├── /examples # Example projects ├── /blog # Updates, announcements (optional) ├── /register # Publisher registration ├── /login # Publisher login ├── /dashboard # Publisher dashboard │ ├── /dashboard/tools # My published tools │ ├── /dashboard/tokens # API tokens │ └── /dashboard/settings # Account settings └── /api/v1/... # API endpoints ``` **Landing page content:** - Hero: "Share and discover AI-powered CLI tools" - Quick install example - Featured/popular tools - Category highlights - "Get Started" CTA **Tool detail page:** - Name, description, version, author - README rendered as markdown (sanitized) - Install command (copy-to-clipboard) - Version history - Download stats - Category/tags - "Report" button for abuse ### README Security When rendering README markdown, apply XSS sanitization: ```python import bleach from markdown import markdown ALLOWED_TAGS = [ 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'p', 'br', 'hr', 'ul', 'ol', 'li', 'strong', 'em', 'code', 'pre', 'blockquote', 'a', 'img', 'table', 'thead', 'tbody', 'tr', 'th', 'td' ] ALLOWED_ATTRS = { 'a': ['href', 'title'], 'img': ['src', 'alt', 'title'], 'code': ['class'], # for syntax highlighting } def render_readme_safe(readme_raw: str) -> str: """Convert markdown to sanitized HTML""" # Convert markdown to HTML html = markdown(readme_raw, extensions=['fenced_code', 'tables']) # Sanitize to prevent XSS safe_html = bleach.clean( html, tags=ALLOWED_TAGS, attributes=ALLOWED_ATTRS, strip=True ) # Linkify URLs safe_html = bleach.linkify(safe_html) return safe_html ``` **Storage strategy:** - Store raw README in `tools.readme` - Render and sanitize on request (or cache rendered HTML) - Never trust client-submitted HTML directly **Tech stack options:** | Option | Pros | Cons | |--------|------|------| | Flask + Jinja + Tailwind | Simple, Python-only, fast to build | Less interactive | | FastAPI + Vue/React SPA | Modern, interactive | More complex, separate build | | Astro/Next.js | Great SEO, static-first | Different stack (Node.js) | **Recommendation:** Flask + Jinja + Tailwind for v1 - Keeps everything in Python - Server-rendered is fine for a registry - Good SEO out of the box - Can add interactivity with Alpine.js or htmx if needed **Monetization considerations:** - AdSense-compatible (server-rendered pages) - Analytics tracking for traffic insights - Future: sponsored tools, featured placements - Future: premium publisher tiers (more tools, priority review) ## Implementation Phases ### Phase 1: Foundation - Define `smarttools.yaml` manifest format - Implement tool resolution order (local → global → registry) - Create SmartTools-Registry repo on Gitea (bootstrap) - Add 3-5 example tools to seed the registry ### Phase 2: Core Backend - Set up Flask/FastAPI project structure - Implement SQLite database schema - Build core API endpoints (list, search, get, download) - Implement webhook receiver for Gitea sync - Set up HMAC verification ### Phase 3: CLI Commands - `smarttools registry search` - `smarttools registry install` - `smarttools registry info` - `smarttools registry browse` (TUI) - Local index caching ### Phase 4: Publishing - Publisher registration (web UI) - Token management - `smarttools registry publish` command - PR creation via Gitea API - CI validation workflows ### Phase 5: Project Dependencies - `smarttools install` (from manifest) - `smarttools add` command - Runtime override application - Dependency resolution ### Phase 6: Smart Features - SQLite FTS5 search index - AI-powered auto-categorization - Duplicate/similarity detection - Security scanning ### Phase 7: Full Web UI - Landing page - Tool browsing/search pages - Tool detail pages with README rendering - Publisher dashboard - Documentation/tutorials section ### Phase 8: Polish & Scale - Rate limiting - Abuse reporting - Analytics integration - Performance optimization - Monitoring/alerting