58 KiB

Raw Blame History

SmartTools Registry Design

Purpose

Build a centralized registry for SmartTools to enable discovery, publishing, dependency management, and future curation at scale.

Terminology

Term	Definition
Tool definition	The full YAML file in the registry (`config.yaml`) containing name, steps, arguments, etc.
Tool config	The configuration within a tool definition (arguments, steps, provider settings)
smarttools.yaml	Project manifest file declaring tool dependencies and overrides
config.yaml	The tool definition file, both in registry and when installed locally
Owner	Immutable namespace slug identifying the publisher (e.g., `rob`, `alice`)
Publisher	A registered user who can publish tools to the registry
Wrapper script	Auto-generated bash script in `~/.local/bin/` that invokes a tool

Canonical naming: Use SmartTools-Registry (capitalized, hyphenated) for the repository name.

Diagram References

System overview: discussions/diagrams/smarttools-registry_rob_1.puml
Data flows: discussions/diagrams/smarttools-registry_rob_5.puml

System Overview

Users interact via the CLI and a future Web UI. Both call a Registry API hosted at https://gitea.brrd.tech/api/v1 (future alias: registry.smarttools.dev/api/v1). The API syncs from a Gitea-backed registry repo and maintains a SQLite cache/search index.

Canonical API base path: https://gitea.brrd.tech/api/v1

All API endpoints are versioned under /api/v1. When breaking changes are needed, a new version (/api/v2) will be introduced with deprecation notices.

Core API endpoints:

GET /api/v1/tools
GET /api/v1/tools/search?q=...
GET /api/v1/tools/{owner}/{name}
GET /api/v1/tools/{owner}/{name}/versions
GET /api/v1/tools/{owner}/{name}/download?version=...
POST /api/v1/tools (publish)
GET /api/v1/categories
GET /api/v1/stats/popular
POST /api/v1/webhook/gitea

Pagination

All list endpoints support pagination:

Parameter	Default	Max	Description
`page`	1	-	Page number (1-indexed)
`per_page`	20	100	Items per page
`sort`	`downloads`	-	Sort field
`order`	`desc`	-	Sort order (asc/desc)

Stable ordering: To ensure deterministic results across pages, sorting includes a secondary key:

Primary: requested field (e.g., downloads)
Secondary: published_at (desc)
Tertiary: id (for absolute stability)

ORDER BY downloads DESC, published_at DESC, id DESC
LIMIT 20 OFFSET 0

Response pagination metadata:

{
  "data": [...],
  "meta": {
    "page": 1,
    "per_page": 20,
    "total": 142,
    "total_pages": 8
  }
}

Input Constraints

Size limits to prevent oversized uploads:

Field	Max Size	Notes
`config.yaml`	64 KB	Tool definition
`README.md`	256 KB	Documentation
Request body	512 KB	Total POST payload
Tool name	64 chars	Alphanumeric + hyphen
Description	500 chars	Short summary
Tag	32 chars	Individual tag
Tags array	10 items	Maximum tags per tool

Validation errors:

{
  "error": {
    "code": "PAYLOAD_TOO_LARGE",
    "message": "config.yaml exceeds 64KB limit",
    "details": {
      "field": "config",
      "size": 72000,
      "limit": 65536
    }
  }
}

Sort Fields and Indexes

Allowed sort fields:

Endpoint	Allowed `sort` values
`GET /tools`	`downloads`, `published_at`, `name`
`GET /tools/search`	`relevance`, `downloads`, `published_at`
`GET /categories`	`name`, `tool_count`

Invalid sort values return 400:

{"error": {"code": "INVALID_SORT", "message": "Unknown sort field 'foo'. Allowed: downloads, published_at, name"}}

Database indexes:

-- Frequent query patterns
CREATE INDEX idx_tools_owner_name ON tools(owner, name);
CREATE INDEX idx_tools_category ON tools(category);
CREATE INDEX idx_tools_published_at ON tools(published_at DESC);
CREATE INDEX idx_tools_downloads ON tools(downloads DESC);
CREATE INDEX idx_tools_owner_name_version ON tools(owner, name, version);

-- For pagination stability
CREATE INDEX idx_tools_sort_stable ON tools(downloads DESC, published_at DESC, id DESC);

-- Publisher lookups
CREATE INDEX idx_publishers_slug ON publishers(slug);
CREATE INDEX idx_publishers_email ON publishers(email);

-- Token lookups
CREATE INDEX idx_api_tokens_hash ON api_tokens(token_hash);
CREATE INDEX idx_api_tokens_publisher ON api_tokens(publisher_id);

API Version Compatibility

Forward compatibility: Clients should ignore unknown fields in API responses:

# Good: ignore unknown fields
tool = response['data']
name = tool.get('name')
# Don't fail if 'new_field' exists but client doesn't know about it

# Bad: strict parsing that fails on unknown fields
tool = ToolSchema.parse(response['data'])  # May fail on new fields

Backward compatibility: The API will:

Never remove fields in a version (only deprecate)
Never change field types
Add new optional fields without version bump
Use new version (/api/v2) for breaking changes

Deprecation process:

Add X-Deprecated-Field: old_field header
Document in changelog
Remove after 6 months minimum
Major version bump if widely used

Client version header:

X-SmartTools-Client: cli/1.2.0

Helps server track client versions for deprecation decisions.

Source of Truth

Gitea registry repo is the source of truth.
API syncs repo content into SQLite for fast queries, stats, and FTS5 search.
index.json remains useful for offline CLI search and as a fallback.

If the cache is stale, the API can fall back to repo reads; a warning header may be emitted.

Namespacing and Paths

Support owner/name from day one:

Registry path: tools/{owner}/{name}/config.yaml
API URL: /tools/{owner}/{name}
Install: smarttools registry install rob/summarize
Shorthand: smarttools registry install summarize resolves to the official namespace.

PR branches: submit/{owner}/{name}/{version}.

Namespace Identity

The owner is an immutable slug, not the display name:

-- In publishers table
slug TEXT UNIQUE NOT NULL,        -- immutable: "rob", "alice-dev"
display_name TEXT NOT NULL,       -- mutable: "Rob", "Alice Developer"

Slug rules:

Lowercase alphanumeric + hyphens only: ^[a-z0-9][a-z0-9-]*[a-z0-9]$
2-39 characters
Cannot start/end with hyphen
Set once at registration, cannot be changed
Reserved slugs: official, admin, system, api, registry

Rename policy:

display_name can be changed anytime via dashboard
slug (owner) is permanent to preserve URLs and tool references
If a publisher absolutely must change slug (legal reasons, etc.):
1. Create new account with new slug
2. Republish tools under new namespace
3. Mark old tools as deprecated with replacement pointing to new namespace
4. Old namespace remains reserved (cannot be reused by others)

Why immutable:

rob/summarize@1.0.0 must always resolve to the same tool
Prevents namespace hijacking after rename
Simplifies caching and CDN strategies

Tool Format (Registry == Local)

Registry tool folders mirror local tools:

tools/
  rob/
    summarize/
      config.yaml
      README.md

Tool files match the existing SmartTools format. Registry-specific metadata is kept under registry:. Deprecation is tool-defined and top-level:

name: summarize
version: "1.2.0"
deprecated: true
deprecated_message: "Security issue. Use v1.2.1"
replacement: "rob/summarize@1.2.1"
registry:
  published_at: "2025-01-15T10:30:00Z"
  downloads: 142

Schema compatibility note: The current SmartTools config parser may reject unknown top-level keys like deprecated, replacement, and registry. Before implementing registry features:

Update the YAML parser to ignore unknown keys (permissive mode)
Or explicitly define these fields in the Tool dataclass with defaults
Validate registry-specific fields only when publishing, not when running locally

This ensures local tools continue to work even if they don't have registry fields.

Versioning and Immutability

Unique key: owner/name + version.
Published versions are immutable.
Deprecation uses deprecated, deprecated_message, and replacement.
CLI warns on install if a version is deprecated.

Yank Policy

Yanking allows removing a version from resolution without deleting it (for auditability):

# In tool config
yanked: true
yanked_reason: "Critical security vulnerability CVE-2025-1234"
yanked_at: "2025-01-20T15:00:00Z"

Yanked version behavior:

Operation	Behavior
`install foo@1.0.0` (exact)	Warns but allows install
`install foo@^1.0.0` (constraint)	Excludes yanked, resolves to next valid
`search` / `browse`	Hidden by default, shown with `--include-yanked`
Direct URL access	Returns tool with `yanked: true` in response
Already installed	Continues to work, no forced removal

Database schema addition:

-- Add to tools table
yanked BOOLEAN DEFAULT FALSE,
yanked_reason TEXT,
yanked_at TIMESTAMP

Yank vs Delete:

Yank: Version remains in DB, excluded from resolution, auditable
Delete: Reserved for DMCA/legal, requires admin action, leaves tombstone record

Version Format

Tools use semantic versioning (semver):

MAJOR.MINOR.PATCH[-PRERELEASE][+BUILD]

Examples:
  1.0.0           # stable release
  1.2.3           # stable release
  2.0.0-alpha.1   # prerelease
  2.0.0-beta.2    # prerelease
  2.0.0-rc.1      # release candidate

Version Constraints

Manifest files support these constraint formats:

Constraint	Meaning	Example Match
`1.2.3`	Exact version	`1.2.3` only
`>=1.2.0`	Minimum version	`1.2.0`, `1.3.0`, `2.0.0`
`<2.0.0`	Below version	`1.9.9`, `1.0.0`
`>=1.0.0,<2.0.0`	Range	`1.0.0` to `1.9.9`
`^1.2.3`	Compatible (same major)	`1.2.3` to `1.9.9`
`~1.2.3`	Approximately (same minor)	`1.2.3` to `1.2.9`
`*`	Any version	latest stable

Version Resolution Rules

When resolving a version constraint:

Filter: Get all versions matching the constraint
Exclude prereleases: Unless constraint explicitly includes them (e.g., >=2.0.0-alpha.1)
Sort: By semver precedence (descending)
Select: Highest matching version

Tie-breakers:

Stable versions preferred over prereleases
Later publish date wins if versions are equal (shouldn't happen with immutability)

Unsatisfiable constraints:

// API Response: 404
{
  "error": {
    "code": "VERSION_NOT_FOUND",
    "message": "No version of 'rob/summarize' satisfies constraint '>=5.0.0'",
    "details": {
      "tool": "rob/summarize",
      "constraint": ">=5.0.0",
      "available_versions": ["1.0.0", "1.1.0", "1.2.0"],
      "latest_stable": "1.2.0"
    }
  }
}

Prerelease Handling

Prereleases are not returned for * or range constraints by default
To install prerelease: smarttools registry install rob/summarize@2.0.0-beta.1
To allow prereleases in manifest: version: ">=2.0.0-0" (the -0 suffix includes prereleases)

Download Endpoint Version Selection

The /api/v1/tools/{owner}/{name}/download endpoint accepts version parameters:

Parameter	Behavior	Example
(none)	Returns latest stable version	`/download` → `1.2.0`
`version=1.2.0`	Exact version (must exist)	`/download?version=1.2.0`
`version=^1.0.0`	Server resolves constraint	`/download?version=^1.0.0` → `1.2.0`
`version=latest`	Alias for latest stable	`/download?version=latest`

Server-side resolution: The API server resolves version constraints, not the client. This ensures consistent resolution and allows the server to apply policies (e.g., exclude yanked versions).

GET /api/v1/tools/rob/summarize/download?version=^1.0.0&install=true

Response (200):
{
  "data": {
    "owner": "rob",
    "name": "summarize",
    "resolved_version": "1.2.0",
    "config": "... YAML content ..."
  },
  "meta": {
    "constraint": "^1.0.0",
    "available_versions": ["1.0.0", "1.1.0", "1.2.0"]
  }
}

Invalid/unsatisfiable constraint:

GET /api/v1/tools/rob/summarize/download?version=^5.0.0

Response (404):
{
  "error": {
    "code": "CONSTRAINT_UNSATISFIABLE",
    "message": "No version matches constraint '^5.0.0'",
    "details": {
      "constraint": "^5.0.0",
      "latest_stable": "1.2.0",
      "available_versions": ["1.0.0", "1.1.0", "1.2.0"]
    }
  }
}

Tool Resolution Order

When a tool is invoked, the CLI searches in this order:

Local project: ./.smarttools/<owner>/<name>/config.yaml (or ./.smarttools/<name>/ for unnamespaced)
Global user: ~/.smarttools/<owner>/<name>/config.yaml
Registry: Fetch from API, install to global, then run
Error: Tool '<toolname>' not found

Step 3 only occurs if auto_fetch_from_registry: true in config (default: true).

Path convention: Use .smarttools/ (with leading dot) for both local and global to maintain consistency.

Resolution also respects namespacing:

summarize → searches for any tool named summarize, prefers official/summarize if exists
rob/summarize → searches for exactly rob/summarize

Official Namespace

The slug official is reserved for curated, high-quality tools maintained by the registry administrators.

Shorthand summarize resolves to official/summarize if it exists
If no official/summarize, falls back to most-downloaded tool named summarize
To avoid ambiguity, always use full owner/name in manifests

Reserved slugs that cannot be registered: official, admin, system, api, registry, smarttools

Auto-Fetch Behavior

When enabled (auto_fetch_from_registry: true), missing tools are automatically fetched:

$ summarize < file.txt
# Tool 'summarize' not found locally.
# Fetching from registry...
# Installed: official/summarize@1.2.0
# Running...

Behavior details:

Fetches latest stable version unless pinned in smarttools.yaml
Installs to ~/.smarttools/<owner>/<name>/
Generates wrapper script in ~/.local/bin/
Subsequent runs use local copy (no re-fetch)

To disable (require explicit install):

# ~/.smarttools/config.yaml
auto_fetch_from_registry: false

Wrapper Script Collisions

When two tools from different owners have the same name:

Scenario	Behavior
Install `official/summarize`	Creates wrapper `~/.local/bin/summarize`
Install `rob/summarize` (collision)	Creates wrapper `~/.local/bin/rob-summarize`
Uninstall `official/summarize`	Removes `summarize` wrapper, promotes `rob-summarize` → `summarize` if desired

The first-installed tool with a given name gets the short wrapper. Subsequent tools use owner-name format.

To invoke a specific owner's tool:

# Short form (whichever was installed first)
summarize < file.txt

# Explicit owner form (always works)
rob-summarize < file.txt

# Or via smarttools run
smarttools run rob/summarize < file.txt

Project Manifest (smarttools.yaml)

Defines tool dependencies with optional runtime overrides:

name: my-ai-project
version: "1.0.0"
dependencies:
  - name: rob/summarize
    version: ">=1.0.0"
overrides:
  rob/summarize:
    provider: ollama

Overrides are applied at runtime and do not mutate installed tool configs.

CLI Config and Tokens

Global config lives in ~/.smarttools/config.yaml:

registry:
  url: https://gitea.brrd.tech/api/v1    # Must match canonical base path
  token: "reg_xxxxxxxxxxxx"
client_id: "anon_abc123def456"
auto_fetch_from_registry: true

client_id is generated locally and used for anonymous install dedupe.

Publishing and Auth

Publishing uses registry accounts, not Gitea accounts:

Public endpoints require no auth.
POST /tools requires a registry token.
The API server uses a private Gitea service account to open PRs.

Publish Idempotency and Edge Cases

Idempotency key: owner/name@version

Scenario	API Response	HTTP Code
New version, no PR exists	Create PR, return URL	`201 Created`
PR already exists (pending)	Return existing PR URL	`200 OK`
Version already published	Error: version exists	`409 Conflict`
PR was closed without merge	Allow new PR	`201 Created`
PR was merged, then tool deleted	Error: version exists (tombstone)	`409 Conflict`

Version immutability enforcement:

// Attempt to publish existing version
// Response: 409 Conflict
{
  "error": {
    "code": "VERSION_EXISTS",
    "message": "Version 1.2.0 of 'rob/summarize' already exists and cannot be overwritten",
    "details": {
      "published_at": "2025-01-15T10:30:00Z",
      "action": "Bump version number to publish changes"
    }
  }
}

Closed PR handling:

Track PR state in database: pending, merged, closed
If PR was closed (rejected/abandoned), allow new submission for same version
If PR was merged, version is immutable forever

Update flow (new version, not overwrite):

Developer modifies tool locally
Bumps version in config.yaml (e.g., 1.2.0 → 1.3.0)
Runs smarttools registry publish
New PR created for 1.3.0
Old version 1.2.0 remains available

Publisher Registration

Publishers register on the registry website, not Gitea:

Registration flow:

User visits https://gitea.brrd.tech/registry/register (or future registry.smarttools.dev)
Creates account with email + password + slug
Receives verification email (optional in v1, but track verified status)
Logs into dashboard at /dashboard
Generates API token from dashboard
Uses token in CLI for publishing

Authentication Security

Password hashing:

Algorithm: Argon2id (memory-hard, recommended by OWASP)
Parameters: memory=65536, iterations=3, parallelism=4
Library: argon2-cffi for Python

from argon2 import PasswordHasher
ph = PasswordHasher(memory_cost=65536, time_cost=3, parallelism=4)
hash = ph.hash(password)
ph.verify(hash, password)  # raises on mismatch

API token format:

reg_<random-32-bytes-base62>

Example: reg_7kX9mPqR2sT4vW6xY8zA1bC3dE5fG7hJ

Prefix reg_ for easy identification in logs/configs
32 bytes of cryptographically random data
Base62 encoded (alphanumeric, no special chars)
Total length: ~47 characters
Stored as SHA-256 hash in database (never plain text)

Token lifecycle:

Action	Behavior
Generate	Create new token, return once, store hash
List	Show token name, created date, last used (not the token itself)
Revoke	Set `revoked_at` timestamp, reject future uses
Rotate	Generate new token, optionally revoke old

Rate limits:

Endpoint	Limit	Window	Scope	Retry-After
`POST /register`	5	1 hour	IP	3600
`POST /login`	10	15 min	IP	900
`POST /login` (failed)	5	15 min	IP + email	900
`POST /tokens`	10	1 hour	Token	3600
`POST /tools`	20	1 hour	Token	3600
`GET /tools/*`	100	1 min	IP	60
`GET /download`	60	1 min	IP	60

Rate limit response (429):

{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Too many requests. Try again in 60 seconds.",
    "details": {
      "limit": 100,
      "window": "1 minute",
      "retry_after": 60
    }
  }
}

Headers on rate-limited response:

HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1705766400

Scope priority: For authenticated requests, both IP and token limits apply. The more restrictive limit wins.

Account lockout:

After 5 failed login attempts: 15-minute lockout for that email
After 10 failed attempts: 1-hour lockout
Lockout clears on successful password reset

Password reset flow (deferred to v1.1):

User requests reset via email
Server generates time-limited token (1 hour expiry)
Email contains reset link with token
User sets new password
All existing sessions/tokens optionally invalidated

Email verification flow (deferred to v1.1):

On registration, send verification email
User clicks link with verification token
Set verified = true in database
Unverified accounts can browse but not publish

Token Scopes and Authorization

Tokens have scopes that limit their capabilities:

Scope	Permissions
`read`	View own published tools, download stats
`publish`	Submit new tools, update own tool metadata
`admin`	Yank tools, manage categories (registry admins only)

Default scope: New tokens get read,publish by default.

Ownership enforcement:

@app.route('/api/v1/tools', methods=['POST'])
@require_token(scopes=['publish'])
def publish_tool():
    token = get_current_token()
    tool_data = request.json

    # Enforce owner == token holder's slug
    if tool_data['owner'] != token.publisher.slug:
        return {
            "error": {
                "code": "FORBIDDEN",
                "message": f"Cannot publish to namespace '{tool_data['owner']}'. "
                           f"Your namespace is '{token.publisher.slug}'."
            }
        }, 403

    # Proceed with publish...

GET /api/v1/me/tools authorization:

Requires valid token with read scope
Returns only tools where owner == token.publisher.slug
Includes pending PRs and all versions (including yanked)

Web Session Security

Dashboard login uses session cookies (not tokens) for browser auth:

Cookie settings:

SESSION_COOKIE_NAME = 'smarttools_session'
SESSION_COOKIE_HTTPONLY = True      # Prevent JS access
SESSION_COOKIE_SECURE = True        # HTTPS only in production
SESSION_COOKIE_SAMESITE = 'Lax'     # CSRF protection
SESSION_COOKIE_MAX_AGE = 86400 * 7  # 7 days

CSRF protection:

All POST/PUT/DELETE forms include csrf_token hidden field
Token validated server-side before processing
403 Forbidden if token missing or invalid

Session lifecycle:

Event	Action
Login	Create session, set cookie
Logout	Delete session, clear cookie
Idle 24h	Session expires, re-login required
Password change	Invalidate all sessions
Token revocation	Existing sessions continue (token != session)

Secure session storage:

# Store sessions in DB, not filesystem
from flask_session import Session
app.config['SESSION_TYPE'] = 'sqlalchemy'
app.config['SESSION_SQLALCHEMY_TABLE'] = 'sessions'

Database schema:

-- Publishers
CREATE TABLE publishers (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    email TEXT UNIQUE NOT NULL,
    password_hash TEXT NOT NULL,
    slug TEXT UNIQUE NOT NULL,            -- immutable namespace: "rob", "alice-dev"
    display_name TEXT NOT NULL,           -- mutable: "Rob", "Alice Developer"
    bio TEXT,
    website TEXT,
    verified BOOLEAN DEFAULT FALSE,
    locked_until TIMESTAMP,               -- account lockout
    failed_login_attempts INTEGER DEFAULT 0,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- API tokens (one publisher can have multiple)
CREATE TABLE api_tokens (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    publisher_id INTEGER NOT NULL REFERENCES publishers(id),
    token_hash TEXT NOT NULL,
    name TEXT NOT NULL,           -- "CLI token", "CI token"
    last_used_at TIMESTAMP,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    revoked_at TIMESTAMP          -- NULL if active
);

-- Tools (links to publisher)
CREATE TABLE tools (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    owner TEXT NOT NULL,          -- namespace slug (immutable, from publisher.slug)
    name TEXT NOT NULL,
    version TEXT NOT NULL,
    description TEXT,
    category TEXT,
    tags TEXT,                    -- JSON array
    config_yaml TEXT NOT NULL,    -- Full tool config
    readme TEXT,
    publisher_id INTEGER NOT NULL REFERENCES publishers(id),
    deprecated BOOLEAN DEFAULT FALSE,
    deprecated_message TEXT,
    replacement TEXT,
    downloads INTEGER DEFAULT 0,
    published_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE(owner, name, version)
);

-- Download stats (for deduplication)
CREATE TABLE download_stats (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    tool_id INTEGER NOT NULL REFERENCES tools(id),
    client_id TEXT NOT NULL,
    downloaded_at DATE NOT NULL,
    UNIQUE(tool_id, client_id, downloaded_at)
);

-- Search index (FTS5)
CREATE VIRTUAL TABLE tools_fts USING fts5(
    name, description, tags, readme,
    content='tools',
    content_rowid='id'
);

-- FTS5 sync triggers (required for external content tables)
CREATE TRIGGER tools_ai AFTER INSERT ON tools BEGIN
    INSERT INTO tools_fts(rowid, name, description, tags, readme)
    VALUES (new.id, new.name, new.description, new.tags, new.readme);
END;

CREATE TRIGGER tools_ad AFTER DELETE ON tools BEGIN
    INSERT INTO tools_fts(tools_fts, rowid, name, description, tags, readme)
    VALUES ('delete', old.id, old.name, old.description, old.tags, old.readme);
END;

CREATE TRIGGER tools_au AFTER UPDATE ON tools BEGIN
    INSERT INTO tools_fts(tools_fts, rowid, name, description, tags, readme)
    VALUES ('delete', old.id, old.name, old.description, old.tags, old.readme);
    INSERT INTO tools_fts(rowid, name, description, tags, readme)
    VALUES (new.id, new.name, new.description, new.tags, new.readme);
END;

-- Pending PRs (track publish state)
CREATE TABLE pending_prs (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    publisher_id INTEGER NOT NULL REFERENCES publishers(id),
    owner TEXT NOT NULL,
    name TEXT NOT NULL,
    version TEXT NOT NULL,
    pr_number INTEGER NOT NULL,
    pr_url TEXT NOT NULL,
    status TEXT NOT NULL DEFAULT 'pending',  -- pending, merged, closed
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE(owner, name, version)
);

-- Webhook sync log (idempotency)
CREATE TABLE webhook_log (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    delivery_id TEXT UNIQUE NOT NULL,        -- Gitea delivery ID
    event_type TEXT NOT NULL,
    processed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Note on tags indexing: The tags column stores JSON arrays as text. For v1, FTS5 will search within the JSON string. If tag filtering becomes a bottleneck, normalize to a tool_tags junction table:

-- Future: normalized tags (if needed)
CREATE TABLE tags (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT UNIQUE NOT NULL
);

CREATE TABLE tool_tags (
    tool_id INTEGER REFERENCES tools(id),
    tag_id INTEGER REFERENCES tags(id),
    PRIMARY KEY (tool_id, tag_id)
);

CLI first-time publish flow:

$ smarttools registry publish

No registry account configured.

1. Register at: https://gitea.brrd.tech/registry/register
2. Generate a token from your dashboard
3. Enter your token below

Registry token: ********
Token saved to ~/.smarttools/config.yaml

Validating tool...
✓ config.yaml is valid
✓ README.md exists (2.3 KB)
✓ Version 1.0.0 not yet published

Publishing rob/my-tool@1.0.0...
✓ PR created: https://gitea.brrd.tech/rob/SmartTools-Registry/pulls/42

Your tool is pending review. You'll receive an email when it's approved.

CLI Commands Reference

Full mapping of CLI commands to API calls:

Registry Commands

# Search for tools
$ smarttools registry search <query> [--category=<cat>] [--limit=20]
    → GET /api/v1/tools/search?q=<query>&category=<cat>&limit=20

# Browse tools (TUI)
$ smarttools registry browse [--category=<cat>]
    → GET /api/v1/tools?category=<cat>&page=1
    → GET /api/v1/categories

# View tool details
$ smarttools registry info <owner/name>
    → GET /api/v1/tools/<owner>/<name>

# Install a tool
$ smarttools registry install <owner/name> [--version=<ver>]
    → GET /api/v1/tools/<owner>/<name>/download?version=<ver>&install=true
    → Writes to ~/.smarttools/<owner>/<name>/config.yaml
    → Generates ~/.local/bin/<name> wrapper (or <owner>-<name> if collision)

# Uninstall a tool
$ smarttools registry uninstall <owner/name>
    → Removes ~/.smarttools/<owner>/<name>/
    → Removes wrapper script

# Publish a tool
$ smarttools registry publish [path] [--dry-run]
    → POST /api/v1/tools (with registry token)
    → Returns PR URL

# List my published tools
$ smarttools registry my-tools
    → GET /api/v1/me/tools (with registry token)

# Update index cache
$ smarttools registry update
    → GET /api/v1/index.json
    → Writes to ~/.smarttools/registry/index.json

Project Commands

# Install project dependencies from smarttools.yaml
$ smarttools install
    → Reads ./smarttools.yaml
    → For each dependency:
        GET /api/v1/tools/<owner>/<name>/download?version=<constraint>&install=true
    → Installs to ~/.smarttools/<owner>/<name>/

# Add a dependency to smarttools.yaml
$ smarttools add <owner/name> [--version=<constraint>]
    → Adds to ./smarttools.yaml dependencies
    → Runs install for that tool

# Show project dependencies status
$ smarttools deps
    → Reads ./smarttools.yaml
    → Shows installed status for each dependency
    → Note: "smarttools list" is reserved for listing installed tools

Command naming note: smarttools list already exists to list locally installed tools. Use smarttools deps to show project manifest dependencies.

Flags available on most commands

Flag	Description
`--offline`	Use cached index only, don't fetch
`--refresh`	Force refresh of cached data
`--json`	Output in JSON format
`--verbose`	Show detailed output

Webhooks and Security

HMAC Verification

All Gitea webhooks are verified using HMAC-SHA256:

import hmac
import hashlib

def verify_webhook(request, secret):
    signature = request.headers.get('X-Gitea-Signature')
    if not signature:
        return False

    expected = hmac.new(
        secret.encode(),
        request.body,
        hashlib.sha256
    ).hexdigest()

    return hmac.compare_digest(signature, expected)

Replay Protection

While sync is idempotent, implement basic replay protection:

def process_webhook(request):
    delivery_id = request.headers.get('X-Gitea-Delivery')

    # Check if already processed
    if db.webhook_log.exists(delivery_id=delivery_id):
        return {"status": "already_processed"}, 200

    # Verify signature
    if not verify_webhook(request, WEBHOOK_SECRET):
        return {"error": "invalid_signature"}, 401

    # Process with lock to prevent concurrent processing
    with db.lock(f"webhook:{delivery_id}"):
        # Double-check after acquiring lock
        if db.webhook_log.exists(delivery_id=delivery_id):
            return {"status": "already_processed"}, 200

        # Process the webhook
        result = sync_from_repo()

        # Log successful processing
        db.webhook_log.insert(
            delivery_id=delivery_id,
            event_type=request.json.get('action'),
            processed_at=datetime.utcnow()
        )

    return {"status": "processed"}, 200

Sync Job Locking

Prevent concurrent sync operations:

# Using file lock or database advisory lock
SYNC_LOCK_TIMEOUT = 300  # 5 minutes max

def sync_from_repo():
    try:
        with acquire_lock("registry_sync", timeout=SYNC_LOCK_TIMEOUT):
            # Pull latest from Gitea
            repo.fetch()
            repo.reset('origin/main', hard=True)

            # Parse and update database
            for tool_path in glob('tools/*/*/config.yaml'):
                update_tool_in_db(tool_path)

            # Rebuild FTS index if needed
            rebuild_fts_index()

    except LockTimeout:
        logger.warning("Sync already in progress, skipping")
        return {"status": "skipped", "reason": "sync_in_progress"}

Atomic Sync Strategy

To avoid partially updated DB during webhook sync, use transactional table swap:

def sync_from_repo_atomic():
    with acquire_lock("registry_sync", timeout=SYNC_LOCK_TIMEOUT):
        # 1. Pull latest from Gitea
        repo.fetch()
        repo.reset('origin/main', hard=True)

        # 2. Parse all tools into memory
        new_tools = []
        for tool_path in glob('tools/*/*/config.yaml'):
            tool_data = parse_tool(tool_path)
            if tool_data:
                new_tools.append(tool_data)

        # 3. Atomic swap using transaction
        with db.transaction():
            # Create temp table
            db.execute("CREATE TABLE tools_new AS SELECT * FROM tools WHERE 0")

            # Bulk insert into temp table
            for tool in new_tools:
                db.execute("INSERT INTO tools_new ...", tool)

            # Swap tables atomically
            db.execute("ALTER TABLE tools RENAME TO tools_old")
            db.execute("ALTER TABLE tools_new RENAME TO tools")
            db.execute("DROP TABLE tools_old")

            # Rebuild FTS index
            db.execute("INSERT INTO tools_fts(tools_fts) VALUES('rebuild')")

            # Update sync timestamp
            db.execute("UPDATE sync_status SET last_sync = ?", [datetime.utcnow()])

Why atomic: Per-row updates with FTS triggers can yield inconsistent reads under load. Readers may see partial state mid-sync. Table swap ensures all-or-nothing visibility.

Error Handling

Error Scenario	Behavior
Repo fetch fails	Log error, retry in 5 min, alert if 3 failures
YAML parse error	Skip tool, log error, continue with others
Database write fails	Rollback transaction, retry once, then alert
Lock timeout	Skip this sync, next webhook will retry

Automated CI Validation

PRs are validated automatically using SmartTools (dogfooding):

PR Submitted
    │
    ▼
┌─────────────────────────────────────┐
│  Gitea CI runs validation tools:    │
│  • schema-validator                 │
│  • security-scanner                 │
│  • duplicate-detector               │
└───────────────┬─────────────────────┘
                │
        ┌───────┴───────┐
        │               │
    All pass        Any fail
        │               │
        ▼               ▼
  Auto-merge or     Add comment,
  flag for review   request changes

Validation checks:

Schema validation: config.yaml matches expected format
Security scan: No dangerous shell commands, no secrets in prompts
Duplicate detection: AI-powered similarity check against existing tools
README check: README.md exists and is non-empty

CI workflow (.gitea/workflows/validate.yaml):

name: Validate Tool Submission
on: [pull_request]
jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Validate schema
        run: python scripts/validate_tool.py ${{ github.event.pull_request.head.sha }}
      - name: Security scan
        run: smarttools run security-scanner < changed_files.txt
      - name: Check duplicates
        run: smarttools run duplicate-detector < changed_files.txt

Registry Repository Structure

Full structure of the SmartTools-Registry repo:

SmartTools-Registry/
├── README.md                        # Registry overview
├── CONTRIBUTING.md                  # How to submit tools
├── LICENSE
│
├── tools/                           # All published tools
│   ├── rob/
│   │   ├── summarize/
│   │   │   ├── config.yaml
│   │   │   └── README.md
│   │   └── translate/
│   │       ├── config.yaml
│   │       └── README.md
│   └── alice/
│       └── code-review/
│           ├── config.yaml
│           └── README.md
│
├── categories/
│   └── categories.yaml              # Category definitions
│
├── index.json                       # Auto-generated search index
│
├── .gitea/
│   └── workflows/
│       ├── validate.yaml            # PR validation
│       ├── build-index.yaml         # Rebuild index on merge
│       └── notify-api.yaml          # Webhook to API server
│
└── scripts/
    ├── validate_tool.py             # Schema validation
    ├── build_index.py               # Generate index.json
    ├── check_duplicates.py          # Similarity detection
    └── security_scan.py             # Security checks

categories.yaml format:

categories:
  - name: text-processing
    description: Tools for manipulating and analyzing text
    icon: 📝
  - name: code
    description: Tools for code review, generation, and analysis
    icon: 💻
  - name: data
    description: Tools for data transformation and analysis
    icon: 📊
  - name: media
    description: Tools for image, audio, and video processing
    icon: 🎨
  - name: productivity
    description: General productivity and automation tools
    icon: ⚡

Download Stats

Counting Methodology

Count installs only, not views or searches
Increment after successful download (response sent)
Dedupe by client_id + tool_id + date

def download_tool(owner, name, version, install=False, client_id=None):
    tool = get_tool(owner, name, version)
    if not tool:
        return {"error": "not_found"}, 404

    config_yaml = tool.config_yaml

    # Only count if this is an install (not just viewing)
    if install:
        record_download(tool.id, client_id)

    return {"config": config_yaml}, 200

def record_download(tool_id, client_id):
    today = date.today()

    # Use client_id if provided, otherwise generate anonymous fallback
    effective_client_id = client_id or f"anon_{hash(request.remote_addr)}"

    # Dedupe: only count once per client per tool per day
    try:
        db.download_stats.insert(
            tool_id=tool_id,
            client_id=effective_client_id,
            downloaded_at=today
        )
        # Increment counter (can be async/batch updated)
        db.execute("UPDATE tools SET downloads = downloads + 1 WHERE id = ?", [tool_id])
    except IntegrityError:
        pass  # Already counted today, ignore

Client ID Generation

CLI generates a persistent anonymous ID on first run:

# In CLI, on first run
import uuid
import os

CONFIG_PATH = os.path.expanduser("~/.smarttools/config.yaml")

def get_or_create_client_id():
    config = load_config()
    if 'client_id' not in config:
        config['client_id'] = f"anon_{uuid.uuid4().hex[:16]}"
        save_config(config)
    return config['client_id']

Fallback when client_id missing:

If header X-Client-ID not sent, use IP hash as fallback
This still provides some dedupe for anonymous users
Logged users' downloads are attributed to their account instead

Privacy Considerations

No IP addresses stored in database
client_id is client-controlled and can be regenerated
Stats are aggregated (total count), not individual tracking

Async Stats Strategy

To avoid DB contention on the hot download path:

from queue import Queue
from threading import Thread

# In-memory queue for stats
stats_queue = Queue()

def record_download_async(tool_id, client_id):
    """Non-blocking: enqueue for background processing"""
    stats_queue.put({
        'tool_id': tool_id,
        'client_id': client_id,
        'date': date.today()
    })

def stats_worker():
    """Background thread: batch process stats every 5 seconds"""
    batch = []
    while True:
        try:
            item = stats_queue.get(timeout=5)
            batch.append(item)
        except Empty:
            if batch:
                flush_batch(batch)
                batch = []

def flush_batch(batch):
    """Bulk insert with conflict ignore"""
    with db.transaction():
        for item in batch:
            try:
                db.execute("""
                    INSERT INTO download_stats (tool_id, client_id, downloaded_at)
                    VALUES (?, ?, ?)
                    ON CONFLICT DO NOTHING
                """, [item['tool_id'], item['client_id'], item['date']])
            except Exception as e:
                logger.warning(f"Stats insert failed: {e}")
                # Don't fail downloads for stats errors

Failure behavior: If stats DB write fails, log the error but don't fail the download. Stats are "best effort" - the download must succeed.

Search

Primary search: SQLite FTS5 inside the API.
index.json provides offline CLI search and backup.
If FTS5 is stale, return results with X-Search-Index-Stale: true.

API Caching Strategy

Cache Headers

Endpoint	Cache-Control	ETag	Notes
`GET /index.json`	`max-age=300, stale-while-revalidate=60`	Yes	5 min cache, background refresh
`GET /tools/{owner}/{name}`	`max-age=60`	Yes	1 min cache
`GET /tools/{owner}/{name}/download`	`max-age=3600, immutable`	Yes	Immutable versions, 1 hour
`GET /tools/search`	`no-cache`	No	Always fresh
`GET /categories`	`max-age=3600`	Yes	Categories change rarely

ETag Implementation

import hashlib
from datetime import datetime

def get_tool_etag(tool):
    """Generate ETag from tool identity (immutable versions don't change)"""
    # Since versions are immutable, owner/name@version is stable
    # Use published_at for extra safety (not updated_at, which doesn't exist)
    content = f"{tool.owner}/{tool.name}@{tool.version}:{tool.published_at.isoformat()}"
    return hashlib.md5(content.encode()).hexdigest()

def get_index_etag():
    """Generate ETag from last sync timestamp"""
    last_sync = db.get_last_sync_time()
    return hashlib.md5(last_sync.isoformat().encode()).hexdigest()

@app.route('/api/v1/tools/<owner>/<name>/download')
def download_tool(owner, name):
    version = request.args.get('version', 'latest')
    tool = resolve_and_get_tool(owner, name, version)
    etag = get_tool_etag(tool)

    # Check If-None-Match header
    if request.headers.get('If-None-Match') == etag:
        return '', 304  # Not Modified

    response = jsonify({
        "data": {
            "owner": tool.owner,
            "name": tool.name,
            "resolved_version": tool.version,
            "config": tool.config_yaml
        }
    })
    response.headers['ETag'] = etag
    response.headers['Cache-Control'] = 'max-age=3600, immutable'
    return response

Note: Since tool versions are immutable, the ETag based on owner/name@version is permanently stable. The published_at timestamp is included for defense-in-depth but won't change.

DB vs Repo Read Strategy

Scenario	Read From	Reason
Normal operation	SQLite DB	Fast, indexed, FTS
DB empty/corrupted	Gitea repo	Fallback/recovery
Webhook sync in progress	DB (stale OK)	Avoid blocking reads
Search query	SQLite FTS5	Full-text search
Download specific version	DB, fallback to repo	DB is cache, repo is truth

Staleness Detection

STALE_THRESHOLD = timedelta(minutes=10)

def is_db_stale():
    last_sync = db.get_last_sync_time()
    return datetime.utcnow() - last_sync > STALE_THRESHOLD

@app.route('/tools/search')
def search_tools(q):
    results = db.search_fts(q)

    response = jsonify({"results": results})
    if is_db_stale():
        response.headers['X-Search-Index-Stale'] = 'true'
        response.headers['X-Last-Sync'] = db.get_last_sync_time().isoformat()

    return response

Error Model

Response Envelopes

Success response:

{
  "data": { ... },
  "meta": {
    "page": 1,
    "per_page": 20,
    "total": 42,
    "total_pages": 3
  }
}

Error response:

{
  "error": {
    "code": "TOOL_NOT_FOUND",
    "message": "Tool 'foo/bar' does not exist",
    "details": {
      "owner": "foo",
      "name": "bar",
      "suggestion": "Did you mean 'rob/bar'?"
    },
    "docs_url": "https://registry.smarttools.dev/docs/errors#TOOL_NOT_FOUND"
  }
}

Error Codes

Code	HTTP	Description
`TOOL_NOT_FOUND`	404	Tool does not exist
`VERSION_NOT_FOUND`	404	Requested version doesn't exist
`VERSION_EXISTS`	409	Cannot overwrite published version
`INVALID_VERSION`	400	Version string is not valid semver
`INVALID_CONSTRAINT`	400	Version constraint syntax error
`CONSTRAINT_UNSATISFIABLE`	404	No version matches constraint
`VALIDATION_ERROR`	400	Tool config validation failed
`UNAUTHORIZED`	401	Missing or invalid auth token
`FORBIDDEN`	403	Token valid but lacks permission
`RATE_LIMITED`	429	Too many requests
`SLUG_TAKEN`	409	Namespace slug already registered
`ACCOUNT_LOCKED`	403	Too many failed login attempts
`SERVER_ERROR`	500	Internal error (logged for debugging)

Error Scenarios and Fallbacks

CLI Error Handling

Scenario	CLI Behavior	User Message
Registry offline	Use cached tools if available	"Registry unavailable. Using cached version."
Tool not found	Check cache, then fail	"Tool 'foo/bar' not found in registry or cache."
Version constraint unsatisfiable	Show available versions	"No version matches '>=5.0.0'. Available: 1.0.0, 1.1.0, 1.2.0"
Auth token expired	Prompt for new token	"Token expired. Please re-authenticate."
Rate limited	Wait and retry (backoff)	"Rate limited. Retrying in 30 seconds..."
Network timeout	Retry with backoff, then fail	"Connection timed out. Check your network."

Validation Failure Details

When VALIDATION_ERROR occurs, provide specific field errors:

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Tool configuration is invalid",
    "details": {
      "errors": [
        {
          "path": "steps[0].provider",
          "message": "Provider 'gpt5' is not recognized",
          "allowed": ["claude", "openai", "ollama", "mock"]
        },
        {
          "path": "version",
          "message": "Version '1.0' is not valid semver (use '1.0.0')"
        }
      ]
    },
    "docs_url": "https://registry.smarttools.dev/docs/tool-format"
  }
}

Dependency Resolution Failures

When smarttools install fails on a manifest:

$ smarttools install

Error: Could not resolve all dependencies

  rob/summarize@^2.0.0
    ✗ No matching version (latest: 1.2.0)

  alice/translate@>=1.0.0
    ✓ Found 1.3.0

Suggestions:
  - Update rob/summarize constraint to "^1.0.0"
  - Contact the tool author for a v2 release

Graceful Degradation

Component Down	Fallback Behavior
API server	CLI uses `~/.smarttools/registry/index.json` for search
Gitea repo	API serves from DB cache (may be stale)
FTS5 index	Fall back to LIKE queries (slower but works)
Network	Use locally installed tools, skip registry features

UX Requirements (CLI/TUI)

Publishing UX

smarttools registry publish --dry-run validates locally and shows what would be submitted:

$ smarttools registry publish --dry-run

Validating tool...
✓ config.yaml is valid
✓ README.md exists (2.3 KB)
✓ Version 1.1.0 not yet published

Would submit:
  Owner: rob
  Name: summarize
  Version: 1.1.0
  Category: text-processing
  Tags: summarization, ai, text

Config preview:
─────────────────────────────
name: summarize
version: "1.1.0"
description: Summarize text using AI
...
─────────────────────────────

Run without --dry-run to submit for review.

Version bump reminder: CLI warns if version hasn't changed from published:

⚠ Version 1.0.0 is already published. Bump version in config.yaml to publish changes.

First-time publishing flow prompts for token and saves it to config.

Progress Indicators

Long-running operations show progress:

$ smarttools install

Installing project dependencies...
  [1/3] rob/summarize@^1.0.0
        Resolving version... 1.2.0
        Downloading... done
        Installing... done ✓
  [2/3] alice/translate@>=2.0.0
        Resolving version... 2.1.0
        Downloading... done
        Installing... done ✓
  [3/3] official/code-review@*
        Resolving version... 1.0.0
        Downloading... done
        Installing... done ✓

✓ Installed 3 tools

$ smarttools registry publish

Submitting rob/summarize@1.1.0...
  Validating... done ✓
  Uploading... done ✓
  Creating PR... done ✓

✓ PR created: https://gitea.brrd.tech/rob/SmartTools-Registry/pulls/42

Your tool is pending review. You'll receive an email when it's approved.

TUI Browse

smarttools registry browse opens a full-screen terminal UI:

┌─ SmartTools Registry ───────────────────────────────────────┐
│ Search: [________________] [All Categories ▼] [Sort: Popular ▼] │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ▶ rob/summarize v1.2.0                          ⬇ 142     │
│    Summarize text using AI                                  │
│    [text-processing] [ai] [summarization]                   │
│                                                             │
│    alice/translate v2.1.0                        ⬇ 98      │
│    Translate text between languages                         │
│    [text-processing] [translation]                          │
│                                                             │
│    official/code-review v1.0.0                   ⬇ 87      │
│    AI-powered code review                                   │
│    [code] [review] [ai]                                     │
│                                                             │
├─────────────────────────────────────────────────────────────┤
│ ↑↓ Navigate  Enter: Details  i: Install  /: Search  q: Quit │
└─────────────────────────────────────────────────────────────┘

Keyboard shortcuts:

Key	Action
`↑/↓` or `j/k`	Navigate list
`Enter`	View tool details
`i`	Install selected tool
`/`	Focus search box
`c`	Change category filter
`s`	Change sort order
`?`	Show help
`q`	Quit

Virtual scrolling: For large tool lists (>100), use virtual scrolling to maintain performance.

Project Initialization

$ smarttools init

Creating smarttools.yaml...

Project name [my-project]: my-ai-project
Version [1.0.0]:

Would you like to add any tools? (search with 's', skip with Enter)
> s
Search: summ
  1. rob/summarize v1.2.0 - Summarize text using AI
  2. alice/summary v1.0.0 - Generate summaries

Add tool (number, or Enter to finish): 1
Added rob/summarize@^1.2.0

Add tool (number, or Enter to finish):

✓ Created smarttools.yaml

name: my-ai-project
version: "1.0.0"
dependencies:
  - name: rob/summarize
    version: "^1.2.0"

Run 'smarttools install' to install dependencies.

Accessibility

CLI: All output works with screen readers, no color-only information
TUI: Full keyboard navigation, high-contrast mode support
Web UI: WCAG 2.1 AA compliance target
- Semantic HTML
- ARIA labels for interactive elements
- Focus management in modals
- Skip links for navigation

Offline Cache

Cache registry index locally:

~/.smarttools/registry/index.json

Refresh when older than 24 hours; support --offline and --refresh flags.

Index Integrity

The cached index.json includes integrity metadata:

{
  "version": "1.0",
  "generated_at": "2025-01-20T12:00:00Z",
  "checksum": "sha256:abc123...",
  "tool_count": 142,
  "tools": [...]
}

API response headers:

ETag: "abc123def456"
X-Index-Checksum: sha256:abc123...
X-Index-Generated: 2025-01-20T12:00:00Z

CLI verification:

def verify_cached_index():
    """Verify cached index integrity on load"""
    cached = load_cached_index()
    if not cached:
        return None

    # Verify checksum
    content = json.dumps(cached['tools'], sort_keys=True)
    computed = hashlib.sha256(content.encode()).hexdigest()

    if computed != cached.get('checksum', '').replace('sha256:', ''):
        logger.warning("Cached index checksum mismatch, will refresh")
        return None

    return cached

Corruption handling:

If checksum fails, discard cache and fetch fresh
If partial write detected (missing fields), discard and refresh
CLI shows warning: "Cached index corrupted, fetching fresh copy..."

Web UI Vision

The registry includes a full website, not just an API:

Site structure:

registry.smarttools.dev (or gitea.brrd.tech/registry)
├── /                           # Landing page
├── /tools                      # Browse all tools
├── /tools/{owner}/{name}       # Tool detail page
├── /categories                 # Browse by category
├── /categories/{name}          # Tools in category
├── /search?q=...               # Search results
├── /docs                       # Documentation
│   ├── /docs/getting-started
│   ├── /docs/creating-tools
│   ├── /docs/publishing
│   └── /docs/best-practices
├── /tutorials                  # Step-by-step guides
│   ├── /tutorials/first-tool
│   ├── /tutorials/chaining-steps
│   └── /tutorials/code-steps
├── /examples                   # Example projects
├── /blog                       # Updates, announcements (optional)
├── /register                   # Publisher registration
├── /login                      # Publisher login
├── /dashboard                  # Publisher dashboard
│   ├── /dashboard/tools        # My published tools
│   ├── /dashboard/tokens       # API tokens
│   └── /dashboard/settings     # Account settings
└── /api/v1/...                 # API endpoints

Landing page content:

Hero: "Share and discover AI-powered CLI tools"
Quick install example
Featured/popular tools
Category highlights
"Get Started" CTA

Tool detail page:

Name, description, version, author
README rendered as markdown (sanitized)
Install command (copy-to-clipboard)
Version history
Download stats
Category/tags
"Report" button for abuse

README Security

When rendering README markdown, apply XSS sanitization:

import bleach
from markdown import markdown

ALLOWED_TAGS = [
    'h1', 'h2', 'h3', 'h4', 'h5', 'h6',
    'p', 'br', 'hr',
    'ul', 'ol', 'li',
    'strong', 'em', 'code', 'pre',
    'blockquote',
    'a', 'img',
    'table', 'thead', 'tbody', 'tr', 'th', 'td'
]

ALLOWED_ATTRS = {
    'a': ['href', 'title'],
    'img': ['src', 'alt', 'title'],
    'code': ['class'],  # for syntax highlighting
}

def render_readme_safe(readme_raw: str) -> str:
    """Convert markdown to sanitized HTML"""
    # Convert markdown to HTML
    html = markdown(readme_raw, extensions=['fenced_code', 'tables'])

    # Sanitize to prevent XSS
    safe_html = bleach.clean(
        html,
        tags=ALLOWED_TAGS,
        attributes=ALLOWED_ATTRS,
        strip=True
    )

    # Linkify URLs
    safe_html = bleach.linkify(safe_html)

    return safe_html

Storage strategy:

Store raw README in tools.readme
Render and sanitize on request (or cache rendered HTML)
Never trust client-submitted HTML directly

Tech stack options:

Option	Pros	Cons
Flask + Jinja + Tailwind	Simple, Python-only, fast to build	Less interactive
FastAPI + Vue/React SPA	Modern, interactive	More complex, separate build
Astro/Next.js	Great SEO, static-first	Different stack (Node.js)

Recommendation: Flask + Jinja + Tailwind for v1

Keeps everything in Python
Server-rendered is fine for a registry
Good SEO out of the box
Can add interactivity with Alpine.js or htmx if needed

Monetization considerations:

AdSense-compatible (server-rendered pages)
Analytics tracking for traffic insights
Future: sponsored tools, featured placements
Future: premium publisher tiers (more tools, priority review)

Implementation Phases

Phase 1: Foundation

Define smarttools.yaml manifest format
Implement tool resolution order (local → global → registry)
Create SmartTools-Registry repo on Gitea (bootstrap)
Add 3-5 example tools to seed the registry

Phase 2: Core Backend

Set up Flask/FastAPI project structure
Implement SQLite database schema
Build core API endpoints (list, search, get, download)
Implement webhook receiver for Gitea sync
Set up HMAC verification

Phase 3: CLI Commands

smarttools registry search
smarttools registry install
smarttools registry info
smarttools registry browse (TUI)
Local index caching

Phase 4: Publishing

Publisher registration (web UI)
Token management
smarttools registry publish command
PR creation via Gitea API
CI validation workflows

Phase 5: Project Dependencies

smarttools install (from manifest)
smarttools add command
Runtime override application
Dependency resolution

Phase 6: Smart Features

SQLite FTS5 search index
AI-powered auto-categorization
Duplicate/similarity detection
Security scanning

Phase 7: Full Web UI

Landing page
Tool browsing/search pages
Tool detail pages with README rendering
Publisher dashboard
Documentation/tutorials section

Phase 8: Polish & Scale

Rate limiting
Abuse reporting
Analytics integration
Performance optimization
Monitoring/alerting

58 KiB Raw Blame History