smarttools/docs/REGISTRY.md

58 KiB

SmartTools Registry Design

Purpose

Build a centralized registry for SmartTools to enable discovery, publishing, dependency management, and future curation at scale.

Terminology

Term Definition
Tool definition The full YAML file in the registry (config.yaml) containing name, steps, arguments, etc.
Tool config The configuration within a tool definition (arguments, steps, provider settings)
smarttools.yaml Project manifest file declaring tool dependencies and overrides
config.yaml The tool definition file, both in registry and when installed locally
Owner Immutable namespace slug identifying the publisher (e.g., rob, alice)
Publisher A registered user who can publish tools to the registry
Wrapper script Auto-generated bash script in ~/.local/bin/ that invokes a tool

Canonical naming: Use SmartTools-Registry (capitalized, hyphenated) for the repository name.

Diagram References

  • System overview: discussions/diagrams/smarttools-registry_rob_1.puml
  • Data flows: discussions/diagrams/smarttools-registry_rob_5.puml

System Overview

Users interact via the CLI and a future Web UI. Both call a Registry API hosted at https://gitea.brrd.tech/api/v1 (future alias: registry.smarttools.dev/api/v1). The API syncs from a Gitea-backed registry repo and maintains a SQLite cache/search index.

Canonical API base path: https://gitea.brrd.tech/api/v1

All API endpoints are versioned under /api/v1. When breaking changes are needed, a new version (/api/v2) will be introduced with deprecation notices.

Core API endpoints:

  • GET /api/v1/tools
  • GET /api/v1/tools/search?q=...
  • GET /api/v1/tools/{owner}/{name}
  • GET /api/v1/tools/{owner}/{name}/versions
  • GET /api/v1/tools/{owner}/{name}/download?version=...
  • POST /api/v1/tools (publish)
  • GET /api/v1/categories
  • GET /api/v1/stats/popular
  • POST /api/v1/webhook/gitea

Pagination

All list endpoints support pagination:

Parameter Default Max Description
page 1 - Page number (1-indexed)
per_page 20 100 Items per page
sort downloads - Sort field
order desc - Sort order (asc/desc)

Stable ordering: To ensure deterministic results across pages, sorting includes a secondary key:

  • Primary: requested field (e.g., downloads)
  • Secondary: published_at (desc)
  • Tertiary: id (for absolute stability)
ORDER BY downloads DESC, published_at DESC, id DESC
LIMIT 20 OFFSET 0

Response pagination metadata:

{
  "data": [...],
  "meta": {
    "page": 1,
    "per_page": 20,
    "total": 142,
    "total_pages": 8
  }
}

Input Constraints

Size limits to prevent oversized uploads:

Field Max Size Notes
config.yaml 64 KB Tool definition
README.md 256 KB Documentation
Request body 512 KB Total POST payload
Tool name 64 chars Alphanumeric + hyphen
Description 500 chars Short summary
Tag 32 chars Individual tag
Tags array 10 items Maximum tags per tool

Validation errors:

{
  "error": {
    "code": "PAYLOAD_TOO_LARGE",
    "message": "config.yaml exceeds 64KB limit",
    "details": {
      "field": "config",
      "size": 72000,
      "limit": 65536
    }
  }
}

Sort Fields and Indexes

Allowed sort fields:

Endpoint Allowed sort values
GET /tools downloads, published_at, name
GET /tools/search relevance, downloads, published_at
GET /categories name, tool_count

Invalid sort values return 400:

{"error": {"code": "INVALID_SORT", "message": "Unknown sort field 'foo'. Allowed: downloads, published_at, name"}}

Database indexes:

-- Frequent query patterns
CREATE INDEX idx_tools_owner_name ON tools(owner, name);
CREATE INDEX idx_tools_category ON tools(category);
CREATE INDEX idx_tools_published_at ON tools(published_at DESC);
CREATE INDEX idx_tools_downloads ON tools(downloads DESC);
CREATE INDEX idx_tools_owner_name_version ON tools(owner, name, version);

-- For pagination stability
CREATE INDEX idx_tools_sort_stable ON tools(downloads DESC, published_at DESC, id DESC);

-- Publisher lookups
CREATE INDEX idx_publishers_slug ON publishers(slug);
CREATE INDEX idx_publishers_email ON publishers(email);

-- Token lookups
CREATE INDEX idx_api_tokens_hash ON api_tokens(token_hash);
CREATE INDEX idx_api_tokens_publisher ON api_tokens(publisher_id);

API Version Compatibility

Forward compatibility: Clients should ignore unknown fields in API responses:

# Good: ignore unknown fields
tool = response['data']
name = tool.get('name')
# Don't fail if 'new_field' exists but client doesn't know about it

# Bad: strict parsing that fails on unknown fields
tool = ToolSchema.parse(response['data'])  # May fail on new fields

Backward compatibility: The API will:

  • Never remove fields in a version (only deprecate)
  • Never change field types
  • Add new optional fields without version bump
  • Use new version (/api/v2) for breaking changes

Deprecation process:

  1. Add X-Deprecated-Field: old_field header
  2. Document in changelog
  3. Remove after 6 months minimum
  4. Major version bump if widely used

Client version header:

X-SmartTools-Client: cli/1.2.0

Helps server track client versions for deprecation decisions.

Source of Truth

  • Gitea registry repo is the source of truth.
  • API syncs repo content into SQLite for fast queries, stats, and FTS5 search.
  • index.json remains useful for offline CLI search and as a fallback.

If the cache is stale, the API can fall back to repo reads; a warning header may be emitted.

Namespacing and Paths

Support owner/name from day one:

  • Registry path: tools/{owner}/{name}/config.yaml
  • API URL: /tools/{owner}/{name}
  • Install: smarttools registry install rob/summarize
  • Shorthand: smarttools registry install summarize resolves to the official namespace.

PR branches: submit/{owner}/{name}/{version}.

Namespace Identity

The owner is an immutable slug, not the display name:

-- In publishers table
slug TEXT UNIQUE NOT NULL,        -- immutable: "rob", "alice-dev"
display_name TEXT NOT NULL,       -- mutable: "Rob", "Alice Developer"

Slug rules:

  • Lowercase alphanumeric + hyphens only: ^[a-z0-9][a-z0-9-]*[a-z0-9]$
  • 2-39 characters
  • Cannot start/end with hyphen
  • Set once at registration, cannot be changed
  • Reserved slugs: official, admin, system, api, registry

Rename policy:

  • display_name can be changed anytime via dashboard
  • slug (owner) is permanent to preserve URLs and tool references
  • If a publisher absolutely must change slug (legal reasons, etc.):
    1. Create new account with new slug
    2. Republish tools under new namespace
    3. Mark old tools as deprecated with replacement pointing to new namespace
    4. Old namespace remains reserved (cannot be reused by others)

Why immutable:

  • rob/summarize@1.0.0 must always resolve to the same tool
  • Prevents namespace hijacking after rename
  • Simplifies caching and CDN strategies

Tool Format (Registry == Local)

Registry tool folders mirror local tools:

tools/
  rob/
    summarize/
      config.yaml
      README.md

Tool files match the existing SmartTools format. Registry-specific metadata is kept under registry:. Deprecation is tool-defined and top-level:

name: summarize
version: "1.2.0"
deprecated: true
deprecated_message: "Security issue. Use v1.2.1"
replacement: "rob/summarize@1.2.1"
registry:
  published_at: "2025-01-15T10:30:00Z"
  downloads: 142

Schema compatibility note: The current SmartTools config parser may reject unknown top-level keys like deprecated, replacement, and registry. Before implementing registry features:

  1. Update the YAML parser to ignore unknown keys (permissive mode)
  2. Or explicitly define these fields in the Tool dataclass with defaults
  3. Validate registry-specific fields only when publishing, not when running locally

This ensures local tools continue to work even if they don't have registry fields.

Versioning and Immutability

  • Unique key: owner/name + version.
  • Published versions are immutable.
  • Deprecation uses deprecated, deprecated_message, and replacement.
  • CLI warns on install if a version is deprecated.

Yank Policy

Yanking allows removing a version from resolution without deleting it (for auditability):

# In tool config
yanked: true
yanked_reason: "Critical security vulnerability CVE-2025-1234"
yanked_at: "2025-01-20T15:00:00Z"

Yanked version behavior:

Operation Behavior
install foo@1.0.0 (exact) Warns but allows install
install foo@^1.0.0 (constraint) Excludes yanked, resolves to next valid
search / browse Hidden by default, shown with --include-yanked
Direct URL access Returns tool with yanked: true in response
Already installed Continues to work, no forced removal

Database schema addition:

-- Add to tools table
yanked BOOLEAN DEFAULT FALSE,
yanked_reason TEXT,
yanked_at TIMESTAMP

Yank vs Delete:

  • Yank: Version remains in DB, excluded from resolution, auditable
  • Delete: Reserved for DMCA/legal, requires admin action, leaves tombstone record

Version Format

Tools use semantic versioning (semver):

MAJOR.MINOR.PATCH[-PRERELEASE][+BUILD]

Examples:
  1.0.0           # stable release
  1.2.3           # stable release
  2.0.0-alpha.1   # prerelease
  2.0.0-beta.2    # prerelease
  2.0.0-rc.1      # release candidate

Version Constraints

Manifest files support these constraint formats:

Constraint Meaning Example Match
1.2.3 Exact version 1.2.3 only
>=1.2.0 Minimum version 1.2.0, 1.3.0, 2.0.0
<2.0.0 Below version 1.9.9, 1.0.0
>=1.0.0,<2.0.0 Range 1.0.0 to 1.9.9
^1.2.3 Compatible (same major) 1.2.3 to 1.9.9
~1.2.3 Approximately (same minor) 1.2.3 to 1.2.9
* Any version latest stable

Version Resolution Rules

When resolving a version constraint:

  1. Filter: Get all versions matching the constraint
  2. Exclude prereleases: Unless constraint explicitly includes them (e.g., >=2.0.0-alpha.1)
  3. Sort: By semver precedence (descending)
  4. Select: Highest matching version

Tie-breakers:

  • Stable versions preferred over prereleases
  • Later publish date wins if versions are equal (shouldn't happen with immutability)

Unsatisfiable constraints:

// API Response: 404
{
  "error": {
    "code": "VERSION_NOT_FOUND",
    "message": "No version of 'rob/summarize' satisfies constraint '>=5.0.0'",
    "details": {
      "tool": "rob/summarize",
      "constraint": ">=5.0.0",
      "available_versions": ["1.0.0", "1.1.0", "1.2.0"],
      "latest_stable": "1.2.0"
    }
  }
}

Prerelease Handling

  • Prereleases are not returned for * or range constraints by default
  • To install prerelease: smarttools registry install rob/summarize@2.0.0-beta.1
  • To allow prereleases in manifest: version: ">=2.0.0-0" (the -0 suffix includes prereleases)

Download Endpoint Version Selection

The /api/v1/tools/{owner}/{name}/download endpoint accepts version parameters:

Parameter Behavior Example
(none) Returns latest stable version /download1.2.0
version=1.2.0 Exact version (must exist) /download?version=1.2.0
version=^1.0.0 Server resolves constraint /download?version=^1.0.01.2.0
version=latest Alias for latest stable /download?version=latest

Server-side resolution: The API server resolves version constraints, not the client. This ensures consistent resolution and allows the server to apply policies (e.g., exclude yanked versions).

GET /api/v1/tools/rob/summarize/download?version=^1.0.0&install=true

Response (200):
{
  "data": {
    "owner": "rob",
    "name": "summarize",
    "resolved_version": "1.2.0",
    "config": "... YAML content ..."
  },
  "meta": {
    "constraint": "^1.0.0",
    "available_versions": ["1.0.0", "1.1.0", "1.2.0"]
  }
}

Invalid/unsatisfiable constraint:

GET /api/v1/tools/rob/summarize/download?version=^5.0.0

Response (404):
{
  "error": {
    "code": "CONSTRAINT_UNSATISFIABLE",
    "message": "No version matches constraint '^5.0.0'",
    "details": {
      "constraint": "^5.0.0",
      "latest_stable": "1.2.0",
      "available_versions": ["1.0.0", "1.1.0", "1.2.0"]
    }
  }
}

Tool Resolution Order

When a tool is invoked, the CLI searches in this order:

  1. Local project: ./.smarttools/<owner>/<name>/config.yaml (or ./.smarttools/<name>/ for unnamespaced)
  2. Global user: ~/.smarttools/<owner>/<name>/config.yaml
  3. Registry: Fetch from API, install to global, then run
  4. Error: Tool '<toolname>' not found

Step 3 only occurs if auto_fetch_from_registry: true in config (default: true).

Path convention: Use .smarttools/ (with leading dot) for both local and global to maintain consistency.

Resolution also respects namespacing:

  • summarize → searches for any tool named summarize, prefers official/summarize if exists
  • rob/summarize → searches for exactly rob/summarize

Official Namespace

The slug official is reserved for curated, high-quality tools maintained by the registry administrators.

  • Shorthand summarize resolves to official/summarize if it exists
  • If no official/summarize, falls back to most-downloaded tool named summarize
  • To avoid ambiguity, always use full owner/name in manifests

Reserved slugs that cannot be registered: official, admin, system, api, registry, smarttools

Auto-Fetch Behavior

When enabled (auto_fetch_from_registry: true), missing tools are automatically fetched:

$ summarize < file.txt
# Tool 'summarize' not found locally.
# Fetching from registry...
# Installed: official/summarize@1.2.0
# Running...

Behavior details:

  • Fetches latest stable version unless pinned in smarttools.yaml
  • Installs to ~/.smarttools/<owner>/<name>/
  • Generates wrapper script in ~/.local/bin/
  • Subsequent runs use local copy (no re-fetch)

To disable (require explicit install):

# ~/.smarttools/config.yaml
auto_fetch_from_registry: false

Wrapper Script Collisions

When two tools from different owners have the same name:

Scenario Behavior
Install official/summarize Creates wrapper ~/.local/bin/summarize
Install rob/summarize (collision) Creates wrapper ~/.local/bin/rob-summarize
Uninstall official/summarize Removes summarize wrapper, promotes rob-summarizesummarize if desired

The first-installed tool with a given name gets the short wrapper. Subsequent tools use owner-name format.

To invoke a specific owner's tool:

# Short form (whichever was installed first)
summarize < file.txt

# Explicit owner form (always works)
rob-summarize < file.txt

# Or via smarttools run
smarttools run rob/summarize < file.txt

Project Manifest (smarttools.yaml)

Defines tool dependencies with optional runtime overrides:

name: my-ai-project
version: "1.0.0"
dependencies:
  - name: rob/summarize
    version: ">=1.0.0"
overrides:
  rob/summarize:
    provider: ollama

Overrides are applied at runtime and do not mutate installed tool configs.

CLI Config and Tokens

Global config lives in ~/.smarttools/config.yaml:

registry:
  url: https://gitea.brrd.tech/api/v1    # Must match canonical base path
  token: "reg_xxxxxxxxxxxx"
client_id: "anon_abc123def456"
auto_fetch_from_registry: true

client_id is generated locally and used for anonymous install dedupe.

Publishing and Auth

Publishing uses registry accounts, not Gitea accounts:

  • Public endpoints require no auth.
  • POST /tools requires a registry token.
  • The API server uses a private Gitea service account to open PRs.

Publish Idempotency and Edge Cases

Idempotency key: owner/name@version

Scenario API Response HTTP Code
New version, no PR exists Create PR, return URL 201 Created
PR already exists (pending) Return existing PR URL 200 OK
Version already published Error: version exists 409 Conflict
PR was closed without merge Allow new PR 201 Created
PR was merged, then tool deleted Error: version exists (tombstone) 409 Conflict

Version immutability enforcement:

// Attempt to publish existing version
// Response: 409 Conflict
{
  "error": {
    "code": "VERSION_EXISTS",
    "message": "Version 1.2.0 of 'rob/summarize' already exists and cannot be overwritten",
    "details": {
      "published_at": "2025-01-15T10:30:00Z",
      "action": "Bump version number to publish changes"
    }
  }
}

Closed PR handling:

  • Track PR state in database: pending, merged, closed
  • If PR was closed (rejected/abandoned), allow new submission for same version
  • If PR was merged, version is immutable forever

Update flow (new version, not overwrite):

  1. Developer modifies tool locally
  2. Bumps version in config.yaml (e.g., 1.2.01.3.0)
  3. Runs smarttools registry publish
  4. New PR created for 1.3.0
  5. Old version 1.2.0 remains available

Publisher Registration

Publishers register on the registry website, not Gitea:

Registration flow:

  1. User visits https://gitea.brrd.tech/registry/register (or future registry.smarttools.dev)
  2. Creates account with email + password + slug
  3. Receives verification email (optional in v1, but track verified status)
  4. Logs into dashboard at /dashboard
  5. Generates API token from dashboard
  6. Uses token in CLI for publishing

Authentication Security

Password hashing:

  • Algorithm: Argon2id (memory-hard, recommended by OWASP)
  • Parameters: memory=65536, iterations=3, parallelism=4
  • Library: argon2-cffi for Python
from argon2 import PasswordHasher
ph = PasswordHasher(memory_cost=65536, time_cost=3, parallelism=4)
hash = ph.hash(password)
ph.verify(hash, password)  # raises on mismatch

API token format:

reg_<random-32-bytes-base62>

Example: reg_7kX9mPqR2sT4vW6xY8zA1bC3dE5fG7hJ
  • Prefix reg_ for easy identification in logs/configs
  • 32 bytes of cryptographically random data
  • Base62 encoded (alphanumeric, no special chars)
  • Total length: ~47 characters
  • Stored as SHA-256 hash in database (never plain text)

Token lifecycle:

Action Behavior
Generate Create new token, return once, store hash
List Show token name, created date, last used (not the token itself)
Revoke Set revoked_at timestamp, reject future uses
Rotate Generate new token, optionally revoke old

Rate limits:

Endpoint Limit Window Scope Retry-After
POST /register 5 1 hour IP 3600
POST /login 10 15 min IP 900
POST /login (failed) 5 15 min IP + email 900
POST /tokens 10 1 hour Token 3600
POST /tools 20 1 hour Token 3600
GET /tools/* 100 1 min IP 60
GET /download 60 1 min IP 60

Rate limit response (429):

{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Too many requests. Try again in 60 seconds.",
    "details": {
      "limit": 100,
      "window": "1 minute",
      "retry_after": 60
    }
  }
}

Headers on rate-limited response:

HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1705766400

Scope priority: For authenticated requests, both IP and token limits apply. The more restrictive limit wins.

Account lockout:

  • After 5 failed login attempts: 15-minute lockout for that email
  • After 10 failed attempts: 1-hour lockout
  • Lockout clears on successful password reset

Password reset flow (deferred to v1.1):

  1. User requests reset via email
  2. Server generates time-limited token (1 hour expiry)
  3. Email contains reset link with token
  4. User sets new password
  5. All existing sessions/tokens optionally invalidated

Email verification flow (deferred to v1.1):

  1. On registration, send verification email
  2. User clicks link with verification token
  3. Set verified = true in database
  4. Unverified accounts can browse but not publish

Token Scopes and Authorization

Tokens have scopes that limit their capabilities:

Scope Permissions
read View own published tools, download stats
publish Submit new tools, update own tool metadata
admin Yank tools, manage categories (registry admins only)

Default scope: New tokens get read,publish by default.

Ownership enforcement:

@app.route('/api/v1/tools', methods=['POST'])
@require_token(scopes=['publish'])
def publish_tool():
    token = get_current_token()
    tool_data = request.json

    # Enforce owner == token holder's slug
    if tool_data['owner'] != token.publisher.slug:
        return {
            "error": {
                "code": "FORBIDDEN",
                "message": f"Cannot publish to namespace '{tool_data['owner']}'. "
                           f"Your namespace is '{token.publisher.slug}'."
            }
        }, 403

    # Proceed with publish...

GET /api/v1/me/tools authorization:

  • Requires valid token with read scope
  • Returns only tools where owner == token.publisher.slug
  • Includes pending PRs and all versions (including yanked)

Web Session Security

Dashboard login uses session cookies (not tokens) for browser auth:

Cookie settings:

SESSION_COOKIE_NAME = 'smarttools_session'
SESSION_COOKIE_HTTPONLY = True      # Prevent JS access
SESSION_COOKIE_SECURE = True        # HTTPS only in production
SESSION_COOKIE_SAMESITE = 'Lax'     # CSRF protection
SESSION_COOKIE_MAX_AGE = 86400 * 7  # 7 days

CSRF protection:

  • All POST/PUT/DELETE forms include csrf_token hidden field
  • Token validated server-side before processing
  • 403 Forbidden if token missing or invalid

Session lifecycle:

Event Action
Login Create session, set cookie
Logout Delete session, clear cookie
Idle 24h Session expires, re-login required
Password change Invalidate all sessions
Token revocation Existing sessions continue (token != session)

Secure session storage:

# Store sessions in DB, not filesystem
from flask_session import Session
app.config['SESSION_TYPE'] = 'sqlalchemy'
app.config['SESSION_SQLALCHEMY_TABLE'] = 'sessions'

Database schema:

-- Publishers
CREATE TABLE publishers (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    email TEXT UNIQUE NOT NULL,
    password_hash TEXT NOT NULL,
    slug TEXT UNIQUE NOT NULL,            -- immutable namespace: "rob", "alice-dev"
    display_name TEXT NOT NULL,           -- mutable: "Rob", "Alice Developer"
    bio TEXT,
    website TEXT,
    verified BOOLEAN DEFAULT FALSE,
    locked_until TIMESTAMP,               -- account lockout
    failed_login_attempts INTEGER DEFAULT 0,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- API tokens (one publisher can have multiple)
CREATE TABLE api_tokens (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    publisher_id INTEGER NOT NULL REFERENCES publishers(id),
    token_hash TEXT NOT NULL,
    name TEXT NOT NULL,           -- "CLI token", "CI token"
    last_used_at TIMESTAMP,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    revoked_at TIMESTAMP          -- NULL if active
);

-- Tools (links to publisher)
CREATE TABLE tools (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    owner TEXT NOT NULL,          -- namespace slug (immutable, from publisher.slug)
    name TEXT NOT NULL,
    version TEXT NOT NULL,
    description TEXT,
    category TEXT,
    tags TEXT,                    -- JSON array
    config_yaml TEXT NOT NULL,    -- Full tool config
    readme TEXT,
    publisher_id INTEGER NOT NULL REFERENCES publishers(id),
    deprecated BOOLEAN DEFAULT FALSE,
    deprecated_message TEXT,
    replacement TEXT,
    downloads INTEGER DEFAULT 0,
    published_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE(owner, name, version)
);

-- Download stats (for deduplication)
CREATE TABLE download_stats (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    tool_id INTEGER NOT NULL REFERENCES tools(id),
    client_id TEXT NOT NULL,
    downloaded_at DATE NOT NULL,
    UNIQUE(tool_id, client_id, downloaded_at)
);

-- Search index (FTS5)
CREATE VIRTUAL TABLE tools_fts USING fts5(
    name, description, tags, readme,
    content='tools',
    content_rowid='id'
);

-- FTS5 sync triggers (required for external content tables)
CREATE TRIGGER tools_ai AFTER INSERT ON tools BEGIN
    INSERT INTO tools_fts(rowid, name, description, tags, readme)
    VALUES (new.id, new.name, new.description, new.tags, new.readme);
END;

CREATE TRIGGER tools_ad AFTER DELETE ON tools BEGIN
    INSERT INTO tools_fts(tools_fts, rowid, name, description, tags, readme)
    VALUES ('delete', old.id, old.name, old.description, old.tags, old.readme);
END;

CREATE TRIGGER tools_au AFTER UPDATE ON tools BEGIN
    INSERT INTO tools_fts(tools_fts, rowid, name, description, tags, readme)
    VALUES ('delete', old.id, old.name, old.description, old.tags, old.readme);
    INSERT INTO tools_fts(rowid, name, description, tags, readme)
    VALUES (new.id, new.name, new.description, new.tags, new.readme);
END;

-- Pending PRs (track publish state)
CREATE TABLE pending_prs (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    publisher_id INTEGER NOT NULL REFERENCES publishers(id),
    owner TEXT NOT NULL,
    name TEXT NOT NULL,
    version TEXT NOT NULL,
    pr_number INTEGER NOT NULL,
    pr_url TEXT NOT NULL,
    status TEXT NOT NULL DEFAULT 'pending',  -- pending, merged, closed
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE(owner, name, version)
);

-- Webhook sync log (idempotency)
CREATE TABLE webhook_log (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    delivery_id TEXT UNIQUE NOT NULL,        -- Gitea delivery ID
    event_type TEXT NOT NULL,
    processed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Note on tags indexing: The tags column stores JSON arrays as text. For v1, FTS5 will search within the JSON string. If tag filtering becomes a bottleneck, normalize to a tool_tags junction table:

-- Future: normalized tags (if needed)
CREATE TABLE tags (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT UNIQUE NOT NULL
);

CREATE TABLE tool_tags (
    tool_id INTEGER REFERENCES tools(id),
    tag_id INTEGER REFERENCES tags(id),
    PRIMARY KEY (tool_id, tag_id)
);

CLI first-time publish flow:

$ smarttools registry publish

No registry account configured.

1. Register at: https://gitea.brrd.tech/registry/register
2. Generate a token from your dashboard
3. Enter your token below

Registry token: ********
Token saved to ~/.smarttools/config.yaml

Validating tool...
✓ config.yaml is valid
✓ README.md exists (2.3 KB)
✓ Version 1.0.0 not yet published

Publishing rob/my-tool@1.0.0...
✓ PR created: https://gitea.brrd.tech/rob/SmartTools-Registry/pulls/42

Your tool is pending review. You'll receive an email when it's approved.

CLI Commands Reference

Full mapping of CLI commands to API calls:

Registry Commands

# Search for tools
$ smarttools registry search <query> [--category=<cat>] [--limit=20]
    → GET /api/v1/tools/search?q=<query>&category=<cat>&limit=20

# Browse tools (TUI)
$ smarttools registry browse [--category=<cat>]
    → GET /api/v1/tools?category=<cat>&page=1
    → GET /api/v1/categories

# View tool details
$ smarttools registry info <owner/name>
    → GET /api/v1/tools/<owner>/<name>

# Install a tool
$ smarttools registry install <owner/name> [--version=<ver>]
    → GET /api/v1/tools/<owner>/<name>/download?version=<ver>&install=true
    → Writes to ~/.smarttools/<owner>/<name>/config.yaml
    → Generates ~/.local/bin/<name> wrapper (or <owner>-<name> if collision)

# Uninstall a tool
$ smarttools registry uninstall <owner/name>
    → Removes ~/.smarttools/<owner>/<name>/
    → Removes wrapper script

# Publish a tool
$ smarttools registry publish [path] [--dry-run]
    → POST /api/v1/tools (with registry token)
    → Returns PR URL

# List my published tools
$ smarttools registry my-tools
    → GET /api/v1/me/tools (with registry token)

# Update index cache
$ smarttools registry update
    → GET /api/v1/index.json
    → Writes to ~/.smarttools/registry/index.json

Project Commands

# Install project dependencies from smarttools.yaml
$ smarttools install
    → Reads ./smarttools.yaml
    → For each dependency:
        GET /api/v1/tools/<owner>/<name>/download?version=<constraint>&install=true
    → Installs to ~/.smarttools/<owner>/<name>/

# Add a dependency to smarttools.yaml
$ smarttools add <owner/name> [--version=<constraint>]
    → Adds to ./smarttools.yaml dependencies
    → Runs install for that tool

# Show project dependencies status
$ smarttools deps
    → Reads ./smarttools.yaml
    → Shows installed status for each dependency
    → Note: "smarttools list" is reserved for listing installed tools

Command naming note: smarttools list already exists to list locally installed tools. Use smarttools deps to show project manifest dependencies.

Flags available on most commands

Flag Description
--offline Use cached index only, don't fetch
--refresh Force refresh of cached data
--json Output in JSON format
--verbose Show detailed output

Webhooks and Security

HMAC Verification

All Gitea webhooks are verified using HMAC-SHA256:

import hmac
import hashlib

def verify_webhook(request, secret):
    signature = request.headers.get('X-Gitea-Signature')
    if not signature:
        return False

    expected = hmac.new(
        secret.encode(),
        request.body,
        hashlib.sha256
    ).hexdigest()

    return hmac.compare_digest(signature, expected)

Replay Protection

While sync is idempotent, implement basic replay protection:

def process_webhook(request):
    delivery_id = request.headers.get('X-Gitea-Delivery')

    # Check if already processed
    if db.webhook_log.exists(delivery_id=delivery_id):
        return {"status": "already_processed"}, 200

    # Verify signature
    if not verify_webhook(request, WEBHOOK_SECRET):
        return {"error": "invalid_signature"}, 401

    # Process with lock to prevent concurrent processing
    with db.lock(f"webhook:{delivery_id}"):
        # Double-check after acquiring lock
        if db.webhook_log.exists(delivery_id=delivery_id):
            return {"status": "already_processed"}, 200

        # Process the webhook
        result = sync_from_repo()

        # Log successful processing
        db.webhook_log.insert(
            delivery_id=delivery_id,
            event_type=request.json.get('action'),
            processed_at=datetime.utcnow()
        )

    return {"status": "processed"}, 200

Sync Job Locking

Prevent concurrent sync operations:

# Using file lock or database advisory lock
SYNC_LOCK_TIMEOUT = 300  # 5 minutes max

def sync_from_repo():
    try:
        with acquire_lock("registry_sync", timeout=SYNC_LOCK_TIMEOUT):
            # Pull latest from Gitea
            repo.fetch()
            repo.reset('origin/main', hard=True)

            # Parse and update database
            for tool_path in glob('tools/*/*/config.yaml'):
                update_tool_in_db(tool_path)

            # Rebuild FTS index if needed
            rebuild_fts_index()

    except LockTimeout:
        logger.warning("Sync already in progress, skipping")
        return {"status": "skipped", "reason": "sync_in_progress"}

Atomic Sync Strategy

To avoid partially updated DB during webhook sync, use transactional table swap:

def sync_from_repo_atomic():
    with acquire_lock("registry_sync", timeout=SYNC_LOCK_TIMEOUT):
        # 1. Pull latest from Gitea
        repo.fetch()
        repo.reset('origin/main', hard=True)

        # 2. Parse all tools into memory
        new_tools = []
        for tool_path in glob('tools/*/*/config.yaml'):
            tool_data = parse_tool(tool_path)
            if tool_data:
                new_tools.append(tool_data)

        # 3. Atomic swap using transaction
        with db.transaction():
            # Create temp table
            db.execute("CREATE TABLE tools_new AS SELECT * FROM tools WHERE 0")

            # Bulk insert into temp table
            for tool in new_tools:
                db.execute("INSERT INTO tools_new ...", tool)

            # Swap tables atomically
            db.execute("ALTER TABLE tools RENAME TO tools_old")
            db.execute("ALTER TABLE tools_new RENAME TO tools")
            db.execute("DROP TABLE tools_old")

            # Rebuild FTS index
            db.execute("INSERT INTO tools_fts(tools_fts) VALUES('rebuild')")

            # Update sync timestamp
            db.execute("UPDATE sync_status SET last_sync = ?", [datetime.utcnow()])

Why atomic: Per-row updates with FTS triggers can yield inconsistent reads under load. Readers may see partial state mid-sync. Table swap ensures all-or-nothing visibility.

Error Handling

Error Scenario Behavior
Repo fetch fails Log error, retry in 5 min, alert if 3 failures
YAML parse error Skip tool, log error, continue with others
Database write fails Rollback transaction, retry once, then alert
Lock timeout Skip this sync, next webhook will retry

Automated CI Validation

PRs are validated automatically using SmartTools (dogfooding):

PR Submitted
    │
    ▼
┌─────────────────────────────────────┐
│  Gitea CI runs validation tools:    │
│  • schema-validator                 │
│  • security-scanner                 │
│  • duplicate-detector               │
└───────────────┬─────────────────────┘
                │
        ┌───────┴───────┐
        │               │
    All pass        Any fail
        │               │
        ▼               ▼
  Auto-merge or     Add comment,
  flag for review   request changes

Validation checks:

  1. Schema validation: config.yaml matches expected format
  2. Security scan: No dangerous shell commands, no secrets in prompts
  3. Duplicate detection: AI-powered similarity check against existing tools
  4. README check: README.md exists and is non-empty

CI workflow (.gitea/workflows/validate.yaml):

name: Validate Tool Submission
on: [pull_request]
jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Validate schema
        run: python scripts/validate_tool.py ${{ github.event.pull_request.head.sha }}
      - name: Security scan
        run: smarttools run security-scanner < changed_files.txt
      - name: Check duplicates
        run: smarttools run duplicate-detector < changed_files.txt

Registry Repository Structure

Full structure of the SmartTools-Registry repo:

SmartTools-Registry/
├── README.md                        # Registry overview
├── CONTRIBUTING.md                  # How to submit tools
├── LICENSE
│
├── tools/                           # All published tools
│   ├── rob/
│   │   ├── summarize/
│   │   │   ├── config.yaml
│   │   │   └── README.md
│   │   └── translate/
│   │       ├── config.yaml
│   │       └── README.md
│   └── alice/
│       └── code-review/
│           ├── config.yaml
│           └── README.md
│
├── categories/
│   └── categories.yaml              # Category definitions
│
├── index.json                       # Auto-generated search index
│
├── .gitea/
│   └── workflows/
│       ├── validate.yaml            # PR validation
│       ├── build-index.yaml         # Rebuild index on merge
│       └── notify-api.yaml          # Webhook to API server
│
└── scripts/
    ├── validate_tool.py             # Schema validation
    ├── build_index.py               # Generate index.json
    ├── check_duplicates.py          # Similarity detection
    └── security_scan.py             # Security checks

categories.yaml format:

categories:
  - name: text-processing
    description: Tools for manipulating and analyzing text
    icon: 📝
  - name: code
    description: Tools for code review, generation, and analysis
    icon: 💻
  - name: data
    description: Tools for data transformation and analysis
    icon: 📊
  - name: media
    description: Tools for image, audio, and video processing
    icon: 🎨
  - name: productivity
    description: General productivity and automation tools
    icon: 

Download Stats

Counting Methodology

  • Count installs only, not views or searches
  • Increment after successful download (response sent)
  • Dedupe by client_id + tool_id + date
def download_tool(owner, name, version, install=False, client_id=None):
    tool = get_tool(owner, name, version)
    if not tool:
        return {"error": "not_found"}, 404

    config_yaml = tool.config_yaml

    # Only count if this is an install (not just viewing)
    if install:
        record_download(tool.id, client_id)

    return {"config": config_yaml}, 200

def record_download(tool_id, client_id):
    today = date.today()

    # Use client_id if provided, otherwise generate anonymous fallback
    effective_client_id = client_id or f"anon_{hash(request.remote_addr)}"

    # Dedupe: only count once per client per tool per day
    try:
        db.download_stats.insert(
            tool_id=tool_id,
            client_id=effective_client_id,
            downloaded_at=today
        )
        # Increment counter (can be async/batch updated)
        db.execute("UPDATE tools SET downloads = downloads + 1 WHERE id = ?", [tool_id])
    except IntegrityError:
        pass  # Already counted today, ignore

Client ID Generation

CLI generates a persistent anonymous ID on first run:

# In CLI, on first run
import uuid
import os

CONFIG_PATH = os.path.expanduser("~/.smarttools/config.yaml")

def get_or_create_client_id():
    config = load_config()
    if 'client_id' not in config:
        config['client_id'] = f"anon_{uuid.uuid4().hex[:16]}"
        save_config(config)
    return config['client_id']

Fallback when client_id missing:

  • If header X-Client-ID not sent, use IP hash as fallback
  • This still provides some dedupe for anonymous users
  • Logged users' downloads are attributed to their account instead

Privacy Considerations

  • No IP addresses stored in database
  • client_id is client-controlled and can be regenerated
  • Stats are aggregated (total count), not individual tracking

Async Stats Strategy

To avoid DB contention on the hot download path:

from queue import Queue
from threading import Thread

# In-memory queue for stats
stats_queue = Queue()

def record_download_async(tool_id, client_id):
    """Non-blocking: enqueue for background processing"""
    stats_queue.put({
        'tool_id': tool_id,
        'client_id': client_id,
        'date': date.today()
    })

def stats_worker():
    """Background thread: batch process stats every 5 seconds"""
    batch = []
    while True:
        try:
            item = stats_queue.get(timeout=5)
            batch.append(item)
        except Empty:
            if batch:
                flush_batch(batch)
                batch = []

def flush_batch(batch):
    """Bulk insert with conflict ignore"""
    with db.transaction():
        for item in batch:
            try:
                db.execute("""
                    INSERT INTO download_stats (tool_id, client_id, downloaded_at)
                    VALUES (?, ?, ?)
                    ON CONFLICT DO NOTHING
                """, [item['tool_id'], item['client_id'], item['date']])
            except Exception as e:
                logger.warning(f"Stats insert failed: {e}")
                # Don't fail downloads for stats errors

Failure behavior: If stats DB write fails, log the error but don't fail the download. Stats are "best effort" - the download must succeed.

  • Primary search: SQLite FTS5 inside the API.
  • index.json provides offline CLI search and backup.
  • If FTS5 is stale, return results with X-Search-Index-Stale: true.

API Caching Strategy

Cache Headers

Endpoint Cache-Control ETag Notes
GET /index.json max-age=300, stale-while-revalidate=60 Yes 5 min cache, background refresh
GET /tools/{owner}/{name} max-age=60 Yes 1 min cache
GET /tools/{owner}/{name}/download max-age=3600, immutable Yes Immutable versions, 1 hour
GET /tools/search no-cache No Always fresh
GET /categories max-age=3600 Yes Categories change rarely

ETag Implementation

import hashlib
from datetime import datetime

def get_tool_etag(tool):
    """Generate ETag from tool identity (immutable versions don't change)"""
    # Since versions are immutable, owner/name@version is stable
    # Use published_at for extra safety (not updated_at, which doesn't exist)
    content = f"{tool.owner}/{tool.name}@{tool.version}:{tool.published_at.isoformat()}"
    return hashlib.md5(content.encode()).hexdigest()

def get_index_etag():
    """Generate ETag from last sync timestamp"""
    last_sync = db.get_last_sync_time()
    return hashlib.md5(last_sync.isoformat().encode()).hexdigest()

@app.route('/api/v1/tools/<owner>/<name>/download')
def download_tool(owner, name):
    version = request.args.get('version', 'latest')
    tool = resolve_and_get_tool(owner, name, version)
    etag = get_tool_etag(tool)

    # Check If-None-Match header
    if request.headers.get('If-None-Match') == etag:
        return '', 304  # Not Modified

    response = jsonify({
        "data": {
            "owner": tool.owner,
            "name": tool.name,
            "resolved_version": tool.version,
            "config": tool.config_yaml
        }
    })
    response.headers['ETag'] = etag
    response.headers['Cache-Control'] = 'max-age=3600, immutable'
    return response

Note: Since tool versions are immutable, the ETag based on owner/name@version is permanently stable. The published_at timestamp is included for defense-in-depth but won't change.

DB vs Repo Read Strategy

Scenario Read From Reason
Normal operation SQLite DB Fast, indexed, FTS
DB empty/corrupted Gitea repo Fallback/recovery
Webhook sync in progress DB (stale OK) Avoid blocking reads
Search query SQLite FTS5 Full-text search
Download specific version DB, fallback to repo DB is cache, repo is truth

Staleness Detection

STALE_THRESHOLD = timedelta(minutes=10)

def is_db_stale():
    last_sync = db.get_last_sync_time()
    return datetime.utcnow() - last_sync > STALE_THRESHOLD

@app.route('/tools/search')
def search_tools(q):
    results = db.search_fts(q)

    response = jsonify({"results": results})
    if is_db_stale():
        response.headers['X-Search-Index-Stale'] = 'true'
        response.headers['X-Last-Sync'] = db.get_last_sync_time().isoformat()

    return response

Error Model

Response Envelopes

Success response:

{
  "data": { ... },
  "meta": {
    "page": 1,
    "per_page": 20,
    "total": 42,
    "total_pages": 3
  }
}

Error response:

{
  "error": {
    "code": "TOOL_NOT_FOUND",
    "message": "Tool 'foo/bar' does not exist",
    "details": {
      "owner": "foo",
      "name": "bar",
      "suggestion": "Did you mean 'rob/bar'?"
    },
    "docs_url": "https://registry.smarttools.dev/docs/errors#TOOL_NOT_FOUND"
  }
}

Error Codes

Code HTTP Description
TOOL_NOT_FOUND 404 Tool does not exist
VERSION_NOT_FOUND 404 Requested version doesn't exist
VERSION_EXISTS 409 Cannot overwrite published version
INVALID_VERSION 400 Version string is not valid semver
INVALID_CONSTRAINT 400 Version constraint syntax error
CONSTRAINT_UNSATISFIABLE 404 No version matches constraint
VALIDATION_ERROR 400 Tool config validation failed
UNAUTHORIZED 401 Missing or invalid auth token
FORBIDDEN 403 Token valid but lacks permission
RATE_LIMITED 429 Too many requests
SLUG_TAKEN 409 Namespace slug already registered
ACCOUNT_LOCKED 403 Too many failed login attempts
SERVER_ERROR 500 Internal error (logged for debugging)

Error Scenarios and Fallbacks

CLI Error Handling

Scenario CLI Behavior User Message
Registry offline Use cached tools if available "Registry unavailable. Using cached version."
Tool not found Check cache, then fail "Tool 'foo/bar' not found in registry or cache."
Version constraint unsatisfiable Show available versions "No version matches '>=5.0.0'. Available: 1.0.0, 1.1.0, 1.2.0"
Auth token expired Prompt for new token "Token expired. Please re-authenticate."
Rate limited Wait and retry (backoff) "Rate limited. Retrying in 30 seconds..."
Network timeout Retry with backoff, then fail "Connection timed out. Check your network."

Validation Failure Details

When VALIDATION_ERROR occurs, provide specific field errors:

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Tool configuration is invalid",
    "details": {
      "errors": [
        {
          "path": "steps[0].provider",
          "message": "Provider 'gpt5' is not recognized",
          "allowed": ["claude", "openai", "ollama", "mock"]
        },
        {
          "path": "version",
          "message": "Version '1.0' is not valid semver (use '1.0.0')"
        }
      ]
    },
    "docs_url": "https://registry.smarttools.dev/docs/tool-format"
  }
}

Dependency Resolution Failures

When smarttools install fails on a manifest:

$ smarttools install

Error: Could not resolve all dependencies

  rob/summarize@^2.0.0
    ✗ No matching version (latest: 1.2.0)

  alice/translate@>=1.0.0
    ✓ Found 1.3.0

Suggestions:
  - Update rob/summarize constraint to "^1.0.0"
  - Contact the tool author for a v2 release

Graceful Degradation

Component Down Fallback Behavior
API server CLI uses ~/.smarttools/registry/index.json for search
Gitea repo API serves from DB cache (may be stale)
FTS5 index Fall back to LIKE queries (slower but works)
Network Use locally installed tools, skip registry features

UX Requirements (CLI/TUI)

Publishing UX

  • smarttools registry publish --dry-run validates locally and shows what would be submitted:

    $ smarttools registry publish --dry-run
    
    Validating tool...
    ✓ config.yaml is valid
    ✓ README.md exists (2.3 KB)
    ✓ Version 1.1.0 not yet published
    
    Would submit:
      Owner: rob
      Name: summarize
      Version: 1.1.0
      Category: text-processing
      Tags: summarization, ai, text
    
    Config preview:
    ─────────────────────────────
    name: summarize
    version: "1.1.0"
    description: Summarize text using AI
    ...
    ─────────────────────────────
    
    Run without --dry-run to submit for review.
    
  • Version bump reminder: CLI warns if version hasn't changed from published:

    ⚠ Version 1.0.0 is already published. Bump version in config.yaml to publish changes.
    
  • First-time publishing flow prompts for token and saves it to config.

Progress Indicators

Long-running operations show progress:

$ smarttools install

Installing project dependencies...
  [1/3] rob/summarize@^1.0.0
        Resolving version... 1.2.0
        Downloading... done
        Installing... done[2/3] alice/translate@>=2.0.0
        Resolving version... 2.1.0
        Downloading... done
        Installing... done[3/3] official/code-review@*
        Resolving version... 1.0.0
        Downloading... done
        Installing... done ✓

✓ Installed 3 tools
$ smarttools registry publish

Submitting rob/summarize@1.1.0...
  Validating... done ✓
  Uploading... done ✓
  Creating PR... done ✓

✓ PR created: https://gitea.brrd.tech/rob/SmartTools-Registry/pulls/42

Your tool is pending review. You'll receive an email when it's approved.

TUI Browse

smarttools registry browse opens a full-screen terminal UI:

┌─ SmartTools Registry ───────────────────────────────────────┐
│ Search: [________________] [All Categories ▼] [Sort: Popular ▼] │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ▶ rob/summarize v1.2.0                          ⬇ 142     │
│    Summarize text using AI                                  │
│    [text-processing] [ai] [summarization]                   │
│                                                             │
│    alice/translate v2.1.0                        ⬇ 98      │
│    Translate text between languages                         │
│    [text-processing] [translation]                          │
│                                                             │
│    official/code-review v1.0.0                   ⬇ 87      │
│    AI-powered code review                                   │
│    [code] [review] [ai]                                     │
│                                                             │
├─────────────────────────────────────────────────────────────┤
│ ↑↓ Navigate  Enter: Details  i: Install  /: Search  q: Quit │
└─────────────────────────────────────────────────────────────┘

Keyboard shortcuts:

Key Action
↑/↓ or j/k Navigate list
Enter View tool details
i Install selected tool
/ Focus search box
c Change category filter
s Change sort order
? Show help
q Quit

Virtual scrolling: For large tool lists (>100), use virtual scrolling to maintain performance.

Project Initialization

$ smarttools init

Creating smarttools.yaml...

Project name [my-project]: my-ai-project
Version [1.0.0]:

Would you like to add any tools? (search with 's', skip with Enter)
> s
Search: summ
  1. rob/summarize v1.2.0 - Summarize text using AI
  2. alice/summary v1.0.0 - Generate summaries

Add tool (number, or Enter to finish): 1
Added rob/summarize@^1.2.0

Add tool (number, or Enter to finish):

✓ Created smarttools.yaml

name: my-ai-project
version: "1.0.0"
dependencies:
  - name: rob/summarize
    version: "^1.2.0"

Run 'smarttools install' to install dependencies.

Accessibility

  • CLI: All output works with screen readers, no color-only information
  • TUI: Full keyboard navigation, high-contrast mode support
  • Web UI: WCAG 2.1 AA compliance target
    • Semantic HTML
    • ARIA labels for interactive elements
    • Focus management in modals
    • Skip links for navigation

Offline Cache

Cache registry index locally:

~/.smarttools/registry/index.json

Refresh when older than 24 hours; support --offline and --refresh flags.

Index Integrity

The cached index.json includes integrity metadata:

{
  "version": "1.0",
  "generated_at": "2025-01-20T12:00:00Z",
  "checksum": "sha256:abc123...",
  "tool_count": 142,
  "tools": [...]
}

API response headers:

ETag: "abc123def456"
X-Index-Checksum: sha256:abc123...
X-Index-Generated: 2025-01-20T12:00:00Z

CLI verification:

def verify_cached_index():
    """Verify cached index integrity on load"""
    cached = load_cached_index()
    if not cached:
        return None

    # Verify checksum
    content = json.dumps(cached['tools'], sort_keys=True)
    computed = hashlib.sha256(content.encode()).hexdigest()

    if computed != cached.get('checksum', '').replace('sha256:', ''):
        logger.warning("Cached index checksum mismatch, will refresh")
        return None

    return cached

Corruption handling:

  • If checksum fails, discard cache and fetch fresh
  • If partial write detected (missing fields), discard and refresh
  • CLI shows warning: "Cached index corrupted, fetching fresh copy..."

Web UI Vision

The registry includes a full website, not just an API:

Site structure:

registry.smarttools.dev (or gitea.brrd.tech/registry)
├── /                           # Landing page
├── /tools                      # Browse all tools
├── /tools/{owner}/{name}       # Tool detail page
├── /categories                 # Browse by category
├── /categories/{name}          # Tools in category
├── /search?q=...               # Search results
├── /docs                       # Documentation
│   ├── /docs/getting-started
│   ├── /docs/creating-tools
│   ├── /docs/publishing
│   └── /docs/best-practices
├── /tutorials                  # Step-by-step guides
│   ├── /tutorials/first-tool
│   ├── /tutorials/chaining-steps
│   └── /tutorials/code-steps
├── /examples                   # Example projects
├── /blog                       # Updates, announcements (optional)
├── /register                   # Publisher registration
├── /login                      # Publisher login
├── /dashboard                  # Publisher dashboard
│   ├── /dashboard/tools        # My published tools
│   ├── /dashboard/tokens       # API tokens
│   └── /dashboard/settings     # Account settings
└── /api/v1/...                 # API endpoints

Landing page content:

  • Hero: "Share and discover AI-powered CLI tools"
  • Quick install example
  • Featured/popular tools
  • Category highlights
  • "Get Started" CTA

Tool detail page:

  • Name, description, version, author
  • README rendered as markdown (sanitized)
  • Install command (copy-to-clipboard)
  • Version history
  • Download stats
  • Category/tags
  • "Report" button for abuse

README Security

When rendering README markdown, apply XSS sanitization:

import bleach
from markdown import markdown

ALLOWED_TAGS = [
    'h1', 'h2', 'h3', 'h4', 'h5', 'h6',
    'p', 'br', 'hr',
    'ul', 'ol', 'li',
    'strong', 'em', 'code', 'pre',
    'blockquote',
    'a', 'img',
    'table', 'thead', 'tbody', 'tr', 'th', 'td'
]

ALLOWED_ATTRS = {
    'a': ['href', 'title'],
    'img': ['src', 'alt', 'title'],
    'code': ['class'],  # for syntax highlighting
}

def render_readme_safe(readme_raw: str) -> str:
    """Convert markdown to sanitized HTML"""
    # Convert markdown to HTML
    html = markdown(readme_raw, extensions=['fenced_code', 'tables'])

    # Sanitize to prevent XSS
    safe_html = bleach.clean(
        html,
        tags=ALLOWED_TAGS,
        attributes=ALLOWED_ATTRS,
        strip=True
    )

    # Linkify URLs
    safe_html = bleach.linkify(safe_html)

    return safe_html

Storage strategy:

  • Store raw README in tools.readme
  • Render and sanitize on request (or cache rendered HTML)
  • Never trust client-submitted HTML directly

Tech stack options:

Option Pros Cons
Flask + Jinja + Tailwind Simple, Python-only, fast to build Less interactive
FastAPI + Vue/React SPA Modern, interactive More complex, separate build
Astro/Next.js Great SEO, static-first Different stack (Node.js)

Recommendation: Flask + Jinja + Tailwind for v1

  • Keeps everything in Python
  • Server-rendered is fine for a registry
  • Good SEO out of the box
  • Can add interactivity with Alpine.js or htmx if needed

Monetization considerations:

  • AdSense-compatible (server-rendered pages)
  • Analytics tracking for traffic insights
  • Future: sponsored tools, featured placements
  • Future: premium publisher tiers (more tools, priority review)

Implementation Phases

Phase 1: Foundation

  • Define smarttools.yaml manifest format
  • Implement tool resolution order (local → global → registry)
  • Create SmartTools-Registry repo on Gitea (bootstrap)
  • Add 3-5 example tools to seed the registry

Phase 2: Core Backend

  • Set up Flask/FastAPI project structure
  • Implement SQLite database schema
  • Build core API endpoints (list, search, get, download)
  • Implement webhook receiver for Gitea sync
  • Set up HMAC verification

Phase 3: CLI Commands

  • smarttools registry search
  • smarttools registry install
  • smarttools registry info
  • smarttools registry browse (TUI)
  • Local index caching

Phase 4: Publishing

  • Publisher registration (web UI)
  • Token management
  • smarttools registry publish command
  • PR creation via Gitea API
  • CI validation workflows

Phase 5: Project Dependencies

  • smarttools install (from manifest)
  • smarttools add command
  • Runtime override application
  • Dependency resolution

Phase 6: Smart Features

  • SQLite FTS5 search index
  • AI-powered auto-categorization
  • Duplicate/similarity detection
  • Security scanning

Phase 7: Full Web UI

  • Landing page
  • Tool browsing/search pages
  • Tool detail pages with README rendering
  • Publisher dashboard
  • Documentation/tutorials section

Phase 8: Polish & Scale

  • Rate limiting
  • Abuse reporting
  • Analytics integration
  • Performance optimization
  • Monitoring/alerting