smarttools/docs/REGISTRY.md

1897 lines
58 KiB
Markdown

# SmartTools Registry Design
## Purpose
Build a centralized registry for SmartTools to enable discovery, publishing, dependency management, and future curation at scale.
## Terminology
| Term | Definition |
|------|------------|
| **Tool definition** | The full YAML file in the registry (`config.yaml`) containing name, steps, arguments, etc. |
| **Tool config** | The configuration within a tool definition (arguments, steps, provider settings) |
| **smarttools.yaml** | Project manifest file declaring tool dependencies and overrides |
| **config.yaml** | The tool definition file, both in registry and when installed locally |
| **Owner** | Immutable namespace slug identifying the publisher (e.g., `rob`, `alice`) |
| **Publisher** | A registered user who can publish tools to the registry |
| **Wrapper script** | Auto-generated bash script in `~/.local/bin/` that invokes a tool |
**Canonical naming:** Use `SmartTools-Registry` (capitalized, hyphenated) for the repository name.
## Diagram References
- System overview: `discussions/diagrams/smarttools-registry_rob_1.puml`
- Data flows: `discussions/diagrams/smarttools-registry_rob_5.puml`
## System Overview
Users interact via the CLI and a future Web UI. Both call a Registry API hosted at `https://gitea.brrd.tech/api/v1` (future alias: `registry.smarttools.dev/api/v1`). The API syncs from a Gitea-backed registry repo and maintains a SQLite cache/search index.
**Canonical API base path:** `https://gitea.brrd.tech/api/v1`
All API endpoints are versioned under `/api/v1`. When breaking changes are needed, a new version (`/api/v2`) will be introduced with deprecation notices.
Core API endpoints:
- `GET /api/v1/tools`
- `GET /api/v1/tools/search?q=...`
- `GET /api/v1/tools/{owner}/{name}`
- `GET /api/v1/tools/{owner}/{name}/versions`
- `GET /api/v1/tools/{owner}/{name}/download?version=...`
- `POST /api/v1/tools` (publish)
- `GET /api/v1/categories`
- `GET /api/v1/stats/popular`
- `POST /api/v1/webhook/gitea`
### Pagination
All list endpoints support pagination:
| Parameter | Default | Max | Description |
|-----------|---------|-----|-------------|
| `page` | 1 | - | Page number (1-indexed) |
| `per_page` | 20 | 100 | Items per page |
| `sort` | `downloads` | - | Sort field |
| `order` | `desc` | - | Sort order (asc/desc) |
**Stable ordering:** To ensure deterministic results across pages, sorting includes a secondary key:
- Primary: requested field (e.g., `downloads`)
- Secondary: `published_at` (desc)
- Tertiary: `id` (for absolute stability)
```sql
ORDER BY downloads DESC, published_at DESC, id DESC
LIMIT 20 OFFSET 0
```
**Response pagination metadata:**
```json
{
"data": [...],
"meta": {
"page": 1,
"per_page": 20,
"total": 142,
"total_pages": 8
}
}
```
### Input Constraints
Size limits to prevent oversized uploads:
| Field | Max Size | Notes |
|-------|----------|-------|
| `config.yaml` | 64 KB | Tool definition |
| `README.md` | 256 KB | Documentation |
| Request body | 512 KB | Total POST payload |
| Tool name | 64 chars | Alphanumeric + hyphen |
| Description | 500 chars | Short summary |
| Tag | 32 chars | Individual tag |
| Tags array | 10 items | Maximum tags per tool |
**Validation errors:**
```json
{
"error": {
"code": "PAYLOAD_TOO_LARGE",
"message": "config.yaml exceeds 64KB limit",
"details": {
"field": "config",
"size": 72000,
"limit": 65536
}
}
}
```
### Sort Fields and Indexes
**Allowed sort fields:**
| Endpoint | Allowed `sort` values |
|----------|----------------------|
| `GET /tools` | `downloads`, `published_at`, `name` |
| `GET /tools/search` | `relevance`, `downloads`, `published_at` |
| `GET /categories` | `name`, `tool_count` |
Invalid sort values return 400:
```json
{"error": {"code": "INVALID_SORT", "message": "Unknown sort field 'foo'. Allowed: downloads, published_at, name"}}
```
**Database indexes:**
```sql
-- Frequent query patterns
CREATE INDEX idx_tools_owner_name ON tools(owner, name);
CREATE INDEX idx_tools_category ON tools(category);
CREATE INDEX idx_tools_published_at ON tools(published_at DESC);
CREATE INDEX idx_tools_downloads ON tools(downloads DESC);
CREATE INDEX idx_tools_owner_name_version ON tools(owner, name, version);
-- For pagination stability
CREATE INDEX idx_tools_sort_stable ON tools(downloads DESC, published_at DESC, id DESC);
-- Publisher lookups
CREATE INDEX idx_publishers_slug ON publishers(slug);
CREATE INDEX idx_publishers_email ON publishers(email);
-- Token lookups
CREATE INDEX idx_api_tokens_hash ON api_tokens(token_hash);
CREATE INDEX idx_api_tokens_publisher ON api_tokens(publisher_id);
```
### API Version Compatibility
**Forward compatibility:** Clients should ignore unknown fields in API responses:
```python
# Good: ignore unknown fields
tool = response['data']
name = tool.get('name')
# Don't fail if 'new_field' exists but client doesn't know about it
# Bad: strict parsing that fails on unknown fields
tool = ToolSchema.parse(response['data']) # May fail on new fields
```
**Backward compatibility:** The API will:
- Never remove fields in a version (only deprecate)
- Never change field types
- Add new optional fields without version bump
- Use new version (`/api/v2`) for breaking changes
**Deprecation process:**
1. Add `X-Deprecated-Field: old_field` header
2. Document in changelog
3. Remove after 6 months minimum
4. Major version bump if widely used
**Client version header:**
```
X-SmartTools-Client: cli/1.2.0
```
Helps server track client versions for deprecation decisions.
## Source of Truth
- Gitea registry repo is the source of truth.
- API syncs repo content into SQLite for fast queries, stats, and FTS5 search.
- `index.json` remains useful for offline CLI search and as a fallback.
If the cache is stale, the API can fall back to repo reads; a warning header may be emitted.
## Namespacing and Paths
Support owner/name from day one:
- Registry path: `tools/{owner}/{name}/config.yaml`
- API URL: `/tools/{owner}/{name}`
- Install: `smarttools registry install rob/summarize`
- Shorthand: `smarttools registry install summarize` resolves to the official namespace.
PR branches: `submit/{owner}/{name}/{version}`.
### Namespace Identity
The `owner` is an **immutable slug**, not the display name:
```sql
-- In publishers table
slug TEXT UNIQUE NOT NULL, -- immutable: "rob", "alice-dev"
display_name TEXT NOT NULL, -- mutable: "Rob", "Alice Developer"
```
**Slug rules:**
- Lowercase alphanumeric + hyphens only: `^[a-z0-9][a-z0-9-]*[a-z0-9]$`
- 2-39 characters
- Cannot start/end with hyphen
- Set once at registration, cannot be changed
- Reserved slugs: `official`, `admin`, `system`, `api`, `registry`
**Rename policy:**
- `display_name` can be changed anytime via dashboard
- `slug` (owner) is permanent to preserve URLs and tool references
- If a publisher absolutely must change slug (legal reasons, etc.):
1. Create new account with new slug
2. Republish tools under new namespace
3. Mark old tools as deprecated with `replacement` pointing to new namespace
4. Old namespace remains reserved (cannot be reused by others)
**Why immutable:**
- `rob/summarize@1.0.0` must always resolve to the same tool
- Prevents namespace hijacking after rename
- Simplifies caching and CDN strategies
## Tool Format (Registry == Local)
Registry tool folders mirror local tools:
```
tools/
rob/
summarize/
config.yaml
README.md
```
Tool files match the existing SmartTools format. Registry-specific metadata is kept under `registry:`. Deprecation is tool-defined and top-level:
```yaml
name: summarize
version: "1.2.0"
deprecated: true
deprecated_message: "Security issue. Use v1.2.1"
replacement: "rob/summarize@1.2.1"
registry:
published_at: "2025-01-15T10:30:00Z"
downloads: 142
```
**Schema compatibility note:** The current SmartTools config parser may reject unknown top-level keys like `deprecated`, `replacement`, and `registry`. Before implementing registry features:
1. Update the YAML parser to ignore unknown keys (permissive mode)
2. Or explicitly define these fields in the Tool dataclass with defaults
3. Validate registry-specific fields only when publishing, not when running locally
This ensures local tools continue to work even if they don't have registry fields.
## Versioning and Immutability
- Unique key: `owner/name + version`.
- Published versions are immutable.
- Deprecation uses `deprecated`, `deprecated_message`, and `replacement`.
- CLI warns on install if a version is deprecated.
### Yank Policy
Yanking allows removing a version from resolution without deleting it (for auditability):
```yaml
# In tool config
yanked: true
yanked_reason: "Critical security vulnerability CVE-2025-1234"
yanked_at: "2025-01-20T15:00:00Z"
```
**Yanked version behavior:**
| Operation | Behavior |
|-----------|----------|
| `install foo@1.0.0` (exact) | Warns but allows install |
| `install foo@^1.0.0` (constraint) | Excludes yanked, resolves to next valid |
| `search` / `browse` | Hidden by default, shown with `--include-yanked` |
| Direct URL access | Returns tool with `yanked: true` in response |
| Already installed | Continues to work, no forced removal |
**Database schema addition:**
```sql
-- Add to tools table
yanked BOOLEAN DEFAULT FALSE,
yanked_reason TEXT,
yanked_at TIMESTAMP
```
**Yank vs Delete:**
- **Yank**: Version remains in DB, excluded from resolution, auditable
- **Delete**: Reserved for DMCA/legal, requires admin action, leaves tombstone record
### Version Format
Tools use semantic versioning (semver):
```
MAJOR.MINOR.PATCH[-PRERELEASE][+BUILD]
Examples:
1.0.0 # stable release
1.2.3 # stable release
2.0.0-alpha.1 # prerelease
2.0.0-beta.2 # prerelease
2.0.0-rc.1 # release candidate
```
### Version Constraints
Manifest files support these constraint formats:
| Constraint | Meaning | Example Match |
|------------|---------|---------------|
| `1.2.3` | Exact version | `1.2.3` only |
| `>=1.2.0` | Minimum version | `1.2.0`, `1.3.0`, `2.0.0` |
| `<2.0.0` | Below version | `1.9.9`, `1.0.0` |
| `>=1.0.0,<2.0.0` | Range | `1.0.0` to `1.9.9` |
| `^1.2.3` | Compatible (same major) | `1.2.3` to `1.9.9` |
| `~1.2.3` | Approximately (same minor) | `1.2.3` to `1.2.9` |
| `*` | Any version | latest stable |
### Version Resolution Rules
When resolving a version constraint:
1. **Filter**: Get all versions matching the constraint
2. **Exclude prereleases**: Unless constraint explicitly includes them (e.g., `>=2.0.0-alpha.1`)
3. **Sort**: By semver precedence (descending)
4. **Select**: Highest matching version
**Tie-breakers:**
- Stable versions preferred over prereleases
- Later publish date wins if versions are equal (shouldn't happen with immutability)
**Unsatisfiable constraints:**
```json
// API Response: 404
{
"error": {
"code": "VERSION_NOT_FOUND",
"message": "No version of 'rob/summarize' satisfies constraint '>=5.0.0'",
"details": {
"tool": "rob/summarize",
"constraint": ">=5.0.0",
"available_versions": ["1.0.0", "1.1.0", "1.2.0"],
"latest_stable": "1.2.0"
}
}
}
```
### Prerelease Handling
- Prereleases are **not** returned for `*` or range constraints by default
- To install prerelease: `smarttools registry install rob/summarize@2.0.0-beta.1`
- To allow prereleases in manifest: `version: ">=2.0.0-0"` (the `-0` suffix includes prereleases)
### Download Endpoint Version Selection
The `/api/v1/tools/{owner}/{name}/download` endpoint accepts version parameters:
| Parameter | Behavior | Example |
|-----------|----------|---------|
| (none) | Returns latest stable version | `/download``1.2.0` |
| `version=1.2.0` | Exact version (must exist) | `/download?version=1.2.0` |
| `version=^1.0.0` | Server resolves constraint | `/download?version=^1.0.0``1.2.0` |
| `version=latest` | Alias for latest stable | `/download?version=latest` |
**Server-side resolution:** The API server resolves version constraints, not the client. This ensures consistent resolution and allows the server to apply policies (e.g., exclude yanked versions).
```
GET /api/v1/tools/rob/summarize/download?version=^1.0.0&install=true
Response (200):
{
"data": {
"owner": "rob",
"name": "summarize",
"resolved_version": "1.2.0",
"config": "... YAML content ..."
},
"meta": {
"constraint": "^1.0.0",
"available_versions": ["1.0.0", "1.1.0", "1.2.0"]
}
}
```
**Invalid/unsatisfiable constraint:**
```
GET /api/v1/tools/rob/summarize/download?version=^5.0.0
Response (404):
{
"error": {
"code": "CONSTRAINT_UNSATISFIABLE",
"message": "No version matches constraint '^5.0.0'",
"details": {
"constraint": "^5.0.0",
"latest_stable": "1.2.0",
"available_versions": ["1.0.0", "1.1.0", "1.2.0"]
}
}
}
```
## Tool Resolution Order
When a tool is invoked, the CLI searches in this order:
1. **Local project**: `./.smarttools/<owner>/<name>/config.yaml` (or `./.smarttools/<name>/` for unnamespaced)
2. **Global user**: `~/.smarttools/<owner>/<name>/config.yaml`
3. **Registry**: Fetch from API, install to global, then run
4. **Error**: `Tool '<toolname>' not found`
Step 3 only occurs if `auto_fetch_from_registry: true` in config (default: true).
**Path convention:** Use `.smarttools/` (with leading dot) for both local and global to maintain consistency.
Resolution also respects namespacing:
- `summarize` → searches for any tool named `summarize`, prefers `official/summarize` if exists
- `rob/summarize` → searches for exactly `rob/summarize`
### Official Namespace
The slug `official` is reserved for curated, high-quality tools maintained by the registry administrators.
- Shorthand `summarize` resolves to `official/summarize` if it exists
- If no `official/summarize`, falls back to most-downloaded tool named `summarize`
- To avoid ambiguity, always use full `owner/name` in manifests
Reserved slugs that cannot be registered: `official`, `admin`, `system`, `api`, `registry`, `smarttools`
## Auto-Fetch Behavior
When enabled (`auto_fetch_from_registry: true`), missing tools are automatically fetched:
```bash
$ summarize < file.txt
# Tool 'summarize' not found locally.
# Fetching from registry...
# Installed: official/summarize@1.2.0
# Running...
```
Behavior details:
- Fetches latest stable version unless pinned in `smarttools.yaml`
- Installs to `~/.smarttools/<owner>/<name>/`
- Generates wrapper script in `~/.local/bin/`
- Subsequent runs use local copy (no re-fetch)
To disable (require explicit install):
```yaml
# ~/.smarttools/config.yaml
auto_fetch_from_registry: false
```
### Wrapper Script Collisions
When two tools from different owners have the same name:
| Scenario | Behavior |
|----------|----------|
| Install `official/summarize` | Creates wrapper `~/.local/bin/summarize` |
| Install `rob/summarize` (collision) | Creates wrapper `~/.local/bin/rob-summarize` |
| Uninstall `official/summarize` | Removes `summarize` wrapper, promotes `rob-summarize``summarize` if desired |
The first-installed tool with a given name gets the short wrapper. Subsequent tools use `owner-name` format.
To invoke a specific owner's tool:
```bash
# Short form (whichever was installed first)
summarize < file.txt
# Explicit owner form (always works)
rob-summarize < file.txt
# Or via smarttools run
smarttools run rob/summarize < file.txt
```
## Project Manifest (smarttools.yaml)
Defines tool dependencies with optional runtime overrides:
```
name: my-ai-project
version: "1.0.0"
dependencies:
- name: rob/summarize
version: ">=1.0.0"
overrides:
rob/summarize:
provider: ollama
```
Overrides are applied at runtime and do not mutate installed tool configs.
## CLI Config and Tokens
Global config lives in `~/.smarttools/config.yaml`:
```yaml
registry:
url: https://gitea.brrd.tech/api/v1 # Must match canonical base path
token: "reg_xxxxxxxxxxxx"
client_id: "anon_abc123def456"
auto_fetch_from_registry: true
```
`client_id` is generated locally and used for anonymous install dedupe.
## Publishing and Auth
Publishing uses registry accounts, not Gitea accounts:
- Public endpoints require no auth.
- `POST /tools` requires a registry token.
- The API server uses a private Gitea service account to open PRs.
### Publish Idempotency and Edge Cases
**Idempotency key:** `owner/name@version`
| Scenario | API Response | HTTP Code |
|----------|--------------|-----------|
| New version, no PR exists | Create PR, return URL | `201 Created` |
| PR already exists (pending) | Return existing PR URL | `200 OK` |
| Version already published | Error: version exists | `409 Conflict` |
| PR was closed without merge | Allow new PR | `201 Created` |
| PR was merged, then tool deleted | Error: version exists (tombstone) | `409 Conflict` |
**Version immutability enforcement:**
```json
// Attempt to publish existing version
// Response: 409 Conflict
{
"error": {
"code": "VERSION_EXISTS",
"message": "Version 1.2.0 of 'rob/summarize' already exists and cannot be overwritten",
"details": {
"published_at": "2025-01-15T10:30:00Z",
"action": "Bump version number to publish changes"
}
}
}
```
**Closed PR handling:**
- Track PR state in database: `pending`, `merged`, `closed`
- If PR was closed (rejected/abandoned), allow new submission for same version
- If PR was merged, version is immutable forever
**Update flow (new version, not overwrite):**
1. Developer modifies tool locally
2. Bumps version in `config.yaml` (e.g., `1.2.0``1.3.0`)
3. Runs `smarttools registry publish`
4. New PR created for `1.3.0`
5. Old version `1.2.0` remains available
## Publisher Registration
Publishers register on the registry website, not Gitea:
**Registration flow:**
1. User visits `https://gitea.brrd.tech/registry/register` (or future `registry.smarttools.dev`)
2. Creates account with email + password + slug
3. Receives verification email (optional in v1, but track `verified` status)
4. Logs into dashboard at `/dashboard`
5. Generates API token from dashboard
6. Uses token in CLI for publishing
### Authentication Security
**Password hashing:**
- Algorithm: Argon2id (memory-hard, recommended by OWASP)
- Parameters: `memory=65536, iterations=3, parallelism=4`
- Library: `argon2-cffi` for Python
```python
from argon2 import PasswordHasher
ph = PasswordHasher(memory_cost=65536, time_cost=3, parallelism=4)
hash = ph.hash(password)
ph.verify(hash, password) # raises on mismatch
```
**API token format:**
```
reg_<random-32-bytes-base62>
Example: reg_7kX9mPqR2sT4vW6xY8zA1bC3dE5fG7hJ
```
- Prefix `reg_` for easy identification in logs/configs
- 32 bytes of cryptographically random data
- Base62 encoded (alphanumeric, no special chars)
- Total length: ~47 characters
- Stored as SHA-256 hash in database (never plain text)
**Token lifecycle:**
| Action | Behavior |
|--------|----------|
| Generate | Create new token, return once, store hash |
| List | Show token name, created date, last used (not the token itself) |
| Revoke | Set `revoked_at` timestamp, reject future uses |
| Rotate | Generate new token, optionally revoke old |
**Rate limits:**
| Endpoint | Limit | Window | Scope | Retry-After |
|----------|-------|--------|-------|-------------|
| `POST /register` | 5 | 1 hour | IP | 3600 |
| `POST /login` | 10 | 15 min | IP | 900 |
| `POST /login` (failed) | 5 | 15 min | IP + email | 900 |
| `POST /tokens` | 10 | 1 hour | Token | 3600 |
| `POST /tools` | 20 | 1 hour | Token | 3600 |
| `GET /tools/*` | 100 | 1 min | IP | 60 |
| `GET /download` | 60 | 1 min | IP | 60 |
**Rate limit response (429):**
```json
{
"error": {
"code": "RATE_LIMITED",
"message": "Too many requests. Try again in 60 seconds.",
"details": {
"limit": 100,
"window": "1 minute",
"retry_after": 60
}
}
}
```
**Headers on rate-limited response:**
```
HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1705766400
```
**Scope priority:** For authenticated requests, both IP and token limits apply. The more restrictive limit wins.
**Account lockout:**
- After 5 failed login attempts: 15-minute lockout for that email
- After 10 failed attempts: 1-hour lockout
- Lockout clears on successful password reset
**Password reset flow (deferred to v1.1):**
1. User requests reset via email
2. Server generates time-limited token (1 hour expiry)
3. Email contains reset link with token
4. User sets new password
5. All existing sessions/tokens optionally invalidated
**Email verification flow (deferred to v1.1):**
1. On registration, send verification email
2. User clicks link with verification token
3. Set `verified = true` in database
4. Unverified accounts can browse but not publish
### Token Scopes and Authorization
Tokens have scopes that limit their capabilities:
| Scope | Permissions |
|-------|-------------|
| `read` | View own published tools, download stats |
| `publish` | Submit new tools, update own tool metadata |
| `admin` | Yank tools, manage categories (registry admins only) |
**Default scope:** New tokens get `read,publish` by default.
**Ownership enforcement:**
```python
@app.route('/api/v1/tools', methods=['POST'])
@require_token(scopes=['publish'])
def publish_tool():
token = get_current_token()
tool_data = request.json
# Enforce owner == token holder's slug
if tool_data['owner'] != token.publisher.slug:
return {
"error": {
"code": "FORBIDDEN",
"message": f"Cannot publish to namespace '{tool_data['owner']}'. "
f"Your namespace is '{token.publisher.slug}'."
}
}, 403
# Proceed with publish...
```
**`GET /api/v1/me/tools` authorization:**
- Requires valid token with `read` scope
- Returns only tools where `owner == token.publisher.slug`
- Includes pending PRs and all versions (including yanked)
### Web Session Security
Dashboard login uses session cookies (not tokens) for browser auth:
**Cookie settings:**
```python
SESSION_COOKIE_NAME = 'smarttools_session'
SESSION_COOKIE_HTTPONLY = True # Prevent JS access
SESSION_COOKIE_SECURE = True # HTTPS only in production
SESSION_COOKIE_SAMESITE = 'Lax' # CSRF protection
SESSION_COOKIE_MAX_AGE = 86400 * 7 # 7 days
```
**CSRF protection:**
- All POST/PUT/DELETE forms include `csrf_token` hidden field
- Token validated server-side before processing
- 403 Forbidden if token missing or invalid
**Session lifecycle:**
| Event | Action |
|-------|--------|
| Login | Create session, set cookie |
| Logout | Delete session, clear cookie |
| Idle 24h | Session expires, re-login required |
| Password change | Invalidate all sessions |
| Token revocation | Existing sessions continue (token != session) |
**Secure session storage:**
```python
# Store sessions in DB, not filesystem
from flask_session import Session
app.config['SESSION_TYPE'] = 'sqlalchemy'
app.config['SESSION_SQLALCHEMY_TABLE'] = 'sessions'
```
**Database schema:**
```sql
-- Publishers
CREATE TABLE publishers (
id INTEGER PRIMARY KEY AUTOINCREMENT,
email TEXT UNIQUE NOT NULL,
password_hash TEXT NOT NULL,
slug TEXT UNIQUE NOT NULL, -- immutable namespace: "rob", "alice-dev"
display_name TEXT NOT NULL, -- mutable: "Rob", "Alice Developer"
bio TEXT,
website TEXT,
verified BOOLEAN DEFAULT FALSE,
locked_until TIMESTAMP, -- account lockout
failed_login_attempts INTEGER DEFAULT 0,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- API tokens (one publisher can have multiple)
CREATE TABLE api_tokens (
id INTEGER PRIMARY KEY AUTOINCREMENT,
publisher_id INTEGER NOT NULL REFERENCES publishers(id),
token_hash TEXT NOT NULL,
name TEXT NOT NULL, -- "CLI token", "CI token"
last_used_at TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
revoked_at TIMESTAMP -- NULL if active
);
-- Tools (links to publisher)
CREATE TABLE tools (
id INTEGER PRIMARY KEY AUTOINCREMENT,
owner TEXT NOT NULL, -- namespace slug (immutable, from publisher.slug)
name TEXT NOT NULL,
version TEXT NOT NULL,
description TEXT,
category TEXT,
tags TEXT, -- JSON array
config_yaml TEXT NOT NULL, -- Full tool config
readme TEXT,
publisher_id INTEGER NOT NULL REFERENCES publishers(id),
deprecated BOOLEAN DEFAULT FALSE,
deprecated_message TEXT,
replacement TEXT,
downloads INTEGER DEFAULT 0,
published_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(owner, name, version)
);
-- Download stats (for deduplication)
CREATE TABLE download_stats (
id INTEGER PRIMARY KEY AUTOINCREMENT,
tool_id INTEGER NOT NULL REFERENCES tools(id),
client_id TEXT NOT NULL,
downloaded_at DATE NOT NULL,
UNIQUE(tool_id, client_id, downloaded_at)
);
-- Search index (FTS5)
CREATE VIRTUAL TABLE tools_fts USING fts5(
name, description, tags, readme,
content='tools',
content_rowid='id'
);
-- FTS5 sync triggers (required for external content tables)
CREATE TRIGGER tools_ai AFTER INSERT ON tools BEGIN
INSERT INTO tools_fts(rowid, name, description, tags, readme)
VALUES (new.id, new.name, new.description, new.tags, new.readme);
END;
CREATE TRIGGER tools_ad AFTER DELETE ON tools BEGIN
INSERT INTO tools_fts(tools_fts, rowid, name, description, tags, readme)
VALUES ('delete', old.id, old.name, old.description, old.tags, old.readme);
END;
CREATE TRIGGER tools_au AFTER UPDATE ON tools BEGIN
INSERT INTO tools_fts(tools_fts, rowid, name, description, tags, readme)
VALUES ('delete', old.id, old.name, old.description, old.tags, old.readme);
INSERT INTO tools_fts(rowid, name, description, tags, readme)
VALUES (new.id, new.name, new.description, new.tags, new.readme);
END;
-- Pending PRs (track publish state)
CREATE TABLE pending_prs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
publisher_id INTEGER NOT NULL REFERENCES publishers(id),
owner TEXT NOT NULL,
name TEXT NOT NULL,
version TEXT NOT NULL,
pr_number INTEGER NOT NULL,
pr_url TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'pending', -- pending, merged, closed
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(owner, name, version)
);
-- Webhook sync log (idempotency)
CREATE TABLE webhook_log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
delivery_id TEXT UNIQUE NOT NULL, -- Gitea delivery ID
event_type TEXT NOT NULL,
processed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```
**Note on tags indexing:** The `tags` column stores JSON arrays as text. For v1, FTS5 will search within the JSON string. If tag filtering becomes a bottleneck, normalize to a `tool_tags` junction table:
```sql
-- Future: normalized tags (if needed)
CREATE TABLE tags (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT UNIQUE NOT NULL
);
CREATE TABLE tool_tags (
tool_id INTEGER REFERENCES tools(id),
tag_id INTEGER REFERENCES tags(id),
PRIMARY KEY (tool_id, tag_id)
);
```
**CLI first-time publish flow:**
```bash
$ smarttools registry publish
No registry account configured.
1. Register at: https://gitea.brrd.tech/registry/register
2. Generate a token from your dashboard
3. Enter your token below
Registry token: ********
Token saved to ~/.smarttools/config.yaml
Validating tool...
✓ config.yaml is valid
✓ README.md exists (2.3 KB)
✓ Version 1.0.0 not yet published
Publishing rob/my-tool@1.0.0...
✓ PR created: https://gitea.brrd.tech/rob/SmartTools-Registry/pulls/42
Your tool is pending review. You'll receive an email when it's approved.
```
## CLI Commands Reference
Full mapping of CLI commands to API calls:
### Registry Commands
```bash
# Search for tools
$ smarttools registry search <query> [--category=<cat>] [--limit=20]
→ GET /api/v1/tools/search?q=<query>&category=<cat>&limit=20
# Browse tools (TUI)
$ smarttools registry browse [--category=<cat>]
→ GET /api/v1/tools?category=<cat>&page=1
→ GET /api/v1/categories
# View tool details
$ smarttools registry info <owner/name>
→ GET /api/v1/tools/<owner>/<name>
# Install a tool
$ smarttools registry install <owner/name> [--version=<ver>]
→ GET /api/v1/tools/<owner>/<name>/download?version=<ver>&install=true
→ Writes to ~/.smarttools/<owner>/<name>/config.yaml
→ Generates ~/.local/bin/<name> wrapper (or <owner>-<name> if collision)
# Uninstall a tool
$ smarttools registry uninstall <owner/name>
→ Removes ~/.smarttools/<owner>/<name>/
→ Removes wrapper script
# Publish a tool
$ smarttools registry publish [path] [--dry-run]
→ POST /api/v1/tools (with registry token)
→ Returns PR URL
# List my published tools
$ smarttools registry my-tools
→ GET /api/v1/me/tools (with registry token)
# Update index cache
$ smarttools registry update
→ GET /api/v1/index.json
→ Writes to ~/.smarttools/registry/index.json
```
### Project Commands
```bash
# Install project dependencies from smarttools.yaml
$ smarttools install
→ Reads ./smarttools.yaml
→ For each dependency:
GET /api/v1/tools/<owner>/<name>/download?version=<constraint>&install=true
→ Installs to ~/.smarttools/<owner>/<name>/
# Add a dependency to smarttools.yaml
$ smarttools add <owner/name> [--version=<constraint>]
→ Adds to ./smarttools.yaml dependencies
→ Runs install for that tool
# Show project dependencies status
$ smarttools deps
→ Reads ./smarttools.yaml
→ Shows installed status for each dependency
→ Note: "smarttools list" is reserved for listing installed tools
```
**Command naming note:** `smarttools list` already exists to list locally installed tools. Use `smarttools deps` to show project manifest dependencies.
### Flags available on most commands
| Flag | Description |
|------|-------------|
| `--offline` | Use cached index only, don't fetch |
| `--refresh` | Force refresh of cached data |
| `--json` | Output in JSON format |
| `--verbose` | Show detailed output |
## Webhooks and Security
### HMAC Verification
All Gitea webhooks are verified using HMAC-SHA256:
```python
import hmac
import hashlib
def verify_webhook(request, secret):
signature = request.headers.get('X-Gitea-Signature')
if not signature:
return False
expected = hmac.new(
secret.encode(),
request.body,
hashlib.sha256
).hexdigest()
return hmac.compare_digest(signature, expected)
```
### Replay Protection
While sync is idempotent, implement basic replay protection:
```python
def process_webhook(request):
delivery_id = request.headers.get('X-Gitea-Delivery')
# Check if already processed
if db.webhook_log.exists(delivery_id=delivery_id):
return {"status": "already_processed"}, 200
# Verify signature
if not verify_webhook(request, WEBHOOK_SECRET):
return {"error": "invalid_signature"}, 401
# Process with lock to prevent concurrent processing
with db.lock(f"webhook:{delivery_id}"):
# Double-check after acquiring lock
if db.webhook_log.exists(delivery_id=delivery_id):
return {"status": "already_processed"}, 200
# Process the webhook
result = sync_from_repo()
# Log successful processing
db.webhook_log.insert(
delivery_id=delivery_id,
event_type=request.json.get('action'),
processed_at=datetime.utcnow()
)
return {"status": "processed"}, 200
```
### Sync Job Locking
Prevent concurrent sync operations:
```python
# Using file lock or database advisory lock
SYNC_LOCK_TIMEOUT = 300 # 5 minutes max
def sync_from_repo():
try:
with acquire_lock("registry_sync", timeout=SYNC_LOCK_TIMEOUT):
# Pull latest from Gitea
repo.fetch()
repo.reset('origin/main', hard=True)
# Parse and update database
for tool_path in glob('tools/*/*/config.yaml'):
update_tool_in_db(tool_path)
# Rebuild FTS index if needed
rebuild_fts_index()
except LockTimeout:
logger.warning("Sync already in progress, skipping")
return {"status": "skipped", "reason": "sync_in_progress"}
```
### Atomic Sync Strategy
To avoid partially updated DB during webhook sync, use transactional table swap:
```python
def sync_from_repo_atomic():
with acquire_lock("registry_sync", timeout=SYNC_LOCK_TIMEOUT):
# 1. Pull latest from Gitea
repo.fetch()
repo.reset('origin/main', hard=True)
# 2. Parse all tools into memory
new_tools = []
for tool_path in glob('tools/*/*/config.yaml'):
tool_data = parse_tool(tool_path)
if tool_data:
new_tools.append(tool_data)
# 3. Atomic swap using transaction
with db.transaction():
# Create temp table
db.execute("CREATE TABLE tools_new AS SELECT * FROM tools WHERE 0")
# Bulk insert into temp table
for tool in new_tools:
db.execute("INSERT INTO tools_new ...", tool)
# Swap tables atomically
db.execute("ALTER TABLE tools RENAME TO tools_old")
db.execute("ALTER TABLE tools_new RENAME TO tools")
db.execute("DROP TABLE tools_old")
# Rebuild FTS index
db.execute("INSERT INTO tools_fts(tools_fts) VALUES('rebuild')")
# Update sync timestamp
db.execute("UPDATE sync_status SET last_sync = ?", [datetime.utcnow()])
```
**Why atomic:** Per-row updates with FTS triggers can yield inconsistent reads under load. Readers may see partial state mid-sync. Table swap ensures all-or-nothing visibility.
### Error Handling
| Error Scenario | Behavior |
|----------------|----------|
| Repo fetch fails | Log error, retry in 5 min, alert if 3 failures |
| YAML parse error | Skip tool, log error, continue with others |
| Database write fails | Rollback transaction, retry once, then alert |
| Lock timeout | Skip this sync, next webhook will retry |
## Automated CI Validation
PRs are validated automatically using SmartTools (dogfooding):
```
PR Submitted
┌─────────────────────────────────────┐
│ Gitea CI runs validation tools: │
│ • schema-validator │
│ • security-scanner │
│ • duplicate-detector │
└───────────────┬─────────────────────┘
┌───────┴───────┐
│ │
All pass Any fail
│ │
▼ ▼
Auto-merge or Add comment,
flag for review request changes
```
Validation checks:
1. **Schema validation**: config.yaml matches expected format
2. **Security scan**: No dangerous shell commands, no secrets in prompts
3. **Duplicate detection**: AI-powered similarity check against existing tools
4. **README check**: README.md exists and is non-empty
CI workflow (`.gitea/workflows/validate.yaml`):
```yaml
name: Validate Tool Submission
on: [pull_request]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Validate schema
run: python scripts/validate_tool.py ${{ github.event.pull_request.head.sha }}
- name: Security scan
run: smarttools run security-scanner < changed_files.txt
- name: Check duplicates
run: smarttools run duplicate-detector < changed_files.txt
```
## Registry Repository Structure
Full structure of the SmartTools-Registry repo:
```
SmartTools-Registry/
├── README.md # Registry overview
├── CONTRIBUTING.md # How to submit tools
├── LICENSE
├── tools/ # All published tools
│ ├── rob/
│ │ ├── summarize/
│ │ │ ├── config.yaml
│ │ │ └── README.md
│ │ └── translate/
│ │ ├── config.yaml
│ │ └── README.md
│ └── alice/
│ └── code-review/
│ ├── config.yaml
│ └── README.md
├── categories/
│ └── categories.yaml # Category definitions
├── index.json # Auto-generated search index
├── .gitea/
│ └── workflows/
│ ├── validate.yaml # PR validation
│ ├── build-index.yaml # Rebuild index on merge
│ └── notify-api.yaml # Webhook to API server
└── scripts/
├── validate_tool.py # Schema validation
├── build_index.py # Generate index.json
├── check_duplicates.py # Similarity detection
└── security_scan.py # Security checks
```
`categories.yaml` format:
```yaml
categories:
- name: text-processing
description: Tools for manipulating and analyzing text
icon: 📝
- name: code
description: Tools for code review, generation, and analysis
icon: 💻
- name: data
description: Tools for data transformation and analysis
icon: 📊
- name: media
description: Tools for image, audio, and video processing
icon: 🎨
- name: productivity
description: General productivity and automation tools
icon:
```
## Download Stats
### Counting Methodology
- Count installs only, not views or searches
- Increment **after** successful download (response sent)
- Dedupe by `client_id + tool_id + date`
```python
def download_tool(owner, name, version, install=False, client_id=None):
tool = get_tool(owner, name, version)
if not tool:
return {"error": "not_found"}, 404
config_yaml = tool.config_yaml
# Only count if this is an install (not just viewing)
if install:
record_download(tool.id, client_id)
return {"config": config_yaml}, 200
def record_download(tool_id, client_id):
today = date.today()
# Use client_id if provided, otherwise generate anonymous fallback
effective_client_id = client_id or f"anon_{hash(request.remote_addr)}"
# Dedupe: only count once per client per tool per day
try:
db.download_stats.insert(
tool_id=tool_id,
client_id=effective_client_id,
downloaded_at=today
)
# Increment counter (can be async/batch updated)
db.execute("UPDATE tools SET downloads = downloads + 1 WHERE id = ?", [tool_id])
except IntegrityError:
pass # Already counted today, ignore
```
### Client ID Generation
CLI generates a persistent anonymous ID on first run:
```python
# In CLI, on first run
import uuid
import os
CONFIG_PATH = os.path.expanduser("~/.smarttools/config.yaml")
def get_or_create_client_id():
config = load_config()
if 'client_id' not in config:
config['client_id'] = f"anon_{uuid.uuid4().hex[:16]}"
save_config(config)
return config['client_id']
```
**Fallback when client_id missing:**
- If header `X-Client-ID` not sent, use IP hash as fallback
- This still provides some dedupe for anonymous users
- Logged users' downloads are attributed to their account instead
### Privacy Considerations
- No IP addresses stored in database
- `client_id` is client-controlled and can be regenerated
- Stats are aggregated (total count), not individual tracking
### Async Stats Strategy
To avoid DB contention on the hot download path:
```python
from queue import Queue
from threading import Thread
# In-memory queue for stats
stats_queue = Queue()
def record_download_async(tool_id, client_id):
"""Non-blocking: enqueue for background processing"""
stats_queue.put({
'tool_id': tool_id,
'client_id': client_id,
'date': date.today()
})
def stats_worker():
"""Background thread: batch process stats every 5 seconds"""
batch = []
while True:
try:
item = stats_queue.get(timeout=5)
batch.append(item)
except Empty:
if batch:
flush_batch(batch)
batch = []
def flush_batch(batch):
"""Bulk insert with conflict ignore"""
with db.transaction():
for item in batch:
try:
db.execute("""
INSERT INTO download_stats (tool_id, client_id, downloaded_at)
VALUES (?, ?, ?)
ON CONFLICT DO NOTHING
""", [item['tool_id'], item['client_id'], item['date']])
except Exception as e:
logger.warning(f"Stats insert failed: {e}")
# Don't fail downloads for stats errors
```
**Failure behavior:** If stats DB write fails, log the error but don't fail the download. Stats are "best effort" - the download must succeed.
## Search
- Primary search: SQLite FTS5 inside the API.
- `index.json` provides offline CLI search and backup.
- If FTS5 is stale, return results with `X-Search-Index-Stale: true`.
## API Caching Strategy
### Cache Headers
| Endpoint | Cache-Control | ETag | Notes |
|----------|---------------|------|-------|
| `GET /index.json` | `max-age=300, stale-while-revalidate=60` | Yes | 5 min cache, background refresh |
| `GET /tools/{owner}/{name}` | `max-age=60` | Yes | 1 min cache |
| `GET /tools/{owner}/{name}/download` | `max-age=3600, immutable` | Yes | Immutable versions, 1 hour |
| `GET /tools/search` | `no-cache` | No | Always fresh |
| `GET /categories` | `max-age=3600` | Yes | Categories change rarely |
### ETag Implementation
```python
import hashlib
from datetime import datetime
def get_tool_etag(tool):
"""Generate ETag from tool identity (immutable versions don't change)"""
# Since versions are immutable, owner/name@version is stable
# Use published_at for extra safety (not updated_at, which doesn't exist)
content = f"{tool.owner}/{tool.name}@{tool.version}:{tool.published_at.isoformat()}"
return hashlib.md5(content.encode()).hexdigest()
def get_index_etag():
"""Generate ETag from last sync timestamp"""
last_sync = db.get_last_sync_time()
return hashlib.md5(last_sync.isoformat().encode()).hexdigest()
@app.route('/api/v1/tools/<owner>/<name>/download')
def download_tool(owner, name):
version = request.args.get('version', 'latest')
tool = resolve_and_get_tool(owner, name, version)
etag = get_tool_etag(tool)
# Check If-None-Match header
if request.headers.get('If-None-Match') == etag:
return '', 304 # Not Modified
response = jsonify({
"data": {
"owner": tool.owner,
"name": tool.name,
"resolved_version": tool.version,
"config": tool.config_yaml
}
})
response.headers['ETag'] = etag
response.headers['Cache-Control'] = 'max-age=3600, immutable'
return response
```
**Note:** Since tool versions are immutable, the ETag based on `owner/name@version` is permanently stable. The `published_at` timestamp is included for defense-in-depth but won't change.
### DB vs Repo Read Strategy
| Scenario | Read From | Reason |
|----------|-----------|--------|
| Normal operation | SQLite DB | Fast, indexed, FTS |
| DB empty/corrupted | Gitea repo | Fallback/recovery |
| Webhook sync in progress | DB (stale OK) | Avoid blocking reads |
| Search query | SQLite FTS5 | Full-text search |
| Download specific version | DB, fallback to repo | DB is cache, repo is truth |
### Staleness Detection
```python
STALE_THRESHOLD = timedelta(minutes=10)
def is_db_stale():
last_sync = db.get_last_sync_time()
return datetime.utcnow() - last_sync > STALE_THRESHOLD
@app.route('/tools/search')
def search_tools(q):
results = db.search_fts(q)
response = jsonify({"results": results})
if is_db_stale():
response.headers['X-Search-Index-Stale'] = 'true'
response.headers['X-Last-Sync'] = db.get_last_sync_time().isoformat()
return response
```
## Error Model
### Response Envelopes
**Success response:**
```json
{
"data": { ... },
"meta": {
"page": 1,
"per_page": 20,
"total": 42,
"total_pages": 3
}
}
```
**Error response:**
```json
{
"error": {
"code": "TOOL_NOT_FOUND",
"message": "Tool 'foo/bar' does not exist",
"details": {
"owner": "foo",
"name": "bar",
"suggestion": "Did you mean 'rob/bar'?"
},
"docs_url": "https://registry.smarttools.dev/docs/errors#TOOL_NOT_FOUND"
}
}
```
### Error Codes
| Code | HTTP | Description |
|------|------|-------------|
| `TOOL_NOT_FOUND` | 404 | Tool does not exist |
| `VERSION_NOT_FOUND` | 404 | Requested version doesn't exist |
| `VERSION_EXISTS` | 409 | Cannot overwrite published version |
| `INVALID_VERSION` | 400 | Version string is not valid semver |
| `INVALID_CONSTRAINT` | 400 | Version constraint syntax error |
| `CONSTRAINT_UNSATISFIABLE` | 404 | No version matches constraint |
| `VALIDATION_ERROR` | 400 | Tool config validation failed |
| `UNAUTHORIZED` | 401 | Missing or invalid auth token |
| `FORBIDDEN` | 403 | Token valid but lacks permission |
| `RATE_LIMITED` | 429 | Too many requests |
| `SLUG_TAKEN` | 409 | Namespace slug already registered |
| `ACCOUNT_LOCKED` | 403 | Too many failed login attempts |
| `SERVER_ERROR` | 500 | Internal error (logged for debugging) |
## Error Scenarios and Fallbacks
### CLI Error Handling
| Scenario | CLI Behavior | User Message |
|----------|--------------|--------------|
| Registry offline | Use cached tools if available | "Registry unavailable. Using cached version." |
| Tool not found | Check cache, then fail | "Tool 'foo/bar' not found in registry or cache." |
| Version constraint unsatisfiable | Show available versions | "No version matches '>=5.0.0'. Available: 1.0.0, 1.1.0, 1.2.0" |
| Auth token expired | Prompt for new token | "Token expired. Please re-authenticate." |
| Rate limited | Wait and retry (backoff) | "Rate limited. Retrying in 30 seconds..." |
| Network timeout | Retry with backoff, then fail | "Connection timed out. Check your network." |
### Validation Failure Details
When `VALIDATION_ERROR` occurs, provide specific field errors:
```json
{
"error": {
"code": "VALIDATION_ERROR",
"message": "Tool configuration is invalid",
"details": {
"errors": [
{
"path": "steps[0].provider",
"message": "Provider 'gpt5' is not recognized",
"allowed": ["claude", "openai", "ollama", "mock"]
},
{
"path": "version",
"message": "Version '1.0' is not valid semver (use '1.0.0')"
}
]
},
"docs_url": "https://registry.smarttools.dev/docs/tool-format"
}
}
```
### Dependency Resolution Failures
When `smarttools install` fails on a manifest:
```bash
$ smarttools install
Error: Could not resolve all dependencies
rob/summarize@^2.0.0
✗ No matching version (latest: 1.2.0)
alice/translate@>=1.0.0
✓ Found 1.3.0
Suggestions:
- Update rob/summarize constraint to "^1.0.0"
- Contact the tool author for a v2 release
```
### Graceful Degradation
| Component Down | Fallback Behavior |
|----------------|-------------------|
| API server | CLI uses `~/.smarttools/registry/index.json` for search |
| Gitea repo | API serves from DB cache (may be stale) |
| FTS5 index | Fall back to LIKE queries (slower but works) |
| Network | Use locally installed tools, skip registry features |
## UX Requirements (CLI/TUI)
### Publishing UX
- `smarttools registry publish --dry-run` validates locally and shows what would be submitted:
```bash
$ smarttools registry publish --dry-run
Validating tool...
✓ config.yaml is valid
✓ README.md exists (2.3 KB)
✓ Version 1.1.0 not yet published
Would submit:
Owner: rob
Name: summarize
Version: 1.1.0
Category: text-processing
Tags: summarization, ai, text
Config preview:
─────────────────────────────
name: summarize
version: "1.1.0"
description: Summarize text using AI
...
─────────────────────────────
Run without --dry-run to submit for review.
```
- **Version bump reminder:** CLI warns if version hasn't changed from published:
```
⚠ Version 1.0.0 is already published. Bump version in config.yaml to publish changes.
```
- First-time publishing flow prompts for token and saves it to config.
### Progress Indicators
Long-running operations show progress:
```bash
$ smarttools install
Installing project dependencies...
[1/3] rob/summarize@^1.0.0
Resolving version... 1.2.0
Downloading... done
Installing... done
[2/3] alice/translate@>=2.0.0
Resolving version... 2.1.0
Downloading... done
Installing... done
[3/3] official/code-review@*
Resolving version... 1.0.0
Downloading... done
Installing... done
✓ Installed 3 tools
```
```bash
$ smarttools registry publish
Submitting rob/summarize@1.1.0...
Validating... done
Uploading... done
Creating PR... done
✓ PR created: https://gitea.brrd.tech/rob/SmartTools-Registry/pulls/42
Your tool is pending review. You'll receive an email when it's approved.
```
### TUI Browse
`smarttools registry browse` opens a full-screen terminal UI:
```
┌─ SmartTools Registry ───────────────────────────────────────┐
│ Search: [________________] [All Categories ▼] [Sort: Popular ▼] │
├─────────────────────────────────────────────────────────────┤
│ │
│ ▶ rob/summarize v1.2.0 ⬇ 142 │
│ Summarize text using AI │
│ [text-processing] [ai] [summarization] │
│ │
│ alice/translate v2.1.0 ⬇ 98 │
│ Translate text between languages │
│ [text-processing] [translation] │
│ │
│ official/code-review v1.0.0 ⬇ 87 │
│ AI-powered code review │
│ [code] [review] [ai] │
│ │
├─────────────────────────────────────────────────────────────┤
│ ↑↓ Navigate Enter: Details i: Install /: Search q: Quit │
└─────────────────────────────────────────────────────────────┘
```
**Keyboard shortcuts:**
| Key | Action |
|-----|--------|
| `↑/↓` or `j/k` | Navigate list |
| `Enter` | View tool details |
| `i` | Install selected tool |
| `/` | Focus search box |
| `c` | Change category filter |
| `s` | Change sort order |
| `?` | Show help |
| `q` | Quit |
**Virtual scrolling:** For large tool lists (>100), use virtual scrolling to maintain performance.
### Project Initialization
```bash
$ smarttools init
Creating smarttools.yaml...
Project name [my-project]: my-ai-project
Version [1.0.0]:
Would you like to add any tools? (search with 's', skip with Enter)
> s
Search: summ
1. rob/summarize v1.2.0 - Summarize text using AI
2. alice/summary v1.0.0 - Generate summaries
Add tool (number, or Enter to finish): 1
Added rob/summarize@^1.2.0
Add tool (number, or Enter to finish):
✓ Created smarttools.yaml
name: my-ai-project
version: "1.0.0"
dependencies:
- name: rob/summarize
version: "^1.2.0"
Run 'smarttools install' to install dependencies.
```
### Accessibility
- **CLI:** All output works with screen readers, no color-only information
- **TUI:** Full keyboard navigation, high-contrast mode support
- **Web UI:** WCAG 2.1 AA compliance target
- Semantic HTML
- ARIA labels for interactive elements
- Focus management in modals
- Skip links for navigation
## Offline Cache
Cache registry index locally:
```
~/.smarttools/registry/index.json
```
Refresh when older than 24 hours; support `--offline` and `--refresh` flags.
### Index Integrity
The cached `index.json` includes integrity metadata:
```json
{
"version": "1.0",
"generated_at": "2025-01-20T12:00:00Z",
"checksum": "sha256:abc123...",
"tool_count": 142,
"tools": [...]
}
```
**API response headers:**
```
ETag: "abc123def456"
X-Index-Checksum: sha256:abc123...
X-Index-Generated: 2025-01-20T12:00:00Z
```
**CLI verification:**
```python
def verify_cached_index():
"""Verify cached index integrity on load"""
cached = load_cached_index()
if not cached:
return None
# Verify checksum
content = json.dumps(cached['tools'], sort_keys=True)
computed = hashlib.sha256(content.encode()).hexdigest()
if computed != cached.get('checksum', '').replace('sha256:', ''):
logger.warning("Cached index checksum mismatch, will refresh")
return None
return cached
```
**Corruption handling:**
- If checksum fails, discard cache and fetch fresh
- If partial write detected (missing fields), discard and refresh
- CLI shows warning: "Cached index corrupted, fetching fresh copy..."
## Web UI Vision
The registry includes a full website, not just an API:
**Site structure:**
```
registry.smarttools.dev (or gitea.brrd.tech/registry)
├── / # Landing page
├── /tools # Browse all tools
├── /tools/{owner}/{name} # Tool detail page
├── /categories # Browse by category
├── /categories/{name} # Tools in category
├── /search?q=... # Search results
├── /docs # Documentation
│ ├── /docs/getting-started
│ ├── /docs/creating-tools
│ ├── /docs/publishing
│ └── /docs/best-practices
├── /tutorials # Step-by-step guides
│ ├── /tutorials/first-tool
│ ├── /tutorials/chaining-steps
│ └── /tutorials/code-steps
├── /examples # Example projects
├── /blog # Updates, announcements (optional)
├── /register # Publisher registration
├── /login # Publisher login
├── /dashboard # Publisher dashboard
│ ├── /dashboard/tools # My published tools
│ ├── /dashboard/tokens # API tokens
│ └── /dashboard/settings # Account settings
└── /api/v1/... # API endpoints
```
**Landing page content:**
- Hero: "Share and discover AI-powered CLI tools"
- Quick install example
- Featured/popular tools
- Category highlights
- "Get Started" CTA
**Tool detail page:**
- Name, description, version, author
- README rendered as markdown (sanitized)
- Install command (copy-to-clipboard)
- Version history
- Download stats
- Category/tags
- "Report" button for abuse
### README Security
When rendering README markdown, apply XSS sanitization:
```python
import bleach
from markdown import markdown
ALLOWED_TAGS = [
'h1', 'h2', 'h3', 'h4', 'h5', 'h6',
'p', 'br', 'hr',
'ul', 'ol', 'li',
'strong', 'em', 'code', 'pre',
'blockquote',
'a', 'img',
'table', 'thead', 'tbody', 'tr', 'th', 'td'
]
ALLOWED_ATTRS = {
'a': ['href', 'title'],
'img': ['src', 'alt', 'title'],
'code': ['class'], # for syntax highlighting
}
def render_readme_safe(readme_raw: str) -> str:
"""Convert markdown to sanitized HTML"""
# Convert markdown to HTML
html = markdown(readme_raw, extensions=['fenced_code', 'tables'])
# Sanitize to prevent XSS
safe_html = bleach.clean(
html,
tags=ALLOWED_TAGS,
attributes=ALLOWED_ATTRS,
strip=True
)
# Linkify URLs
safe_html = bleach.linkify(safe_html)
return safe_html
```
**Storage strategy:**
- Store raw README in `tools.readme`
- Render and sanitize on request (or cache rendered HTML)
- Never trust client-submitted HTML directly
**Tech stack options:**
| Option | Pros | Cons |
|--------|------|------|
| Flask + Jinja + Tailwind | Simple, Python-only, fast to build | Less interactive |
| FastAPI + Vue/React SPA | Modern, interactive | More complex, separate build |
| Astro/Next.js | Great SEO, static-first | Different stack (Node.js) |
**Recommendation:** Flask + Jinja + Tailwind for v1
- Keeps everything in Python
- Server-rendered is fine for a registry
- Good SEO out of the box
- Can add interactivity with Alpine.js or htmx if needed
**Monetization considerations:**
- AdSense-compatible (server-rendered pages)
- Analytics tracking for traffic insights
- Future: sponsored tools, featured placements
- Future: premium publisher tiers (more tools, priority review)
## Implementation Phases
### Phase 1: Foundation
- Define `smarttools.yaml` manifest format
- Implement tool resolution order (local → global → registry)
- Create SmartTools-Registry repo on Gitea (bootstrap)
- Add 3-5 example tools to seed the registry
### Phase 2: Core Backend
- Set up Flask/FastAPI project structure
- Implement SQLite database schema
- Build core API endpoints (list, search, get, download)
- Implement webhook receiver for Gitea sync
- Set up HMAC verification
### Phase 3: CLI Commands
- `smarttools registry search`
- `smarttools registry install`
- `smarttools registry info`
- `smarttools registry browse` (TUI)
- Local index caching
### Phase 4: Publishing
- Publisher registration (web UI)
- Token management
- `smarttools registry publish` command
- PR creation via Gitea API
- CI validation workflows
### Phase 5: Project Dependencies
- `smarttools install` (from manifest)
- `smarttools add` command
- Runtime override application
- Dependency resolution
### Phase 6: Smart Features
- SQLite FTS5 search index
- AI-powered auto-categorization
- Duplicate/similarity detection
- Security scanning
### Phase 7: Full Web UI
- Landing page
- Tool browsing/search pages
- Tool detail pages with README rendering
- Publisher dashboard
- Documentation/tutorials section
### Phase 8: Polish & Scale
- Rate limiting
- Abuse reporting
- Analytics integration
- Performance optimization
- Monitoring/alerting