# Lyrenth

Lyrenth is the AI-readable web index for agents. A universal adapter for the existing web: turn any URL into clean AIDocument JSON with markdown, headings, links, structured data, crawl metadata, and token savings. No website changes required.

This is /llms-full.txt: the long-form companion to /llms.txt. If you're
an AI agent reading this, the goal is to give you everything in one
fetch so you can decide whether Lyrenth solves your problem
without crawling further.

## What it is

Lyrenth is the AI-readable web index for agents and a universal
adapter for the existing web. It is a REST API that takes any URL and
returns a structured AI-readable document: markdown body, headings, links,
structured data, metadata, render-mode trace, and per-call cost economics.
Every response is the same shape regardless of how the source page was
rendered (static HTML, React SPA, JS-hydrated content), and no website
owner has to change anything for their pages to become AI-readable through
Lyrenth.

The wedge: agents calling LLMs to read web pages today either pay to
clean raw HTML themselves (expensive, slow), or use a search API that
returns dirty results (also expensive, also bad). Lyrenth resolves
each URL once into a canonical AIDocument and caches it for every
agent after that. The architecture lets us offer
competitive per-request pricing while protecting origin sites from
read amplification. Per-call savings vs raw HTML to a frontier LLM
are ~60-90%.

## API

Base URL: https://api.lyrenth.com/v1
Auth: Bearer token. Free tier is 2,000 successful API calls / month,
no credit card. Sign up at https://www.lyrenth.com/signup.

```
GET  /healthz                                 public,    not logged
GET  /aidocument.schema.json                  public,    not logged   v2 JSON Schema (draft-07)
GET  /v1/stats                                public,    logged
POST /v1/aidocument  body {"url":"..."}       Bearer,    logged   resolve URL -> v2 envelope
GET  /v1/document?url=...                     Bearer,    logged   index lookup only, 404 on miss
POST /v1/submit      body {"url":"..."}       Bearer,    logged   202 + background indexing

# Per-user site management
GET    /v1/sites                            Bearer    list of caller's verified domains
POST   /v1/sites/{domain}/verify            Bearer    issue / re-issue verification token
POST   /v1/sites/{domain}/check             Bearer    run DNS + HTML probe; 15s cooldown
DELETE /v1/sites/{domain}                   Bearer    drop the ownership record

# Self-service analytics
GET /v1/admin/stats?domain=...&days=N       Bearer    per-domain analytics
GET /v1/admin/usage?api_key_id=...&days=N   Bearer    per-key usage
GET /v1/admin/savings?api_key_id=...&days=N Bearer    per-key savings ($, tokens)
```

### The canonical AIDocument shape

The v2 grouped envelope served by POST /v1/aidocument. Machine-readable
JSON Schema (draft-07) lives at https://api.lyrenth.com/aidocument.schema.json; that
schema is the source of truth for field names, required fields, and
enum values (cache.status, source.freshness_policy, source.render_mode).
The narrative example below is the same shape humans can read.

```jsonc
{
  "schema": {
    "name":    "AIDocument",
    "version": "2.0",
    "ref":     "aidoc:sha256:7a14e9b2d8f3c901b42e5a77c0f19a34"
  },
  "source": {
    "url":              "https://...",
    "canonical_url":    "https://...",
    "fetched_at":       "...",
    "render_mode":      "static" | "rendered" | "static_after_render_failure",
    "status_code":      200,
    "freshness_policy": "cache_first" | "force_refresh"
  },
  "cache": {
    "status":           "hit" | "miss" | "refreshed" | "stale_revalidated",
    "origin_contacted": false,
    "body_fetched":     false
  },
  "identity": {
    "title":        "...",
    "description":  "...",
    "language":     "en",
    "content_type": "article"
  },
  "content": {
    "markdown": "..."                       // the body, cleaned
  },
  "structure": {
    "headings":        [{"level": 1, "text": "..."}],
    "links":           [{"url": "...", "text": "...", "internal": true}],
    "images":          [{"url": "...", "alt": "..."}],
    "structured_data": { /* parsed JSON-LD blocks, merged into one object */ }
  },
  "signals": {
    "word_count":           1552,
    "reading_time":         7,
    "has_json_ld":          true,
    "heading_hierarchy_ok": true
  },
  "economics": {
    "raw_html_tokens_approx": 12500,
    "output_tokens_approx":   1850,
    "token_savings":          10650,
    "token_savings_percent":  0.85,
    "estimated_cost_usd": {
      "raw_html":   0.0375,
      "our_output": 0.0056,
      "savings":    0.0319
    },
    "pricing_basis": {
      "input_price_per_1k_usd": 0.003,
      "model_class":            "frontier-1m-context"
    }
  }
}
```

## Render mode escalation

Static fetch first (~50ms). If the page is empty, an empty SPA shell,
or low-density text with a framework signal (react / vue / angular /
next / nuxt / svelte), we escalate to headless Chromium with stealth
flags (~2-5s). The captured `crawl.status_code` comes from the
network event for the main document so chromedp can never silently
misreport an upstream 404.

If rendering fails AND the static fetch returned a body, we degrade
gracefully and return the static body. A 4xx/5xx never escalates: a
404 stays a 404.

## Bot identification

User-Agent: `AIWebIndex/1.0 (+https://www.lyrenth.com/bot; AI-readable web index)`
Verification: `AIWebIndex/1.0 verification (+https://www.lyrenth.com/bot)`

Today we honor HTTP 429 / 503 backoff, Sitemap directives in
robots.txt, and per-domain rate caps with a 2 second cooldown.
Disallow and Crawl-delay enforcement is on the short-term roadmap.
Allowlisting our UA in your edge config lets your content into the
index. To opt out today, firewall-block our IP range or email us;
add the Disallow rule ahead of time so the block takes effect the
moment enforcement ships.

## Domain ownership

Two methods, either succeeds:

1. DNS TXT at `_aiwebindex-verify.<domain>` with value
   `aiwi-verify=<token>`.
2. HTML file at `https://<domain>/.well-known/aiwebindex-verify.txt`
   containing `aiwi-verify=<token>` anywhere in the body.

The DNS path bypasses the host's system resolver and queries
`1.1.1.1`, `8.8.8.8`, and `9.9.9.9` in parallel; the first resolver
that finds a matching record wins. This avoids breakage from
misbehaving local / ISP nameservers that hijack lookups (we've
reproduced this in the wild).

## AI Readiness score

A weighted 0-10 number per verified domain, surfaced in the
website-owner dashboard as the "AI Readiness score". It measures the
quality of the content Lyrenth serves to AI agents (not a homework
list for the site owner; we don't ship advice). Computed per-page
over the indexed AIDocuments and rolled up per-domain.

Signals and weights:

  jsonld_coverage    0.20  structured-data entries on the page
  heading_hierarchy  0.15  semantic heading structure
  js_blocked_ratio   0.20  share served without JS escalation
  noise_ratio        0.20  markdown-to-html density
  meta_quality       0.10  title + description adequacy
  paywall_penalty    0.10  open-access vs paywall / login wall
  word_count_ok      0.05  meets baseline content depth

The hero on /sites/[domain] also surfaces a served-count: how many AI
agents Lyrenth served clean content to in the window.

## Privacy and what's logged

Every `/v1/*` call writes one row to `requests` from a goroutine
fired after the response is sent. Captured fields: endpoint,
status_code, duration_ms, cache_hit, render_mode, domain (as
requested), final_domain (post-redirect), url, api_key_id,
user_agent (the API caller's), error_message.

Not captured: end-user IPs, request bodies beyond the URL parameter,
response bodies, cookies / auth tokens, browser fingerprints, any
PII the agent might be relaying. The log captures the API CALLER'S
behavior, not the end-user data flowing through.

## Pricing

Every successful 2xx call that returns an AIDocument counts as one
request, regardless of which billable endpoint served it. Failed
fetches (4xx/5xx) don't count. Internal endpoints (dashboard
analytics, key management, domain verification, the
universal-bot-traffic ingest path) don't count either.

Free: 2,000 AIDocuments / month. No credit card. (https://www.lyrenth.com/pricing)
Pro tier ships in our next phase. Custom / volume / on-premise:
contact us.

## Universal AI-bot ingest (Phase 1 backend, v0.5.0)

Site owners who verify a domain auto-receive a per-domain ingest
token. Workers (Cloudflare Worker / Vercel Edge Middleware) install
at the site's CDN edge, classify request User-Agents, batch into
30-second windows, and POST aggregated counts to
/v1/ingest/bot/<domain>. The data lets Lyrenth see AI agents that
visit a verified domain directly (i.e., never route through
Lyrenth's own API), so we can serve the whole audience cleanly. The
data is used internally and is not surfaced in the dashboard.

Wire protocol:

```
POST /v1/ingest/bot/<domain>
Authorization: Bearer aiwt_<token>
{
  "events": [
    { "raw_ua": "...", "path": "/blog", "bucket_at": "2026-05-09T10:30:00Z",
      "count": 47, "status_code": 200 }
  ]
}
```

Server classifies raw_ua against an evolving curated list of AI
bots (GPTBot, ChatGPT-User, ClaudeBot, Claude-Web, anthropic-ai,
PerplexityBot, Applebot-Extended, Meta-ExternalAgent, Bytespider,
Google-Extended, GoogleOther, cohere-ai, DuckAssistBot, Amazonbot,
AI2Bot) and persists one row per bucket.

## Links

- Documentation: https://www.lyrenth.com/docs
- Pricing: https://www.lyrenth.com/pricing
- Bot info: https://www.lyrenth.com/bot
- Agent manifest (JSON): https://www.lyrenth.com/api/agent-manifest
- AIDocument JSON Schema (draft-07): https://api.lyrenth.com/aidocument.schema.json
- Sign up: https://www.lyrenth.com/signup
- Sitemap: https://www.lyrenth.com/sitemap.xml
- Robots: https://www.lyrenth.com/robots.txt