# Lyrenth Lyrenth is the AI-readable web index for agents. A universal adapter for the existing web: turn any URL into clean AIDocument JSON with markdown, headings, links, structured data, crawl metadata, and token savings. No website changes required. This is /llms-full.txt: the long-form companion to /llms.txt. If you're an AI agent reading this, the goal is to give you everything in one fetch so you can decide whether Lyrenth solves your problem without crawling further. ## What it is Lyrenth is the AI-readable web index for agents and a universal adapter for the existing web. It is a REST API that takes any URL and returns a structured AI-readable document: markdown body, headings, links, structured data, metadata, render-mode trace, and per-call cost economics. Every response is the same shape regardless of how the source page was rendered (static HTML, React SPA, JS-hydrated content), and no website owner has to change anything for their pages to become AI-readable through Lyrenth. The wedge: agents calling LLMs to read web pages today either pay to clean raw HTML themselves (expensive, slow), or use a search API that returns dirty results (also expensive, also bad). Lyrenth resolves each URL once into a canonical AIDocument and caches it for every agent after that. The architecture lets us offer competitive per-request pricing while protecting origin sites from read amplification. Per-call savings vs raw HTML to a frontier LLM are ~60-90%. ## API Base URL: https://api.lyrenth.com/v1 Auth: Bearer token. Free tier is 2,000 successful API calls / month, no credit card. Sign up at https://www.lyrenth.com/signup. ``` GET /healthz public, not logged GET /aidocument.schema.json public, not logged v2 JSON Schema (draft-07) GET /v1/stats public, logged POST /v1/aidocument body {"url":"..."} Bearer, logged resolve URL -> v2 envelope GET /v1/document?url=... Bearer, logged index lookup only, 404 on miss POST /v1/submit body {"url":"..."} Bearer, logged 202 + background indexing # Per-user site management GET /v1/sites Bearer list of caller's verified domains POST /v1/sites/{domain}/verify Bearer issue / re-issue verification token POST /v1/sites/{domain}/check Bearer run DNS + HTML probe; 15s cooldown DELETE /v1/sites/{domain} Bearer drop the ownership record # Self-service analytics GET /v1/admin/stats?domain=...&days=N Bearer per-domain analytics GET /v1/admin/usage?api_key_id=...&days=N Bearer per-key usage GET /v1/admin/savings?api_key_id=...&days=N Bearer per-key savings ($, tokens) ``` ### The canonical AIDocument shape The v2 grouped envelope served by POST /v1/aidocument. Machine-readable JSON Schema (draft-07) lives at https://api.lyrenth.com/aidocument.schema.json; that schema is the source of truth for field names, required fields, and enum values (cache.status, source.freshness_policy, source.render_mode). The narrative example below is the same shape humans can read. ```jsonc { "schema": { "name": "AIDocument", "version": "2.0", "ref": "aidoc:sha256:7a14e9b2d8f3c901b42e5a77c0f19a34" }, "source": { "url": "https://...", "canonical_url": "https://...", "fetched_at": "...", "render_mode": "static" | "rendered" | "static_after_render_failure", "status_code": 200, "freshness_policy": "cache_first" | "force_refresh" }, "cache": { "status": "hit" | "miss" | "refreshed" | "stale_revalidated", "origin_contacted": false, "body_fetched": false }, "identity": { "title": "...", "description": "...", "language": "en", "content_type": "article" }, "content": { "markdown": "..." // the body, cleaned }, "structure": { "headings": [{"level": 1, "text": "..."}], "links": [{"url": "...", "text": "...", "internal": true}], "images": [{"url": "...", "alt": "..."}], "structured_data": { /* parsed JSON-LD blocks, merged into one object */ } }, "signals": { "word_count": 1552, "reading_time": 7, "has_json_ld": true, "heading_hierarchy_ok": true }, "economics": { "raw_html_tokens_approx": 12500, "output_tokens_approx": 1850, "token_savings": 10650, "token_savings_percent": 0.85, "estimated_cost_usd": { "raw_html": 0.0375, "our_output": 0.0056, "savings": 0.0319 }, "pricing_basis": { "input_price_per_1k_usd": 0.003, "model_class": "frontier-1m-context" } } } ``` ## Render mode escalation Static fetch first (~50ms). If the page is empty, an empty SPA shell, or low-density text with a framework signal (react / vue / angular / next / nuxt / svelte), we escalate to headless Chromium with stealth flags (~2-5s). The captured `crawl.status_code` comes from the network event for the main document so chromedp can never silently misreport an upstream 404. If rendering fails AND the static fetch returned a body, we degrade gracefully and return the static body. A 4xx/5xx never escalates: a 404 stays a 404. ## Bot identification User-Agent: `AIWebIndex/1.0 (+https://www.lyrenth.com/bot; AI-readable web index)` Verification: `AIWebIndex/1.0 verification (+https://www.lyrenth.com/bot)` Today we honor HTTP 429 / 503 backoff, Sitemap directives in robots.txt, and per-domain rate caps with a 2 second cooldown. Disallow and Crawl-delay enforcement is on the short-term roadmap. Allowlisting our UA in your edge config lets your content into the index. To opt out today, firewall-block our IP range or email us; add the Disallow rule ahead of time so the block takes effect the moment enforcement ships. ## Domain ownership Two methods, either succeeds: 1. DNS TXT at `_aiwebindex-verify.` with value `aiwi-verify=`. 2. HTML file at `https:///.well-known/aiwebindex-verify.txt` containing `aiwi-verify=` anywhere in the body. The DNS path bypasses the host's system resolver and queries `1.1.1.1`, `8.8.8.8`, and `9.9.9.9` in parallel; the first resolver that finds a matching record wins. This avoids breakage from misbehaving local / ISP nameservers that hijack lookups (we've reproduced this in the wild). ## AI Readiness score A weighted 0-10 number per verified domain, surfaced in the website-owner dashboard as the "AI Readiness score". It measures the quality of the content Lyrenth serves to AI agents (not a homework list for the site owner; we don't ship advice). Computed per-page over the indexed AIDocuments and rolled up per-domain. Signals and weights: jsonld_coverage 0.20 structured-data entries on the page heading_hierarchy 0.15 semantic heading structure js_blocked_ratio 0.20 share served without JS escalation noise_ratio 0.20 markdown-to-html density meta_quality 0.10 title + description adequacy paywall_penalty 0.10 open-access vs paywall / login wall word_count_ok 0.05 meets baseline content depth The hero on /sites/[domain] also surfaces a served-count: how many AI agents Lyrenth served clean content to in the window. ## Privacy and what's logged Every `/v1/*` call writes one row to `requests` from a goroutine fired after the response is sent. Captured fields: endpoint, status_code, duration_ms, cache_hit, render_mode, domain (as requested), final_domain (post-redirect), url, api_key_id, user_agent (the API caller's), error_message. Not captured: end-user IPs, request bodies beyond the URL parameter, response bodies, cookies / auth tokens, browser fingerprints, any PII the agent might be relaying. The log captures the API CALLER'S behavior, not the end-user data flowing through. ## Pricing Every successful 2xx call that returns an AIDocument counts as one request, regardless of which billable endpoint served it. Failed fetches (4xx/5xx) don't count. Internal endpoints (dashboard analytics, key management, domain verification, the universal-bot-traffic ingest path) don't count either. Free: 2,000 AIDocuments / month. No credit card. (https://www.lyrenth.com/pricing) Pro tier ships in our next phase. Custom / volume / on-premise: contact us. ## Universal AI-bot ingest (Phase 1 backend, v0.5.0) Site owners who verify a domain auto-receive a per-domain ingest token. Workers (Cloudflare Worker / Vercel Edge Middleware) install at the site's CDN edge, classify request User-Agents, batch into 30-second windows, and POST aggregated counts to /v1/ingest/bot/. The data lets Lyrenth see AI agents that visit a verified domain directly (i.e., never route through Lyrenth's own API), so we can serve the whole audience cleanly. The data is used internally and is not surfaced in the dashboard. Wire protocol: ``` POST /v1/ingest/bot/ Authorization: Bearer aiwt_ { "events": [ { "raw_ua": "...", "path": "/blog", "bucket_at": "2026-05-09T10:30:00Z", "count": 47, "status_code": 200 } ] } ``` Server classifies raw_ua against an evolving curated list of AI bots (GPTBot, ChatGPT-User, ClaudeBot, Claude-Web, anthropic-ai, PerplexityBot, Applebot-Extended, Meta-ExternalAgent, Bytespider, Google-Extended, GoogleOther, cohere-ai, DuckAssistBot, Amazonbot, AI2Bot) and persists one row per bucket. ## Links - Documentation: https://www.lyrenth.com/docs - Pricing: https://www.lyrenth.com/pricing - Bot info: https://www.lyrenth.com/bot - Agent manifest (JSON): https://www.lyrenth.com/api/agent-manifest - AIDocument JSON Schema (draft-07): https://api.lyrenth.com/aidocument.schema.json - Sign up: https://www.lyrenth.com/signup - Sitemap: https://www.lyrenth.com/sitemap.xml - Robots: https://www.lyrenth.com/robots.txt