The AIDocument shape
A stable, grouped JSON envelope. The same shape for every URL and every agent, eight top-level groups, versioned forever. Validate against api.lyrenth.com/aidocument.schema.json (draft-07).
{ "schema": { "name": "AIDocument", "version": "2.0", "ref": "aidoc:sha256:7a14e9b2…" }, "source": { "url": "…/Web_indexing", "canonical_url": "…/Web_indexing", "fetched_at": "2026-05-13T…", "render_mode": "static", "status_code": 200, "freshness_policy": "cache_first" }, "cache": { "status": "hit", "origin_contacted": false, "body_fetched": false }, "identity": { "title": "Web indexing", "language": "en", "content_type": "article" }, "content": { "markdown": "Web indexing or…" }, "structure": { "headings": […], "links": […], "images": […], "structured_data": {} }, "signals": { "word_count": 1552, "reading_time": 7, "has_json_ld": true }, "economics": { "raw_html_tokens_approx": 21331, "output_tokens_approx": 2715, "token_savings_percent": 0.873 } }
Format identity and a content-addressed ref. Pin against version for stability.
Where it came from and how it was fetched: canonical URL, render mode, status, freshness policy.
Cache truth. Whether the origin was contacted and the body re-fetched on this call.
Title, description, language, and detected content type.
Cleaned, boilerplate-stripped Markdown, the part you feed a model.
Headings, links, images, media, contacts, and JSON-LD structured data.
Derived quality signals: word count, reading time, schema presence, heading hierarchy.
Token and cost economics versus ingesting raw HTML, the savings, per call.
Call it to learn it.
The fastest way to learn the shape is to resolve a URL.