LYRENTH
HomePricingIntegrationsDocsIndex statsAboutContact
/docs / aidocument
Reference · v2.0

The AIDocument shape

A stable, grouped JSON envelope. The same shape for every URL and every agent, eight top-level groups, versioned forever. Validate against api.lyrenth.com/aidocument.schema.json (draft-07).

AIDocument.v2.json200 · 9.4 KB
{
  "schema": { "name": "AIDocument", "version": "2.0",
             "ref": "aidoc:sha256:7a14e9b2…" },
  "source": {
    "url": "…/Web_indexing",
    "canonical_url": "…/Web_indexing",
    "fetched_at": "2026-05-13T…",
    "render_mode": "static",
    "status_code": 200,
    "freshness_policy": "cache_first" },
  "cache": { "status": "hit",
            "origin_contacted": false,
            "body_fetched": false },
  "identity": { "title": "Web indexing",
               "language": "en",
               "content_type": "article" },
  "content": { "markdown": "Web indexing or…" },
  "structure": { "headings": […], "links": […],
                "images": […], "structured_data": {} },
  "signals": { "word_count": 1552,
              "reading_time": 7,
              "has_json_ld": true },
  "economics": { "raw_html_tokens_approx": 21331,
                "output_tokens_approx": 2715,
                "token_savings_percent": 0.873 }
}
schema

Format identity and a content-addressed ref. Pin against version for stability.

name · version · ref
source

Where it came from and how it was fetched: canonical URL, render mode, status, freshness policy.

url · canonical_url · fetched_at · render_mode · status_code
cache

Cache truth. Whether the origin was contacted and the body re-fetched on this call.

status · origin_contacted · body_fetched
identity

Title, description, language, and detected content type.

title · description · language · content_type
content

Cleaned, boilerplate-stripped Markdown, the part you feed a model.

markdown
structure

Headings, links, images, media, contacts, and JSON-LD structured data.

headings · links · images · media · contacts · structured_data
signals

Derived quality signals: word count, reading time, schema presence, heading hierarchy.

word_count · reading_time · has_json_ld · heading_hierarchy_ok
economics

Token and cost economics versus ingesting raw HTML, the savings, per call.

raw_html_tokens_approx · output_tokens_approx · token_savings_percent · estimated_cost_usd

Call it to learn it.

The fastest way to learn the shape is to resolve a URL.