AI-readable web index

The web,
rebuilt for AI.

Lyrenth continuously indexes the open web and returns every page as a clean, structured AIDocument: the signal your agents need, without the markup they don't.

Get API access →Read the docs

Free tier: 2,000 AIDocuments / month. No card required.

POST

https://

from the cached index · site not contacted2,722 tokens · 89.0% saved

Web indexing - Wikipedia

en.wikipedia.org/wiki/Web_indexing

From Wikipedia, the free encyclopedia Methods for indexing the Internet **Web indexing**, or **Internet indexing**, comprises methods for indexing the contents of a [website](https://en.wikipedia.org/wiki/Website) or of the [Internet](https://en.wikipedia.org/wiki/Internet) as a whole. Individual websites or [intranet…

lang entype Article1,556 words8 min readhas JSON-LD

Tokens saved

89.0%

24,734 → 2,722

AIDocument tokens

2,722

what the model reads

Origin fetches

shared, cross-caller cache

Served from

cached index

instant · fresh copy on demand

AI-readable pages indexed

1,505,699,515

Indexed domains

139,200,563

Pages audited

1503.8M

Recrawl cadence

24h to 90d · by plan

The transform

Signal,
not markup.

Every crawl of raw HTML drags in navs, ads, cookie walls, scripts, and trackers. Models pay, in tokens, latency, and dollars, to parse junk before reaching a single useful sentence.

An indexed page is different: rendered, cleaned, normalized, and rewritten into one canonical document a model reads in one pass. Lyrenth removes the noise once, for everyone.

up to 8x less

cost & latency per page

Tokens to read one pagesignal · 9%

Raw HTML page~14,800 tok

Actual signal~1,330 tok

Lyrenth AIDocument~1,840 tok

The adapter layer

An index,
not another scraper.

Every read resolves against the shared index, not the origin. When a thousand agents request the same URL, the origin sees one fetch: everyone else is served from cache in milliseconds. When freshness matters, force_refresh re-indexes on demand.

01 · Origins

one crawl →

02 · The standing index

→ every read

03 · Readers

crawl

Lyrenthindexing · live

AIDocuments in the index

POST/v1/aidocumentHIT

One canonical AIDocument per URL. Written once, read by everyone: shared infrastructure, not a per-customer scrape.

Freshness, per request

cache_firstforce_refreshyou choose

Autonomous agents

per-url · mcp

AI assistants

retrieve · cite

AI search & discovery

search · rank

RAG systems

embed · index

Model labs

corpus · bulk

Enterprise & research

pipeline · api

The open web, crawled on our schedule. Readers ask for the latest; Lyrenth fetches and serves it from the index.

A standing index sits between the two. Rendered, cleaned, normalized, and held.

Every reader resolves against the index and is answered in milliseconds.

01 · Origins

crawl

The open web, crawled on our schedule. Readers ask for the latest; Lyrenth fetches and serves it from the index.

one crawl

02 · The standing index

Lyrenthindexing · live

AIDocuments in the index

POST/v1/aidocumentHIT

One canonical AIDocument per URL. Written once, read by everyone: shared infrastructure, not a per-customer scrape.

Freshness, per request

cache_firstforce_refreshyou choose

A standing index sits between the two. Rendered, cleaned, normalized, and held.

every read

03 · Readers

Autonomous agents

per-url · mcp

AI assistants

retrieve · cite

AI search & discovery

search · rank

RAG systems

embed · index

Model labs

corpus · bulk

Enterprise & research

pipeline · api

Every reader resolves against the index and is answered in milliseconds.

Is not✕Raw crawling✕Scraping-as-a-service✕A normal search engine✕Another data broker

Is✓The AI-readable web index

One request, one document

A stable contract
for every URL.

Send any public URL, get one AIDocument back: the same grouped JSON envelope every time, versioned forever. Validate against the public JSON Schema.

schema · version, refsource · crawl tracecache · truthidentity · title, langcontent · markdownstructure · headings, linkssignals · qualityeconomics · tokens, cost

POST/v1/aidocumentresolve any URL

GET/v1/document?url=…index lookup

POST/v1/submitqueue for indexing

GET/v1/statspublic counter

GET/aidocument.schema.jsonthe contract

Quickstart: first call in a minute →

Built for

Every system that
reads the web.

If it consumes web data, it runs better on clean AIDocuments than on raw HTML.

01Autonomous agentsBrowse and act with structured pages instead of burning context on markup.PER-URL · MCP

02AI assistantsAnswer with fresh, normalized web content and clean citations.RETRIEVE · CITE

03AI search & discoveryBuild on a corpus already structured for ranking and recall.SEARCH · RANK

04RAG systemsSkip the scrape-and-clean pipeline: embed AIDocuments directly.EMBED · INDEX

05Model labsGround, retrieve, and evaluate on a clean, deduplicated web corpus.CORPUS · BULK

06Enterprise & researchMonitor, extract, and analyze the live web as structured data.PIPELINE · API

Real data, from the production API

What famous pages cost,
raw vs indexed.

Every row is a real page read through the index, with token economics reported by the API itself. Reproducible with one call.

Page	Raw HTML tokens	AIDocument tokens	Smaller	Saved
Stripe API reference	307,902	2,000	154x	99.4%
Vercel Functions docs	237,390	2,149	110x	99.1%
Cloudflare Workers docs	94,963	1,377	69x	98.5%
GitHub REST API quickstart	88,614	4,876	18x	94.5%
Kubernetes Pods concepts	131,977	7,475	18x	94.3%

Measured 2026-07-05 · cost basis $3.00 / 1M input tokensAll 10 benchmarks →

Get started

Read the web like a machine.

Get an API key and pull your first AIDocument in under a minute. Web-scale, structured, and live.

Get API access →Read the docs

The web,rebuilt for AI.

Signal,not markup.

An index,not another scraper.

A stable contractfor every URL.

Every system thatreads the web.

What famous pages cost,raw vs indexed.