July 5, 2026 · tokens · agents

How to feed web pages to an LLM without blowing the context window

Raw HTML burns your context window on nav, scripts, and boilerplate. Here is how to feed web pages to an LLM as clean AIDocuments, with real token numbers.

You wire up an agent to read a web page, pass the response straight into your model, and the request either errors out on a context-length limit or quietly costs you five times what you expected. The page looked small in a browser. In tokens it was enormous. This is the single most common wall people hit when they first give an LLM the ability to read the web, and it is entirely fixable once you understand where the tokens actually go.

Why raw HTML is brutal in tokens

A modern web page is mostly not content. It is navigation, script tags, inline styles, cookie and consent markup, tracking pixels, analytics blobs, SVG icons, repeated header and footer chrome, and a wrapper of div soup around the handful of paragraphs you actually wanted. When you feed that to a model, you pay for every angle bracket and every class name. The model, for its part, has to read past all of it to find the answer, which makes retrieval and reasoning noisier at the same time it makes them more expensive.

Here is a real measurement, not an estimate. Take the Python standard library docs page for the json module, a page every backend developer has read at some point:

https://docs.python.org/3/library/json.html

Fetched as raw HTML with curl, that page is 112,865 bytes. At the common four-characters-per-token approximation, that is roughly 28,000 tokens of markup before your prompt, your instructions, or anything else you need in the window. And this is a well-behaved, mostly static documentation page. Marketing pages and single-page apps are worse.

Now read the same URL as an AIDocument through Lyrenth. Every response carries an economics line in its header. Here is the genuine one that came back:

status 200 · render static · 5303 words · ~8,629 tokens · 69% smaller than raw HTML

The body dropped from roughly 28,000 tokens of HTML to about 8,600 tokens of clean Markdown for the same page, and the API reported that as 69 percent smaller than the raw HTML it started from. (Method: token counts are the values on the economics line the API returns for that exact URL, measured July 2026; the raw-HTML figure is the actual curl byte size divided by four.)

The savings compound. If your agent reads ten pages in a research loop, that is the difference between spending your window on ten pages of content or drowning in ninety percent boilerplate and fitting only two or three before you run out of room.

The three workarounds that do not hold up

Before reaching for the clean shape, most people try one of three things. Each has a failure mode worth naming, because you will hit it.

Truncation. Cut the HTML at N characters and hope the answer is near the top. It usually is not. The interesting content on a real page sits well below the header, the hero, the navigation, and often a consent overlay. Truncating raw HTML is a coin flip on whether you kept the part that mattered, and when you lose it, the model confidently answers from whatever chrome survived the cut.

Summarization before the read. Run a cheap model to compress the page first, then hand the summary to your main model. Now you have two problems: you pay for two passes, and the summarization step hallucinates. A compressor that has never seen the ground truth will smooth over the exact number, the exact function signature, or the exact clause you needed. Lossy compression of a document you have not verified is a good way to launder a mistake into your pipeline.

Readability libraries. Client-side extraction libraries that pull the "article" out of HTML work well on a blog post and fall over on anything that renders in the browser. A JavaScript-driven site hands the fetcher a hollow div and loads the real content afterward, so the extractor gets an empty page. Shadow DOM and web-component sites serialize to markup that has no readable text in it at all. You end up maintaining a headless browser just to see the content, which is its own project.

The common thread: all three try to rescue signal from a format that was never meant to be read by a machine. The better move is to get the content in a shape that is signal to begin with.

Markdown is the right density

Markdown is the sweet spot for an LLM reading the web. It keeps the structure a model actually uses, headings, lists, links, code blocks, tables, and drops everything a model cannot use, the layout scaffolding and the scripts. It is close to how the text would read if a person copied the meaningful part of the page into a notes file. And because a model was trained on a great deal of Markdown, it parses cleanly with no special handling.

That is what an AIDocument is: one clean, stable shape for a page. A Markdown body plus a title, a description, and the page structure, with the boilerplate stripped and JavaScript rendered when the page needs it. If you want the full picture of the shape and the fields it carries, that is the subject of a companion post, what an AIDocument is. For this post the relevant part is simple: the body you get back is content, not chrome, so nearly every token you pay for is a token worth reading.

Practical guidance: trim at clean boundaries with max_tokens

Sometimes even the clean body is bigger than the budget you want to spend on a single page. A 5,000-word reference page is legitimately long. Rather than fetch the whole thing and truncate it yourself in the middle of a sentence, cap it at the source. The reader accepts a max_tokens parameter that trims the body to roughly that many tokens at a clean paragraph or sentence boundary, so the tail is never a half-finished thought.

Here is a real trimmed call against the same json docs page, capped at 200 tokens. This is the genuine response header and the start of the body:

status 200 · render static · 5303 words · ~195 tokens · 99% smaller than raw HTML · truncated to budget

# `json` — JSON encoder and decoder

**Source code:** Lib/json/__init__.py

JSON (JavaScript Object Notation), specified by RFC 7159 (which
obsoletes RFC 4627) and by ECMA-404, is a lightweight data
interchange format inspired by JavaScript object literal syntax
(although it is not a strict subset of JavaScript).

Note

Two things to notice. The header now reads ~195 tokens · 99% smaller than raw HTML · truncated to budget, so you get the same economics accounting on the trimmed result and can see exactly what you spent. And the body stops at a boundary, at the end of a note, not mid-word. The model gets the top of the document intact instead of a jagged fragment. (Same method as above: these are the real values the API returned for max_tokens=200 on that URL in July 2026.)

The practical pattern for a research agent is to set max_tokens to a page budget that leaves room for your prompt and your other pages. If a first read at a tight cap does not contain the answer, read again without the cap, or read a more specific URL. You are trading a cheap, bounded first look against an occasional full read, which is far better than paying for the whole raw page on every hop.

A worked example

Put it together as the loop an agent actually runs. Say the task is "find the default value of the sort_keys argument in Python's json.dumps."

Read the page as an AIDocument. You get about 8,600 tokens of clean Markdown instead of about 28,000 tokens of HTML, and the headings and the parameter list survive intact because structure is preserved.
If you are budget-conscious across many pages, cap the read with max_tokens so each hop stays bounded, and the trim lands on a clean boundary rather than slicing a code block in half.
Hand that to the model. The answer, sort_keys defaults to False, sits in a clean parameter list, not buried under three levels of div and a cookie banner, so the model finds it on the first pass.

Same answer, a fraction of the tokens, and no summarization layer inventing a default that was never on the page.

The reader endpoint takes the same shape from curl:

curl -H "Authorization: Bearer $LYRENTH_API_KEY" \
  "https://api.lyrenth.com/v1/read?url=https://docs.python.org/3/library/json.html&max_tokens=200"

The point

The context window is not the constraint people think it is. The constraint is spending it on the wrong bytes. Raw HTML is roughly ninety-five percent scaffolding; a clean Markdown body is roughly ninety-five percent signal. Once your agent reads content instead of page chrome, the same window holds several times as much of what you actually came for, your token bill drops in proportion, and the model stops tripping over navigation menus on its way to the answer. Truncation, pre-summarization, and readability libraries are all attempts to rescue signal after the fact; getting the clean shape in the first place is cheaper and more reliable than any of them.

If you want to try it on your own URLs, the free tier is 2,000 AIDocuments a month, no card, and every response carries its own economics line so you can check the token savings on the pages you actually read. The five-minute version is in the quickstart.