30x Token Compression: How Plasmate Cuts LLM Costs for Web-Scraping AI Agents
Every token your AI agent spends parsing CSS class names and tracking scripts is money burned. We ran Plasmate against 100 production websites to quantify exactly how much waste raw HTML introduces and how much you can save by switching to structured semantic output.
The headline number: 30x mean compression from HTML to SOM. The best case: 864x on accounts.google.com. The worst case: documentation sites that are already lean still achieve 3-9x.
This post presents the full benchmark methodology, results by category, cost calculations at scale, and an honest assessment of where Plasmate helps most and where it helps least.
TL;DR: The Numbers
- 100 URLs tested, 98% success rate
- Mean compression: 30x (HTML to SOM)
- Median compression: 10.2x
- P95 compression: 98.4x
- Mean fetch time: 232ms
- Mean parse+SOM compilation: 19ms
- Total pipeline latency: ~250ms
Methodology
We selected 100 URLs across eight categories: e-commerce, news, SaaS applications, search engines, social media, developer documentation, government sites, and financial services. Selection criteria prioritized diversity over cherry-picking favorable results.
For each URL, we performed the following:
- Raw HTML baseline: Fetched the page using a standard HTTP client and measured the byte size of the HTML response body.
- Plasmate SOM: Ran
plasmate fetchwith default settings and measured the byte size of the JSON output.
- Token estimation: Applied the cl100k_base tokenizer (GPT-4/Claude tokenizer) to both representations.
- Compression ratio: Calculated
html_tokens / som_tokensfor each URL.
- Performance timing: Recorded network fetch time and parse+compile time separately.
Results by Category
Search Engines
Search engine homepages and results pages are among the highest-compression targets due to heavy JavaScript bundling, analytics, and A/B testing infrastructure.
| Site | Compression Ratio |
|---|---|
| www.google.com | 114.4x |
| www.bing.com | 98.2x |
| duckduckgo.com | 9.3x |
SaaS Applications
Modern SaaS applications use component frameworks that generate verbose class names and nested wrapper divs. These compress extremely well.
| Site | Compression Ratio |
|---|---|
| linear.app | 105.1x |
| www.figma.com | 63.6x |
| vercel.com | 28.4x |
| stripe.com | 14.2x |
E-commerce
Retail sites carry product grids, recommendation carousels, and tracking pixels. Compression is consistent but not extreme.
| Site | Compression Ratio |
|---|---|
| www.walmart.com | 32.1x |
| www.target.com | 27.8x |
| www.bestbuy.com | 21.4x |
| www.amazon.com | 19.6x |
News Sites
News sites vary dramatically based on advertising load and article density.
| Site | Compression Ratio |
|---|---|
| www.nytimes.com | 98.4x |
| www.washingtonpost.com | 45.2x |
| www.theguardian.com | 31.7x |
| www.bbc.com | 15.8x |
Social Media and Authentication
These platforms generate the most dramatic compression ratios in our benchmark.
| Site | Compression Ratio |
|---|---|
| accounts.google.com | 864.9x |
| x.com | 163.6x |
| store.steampowered.com | 77.7x |
Developer Documentation
Documentation sites are already optimized for readability and produce the lowest compression ratios.
| Site | Compression Ratio |
|---|---|
| developer.chrome.com | 9.1x |
| developer.mozilla.org | 5.8x |
| docs.python.org | 3.2x |
Why Compression Matters: The Cost Calculation
Let's make this concrete with a cost model for a production agent system.
Assumptions:- Agent processes 10,000 web pages per day
- Average raw HTML size: 250KB (approximately 35,000 tokens)
- Average SOM size: 8KB (approximately 1,200 tokens)
- LLM input pricing: $3 per million tokens (Claude Sonnet 4 tier)
| Format | Tokens/Page | Daily Tokens | Daily Cost |
|---|---|---|---|
| Raw HTML | 35,000 | 350,000,000 | $1,050 |
| SOM | 1,200 | 12,000,000 | $36 |
| Format | Annual Cost |
|---|---|
| Raw HTML | $383,250 |
| SOM | $13,140 |
| Savings | $370,110 |
Even conservative estimates matter. If your actual compression is only 10x instead of 30x, you still save $345,000 per year at 10,000 pages/day.
Performance: The 250ms Pipeline
Token compression is meaningless if it adds latency. Our benchmarks show Plasmate's pipeline runs faster than most agents' decision-making loops.
| Stage | Mean Time | P95 Time |
|---|---|---|
| Network fetch | 232ms | 890ms |
| Parse + SOM compile | 19ms | 42ms |
| Total | 251ms | 932ms |
For comparison, agents using browser automation tools like Playwright or Puppeteer typically see 2-5 second page load times due to full browser rendering. Plasmate's HTTP-first approach with JavaScript execution on demand provides a middle ground between raw HTML fetch speed and full browser fidelity.
Honest Limitations: Where SOM Helps Less
Transparency matters more than marketing. Here are scenarios where Plasmate provides minimal benefit:
Documentation and text-heavy sites
Sites like MDN, Python docs, and technical wikis already optimize for content. You see 3-9x compression instead of 30x+. The savings are real but not dramatic.
Sites requiring full browser state
Some applications depend on client-side rendering that cannot be replicated without a full browser session. Plasmate handles most JavaScript, but edge cases exist in heavily dynamic SPAs.
Already-structured APIs
If a site offers a public API with structured JSON responses, use that directly. Plasmate solves the "website as interface" problem, not the "API exists but I'm ignoring it" problem.
Real-time content
Websocket-driven dashboards, live sports scores, and streaming content update faster than any crawl cycle. SOM captures snapshots, not streams.
Authentication-gated content
Plasmate supports custom headers including Authorization tokens, but managing session state across complex multi-step authentication flows requires additional orchestration.
What the Data Tells Us
Three findings stand out from this benchmark:
1. Framework overhead dominates modern web pages. Sites using React, Vue, Angular, or Tailwind CSS generate HTML that is 90%+ presentation markup. This is not a criticism of those tools; they optimize for developer experience and visual fidelity. But agents pay the token cost of that optimization even though they need none of it. 2. The variance is massive. 864x to 3x is a 288x range. Any single-number claim about "how much Plasmate helps" is necessarily incomplete. Your mileage depends entirely on which sites your agent needs to read. 3. Documentation sites set the floor. If your agent primarily reads docs.python.org and MDN, expect 5-10x compression. If it reads SaaS dashboards and e-commerce sites, expect 20-100x.Try It Yourself
Reproduce these benchmarks on your target URLs:
# Install Plasmate
npm install -g plasmate
# Benchmark a single URL
plasmate bench https://your-target-site.com
# Benchmark multiple URLs from a file
plasmate bench --urls urls.txt --output results.json
The benchmark command outputs compression ratio, timing breakdown, and byte counts for both HTML and SOM.
For production integration, see our guides for MCP server setup, LangChain integration, and direct API usage.