Bot Detection
The Edge Function detects AI crawlers accessing protected content without a signed URL and returns a 403 Forbidden response directing them to the Exchange. Bot detection only applies to requests without signed URL parameters — a valid signed URL always takes precedence.
Detection Signals
Section titled “Detection Signals”Three layers of bot detection, from cheapest to most expensive:
Layer 1: User-Agent String Matching (< 0.01ms)
Section titled “Layer 1: User-Agent String Matching (< 0.01ms)”Known AI crawler User-Agents. This is the primary detection mechanism.
# Well-known AI bot User-Agents (maintained list)ClaudeBot/1.0 - Anthropic training crawleranthropic-ai - Anthropic inference/RAGGPTBot/1.0 - OpenAI training crawlerChatGPT-User - OpenAI inference/RAGCCBot/2.0 - Common CrawlGoogle-Extended - Google AI trainingGooglebot-Extended - Google AI training (alternate)Bytespider - ByteDance training crawlerPerplexityBot - Perplexity AIYouBot - You.comApplebot-Extended - Apple AI trainingcohere-ai - CohereMeta-ExternalAgent - Meta AI trainingAmazonbot - Amazon AIAI2Bot - Allen Institute for AIDiffbot - DiffbotOmgilibot - Webz.ioFacebookBot - Metaramp-ai-buyer - RAMP protocol agent (self-identified)The edge function maintains this list in configuration (KV store or inline config). The list is additive — new bots are added, existing entries are never removed.
Layer 2: CDN-Native Bot Classification (0ms — pre-computed)
Section titled “Layer 2: CDN-Native Bot Classification (0ms — pre-computed)”All major CDNs maintain their own bot classification databases:
- Cloudflare:
cf.bot_management.score(0-99, lower = more likely bot),cf.bot_management.verified_bot,cf.bot_management.ja3_hash. - Akamai: Bot Manager Premier — behavioral analysis, device fingerprinting.
- Fastly: Signal Sciences WAF — bot detection signals.
- CloudFront: AWS WAF Bot Control — managed rule group with bot categorization.
These are available as request metadata without any computation cost at the edge. The edge function can check CDN-native bot signals as a second layer.
Layer 3: Behavioral Signals (0ms — pattern matching)
Section titled “Layer 3: Behavioral Signals (0ms — pattern matching)”Heuristic patterns that indicate automated access:
- Request rate: More than N requests to protected paths within T seconds from the same IP.
- Missing headers: Real browsers send
Accept,Accept-Language,Accept-Encoding. Bots often omit them. - No cookie/session: Legitimate users have session cookies. First-time visitors get a redirect to set a cookie (but this interferes with bot protocol flow — use sparingly).
- Sequential path access: Bots tend to walk paths sequentially (
/premium/1,/premium/2, …). Humans jump around. - TLS fingerprint: JA3/JA4 fingerprint mismatches (User-Agent claims to be Chrome but TLS fingerprint is Python
requests).
Layer 3 signals are supplementary. They can reduce false positives (bot with unknown User-Agent) but should not override a valid signed URL.
Bot Classification Tiers
Section titled “Bot Classification Tiers”The edge function catches cooperative AI bots — those that identify themselves via well-known User-Agent strings. It is not a replacement for the provider’s CDN WAF. Both are required:
| Layer | What It Catches | Mechanism |
|---|---|---|
| Edge function (RAMP) | Cooperative AI bots (ClaudeBot, GPTBot, etc.) | User-Agent string matching |
| CDN WAF | Everything else: disguised bots, credential stuffing, DDoS, scraping with spoofed UAs | Fingerprinting, behavioral analysis, rate limiting, ML models |
The edge function is additive to the WAF, not a replacement. It must deploy alongside existing WAF rules without conflict.
Integration requirement: The edge function runs after the WAF (or alongside it, depending on CDN architecture). WAF rules should not block requests to /.well-known/ramp.json or /rsl.txt, and should not interfere with signed URL parameters. Providers must allowlist these paths in their WAF configuration.
CDN WAF Products by Platform
Section titled “CDN WAF Products by Platform”| CDN | WAF / Bot Management Product |
|---|---|
| CloudFront | AWS WAF (Bot Control managed rule group) |
| Cloudflare | Cloudflare Bot Management (Enterprise), Super Bot Fight Mode (Pro/Business) |
| Akamai | Akamai Bot Manager (Premier / Standard) |
| Fastly | Fastly Signal Sciences (next-gen WAF) |
The 403 Response
Section titled “The 403 Response”When a bot is detected on a protected path without a signed URL:
HTTP/1.1 403 ForbiddenContent-Type: application/jsonX-Content-Rules: https://exchange.ssp-example.com/ramp/v1Cache-Control: no-store{ "error": "Licensed content. Negotiate access via the Exchange.", "protocol": "RAMP", "version": "1.0", "info_url": "https://exchange.ssp-example.com/ramp/v1/info", "ramp_json_url": "https://techcrunch.com/.well-known/ramp.json"}Header semantics:
X-Content-Rules— the Exchange info endpoint. This is the key discovery mechanism for bots that did not checkramp.json. The header value is a URL that the agent canGETto learn about available content and pricing.Cache-Control: no-store— the 403 should not be cached. Content availability may change.
Body semantics:
error— human/machine readable explanation.protocol— identifies this as a RAMP protocol response, not a generic 403.version— protocol version for forward compatibility.info_url— same asX-Content-Rulesheader, in the body for agents that parse JSON responses.ramp_json_url— direct link toramp.jsonso the agent can discover all authorized Exchanges, not just the one inX-Content-Rules.
Classification Algorithm
Section titled “Classification Algorithm”The detection algorithm follows a strict priority order to minimize false positives:
- Only process protected paths — never block requests to open content.
- Check for signed URL parameters first — if present, skip bot detection entirely and proceed to signature verification.
- User-Agent matching is exact substring match, not fuzzy. Unknown User-Agents pass through.
- Never block based on behavioral signals alone — they supplement User-Agent detection, not replace it.
False Positive Handling
Section titled “False Positive Handling”A false positive (regular user blocked as bot) is worse than a false negative (bot not detected). The edge function should err on the side of allowing access:
- Only block on protected paths — never block requests to open content.
- Never block requests with valid signed URLs — signature verification overrides bot detection.
- User-Agent matching is exact substring match, not fuzzy. Unknown User-Agents pass through.
- Do not block based on behavioral signals alone — they supplement User-Agent detection, not replace it.
Rate Limiting
Section titled “Rate Limiting”The edge function should rate limit at two levels:
Level 1: Per-IP rate limiting on protected path 403 responses
If the same IP is generating more than N 403 responses per minute on protected paths, drop to a synthetic 429 Too Many Requests instead of the informative 403. This prevents:
- DoS via 403 generation (each 403 includes a JSON body).
- Enumeration attacks (scanning all protected paths).
Target: 100 requests/minute per IP before throttling.
Level 2: Per-IP rate limiting on ramp.json
ramp.json is cacheable, but an aggressive agent might poll it. Rate limit to 10 requests/minute per IP and rely on Cache-Control for normal operation.
Rate limiting implementation:
- Cloudflare: Use
request.cf.botManagementand Cloudflare Rate Limiting rules (not in Worker code — use platform rate limiting for efficiency). - CloudFront: AWS WAF rate-based rules on the distribution.
- Akamai: Rate control via Property Manager.
- Fastly: Use
req.http.Fastly-Client-IPwith in-memory counters in the Compute service, or Fastly rate limiting product.
Bot Pattern Updates
Section titled “Bot Pattern Updates”The RAMP project should maintain a community-curated bot pattern list as a JSON file at a well-known URL (e.g., https://ramp-protocol.org/bot-patterns.json). Edge functions can periodically fetch updates from this list. The fetch happens on a timer (hourly), not per-request.