Disallow /live/longpoll in the generated robots.txt by patrols · Pull Request #6699 · phoenixframework/phoenix

patrols · 2026-06-07T00:37:09Z

New LiveView apps enable a longpoll fallback by default, but the generated
robots.txt is empty — so search engine crawlers spend crawl budget polling a
transport endpoint that has no indexable content.

The chain:

The generated endpoint mounts socket "/live" with longpoll: enabled, and the
generated app.js sets longPollFallbackMs: 2500.
Googlebot's renderer doesn't open WebSockets, so on every render it falls back to
longpoll and fetches /live/longpoll.
That endpoint is transport-only — it returns serialized socket messages, not HTML.
Every fetch is wasted crawl budget.

I ran into this on a production LiveView site:

This PR adds Disallow: /live/longpoll to the scaffolded robots.txt, and documents
the behavior on longPollFallbackMs in Phoenix.Socket.

Why robots.txt is the right lever

The renderer honors robots.txt for the resources it fetches during rendering
(JS/CSS/XHR), so the Disallow actually stops the longpoll fetch — it doesn't merely
deindex it.

It also can't hide content. LiveView's disconnected ("dead") render already emits the
page's server-rendered HTML; the socket is for interactivity, not first-paint content.
So for apps that render in mount/render (the disconnected pass), nothing indexable
lives behind the socket. The one exception is an app that gates indexable content behind
connected?(socket) — it serves that content only over the socket, and this rule removes
the renderer's last path to it (WebSocket was already unavailable). That's a discouraged
pattern for SEO and such content indexes poorly today regardless, but it's worth calling
out.

Scope (a deliberate default)

/live/longpoll is correct for the default socket mount (Phoenix.LiveView.Socket at
/live). An app that remounts the socket, runs multiple sockets, or uses a plain
Phoenix.Socket with longpoll would need to adjust the rule. Since this is a scaffolded
file that bakes in the generator's own defaults, I kept it as a single unconditional line
rather than templating it: it matches the existing "one shared static robots.txt"
approach, and it's a harmless no-op for --no-html/--no-live apps where the /live
socket is commented out anyway.

The two commits are split so the Phoenix.Socket doc change can stand on its own if you'd
rather take them separately.

Refs

Googlebot doesn't support WebSockets; provide an HTTP fallback —
https://developers.google.com/search/docs/crawling-indexing/javascript/fix-search-javascript
robots.txt governs the resources the renderer fetches —
https://developers.google.com/search/blog/2024/12/crawling-december-resources

Clients that can't open a WebSocket fall back to the LongPoll transport, and search engine crawlers are in that group: their renderers don't open WebSockets. With longPollFallbackMs set (the default in generated apps), they fall back and repeatedly fetch /live/longpoll while rendering each page. That endpoint serves no indexable content, so the requests are wasted crawl budget. Add the rule to the scaffolded robots.txt and assert it in the installer test so a future template edit can't silently drop it.

Clients without WebSocket support fall back to LongPoll, and search engine crawlers are the common case: their renderers don't open WebSockets and repeatedly request the LongPoll endpoint (/live/longpoll for LiveView), which serves no indexable content. Note this next to the option that enables the fallback so the robots.txt advice has a home.

SteffenDE · 2026-06-08T08:16:54Z

I'm not yet convinced that this is the default we should generate. If a crawler opens the longpoll connection, it is because it's executing the JavaScript. So letting it do that seems just fine to me. As long as we don't see crawls failing due to this?

patrols · 2026-06-08T10:15:16Z

@SteffenDE Fair pushback. I pulled my site's crawl stats before replying, since my first instinct was the same as yours.

You're right the longpoll fetch means the renderer is running our JS, and we want that. But Disallow: /live/longpoll doesn't stop it. The dead render already emits the full HTML before any socket exists, so Googlebot still runs the JS, renders, and indexes. It just skips the one request that returns transport frames instead of content.

And it adds up. With longPollFallbackMs: 2500 it keeps polling on every render. In my Search Console, 569 of 684 sampled JSON URLs (83%) were /live/longpoll, and JSON was 66% of all Googlebot requests vs 26% for real HTML pages. It's the single biggest thing Googlebot does on the site.

On "as long as we don't see crawls failing": nothing does. 94% are clean 200s, which is why it's easy to miss. The cost isn't errors, it's crawl budget spent on a transport endpoint instead of pages.

Either way the commits are split on purpose, so if you'd rather not touch the default I'm happy to keep just the Phoenix.Socket doc note.

patrols added 2 commits June 6, 2026 22:57

patrols changed the title ~~Robots disallow live longpoll~~ Disallow /live/longpoll in the generated robots.txt Jun 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disallow /live/longpoll in the generated robots.txt#6699

Disallow /live/longpoll in the generated robots.txt#6699
patrols wants to merge 2 commits into
phoenixframework:mainfrom
patrols:robots-disallow-live-longpoll

patrols commented Jun 7, 2026

Uh oh!

SteffenDE commented Jun 8, 2026

Uh oh!

patrols commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

patrols commented Jun 7, 2026

Why robots.txt is the right lever

Scope (a deliberate default)

Refs

Uh oh!

SteffenDE commented Jun 8, 2026

Uh oh!

patrols commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants