Fix possible 100% CPU loop in CivetWeb#2882
Conversation
…connections Signed-off-by: Dominik <dl6er@dl6er.de>
|
TODO: Need to submit a PR upstream |
There was a problem hiding this comment.
Pull request overview
This pull request introduces a small configurable backoff for non-blocking mbedTLS operations in CivetWeb to prevent tight retry loops that can otherwise drive a worker thread to 100% CPU on idle/keep-alive HTTPS connections.
Changes:
- Define
MG_MBEDTLS_WANT_RETRY_DELAY_MS(default: 5ms) as a tunable backoff interval. - Sleep briefly in the mbedTLS handshake loop when
WANT_READ/WRITE(or async-in-progress) is returned. - Sleep briefly in the mbedTLS read path when
WANT_READ/WRITE(or async-in-progress) is returned after a poll-readability event.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
src/webserver/civetweb/mod_mbedtls.inl |
Adds a tiny sleep in the mbedTLS handshake retry loop to avoid spinning on non-blocking sockets. |
src/webserver/civetweb/civetweb.c |
Introduces the backoff macro and applies it in the mbedTLS read path when WANT_* is returned. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
yubiuser
left a comment
There was a problem hiding this comment.
Should such a change not better be a patch at https://github.com/pi-hole/FTL/tree/master/patch/civetweb
|
I think this is the intention:
|
|
FWIW, I think I maybe ran into this after issuing a request to the admin UI (clean session; was just going to add a blocklist entry). I saw a single pihole-FTL thread spike to, and stay at, 100% cpu use. From
fd35 = 0.0.0.0:80 If I run into it again I can try to get some more useful info via gdb or something, but it's just my home network so I just dumped what info I could and HUP'd the process |
|
Sorry for the long delay in replying - real life has been really busy lately. Thanks for the detailed dump - that's genuinely useful. One thing stands out though: the strace shows the spinning thread sitting in a tight So as it stands this looks like it may be a related but distinct loop rather than the exact one this PR targets — possibly the accept loop or the worker queue getting wedged. It's plausible the two are connected (e.g. workers stuck spinning never return to the pool and starve the accept side), but the trace only shows one hot thread and it's the master on the listeners, so I can't tie it to the TLS path from this alone. If you hit it again, the one thing that would nail it down is a backtrace of the 100%-CPU thread under gdb: (or at least Appreciate you grabbing what you could before HUP'ing it! |
What does this implement/fix?
Try tiny backoff to avoid tight retry loops on idle HTTPS keep-alive connections
Related issue or feature (if applicable): N/A
Pull request in docs with documentation (if applicable): N/A
By submitting this pull request, I confirm the following:
git rebase)Checklist:
developmentbranch.