feat(electric-telemetry): add process_subtype attribute for supervisor/erlang/logger_olp granularity#4397
feat(electric-telemetry): add process_subtype attribute for supervisor/erlang/logger_olp granularity#4397erik-the-implementer wants to merge 2 commits into
Conversation
Adds a new low-cardinality `process_subtype` attribute alongside the
existing `process_type` on all telemetry events that today carry it
(`vm.monitor.long_{gc,schedule,message_queue}`, `process.memory`,
`process.bin_memory`).
For the three coarse `process_type` buckets that previously hid most
of the signal during overload, `process_subtype` is derived as:
* `:supervisor` -> registered name, else first atom in $ancestors
* `:erlang` -> registered name, else initial_call MFA string
* `:logger_olp` -> registered name (handler id)
For every other `process_type` value, `process_subtype` is `nil`.
The existing `process_type` taxonomy is unchanged, so Honeycomb boards
and alerts that group by it continue to work; `process_subtype` adds
a finer-grained drill-down without exploding cardinality.
Refs electric-sql/alco-agent-tasks#46.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4397 +/- ##
==========================================
+ Coverage 59.46% 59.66% +0.20%
==========================================
Files 304 319 +15
Lines 30626 31224 +598
Branches 8335 8334 -1
==========================================
+ Hits 18211 18630 +419
- Misses 12397 12576 +179
Partials 18 18
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Claude Code ReviewSummaryAdds a What's Working Well
Issues FoundCritical (Must Fix)None. Important (Should Fix)None blocking. Suggestions (Nice to Have)1. File:
2. Stale subtype on File:
3. File: For any unnamed 4. Cardinality note for closure MFAs File:
5. No test for File:
6. Reporter behavior with File: Worth a manual sanity check: how do the configured reporters ( Issue ConformanceNo linked issue. PR description references "recent investigations into long-mailbox spikes" but no tracking ticket. Per project guidelines, PRs should reference an issue (even a short one capturing the original observability gap) — flagging as a minor process note, not a blocker. The PR description itself is detailed and adequate as a self-contained spec. Previous Review StatusFirst review — no prior context. Review iteration: 1 | 2026-05-22 |
Claude Code ReviewSummaryAdds a new What's Working Well
Issues FoundCritical (Must Fix)None. Important (Should Fix)Missing linked issuePer project review convention (Phase 2.7), PRs should reference the issue they address. The PR description mentions "recent investigations into long-mailbox spikes" but doesn't link an issue. Worth adding a tracking issue (or linking the relevant incident/investigation) so future readers can find the motivation. Suggestions (Nice to Have)Test gap: dead process behaviourFile: Existing tests cover
|
Summary
Adds a new low-cardinality
process_subtypeattribute alongside the existingprocess_typeon allelectric-telemetryevents that today carry it:vm.monitor.long_gc,vm.monitor.long_schedule,vm.monitor.long_message_queue,process.memory,process.bin_memory.For the three coarse
process_typebuckets that hide the most signal during overload (per recent investigations into long-mailbox spikes),process_subtypeis derived from cheap process introspection:process_type = "supervisor"→ registered name; else first atom in$ancestors; elsenil.process_type = "erlang"→ registered name (catches named VM helpers like:erts_dirty_process_signal_handler); elseinitial_callMFA string (e.g.":erlang.apply/2").process_type = "logger_olp"→ registered name (the handler id —default,otel_log_handler,logger_proxy, …).For all other
process_typevalues,process_subtypeisnil.The change is purely additive:
process_typevalues are unchanged, so existing Honeycomb boards and alerts that group byprocess_typecontinue to work.process_subtypegives a drill-down dimension without exploding cardinality (registered names + MFAs only; no pids, no dynamic registry tuples).Implementation notes
ElectricTelemetry.Processes.proc_type_and_subtype/1returns{type, subtype}in a singleProcess.info/2call;proc_subtype/1is also exported for callers that only want the subtype.Process.info/2now also fetches:registered_name(one extra key per call).sorted_groups/2groups by{type, subtype}so theprocess.memory/process.bin_memorymetrics break down by subtype as well.:process_subtypeis added to thetags:lists of the affectedlast_value,sum, anddistributionmetric definitions.proc_type/1is kept unchanged for backward compatibility.Test plan
packages/electric-telemetry/test/electric/telemetry/processes_test.exs).proc_type/1tests unchanged and still pass (121/121 tests pass inelectric-telemetry).@core/electric-telemetry: minor.🤖 Generated with Claude Code