Skip to content

Fix windows logging tests and guard against runtime panic#6105

Open
EvanDorsky wants to merge 1 commit into
viamrobotics:mainfrom
EvanDorsky:winio-fix
Open

Fix windows logging tests and guard against runtime panic#6105
EvanDorsky wants to merge 1 commit into
viamrobotics:mainfrom
EvanDorsky:winio-fix

Conversation

@EvanDorsky

Copy link
Copy Markdown
Member

Fix for CI failure: https://github.com/viamrobotics/rdk/actions/runs/27287348888/job/80598208096

2026-06-10T15:41:09.5634785Z panic: runtime error: invalid memory address or nil pointer dereference
2026-06-10T15:41:09.5635127Z [signal 0xc0000005 code=0x1 addr=0x40 pc=0x7ff7ee2a0f44]
2026-06-10T15:41:09.5635307Z
2026-06-10T15:41:09.5635396Z goroutine 17 [running, locked to thread]:
2026-06-10T15:41:09.5635947Z github.com/Microsoft/go-winio/pkg/etw.providerCallback({0x0, 0x0, 0x0, {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...}}, ...)
2026-06-10T15:41:09.5636595Z 	C:/Users/runneradmin/go/pkg/mod/github.com/!microsoft/go-winio@v0.6.2/pkg/etw/provider.go:84 +0x44
2026-06-10T15:41:09.5637136Z github.com/Microsoft/go-winio/pkg/etw.providerCallbackAdapter(0x0?, 0x0?, 0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
2026-06-10T15:41:09.5637800Z 	C:/Users/runneradmin/go/pkg/mod/github.com/!microsoft/go-winio@v0.6.2/pkg/etw/wrapper_64.go:57 +0x36

Background

ETW logging operates with a provider / session model, where a provider is like a publisher and a session is like a subscriber in a pub/sub model.

viam-server now calls out to logman to start a logging session when it opens, and to kill that session before it exits. Provider registration is handled by the go-winio library.

There is a bug in the go-winio library that can lead to a nil pointer dereference when attempting to close a log provider if a logging session remains active after the provider is closed. This scenario shouldn't even be reached in prod because

  • providers are killed by Windows when the process that owns them exits
  • viam-server kills its own session when it exits
  • viam-server explicitly kills its provider AFTER killing the session, before exiting (ensuring that no sessions should be active when closing a provider)
  • when viam-server starts, it checks for existing sessions and kills them before trying to start its own

In CI, multiple logging providers are being created in parallel all with the same name and ID. Multiple loggers are also trying to create logging sessions with the same name, so if test A and test B start in sequence and each test starts a logging provider and a logging session, the session from test B can be active while the provider from test A is being closed.

None of this would be a problem, though, if not for the bug PLUS a quirk of how providers and sessions interact. When a session closes, it sends a notification to all active providers informing them of the state change. This notification is caught by a callback in go-winio, and that callback assumes that the provider is non-nil:

func providerCallback(...) {
	provider := providers.getProvider(uint(i)) // returns nil if the provider has already been killed

	switch state {
	case ProviderStateCaptureState:
	case ProviderStateDisable:
		provider.enabled = false // causes nil pointer rereference
	case ProviderStateEnable:
		provider.enabled = true
		provider.level = level
		provider.keywordAny = matchAnyKeyword
		provider.keywordAll = matchAllKeyword
	}

        // nil check happens too late
	if provider.callback != nil {
		provider.callback(sourceID, state, level, matchAnyKeyword, matchAllKeyword, filterData)
	}
}

permalink to go-winio code

So if the callback fires after the provider has already been closed (which makes it nil), then we get the nil pointer panic at runtime.

Fix

The fix (suggested by Claude) seems to be to just never explicitly call provider.Close. Windows already unregisters providers on its own, and if we never call provider.Close then the provider will never be nil when the callback fires.

So this branch just removes the calls to provider.Close and also changes the tests a little bit so they don't unnecessarily leak sessions.

@EvanDorsky EvanDorsky requested a review from dgottlieb June 11, 2026 17:30
@viambot viambot added the safe to test This pull request is marked safe to test from a trusted zone label Jun 11, 2026
@EvanDorsky

Copy link
Copy Markdown
Member Author

Hm, macOS tests are failing because of bad dependencies? Did something happen to our runners? @cheukt

From this PR's CI run just now:

You have 17 outdated formulae and 3 outdated casks installed.
You can upgrade them with brew upgrade
or list them with brew outdated.
Warning: Skipping aws/tap because it is not trusted. Run `brew trust aws/tap` to trust it.
Warning: Skipping azure/bicep because it is not trusted. Run `brew trust azure/bicep` to trust it.
Warning: Skipping hashicorp/tap because it is not trusted. Run `brew trust hashicorp/tap` to trust it.
Warning: Skipping aws/tap because it is not trusted. Run `brew trust aws/tap` to trust it.
Warning: Skipping hashicorp/tap because it is not trusted. Run `brew trust hashicorp/tap` to trust it.
==> Tapping viamrobotics/brews
Cloning into '/opt/homebrew/Library/Taps/viamrobotics/homebrew-brews'...
Tapped 15 formulae (37 files, 577.9KB).
Error: Refusing to load formula viamrobotics/brews/nlopt-static from untrusted tap viamrobotics/brews.
Run `brew trust --formula viamrobotics/brews/nlopt-static` or `brew trust viamrobotics/brews` to trust it.
Error: Process completed with exit code 1.

@EvanDorsky EvanDorsky added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Jun 12, 2026

@dgottlieb dgottlieb left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm -- though I expect you'll look into how modules are working.

I expect if there is a module's problem -- that could be more involved and you'll want to open a separate PR.

But happy to re-review here if that time comes also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test This pull request is marked safe to test from a trusted zone

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants