Operator SBOM watcher dispatches `TypeScanImages` commands with empty `Wlid` when SBOM event beats Pod event, vuln results lose workload attribution

# Description

The operator's `SBOMWatcher` correlates incoming `SBOMSyft` events with their owning workload by looking up the image hash in an in-memory map (`ImageToContainerData`) that is populated by Pod events from a separate informer channel. The two channels are merged in a `select`:

```go
// operator/watcher/sbomwatcher.go:81-104 (excerpt)
for {
    select {
    // FIXME select processes the events randomly, so we might see the SBOM event before the pod event
    case event := <-wh.eventQueue.ResultChan: // pod events fill ImageToContainerData
        ...
        for _, containerStatus := range containerStatuses {
            hash := hashFromImageID(containerStatus.ImageID)
            wh.ImageToContainerData.Set(hash, utils.ContainerData{
                ContainerName: containerStatus.Name,
                Wlid:          wlid,
            })
        }
    case sbomEvent, ok := <-sbomEvents:
        if ok { eventQueue.Enqueue(sbomEvent) } else { ... }
    ...
}
```

Two problems compose:

1. **Initial list races with pod cache fill.** Before the `select` loop starts, the watcher pages the *full existing* SBOM list and enqueues each as `watch.Added` (`sbomwatcher.go:65-77`). The race is sharper than the FIXME at line 82 suggests, because `HandleSBOMEvents` runs as its own goroutine (started at `sbomwatcher.go:54`, *before* the SBOM list call) and consumes from a separate cooldown queue. The `ImageToContainerData` map it reads from is only populated by the main `select` loop draining pod events from `wh.eventQueue` (lines 83-101). So even though `listPods` is invoked synchronously at line 45 when `ServiceDiscovery.Enabled == true`, it only **enqueues** pod events  it does not directly populate the map. The map is filled lazily by the select loop, which has not yet started when the SBOM goroutine begins servicing events.

   Two concrete failure paths:
   - With `ServiceDiscovery.Enabled == true`: race between the SBOM-handling goroutine and the select loop draining queued pod events. Whichever wins decides whether `Wlid` is empty.
   - With `ServiceDiscovery.Enabled == false` (line 43): `listPods` never runs at all, so `ImageToContainerData` is never populated and **every** SBOM dispatch goes out with `Wlid=""`.

2. **`validateContainerData` does not check `Wlid`.** When `HandleSBOMEvents` processes an SBOM whose image hash isn't in `ImageToContainerData` yet, `imageContainerData` is the zero-value struct and `containerData.Wlid == ""`. The validation step only rejects empty `ImageID`/`ImageTag`:

   ```go
   // sbomwatcher.go:221-229
   func validateContainerData(containerData *utils.ContainerData) error {
       if containerData.ImageID == "" { return ErrMissingImageID }
       if containerData.ImageTag == "" { return ErrMissingImageTag }
       return nil
   }
   ```

   So validation passes, and a `TypeScanImages` command is dispatched with `Wlid = ""`:

   ```go
   // sbomwatcher.go:181-191
   cmd := &apis.Command{
       Wlid:        containerData.Wlid,          // = ""
       CommandName: apis.TypeScanImages,
       Args: map[string]interface{}{
           utils.ArgsContainerData: containerData,
       },
   }
   logger.L().Info("scanning SBOM", helpers.String("wlid", cmd.Wlid), ...)
   producedCommands <- cmd
   ```

   The downstream `kubevuln` HTTP scan endpoint (`controllers/http.go:140-175`) accepts an empty `Wlid` and scans the image. `ScanService.ValidateScanCVE` (`kubevuln/core/services/scan.go:857-881`) only checks `ImageHash` and `ImageSlug`, so the empty-`Wlid` command passes validation and the scan executes. Then at `kubevuln/core/services/scan.go:499-507`, the platform CVE submission is gated on `if workload.Wlid != ""` — meaning the manifest is computed but **silently never submitted** to the backend when `Wlid` is empty. End-to-end effect: the operator dispatches a malformed scan, kubevuln burns the CPU/network to scan the image, and the result is silently dropped from the platform submission path. There is no retry path: once the SBOM event has been drained from the channel, it is gone, and no error log above `info` is emitted anywhere along the way.

   The presence of the `if workload.Wlid != ""` guard in kubevuln suggests the empty-`Wlid` case was already a known-bad input on the consumer side; it was patched defensively downstream rather than rejected at the source.

# Environment
- Repo: `kubescape/operator` at `main` HEAD
- Go: 1.25.0 (per `go.mod`)
- Tests: `go test ./...` in `operator/` — passes (the race is not covered by an existing test).

# Steps To Reproduce

This is hard to script in a unit test today because `HandleSBOMEvents` and the populating goroutine share state through a private map. Conceptually:

1. Apply a `SBOMSyft` object to storage with valid `ImageID`/`ImageTag` annotations but **before** the corresponding Pod is observed by the operator (e.g. delete the operator pod, leave SBOMs and pods alive, then restart only the operator, the initial SBOM list runs immediately while the pod informer is still warming up).
2. Observe `kubescape scanning SBOM` log lines with `wlid=""`.
3. Inspect the vuln report, the affected entries have no workload attribution.

A minimal reproducible unit test could be added by exposing a test seam (e.g. injecting the initial list and pod informer as separate channels in a constructor).

# Expected behavior

Either:

- **Wait for the Wlid.** If the image hash is not yet in `ImageToContainerData`, the SBOM event should be parked (re-enqueued with backoff, or buffered until the corresponding pod event arrives) rather than dispatched with an empty Wlid. The map can also be pre-populated by listing pods *before* the SBOM list call.

- **Reject explicitly.** If parking is too complex, `validateContainerData` should treat empty `Wlid` as an error — at least surfacing the problem as `ErrMissingWlid` instead of dispatching a malformed command. This trades a silent attribution loss for an explicit, retryable failure.

In either case the FIXME at line 82 should be resolved, not just annotated.

# Actual Behavior

- On operator startup, the full existing SBOM list is enqueued before the pod cache has filled. Affected entries are dispatched with `Wlid=""`.
- When `ServiceDiscovery.Enabled == false`, `listPods` is never called and **every** SBOM dispatch goes out with `Wlid=""`.
- `validateContainerData` accepts the empty-`Wlid` case as valid.
- A vuln scan command is sent over the worker pool with the malformed identifier.
- `kubevuln`'s `ValidateScanCVE` does not check `Wlid` either, so the scan executes.
- At `kubevuln/core/services/scan.go:500` the platform CVE submission is gated on `if workload.Wlid != ""` and silently no-ops for empty `Wlid` — the scan result is computed but never submitted to the backend.
- No log line above `info` is emitted that would let an operator know correlation was lost or that the submission was skipped.

# Suggested fix

Minimum, low-risk change in `operator/watcher/sbomwatcher.go`:

```go
// sbomwatcher.go:221
func validateContainerData(containerData *utils.ContainerData) error {
    if containerData.ImageID == "" {
        return ErrMissingImageID
    }
    if containerData.ImageTag == "" {
        return ErrMissingImageTag
    }
    if containerData.Wlid == "" {
        return ErrMissingWlid
    }
    return nil
}
```

Combined with re-queueing the SBOM event on `ErrMissingWlid` (with a small TTL/backoff) so a slightly-too-early arrival doesn't drop the scan permanently. Pseudocode in `HandleSBOMEvents`:

```go
if err := validateContainerData(containerData); err != nil {
    if errors.Is(err, ErrMissingWlid) {
        wh.deferredSBOMs.Enqueue(e, time.Now().Add(5*time.Second))
        continue
    }
    errorCh <- err
    continue
}
```

A small worker drains `deferredSBOMs` on a timer and re-enqueues into the main queue, with a max-retries cap.

A better long-term fix: order the startup explicitly. List pods first, populate `ImageToContainerData` synchronously, *then* call the initial SBOM `EachListItem`. The current code does the SBOM list (`sbomwatcher.go:65`) without waiting on any pod-informer sync signal.

Regression test: inject a fake informer pair into `SBOMWatcher` whose pod channel is blocked at start. Push an SBOM event. Assert that either (a) no command is dispatched with `Wlid=""`, or (b) the SBOM is re-enqueued and eventually dispatched with the correct Wlid after the pod event is unblocked.

# Source
- `operator/watcher/sbomwatcher.go:82` — the FIXME marking the race.
- `operator/watcher/sbomwatcher.go:43-51` — `listPods` only runs when `ServiceDiscovery.Enabled`; even then it only enqueues pod events rather than populating the map directly.
- `operator/watcher/sbomwatcher.go:54` — `HandleSBOMEvents` goroutine started before the SBOM list paging at lines 65-77.
- `operator/watcher/sbomwatcher.go:65-77` — initial SBOM list, enqueued before the pod informer warms up.
- `operator/watcher/sbomwatcher.go:137-191` — `HandleSBOMEvents` reads the racing map (line 163) and dispatches the scan command (lines 181-190).
- `operator/watcher/sbomwatcher.go:221-229` — `validateContainerData` (missing the `Wlid` check).
- `kubevuln/controllers/http.go:140-175` — HTTP handler that forwards the command to the scan service.
- `kubevuln/core/services/scan.go:857-881` — `ValidateScanCVE` only validates `ImageHash` and `ImageSlug`, not `Wlid`.
- `kubevuln/core/services/scan.go:499-507` — `if workload.Wlid != ""` guard that silently skips platform CVE submission when `Wlid` is empty.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operator SBOM watcher dispatches `TypeScanImages` commands with empty `Wlid` when SBOM event beats Pod event, vuln results lose workload attribution #378

Description

Environment

Steps To Reproduce

Expected behavior

Actual Behavior

Suggested fix

Source

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Operator SBOM watcher dispatches TypeScanImages commands with empty Wlid when SBOM event beats Pod event, vuln results lose workload attribution #378

Description

Description

Environment

Steps To Reproduce

Expected behavior

Actual Behavior

Suggested fix

Source

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Operator SBOM watcher dispatches `TypeScanImages` commands with empty `Wlid` when SBOM event beats Pod event, vuln results lose workload attribution #378