Block multiple sled reservations with the same gen by jmpesp · Pull Request #10479 · oxidecomputer/omicron

jmpesp · 2026-05-21T16:56:39Z

If multiple instance-start sagas are concurrently attempting to allocate for the same instance, this temporarily results in multiple rows in sled_resource_vmm with different propolis ids for the same instance id. One of the instance-start sagas will succeed, where the other(s) will unwind (due to an "instance changed state before it could be started" error from sis_move_to_starting), and remove the sled_resource_vmm record that they added by matching on that saga's propolis id.

There's never been a uniqueness constraint for instance id in the sled_resource_vmm table, because there can't be, otherwise we'd never be able to migrate an instance (which makes a new record on a different sled for the same instance).

For an instance start that performs any new local storage allocation, this is a problem: the latent assumption in inserting / updating local storage related records is that this type of duplication could not occur, that if the insert succeeded then it means the allocation will only be performed once. Because this is not true the CTE will happily stomp all over the local storage allocation related records and that leads to the orphaning seen in the linked issue.

The fix is to add a uniqueness constraint to sled_resource_vmm that ensures only one record for a given instance id plus the instance state generation number exists. This will not affect migration because the instance state generation is bumped in that case.

This commit also changes the local storage related unit tests to clearly specify the ncpus and memory for the fake instances, as inspecting the sled_resource_vmm records produced by the test showed the resources didn't match the instance specification.

Fixes oxidecomputer/customer-support#1184.

If multiple instance-start sagas are concurrently attempting to allocate for the same instance, this temporarily results in multiple rows in `sled_resource_vmm` with different propolis ids for the same instance id. One of the instance-start sagas will succeed, where the other(s) will unwind (due to an "instance changed state before it could be started" error from `sis_move_to_starting`), and remove the `sled_resource_vmm` record that they added by matching on that saga's propolis id. There's never been a uniqueness constraint for instance id in the `sled_resource_vmm` table, because there can't be, otherwise we'd never be able to migrate an instance (which makes a new record on a different sled for the same instance). For an instance start that performs any new local storage allocation, this is a problem: the latent assumption in inserting / updating local storage related records is that this type of duplication could not occur, that if the insert succeeded then it means the allocation will only be performed once. Because this is not true the CTE will happily stomp all over the local storage allocation related records and that leads to the orphaning seen in the linked issue. The fix is to add a uniqueness constraint to `sled_resource_vmm` that ensures only one record for a given instance id plus the instance state generation number exists. This will not affect migration because the instance state generation is bumped in that case. This commit also changes the local storage related unit tests to clearly specify the ncpus and memory for the fake instances, as inspecting the `sled_resource_vmm` records produced by the test showed the resources didn't match the instance specification. Fixes oxidecomputer/customer-support#1184.

hawkw · 2026-05-22T17:37:55Z

 }

+#[derive(Clone, Debug)]
+pub enum SledResourceVmmInstanceStateGeneration {


super annoying obnoxious nitpick: man this is a long name...I suppose including the SledResourceVmm prefix is necessary because this type is currently re-exported by a pub use sled_resource_vmm::*; so we can't just expect callers to refer to it as sled_resource_vmm::InstanceStateGeneration. which...I dunno if it's worth trying to fix that. I guess this is fine, it just makes me feel a certain type of way!

hawkw · 2026-05-22T17:44:25Z

            .sled_reservation_create(
                &opctx,
                instance_id,
+                nexus_db_model::Generation::new(),


nit, may not be important: I wonder if rather than always inserting the reservation at generation 1, we ought to change this function to take an &Instance rather than an InstanceUuid, and use the generation from the instance record. Clearly, there aren't currently any tests which are calling this helper multiple times for the same InstanceUuid, or else they would have already broken, but it seems like it could be potentially annoying if someone were to start adding a new test that does so and was surprised to discover this doesn't actually use the generation from the instance record.

on the other hand, maybe updating all the test code that uses this to pass an &Instance is going to be too painful.

hawkw · 2026-05-22T17:55:45Z

        instance_id: InstanceUuid,
+        instance_state_generation: db::model::Generation,


hmm, so, as written, this allows the caller to attempt to create the reservation at any instance state generation, regardless of what they believe the instance's current generation to be (and, in fact, it seems like we currently have a bunch of tests which are always providing generation 1 no matter what). I wonder if it might be a bit more misuse-resistant to change this function to instead take an &Instance, and always use the state generation from the instance model. That way, we're sure that these come from the same snapshot of the instance state.

Of course, this will require changing the callsites, and it may not be worth the effort, but I felt like it was worth mentioning...

hawkw · 2026-05-22T21:53:25Z

+                    Err(diesel::result::Error::DatabaseError(
+                        diesel::result::DatabaseErrorKind::UniqueViolation,
+                        error_info,
+                    )) if error_info.constraint_name()
+                        == Some(SINGLE_RESERVATION_CONSTRAINT) =>


nit, take it or leave it: i think it might be nice if this logic was stuffed into a function, so you could just say

Suggested change

Err(diesel::result::Error::DatabaseError(

diesel::result::DatabaseErrorKind::UniqueViolation,

error_info,

)) if error_info.constraint_name()

== Some(SINGLE_RESERVATION_CONSTRAINT) =>

Err(e) if is_single_reservation_constraint_violation(e) =>

or something, here and later on?

hawkw · 2026-05-22T21:57:34Z

         group membership."
    )]
    RequiredAffinitySledNotValid,
+    #[error("Instance reservation already made for generation {generation}")]


the rest of the errors here are intended to be user facing (which is why they're kinda big blocks of multi-sentence text). I don't think "Instance reservation already made for generation 69" is something that's particularly meaningful to a user. If this is going to bubble up, could it say something a little less obscure? Maybe "This instance is already running or starting on another sled."?

hawkw · 2026-05-22T21:57:53Z

    // Finally, perform the INSERT if it's still valid.
    query.sql("
-        INSERT INTO sled_resource_vmm (id, sled_id, hardware_threads, rss_ram, reservoir_ram, instance_id)
+        INSERT INTO sled_resource_vmm (id, sled_id, hardware_threads, rss_ram, reservoir_ram, instance_id, instance_state_generation)


can we line wrap this?

jmpesp requested a review from hawkw May 21, 2026 16:56

hawkw requested a review from smklein May 22, 2026 17:11

hawkw reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Block multiple sled reservations with the same gen#10479

Block multiple sled reservations with the same gen#10479
jmpesp wants to merge 1 commit into
oxidecomputer:mainfrom
jmpesp:instance_state_generation_in_sled_reservation

jmpesp commented May 21, 2026

Uh oh!

hawkw May 22, 2026

Uh oh!

hawkw May 22, 2026

Uh oh!

hawkw May 22, 2026

Uh oh!

hawkw May 22, 2026

Uh oh!

hawkw May 22, 2026

Uh oh!

hawkw May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		instance_id: InstanceUuid,
		instance_state_generation: db::model::Generation,

Conversation

jmpesp commented May 21, 2026

Uh oh!

hawkw May 22, 2026

Choose a reason for hiding this comment

Uh oh!

hawkw May 22, 2026

Choose a reason for hiding this comment

Uh oh!

hawkw May 22, 2026

Choose a reason for hiding this comment

Uh oh!

hawkw May 22, 2026

Choose a reason for hiding this comment

Uh oh!

hawkw May 22, 2026

Choose a reason for hiding this comment

Uh oh!

hawkw May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants