Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions drv/lpc55-sprot-server/src/handler.rs
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,14 @@ impl<'a> Handler {
Response::pack(&body, tx_buf)
}

pub fn request_message_too_large_error(
&self,
tx_buf: &mut [u8; RESPONSE_BUF_SIZE],
) -> usize {
let body = Err(SprotProtocolError::BadMessageLength.into());
Response::pack(&body, tx_buf)
}

pub fn handle(
&mut self,
rx_buf: &[u8],
Expand Down
16 changes: 16 additions & 0 deletions drv/lpc55-sprot-server/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ pub(crate) enum Trace {
Err(SprotProtocolError),
Stats(RotIoStats),
Desynchronized,
RequestMsgTooLarge,

#[cfg(feature = "sp-ctrl")]
Dump(u32),
Expand Down Expand Up @@ -162,6 +163,9 @@ enum IoError {
/// send a request to the RoT. We also return this error if we started
/// receiving a request in the middle.
Desynchronized,

/// The SP sent a request that was too large to fit in the RoT's rx buffer.
RequestMsgTooLarge,
}

#[unsafe(export_name = "main")]
Expand Down Expand Up @@ -205,6 +209,10 @@ fn main() -> ! {
ringbuf_entry!(Trace::Desynchronized);
handler.desynchronized_error(tx_buf)
}
Err(IoError::RequestMsgTooLarge) => {
ringbuf_entry!(Trace::RequestMsgTooLarge);
handler.request_message_too_large_error(tx_buf)
}
};

if io.cleanup_after_request().is_err() {
Expand Down Expand Up @@ -312,6 +320,14 @@ impl Io {

self.check_for_rx_error()?;

// The rx fifo contained more bytes than could fit in the buffer, which
// is more than we ever expect to receive in one request message.
if bytes_received > rx_buf.len() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine, but also is also largely unnecessary. We statically size the buffers to always be able to handle a request or response.

I don't believe we've ever actually seen this issue in practice since #1523 merged.

The only reasons I can think of where this would matter are:

  1. The SP was misbehaving significantly
  2. We resized the buffers and messages. The SP was updated first and sent a message to the RoT that didn't fit in the old buffer.

Both would be major problems in production. For (2) we can and should avoid it by just not resizing the buffers :). If we decide we must resize then we can do it over two releases depending upon direction:

  1. If we are increasing buffer size, we just go ahead and do this, but don't add any new messages that need the larger buffer yet. Then in the second release we add the larger messages. This ensures we never overflow.
  2. If we are decreasing the buffer size, we deprecate messages that are larger and stop sending them in the first release. Then in the second release we decrease the buffer size and remove the messages.

All in all, this is fine as a defensive measure, but I find it largely unnecessary. IIRC the reason I did it the way I did was because without fault management at the time I figured it would be easier to spot this major issue (SP misbehavior) occurring from a crash then with an error response and retry. But I'm not sure that's actually true or valid. So I'm generally fine with just not crashing the RoT lol.

However, I don't think we should overload the same error code. This is not a FIFO overrun and it isn't really related to flow control. I'd call it something else like RequestMsgTooLarge or something similar so that it stands out and helps us debug if this ever happens in the future. It would be a good sign to never see this distinct error code occur in practice.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed a commit that adds IOError::RequestMsgTooLarge and converts it to the existing SprotProtocolError::BadMessageLength error when reporting to the SP. Do you think that's a good approach? I could add a new SprotProtocolError but that would require updating the management-gateway-service repo, and BadMessageLength seems like a pretty good fit even though it's mostly used for a slightly different case.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems perfect!

self.stats.request_msg_too_large =
self.stats.request_msg_too_large.wrapping_add(1);
return Err(IoError::RequestMsgTooLarge);
}

// Was this a CSn pulse?
if bytes_received == 0 {
self.stats.csn_pulses = self.stats.csn_pulses.wrapping_add(1);
Expand Down
5 changes: 5 additions & 0 deletions drv/sprot-api/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -631,6 +631,11 @@ pub struct RotIoStats {
/// Number of messages where the RoT failed to service the Rx FIFO in time.
pub rx_overrun: u32,

/// The number of times an SP sent more bytes than expected for one
/// message. In otherwords, the number of bytes sent by the SP to the RoT
/// between CSn assert and CSn de-assert exceeds `REQUEST_BUF_SIZE`.
pub request_msg_too_large: u32,

/// The number of CSn pulses seen by the RoT
pub csn_pulses: u32,

Expand Down
6 changes: 3 additions & 3 deletions drv/stm32h7-sprot-server/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -285,7 +285,7 @@ impl<S: SpiServer> Io<S> {
hl::sleep_for(PART2_DELAY);
}

if total_size > RESPONSE_BUF_SIZE {
if total_size > rx_buf.len() {
return Err(SprotProtocolError::BadMessageLength.into());
}

Expand Down Expand Up @@ -321,13 +321,13 @@ impl<S: SpiServer> Io<S> {
if !self.wait_rot_irq(false, TIMEOUT_QUICK) {
// Nope, it didn't complete. Pulse CSn.
ringbuf_entry!(Trace::UnexpectedRotIrq);
self.stats.csn_pulses += self.stats.csn_pulses.wrapping_add(1);
self.stats.csn_pulses = self.stats.csn_pulses.wrapping_add(1);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take it or leave it: if we wanted to be able to use += here, we could change stats.csn_pulses to Wrapping(u32) and then always get explicitly wrapping behavior when using normal arithmetic operators. we might consider doing this separately in a follow-up PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I see many uses of x = x.wrapping_add(1) in hubris and no uses of Wrapping(u32) yet. If you think it's useful I'd be happy to open a separate PR that changes all of them at once.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we may want to do an audit for this everywhere. The nice thing about Wrapping is that we can say "we always want to use explicitly wrapping arithmetic for this type" rather than having to remember to do it at the call site, which feels less error prone. There's also a Saturating type which I think might be a bit newer. I'll go make a separate bug for this.

// One sample of an LPC55S28 reacting to CSn deasserted
// in about 54us. So, 10ms is plenty.
if self.do_pulse_cs(10_u64, 10_u64)?.rot_irq_end == 1 {
// Did not clear ROT_IRQ
ringbuf_entry!(Trace::PulseFailed);
self.stats.csn_pulse_failures +=
self.stats.csn_pulse_failures =
self.stats.csn_pulse_failures.wrapping_add(1);
debug_set(&self.sys, false); // XXX
return Err(SprotProtocolError::RotIrqRemainsAsserted)?;
Expand Down
Loading