Skip to content
Draft
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 82 additions & 1 deletion components/plurals/src/provider.rs
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,11 @@ use yoke::Yokeable;
use zerofrom::ZeroFrom;
use zerovec::ule::vartuple::VarTuple;
use zerovec::ule::vartuple::VarTupleULE;
use zerovec::ule::AsULE;
use zerovec::ule::EncodeAsVarULE;
use zerovec::ule::UleError;
use zerovec::ule::VarULE;
use zerovec::ule::ULE;
use zerovec::ule::{AsULE, FixedLengthVarULE};
use zerovec::VarZeroSlice;

pub mod rules;
Expand Down Expand Up @@ -482,6 +482,87 @@ where
core::mem::transmute(bytes)
}

/// Creates a singleton [`PluralElementsPackedULE`] in a const context.
///
/// Const parameters:
///
/// - `M`: the length of `input`
/// - `N`: the length of the return value which is `M + 1`
///
/// When [generic_const_exprs] is stabilized, we will be able to add a new
/// function signature without both const parameters.
///
/// # Panics
///
/// Panics if N != M + 1.
///
/// # Examples
///
/// ```
/// use icu::plurals::provider::PluralElementsPackedULE;
/// use icu::plurals::PluralRules;
/// use icu::locale::locale;
/// use zerovec::ule::FixedLengthVarULE;
///
/// let value = "hello, world!"; // 13 bytes long
Copy link
Copy Markdown
Member

@Manishearth Manishearth Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: this isn't a motivating example: this doesn't work in const anyway (since there's try_from_encodeable)

(It's an example, but not a motivating one)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would work in const if I replaced try_from_encodeable with new_unchecked, I just wanted to not have unsafe code in the docs test

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's fine, I was just saying that the example itself wasn't motivating the new zerovec API.

But I have a clearer idea of what's going on now.

/// let inner_ule = FixedLengthVarULE::<13, str>::try_from_encodeable(value).unwrap();
/// let plural_ule = PluralElementsPackedULE::new_singleton_mn::<13, 14>(inner_ule);
/// let rules = PluralRules::try_new(locale!("en").into(), Default::default()).unwrap();
///
/// assert_eq!(plural_ule.as_varule().get(0.into(), &rules).1, "hello, world!");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: merge the assertions

Suggested change
/// assert_eq!(plural_ule.as_varule().get(0.into(), &rules).1, "hello, world!");
/// assert_eq!(plural_ule.as_varule().get(0.into(), &rules), ("hello, world!", metadata));

/// assert_eq!(plural_ule.as_varule().get(1.into(), &rules).1, "hello, world!");
/// assert_eq!(plural_ule.as_varule().get(2.into(), &rules).1, "hello, world!");
/// ```
///
/// In a const context:
///
/// ```
/// use icu::plurals::provider::PluralElementsPackedULE;
/// use icu::plurals::PluralRules;
/// use icu::locale::locale;
/// use zerovec::ule::FixedLengthVarULE;
///
/// const plural_ule: FixedLengthVarULE<1, PluralElementsPackedULE<str>> =
/// PluralElementsPackedULE::new_singleton_mn::<0, 1>(FixedLengthVarULE::EMPTY_STR);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clever

///
/// let rules = PluralRules::try_new(locale!("en").into(), Default::default()).unwrap();
///
/// assert_eq!(plural_ule.as_varule().get(0.into(), &rules).1, "");
/// assert_eq!(plural_ule.as_varule().get(1.into(), &rules).1, "");
/// assert_eq!(plural_ule.as_varule().get(2.into(), &rules).1, "");
/// ```
///
/// [generic_const_exprs]: https://doc.rust-lang.org/beta/unstable-book/language-features/generic-const-exprs.html#generic_const_exprs
pub const fn new_singleton_mn<const M: usize, const N: usize>(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like these APIs where you have to guess the output.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand your comment, but this is in an unstable module.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have to know M and N for this not to panic

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"singleton" is the wrong term, we don't use that with plurals. this method is equivalent to PluralElements::new, so it should just be new

Copy link
Copy Markdown
Member Author

@sffc sffc Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word "singleton" is already used in this file multiple times, including in test function names.

The difference between singleton and non-singleton matters a great deal to PluralElementsPackedULE, because it results in two different encodings.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the caller shouldn't have to know encoding details?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also doesn't need to be qualified as _mn; it's an unstable type so we can just change the generics if generic_const_exprs ever gets stabilised (I wouldn't hold my breath). so just new

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather keep the _mn because it's a pain to debug cross-crate function naming issues when the function mostly stays the same but const parameters change a little. But I will remove singleton.

input: FixedLengthVarULE<M, V>,
) -> FixedLengthVarULE<N, PluralElementsPackedULE<V>> {
#[allow(clippy::panic)] // for safety, and documented
if N != M + 1 {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: this should be a const assertion, not a runtime panic

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please suggest code that compiles with a const assertion.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you can't do it then we shouldn't have this yet

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm? We have lots of const code with runtime assertions because there isn't a good way to write const assertions.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even the Rust standard library does runtime panics on const parameters. For example:

https://doc.rust-lang.org/std/primitive.slice.html#method.as_chunks

slice::as_chunks panics if the const parameter N is zero. There was a lengthy discussion on the proposal thread, starting here: rust-lang/rust#74985 (comment)

panic!(concat!(
"new_singleton_mn: N (",
stringify!(N),
") != 1 + M (",
stringify!(M),
")"
));
}
let mut bytes = [0u8; N];
#[allow(clippy::unwrap_used)] // the bytes are nonempty because N > 0
let (start, remainder) = bytes.split_first_mut().unwrap();
// TODO(1.87): use copy_from_slice
let mut i = 0;
#[allow(clippy::indexing_slicing)] // both remainder and input are length M
while i < M {
remainder[i] = input.as_bytes()[i];
i += 1;
}
*start = 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: mention "first byte = 0 for a singleton" for redundancy

// Safety: bytes are a valid representation of this type:
// 1. The first byte is 0 which indicates a singleton
// 2. The remainder is a valid V by invariant of the input parameter
unsafe { FixedLengthVarULE::new_unchecked(bytes) }
}

/// Returns a tuple with:
/// 1. The lead byte
/// 2. Bytes corresponding to the default V
Expand Down
107 changes: 107 additions & 0 deletions utils/zerovec/src/ule/fixed_length.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
// This file is part of ICU4X. For terms of use, please see the file
// called LICENSE at the top level of the ICU4X source tree
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).

use crate::ule::{EncodeAsVarULE, ULE, UleError, VarULE};
use core::fmt;
use core::marker::PhantomData;
use core::ops::Deref;

/// A container for a [`VarULE`] with a fixed byte length.
///
/// This container may be useful if the length of your VarULE is known at compile-time.
///
/// # Examples
///
/// ```
/// use zerovec::ule::FixedLengthVarULE;
///
/// let container = FixedLengthVarULE::<13, str>::try_from_encodeable("hello, world!").unwrap();
///
/// assert_eq!(&*container, "hello, world!");
///
/// // Returns an error if the container is not the correct size:
/// FixedLengthVarULE::<20, str>::try_from_encodeable("hello, world!").unwrap_err();
/// ```
#[derive(Copy, Clone, PartialEq, Eq)]
pub struct FixedLengthVarULE<const N: usize, V: VarULE + ?Sized> {
/// Invariant: The bytes MUST be a valid VarULE representation of `V`.
bytes: [u8; N],
_marker: PhantomData<V>,
}

impl<const N: usize, V: VarULE + ?Sized> FixedLengthVarULE<N, V> {
/// Creates one of these from an [`EncodeAsVarULE`].
///
/// Returns an error if the byte length in the container is not the correct length
/// for the encodeable object.
pub fn try_from_encodeable(input: impl EncodeAsVarULE<V>) -> Result<Self, UleError> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn try_from_encodeable(input: impl EncodeAsVarULE<V>) -> Result<Self, UleError> {
pub fn try_from_encodeable(input: &impl EncodeAsVarULE<V>) -> Result<Self, UleError> {

?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the recommend style here. It doesn't matter if the trait is implemented on a reference. @Manishearth ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is style choice. you only test this with a &str input, but say you have a String, you don't want to accidentally pass that by value.

let len = input.encode_var_ule_len();
if len != N {
return Err(UleError::length::<V>(len));
}
let mut bytes = [0u8; N];
input.encode_var_ule_write(&mut bytes);
// Safety: the bytes were just written from an EncodeAsVarULE impl
unsafe { Ok(Self::new_unchecked(bytes)) }
}

/// Creates one of these directly from bytes.
///
/// # Safety
///
/// The bytes MUST be a valid VarULE representation of `V`.
pub const unsafe fn new_unchecked(bytes: [u8; N]) -> Self {
Self {
bytes,
_marker: PhantomData,
}
}

/// Returns the bytes backing this [`FixedLengthVarULE`], which are
/// guaranteed to be a valid VarULE representation of `V`.
pub const fn as_bytes(&self) -> &[u8; N] {
&self.bytes
}

/// Returns the container as an instance of `V`.
pub fn as_varule(&self) -> &V {
debug_assert!(V::validate_bytes(&self.bytes).is_ok());
// Safety: self.bytes are a valid VarULE representation of `V`.
unsafe { V::from_bytes_unchecked(&self.bytes) }
}
}

impl<const N: usize, V: VarULE + ?Sized> fmt::Debug for FixedLengthVarULE<N, V>
where
V: fmt::Debug,
{
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
self.as_varule().fmt(f)
}
}

impl<const N: usize, V: VarULE + ?Sized> AsRef<V> for FixedLengthVarULE<N, V> {
fn as_ref(&self) -> &V {
self.as_varule()
}
}

impl<const N: usize, V: VarULE + ?Sized> Deref for FixedLengthVarULE<N, V> {
type Target = V;
fn deref(&self) -> &Self::Target {
self.as_varule()
}
}

impl FixedLengthVarULE<0, str> {
/// The empty string as a [`FixedLengthVarULE`].
// Safety: the empty slice is a valid str
pub const EMPTY_STR: Self = unsafe { Self::new_unchecked([]) };
}

impl<T: ULE> FixedLengthVarULE<0, [T]> {
/// The empty slice as a [`FixedLengthVarULE`].
// Safety: the empty slice is a valid str
pub const EMPTY_SLICE: Self = unsafe { Self::new_unchecked([]) };
}
4 changes: 4 additions & 0 deletions utils/zerovec/src/ule/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ mod chars;
#[cfg(doc)]
pub mod custom;
mod encode;
mod fixed_length;
mod macros;
mod multi;
mod niche;
Expand All @@ -30,6 +31,7 @@ pub use chars::CharULE;
#[cfg(feature = "alloc")]
pub use encode::encode_varule_to_box;
pub use encode::EncodeAsVarULE;
pub use fixed_length::FixedLengthVarULE;
pub use multi::MultiFieldsULE;
pub use niche::{NicheBytes, NichedOption, NichedOptionULE};
pub use option::{OptionULE, OptionVarULE};
Expand Down Expand Up @@ -441,6 +443,8 @@ impl UleError {
}

/// Construct an "invalid length" error for the given type and length
///
/// The length is of the input bytes, not the expected length.
pub fn length<T: ?Sized + 'static>(len: usize) -> UleError {
UleError::InvalidLength {
ty: any::type_name::<T>(),
Expand Down
Loading