attributed_text: Generalize attributed text storage and segment ranges#624
Conversation
|
This was done with the assistance of Codex (GPT 5.5, xhigh). |
|
Perhaps also GPT 5.4 as I've had this around for a while. |
159b4f8 to
e9af936
Compare
tomcur
left a comment
There was a problem hiding this comment.
The new TextChunk is nice for non-contiguously stored text! The docs for TextStorage::chunks and TextStorage::as_str should probably link to each other for discoverability. I've not gone through everything with a fine-toothed comb, but the main changes look good.
TextStorage is no longer dyn-compatible, but that probably does not matter. Orthogonal to this PR, it may be worth thinking about what behavior around invalid TextRanges should be throughout the crate (panic, clamp, something else?).
| TextRange::new(self, range) | ||
| } | ||
|
|
||
| /// Iterates over borrowed chunks covering `range`. |
There was a problem hiding this comment.
Perhaps emphasize chunks are contiguous and non-empty.
| /// Iterates over borrowed chunks covering `range`. | |
| /// Iterates over contiguous, non-empty chunks of text covering `range`. |
| ChunkedTextChunks { | ||
| chunks: self.chunks.iter(), | ||
| chunk_start: 0, | ||
| range, | ||
| } | ||
| } | ||
| } | ||
|
|
||
| #[derive(Clone, Debug)] | ||
| struct ChunkedTextChunks<'a> { | ||
| chunks: core::slice::Iter<'a, &'static str>, | ||
| chunk_start: usize, | ||
| range: TextRange, | ||
| } | ||
|
|
||
| impl<'a> Iterator for ChunkedTextChunks<'a> { | ||
| type Item = TextChunk<'a>; | ||
|
|
||
| fn next(&mut self) -> Option<Self::Item> { | ||
| for chunk in self.chunks.by_ref() { | ||
| let chunk = *chunk; | ||
| let chunk_start = self.chunk_start; | ||
| let chunk_end = chunk_start + chunk.len(); | ||
| self.chunk_start = chunk_end; | ||
|
|
||
| let start = self.range.start().max(chunk_start); | ||
| let end = self.range.end().min(chunk_end); | ||
| if start < end { | ||
| let local_start = start - chunk_start; | ||
| let local_end = end - chunk_start; | ||
| return Some(TextChunk::new( | ||
| TextRange::new_unchecked(start, end), | ||
| &chunk[local_start..local_end], | ||
| )); | ||
| } | ||
| } | ||
|
|
||
| None | ||
| } |
There was a problem hiding this comment.
The following perhaps reads a bit more clearly.
| ChunkedTextChunks { | |
| chunks: self.chunks.iter(), | |
| chunk_start: 0, | |
| range, | |
| } | |
| } | |
| } | |
| #[derive(Clone, Debug)] | |
| struct ChunkedTextChunks<'a> { | |
| chunks: core::slice::Iter<'a, &'static str>, | |
| chunk_start: usize, | |
| range: TextRange, | |
| } | |
| impl<'a> Iterator for ChunkedTextChunks<'a> { | |
| type Item = TextChunk<'a>; | |
| fn next(&mut self) -> Option<Self::Item> { | |
| for chunk in self.chunks.by_ref() { | |
| let chunk = *chunk; | |
| let chunk_start = self.chunk_start; | |
| let chunk_end = chunk_start + chunk.len(); | |
| self.chunk_start = chunk_end; | |
| let start = self.range.start().max(chunk_start); | |
| let end = self.range.end().min(chunk_end); | |
| if start < end { | |
| let local_start = start - chunk_start; | |
| let local_end = end - chunk_start; | |
| return Some(TextChunk::new( | |
| TextRange::new_unchecked(start, end), | |
| &chunk[local_start..local_end], | |
| )); | |
| } | |
| } | |
| None | |
| } | |
| let mut chunk_start = 0; | |
| self.chunks.iter().filter_map(move |&chunk| { | |
| let offset = chunk_start; | |
| let start = range.start().max(chunk_start); | |
| chunk_start += chunk.len(); | |
| let end = range.end().min(chunk_start); | |
| (start < end).then(|| { | |
| let text = &chunk[start - offset..end - offset]; | |
| TextChunk::new(TextRange::new_unchecked(start, end), text) | |
| }) | |
| }) | |
| } | |
| } |
See the review in linebender#624.
This updates
attributed_textso attributed content can be backed by non-contiguous text storage while keeping range validation explicit and reusable.Goals:
TextRangevalues inAttributedText.Non-goals:
Concepts
TextStorage: the storage abstraction for text bytes and UTF-8 boundary checks.TextChunk: a borrowed contiguous slice from storage, with its globalTextRange.TextRange: a validated byte range whose endpoints are in bounds and on UTF-8 boundaries.AttributeSegments: the sweep-line segment iterator over applied attribute spans.ActiveSpans: the active attribute spans for the most recently yielded segment.Changes
TextChunkandTextStorage::chunksfor reading validated ranges from contiguous or chunked text storage.TextStorage::as_str, returningSome(&str)only for contiguous storage.AttributedText::as_strto returnOption<&str>.TextRange.TextRangevalues.AttributeSegments::next_segmentandAttributeSegmentfor callers that want range + active spans as one borrowed view.ActiveSpansIterreports an exactsize_hint, matching itsExactSizeIteratorimplementation.