Skip to content

feat: support reading blob files in data evolution flow#259

Merged
JingsongLi merged 2 commits intoapache:mainfrom
QuakeWang:feat/blob-read
Apr 17, 2026
Merged

feat: support reading blob files in data evolution flow#259
JingsongLi merged 2 commits intoapache:mainfrom
QuakeWang:feat/blob-read

Conversation

@QuakeWang
Copy link
Copy Markdown
Contributor

Purpose

Linked issue: close #228

Add the minimal read-only .blob read path for BlobType, so Rust can read Java-generated blob files in the existing data-evolution flow.

Brief change log

  • add .blob format dispatch
  • add dedicated .blob reader for BlobType
  • support blob footer/index parsing, local row selection, and Arrow Binary output
  • extend DataEvolutionReader with rolling blob merge support via internal blob source planning
  • add Java compatibility coverage for blob format and rolling blob e2e

Tests

API and Format

Documentation

} else {
let source_idx = sources.len();
sources.push(FieldSource::BlobBunch {
bunch: BlobBunch::new(expected_row_count, false),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always false here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the unused row_id_push_down flag and simplified BlobBunch::new.

Comment thread crates/blob_test_utils.rs Outdated
.unwrap_or_else(|e| panic!("Failed to write blob test file {path:?}: {e}"));
}

fn encode_delta_varints(values: &[i64]) -> Vec<u8> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated in blob.rs

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the duplicated helper in blob.rs tests and reused blob_test_utils::encode_delta_varints.

Comment thread crates/paimon/src/arrow/format/blob.rs Outdated
}

let mut builder = BinaryBuilder::new();
for &position in positions {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to read blobs in parallel?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed blob payload reads to bounded parallel reads with buffered(...) while preserving output order, and added a test for it.

Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit 9796ef7 into apache:main Apr 17, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support BlobType in read path

2 participants