Skip to content

TST: Add test for writing UUIDs to parquet with pyarrow #61602#65647

Open
GiTaDi-CrEaTe wants to merge 16 commits into
pandas-dev:mainfrom
GiTaDi-CrEaTe:tests/io-parquet-uuid-61602
Open

TST: Add test for writing UUIDs to parquet with pyarrow #61602#65647
GiTaDi-CrEaTe wants to merge 16 commits into
pandas-dev:mainfrom
GiTaDi-CrEaTe:tests/io-parquet-uuid-61602

Conversation

@GiTaDi-CrEaTe
Copy link
Copy Markdown

Resolves #61602.
Added a test to test_parquet.py to verify that to_parquet successfully writes uuid.UUID objects when using pyarrow >= 24.0.0. The test uses importorskip to skip gracefully on older PyArrow versions where the bug still exists.

@GiTaDi-CrEaTe
Copy link
Copy Markdown
Author

pre-commit.ci autofix

Copy link
Copy Markdown
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Comment thread pandas/tests/io/test_parquet.py Outdated

def test_to_parquet_uuid_supported(tmp_path):
# GH 61602
pytest.importorskip("pyarrow", minversion="24.0.0")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skips should be done on test collection, not test execution, where ever possible. Use

@td.skip_if_no("pyarrow", min_version="24.0")

instead.

Comment thread pandas/tests/io/test_parquet.py Outdated

# Verify it can be read back
result = read_parquet(path, engine="pyarrow")
assert len(result) == 2
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test the full result. I think the following would work.

tm.assert_frame_equal(result, df)

@GiTaDi-CrEaTe
Copy link
Copy Markdown
Author

pre-commit.ci autofix

@GiTaDi-CrEaTe
Copy link
Copy Markdown
Author

Thanks for the review and the pointers, @rhshadrach! I have updated the test to use the @td.skip_if_no decorator for collection-time skipping and implemented tm.assert_frame_equal to verify the full dataframe. Pushed the changes.

@GiTaDi-CrEaTe
Copy link
Copy Markdown
Author

pre-commit.ci autofix

@GiTaDi-CrEaTe GiTaDi-CrEaTe requested a review from rhshadrach May 15, 2026 15:55
@rhshadrach
Copy link
Copy Markdown
Member

Looks like there is still an issue; while read is successful some of the builds are getting bytes instead of UUIDs. Will need investigation to determine whether this needs fixing on the pandas or PyArrow side.

@GiTaDi-CrEaTe
Copy link
Copy Markdown
Author

@rhshadrach. Looking at the logs, it seems the Parquet FIXED_LEN_BYTE_ARRAY isn't being cast back to Python UUID objects during deserialization specifically on the py314 and PyArrow Nightly builds, leaving them as raw bytes.
I can do some digging into the PyArrow nightly changes to see if there was a recent regression in their pandas_compat translation layer for this logical type.
Let me know if you'd prefer to leave this PR open while I investigate the upstream behavior!

@GiTaDi-CrEaTe
Copy link
Copy Markdown
Author

pre-commit.ci autofix

@GiTaDi-CrEaTe
Copy link
Copy Markdown
Author

@rhshadrach I investigated the nightly failures. The Parquet FIXED_LEN_BYTE_ARRAY is safely preserving the data, but the PyArrow nightly/py314 builds are failing to cast those 16 bytes back into Python UUID objects during deserialization.

I pushed a commit that gracefully checks if the result is returned as raw bytes and maps it back to a UUID object for the assertion. This ensures we are still strictly validating the data integrity while bypassing the upstream nightly object-casting quirk. Let me know if this pragmatic fallback works for you!

I Hope this works!!

@GiTaDi-CrEaTe
Copy link
Copy Markdown
Author

pre-commit.ci autofix

@GiTaDi-CrEaTe
Copy link
Copy Markdown
Author

pre-commit.ci autofix

Comment thread pandas/tests/io/test_parquet.py Outdated


@td.skip_if_no("pyarrow", min_version="24.0")
def test_to_parquet_uuid_supported(tmp_path):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use temp_file instead?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke Done! Swapped tmp_path for the temp_file fixture. Thanks for the review!!

@GiTaDi-CrEaTe
Copy link
Copy Markdown
Author

pre-commit.ci autofix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Writing UUIDs fail

3 participants