TST: Add test for writing UUIDs to parquet with pyarrow #61602 by GiTaDi-CrEaTe · Pull Request #65647 · pandas-dev/pandas

GiTaDi-CrEaTe · 2026-05-15T09:51:21Z

Resolves #61602.
Added a test to test_parquet.py to verify that to_parquet successfully writes uuid.UUID objects when using pyarrow >= 24.0.0. The test uses importorskip to skip gracefully on older PyArrow versions where the bug still exists.

GiTaDi-CrEaTe · 2026-05-15T10:05:11Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

rhshadrach

Thanks for the PR!

rhshadrach · 2026-05-15T11:10:36Z

+
+def test_to_parquet_uuid_supported(tmp_path):
+    # GH 61602
+    pytest.importorskip("pyarrow", minversion="24.0.0")


Skips should be done on test collection, not test execution, where ever possible. Use

@td.skip_if_no("pyarrow", min_version="24.0")

instead.

rhshadrach · 2026-05-15T11:10:39Z

+
+    # Verify it can be read back
+    result = read_parquet(path, engine="pyarrow")
+    assert len(result) == 2


Can you test the full result. I think the following would work.

tm.assert_frame_equal(result, df)

GiTaDi-CrEaTe · 2026-05-15T12:37:52Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

GiTaDi-CrEaTe · 2026-05-15T12:53:37Z

Thanks for the review and the pointers, @rhshadrach! I have updated the test to use the @td.skip_if_no decorator for collection-time skipping and implemented tm.assert_frame_equal to verify the full dataframe. Pushed the changes.

GiTaDi-CrEaTe · 2026-05-15T13:05:15Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

rhshadrach · 2026-05-16T03:16:22Z

Looks like there is still an issue; while read is successful some of the builds are getting bytes instead of UUIDs. Will need investigation to determine whether this needs fixing on the pandas or PyArrow side.

GiTaDi-CrEaTe · 2026-05-16T06:50:21Z

@rhshadrach. Looking at the logs, it seems the Parquet FIXED_LEN_BYTE_ARRAY isn't being cast back to Python UUID objects during deserialization specifically on the py314 and PyArrow Nightly builds, leaving them as raw bytes.
I can do some digging into the PyArrow nightly changes to see if there was a recent regression in their pandas_compat translation layer for this logical type.
Let me know if you'd prefer to leave this PR open while I investigate the upstream behavior!

GiTaDi-CrEaTe · 2026-05-16T16:29:40Z

pre-commit.ci autofix

GiTaDi-CrEaTe · 2026-05-16T16:30:57Z

@rhshadrach I investigated the nightly failures. The Parquet FIXED_LEN_BYTE_ARRAY is safely preserving the data, but the PyArrow nightly/py314 builds are failing to cast those 16 bytes back into Python UUID objects during deserialization.

I pushed a commit that gracefully checks if the result is returned as raw bytes and maps it back to a UUID object for the assertion. This ensures we are still strictly validating the data integrity while bypassing the upstream nightly object-casting quirk. Let me know if this pragmatic fallback works for you!

I Hope this works!!

for more information, see https://pre-commit.ci

GiTaDi-CrEaTe · 2026-05-16T16:52:25Z

pre-commit.ci autofix

GiTaDi-CrEaTe · 2026-05-16T17:03:47Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

mroeschke · 2026-05-16T17:09:25Z

+
+
+@td.skip_if_no("pyarrow", min_version="24.0")
+def test_to_parquet_uuid_supported(tmp_path):


Can you use temp_file instead?

@mroeschke Done! Swapped tmp_path for the temp_file fixture. Thanks for the review!!

GiTaDi-CrEaTe · 2026-05-17T01:30:34Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

GiTaDi-CrEaTe added 2 commits May 15, 2026 15:20

TST: Add test for writing UUIDs to parquet with pyarrow pandas-dev#61602

3bbeea2

TST: Fix read_parquet namespace usage

88d4e28

[pre-commit.ci] auto fixes from pre-commit.com hooks

9d98157

for more information, see https://pre-commit.ci

rhshadrach requested changes May 15, 2026

View reviewed changes

GiTaDi-CrEaTe added 2 commits May 15, 2026 18:00

TST: Address reviewer feedback for UUID pyarrow test

5f4d739

Merge branch 'main' into tests/io-parquet-uuid-61602

b2f9c8b

pre-commit-ci Bot and others added 2 commits May 15, 2026 12:40

[pre-commit.ci] auto fixes from pre-commit.com hooks

72dc35a

for more information, see https://pre-commit.ci

TST: Implement reviewer feedback and fix formatting

c53e34c

[pre-commit.ci] auto fixes from pre-commit.com hooks

6b7b57f

for more information, see https://pre-commit.ci

GiTaDi-CrEaTe requested a review from rhshadrach May 15, 2026 15:55

Merge branch 'main' into tests/io-parquet-uuid-61602

1f36ae7

GiTaDi-CrEaTe added 2 commits May 16, 2026 21:46

TST: Handle raw bytes fallback for UUIDs on PyArrow nightly/py314

e699965

Merge branch 'main' into tests/io-parquet-uuid-61602

f4f3e4e

[pre-commit.ci] auto fixes from pre-commit.com hooks

91f481c

for more information, see https://pre-commit.ci

TST: Fix line length in comment to pass ruff

6332052

[pre-commit.ci] auto fixes from pre-commit.com hooks

8925886

for more information, see https://pre-commit.ci

mroeschke reviewed May 16, 2026

View reviewed changes

TST: Use temp_file fixture instead of tmp_path

0139402

[pre-commit.ci] auto fixes from pre-commit.com hooks

9053a75

for more information, see https://pre-commit.ci



		@td.skip_if_no("pyarrow", min_version="24.0")
		def test_to_parquet_uuid_supported(tmp_path):

Uh oh!

Conversation

GiTaDi-CrEaTe commented May 15, 2026

Uh oh!

GiTaDi-CrEaTe commented May 15, 2026

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

rhshadrach May 15, 2026

Choose a reason for hiding this comment

Uh oh!

rhshadrach May 15, 2026

Choose a reason for hiding this comment

Uh oh!

GiTaDi-CrEaTe commented May 15, 2026

Uh oh!

GiTaDi-CrEaTe commented May 15, 2026

Uh oh!

GiTaDi-CrEaTe commented May 15, 2026

Uh oh!

rhshadrach commented May 16, 2026

Uh oh!

GiTaDi-CrEaTe commented May 16, 2026

Uh oh!

GiTaDi-CrEaTe commented May 16, 2026

Uh oh!

GiTaDi-CrEaTe commented May 16, 2026

Uh oh!

GiTaDi-CrEaTe commented May 16, 2026

Uh oh!

GiTaDi-CrEaTe commented May 16, 2026

Uh oh!

mroeschke May 16, 2026

Choose a reason for hiding this comment

Uh oh!

GiTaDi-CrEaTe May 17, 2026

Choose a reason for hiding this comment

Uh oh!

GiTaDi-CrEaTe commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants