From c522ae4db927ee3583ef2fb7ad82a55710e5f0bb Mon Sep 17 00:00:00 2001 From: Alexandros Anastasiou Date: Wed, 22 Apr 2026 21:45:44 +0100 Subject: [PATCH 1/4] GH-45644: [Doc][Python] Document timezone loss when converting timestamp arrays to NumPy --- docs/source/python/numpy.rst | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/docs/source/python/numpy.rst b/docs/source/python/numpy.rst index 01fb1982d598..4b0aad21c8f9 100644 --- a/docs/source/python/numpy.rst +++ b/docs/source/python/numpy.rst @@ -73,3 +73,36 @@ representation as Arrow, and assuming the Arrow data has no nulls. For more complex data types, you have to use the :meth:`~pyarrow.Array.to_pandas` method (which will construct a Numpy array with Pandas semantics for, e.g., representation of null values). + +Timezone-aware Timestamps +~~~~~~~~~~~~~~~~~~~~~~~~~ + +NumPy's ``datetime64`` type does not support timezones. When converting a +timezone-aware Arrow timestamp array to NumPy via :meth:`~pyarrow.Array.to_numpy`, +the timezone information is silently dropped: + +.. code-block:: python + + >>> arr = pa.array([1735689600, 1735689600], type=pa.timestamp("s", tz="UTC")) + >>> arr.type + TimestampType(timestamp[s, tz=UTC]) + >>> arr.to_numpy() + array(['2025-01-01T00:00:00', '2025-01-01T00:00:00'], dtype='datetime64[s]') + +If you need to preserve timezone information, there are two alternatives: + +* Convert to a Pandas Series, which supports timezone-aware ``datetime64`` dtypes: + + .. code-block:: python + + >>> arr.to_pandas() + 0 2025-01-01 00:00:00+00:00 + 1 2025-01-01 00:00:00+00:00 + dtype: datetime64[s, UTC] + +* Convert to Python ``datetime`` objects, which carry ``tzinfo``: + + .. code-block:: python + + >>> arr.to_pylist() + [datetime.datetime(2025, 1, 1, 0, 0, tzinfo=zoneinfo.ZoneInfo(key='UTC')), datetime.datetime(2025, 1, 1, 0, 0, tzinfo=zoneinfo.ZoneInfo(key='UTC'))] From 34e5fadb8fe7ad729a078de08607f2fc6443a930 Mon Sep 17 00:00:00 2001 From: Alexandros Anastasiou Date: Thu, 23 Apr 2026 16:24:48 +0100 Subject: [PATCH 2/4] GH-45644: [Doc][Python] Fix doctest failures and add nested types caveat --- docs/source/python/numpy.rst | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/docs/source/python/numpy.rst b/docs/source/python/numpy.rst index 4b0aad21c8f9..b9d9e1137d5b 100644 --- a/docs/source/python/numpy.rst +++ b/docs/source/python/numpy.rst @@ -83,11 +83,12 @@ the timezone information is silently dropped: .. code-block:: python - >>> arr = pa.array([1735689600, 1735689600], type=pa.timestamp("s", tz="UTC")) - >>> arr.type + >>> arr = pa.array([1735689600, 1735689600], type=pa.timestamp("s", tz="UTC")) # doctest: +SKIP + >>> arr.type # doctest: +SKIP TimestampType(timestamp[s, tz=UTC]) - >>> arr.to_numpy() - array(['2025-01-01T00:00:00', '2025-01-01T00:00:00'], dtype='datetime64[s]') + >>> arr.to_numpy() # doctest: +SKIP + array(['2025-01-01T00:00:00', '2025-01-01T00:00:00'], + dtype='datetime64[s]') If you need to preserve timezone information, there are two alternatives: @@ -95,14 +96,22 @@ If you need to preserve timezone information, there are two alternatives: .. code-block:: python - >>> arr.to_pandas() + >>> arr.to_pandas() # doctest: +SKIP 0 2025-01-01 00:00:00+00:00 1 2025-01-01 00:00:00+00:00 dtype: datetime64[s, UTC] + .. note:: + + For nested types (e.g., list arrays containing timestamps), + ``to_pandas()`` may not preserve timezone information. Structs and maps + do retain timezones, but lists currently do not. See + `GH-41162 `_ for details. + * Convert to Python ``datetime`` objects, which carry ``tzinfo``: .. code-block:: python - >>> arr.to_pylist() - [datetime.datetime(2025, 1, 1, 0, 0, tzinfo=zoneinfo.ZoneInfo(key='UTC')), datetime.datetime(2025, 1, 1, 0, 0, tzinfo=zoneinfo.ZoneInfo(key='UTC'))] + >>> arr.to_pylist() # doctest: +SKIP + [datetime.datetime(2025, 1, 1, 0, 0, tzinfo=zoneinfo.ZoneInfo(key='UTC')), + datetime.datetime(2025, 1, 1, 0, 0, tzinfo=zoneinfo.ZoneInfo(key='UTC'))] From 7dcd9df34097c218ea804b2c94c73375e8d659f5 Mon Sep 17 00:00:00 2001 From: Alexandros Anastasiou Date: Fri, 24 Apr 2026 11:58:06 +0100 Subject: [PATCH 3/4] GH-45644: [Doc][Python] Replace doctest +SKIP with minimal directives --- docs/source/python/numpy.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/source/python/numpy.rst b/docs/source/python/numpy.rst index b9d9e1137d5b..835cd14196ee 100644 --- a/docs/source/python/numpy.rst +++ b/docs/source/python/numpy.rst @@ -83,10 +83,10 @@ the timezone information is silently dropped: .. code-block:: python - >>> arr = pa.array([1735689600, 1735689600], type=pa.timestamp("s", tz="UTC")) # doctest: +SKIP - >>> arr.type # doctest: +SKIP + >>> arr = pa.array([1735689600, 1735689600], type=pa.timestamp("s", tz="UTC")) + >>> arr.type TimestampType(timestamp[s, tz=UTC]) - >>> arr.to_numpy() # doctest: +SKIP + >>> arr.to_numpy() array(['2025-01-01T00:00:00', '2025-01-01T00:00:00'], dtype='datetime64[s]') @@ -96,7 +96,7 @@ If you need to preserve timezone information, there are two alternatives: .. code-block:: python - >>> arr.to_pandas() # doctest: +SKIP + >>> arr.to_pandas() 0 2025-01-01 00:00:00+00:00 1 2025-01-01 00:00:00+00:00 dtype: datetime64[s, UTC] From cf094e6f2c5f7ca48bb149361d97b5f1f3b349f4 Mon Sep 17 00:00:00 2001 From: Alexandros Anastasiou Date: Mon, 27 Apr 2026 23:56:59 +0100 Subject: [PATCH 4/4] GH-45644: [Doc][Python] Document timezone-aware NumPy object conversion --- docs/source/python/numpy.rst | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/docs/source/python/numpy.rst b/docs/source/python/numpy.rst index 835cd14196ee..5a76cb4ed3ab 100644 --- a/docs/source/python/numpy.rst +++ b/docs/source/python/numpy.rst @@ -101,6 +101,17 @@ If you need to preserve timezone information, there are two alternatives: 1 2025-01-01 00:00:00+00:00 dtype: datetime64[s, UTC] + To convert back to NumPy while preserving timezone information, use + ``timestamp_as_object=True`` to get an object array of Python ``datetime`` + objects: + + .. code-block:: python + + >>> arr.to_pandas(timestamp_as_object=True).to_numpy() # doctest: +ELLIPSIS + array([datetime.datetime(2025, 1, 1, 0, 0, tzinfo=...), + datetime.datetime(2025, 1, 1, 0, 0, tzinfo=...)], + dtype=object) + .. note:: For nested types (e.g., list arrays containing timestamps),