Skip to content

python(feat): add HDF5 client-side detect_config#536

Merged
wei-qlu merged 11 commits intomainfrom
python/add-hdf5-client-support
Apr 16, 2026
Merged

python(feat): add HDF5 client-side detect_config#536
wei-qlu merged 11 commits intomainfrom
python/add-hdf5-client-support

Conversation

@wei-qlu
Copy link
Copy Markdown
Contributor

@wei-qlu wei-qlu commented Apr 14, 2026

What was changed

Implement HDF5 detect_config client-side. This handles compound datasets by treating the first field as time and remaining fields as value channels, single-column datasets use a root time dataset if present, and channel metadata is read from HDF5 attributes (matching the existing backend implementation).

Verification

Unit tests and manual testing with a local script, verifying the full import process succeeded:

  • Creating a config and importing
  • Calling detect_config, making adjustments, and importing

Example import workflow

"""
Supported HDF5 layouts:
  1. Groups with separate datasets — time_dataset and value_dataset point to different paths
       /group_name/timestamps   (1D)
       /group_name/values       (1D)

  2. 2D datasets — same path for both, differentiated by column index
       /dataset_name            (N x 2, col 0 = time, col 1 = value)

  3. Compound datasets — same path for both, differentiated by field name
       /dataset_name            (compound dtype with named fields, e.g. ('Timestamp', 'Sensor Reading'))
       Use time_field/value_field to select the appropriate fields.
"""

import os
from dotenv import load_dotenv

from sift_client import SiftClient, SiftConnectionConfig
from sift_client.sift_types.channel import ChannelDataType
from sift_client.sift_types.data_import import (
    Hdf5ImportConfig,
    Hdf5DataColumn,
    TimeFormat,
)

load_dotenv()

HDF5_FILE = "all_types.h5"
ASSET_NAME = "hdf5_all_types_example"


def get_client() -> SiftClient:
    return SiftClient(
        connection_config=SiftConnectionConfig(
            api_key=os.getenv("SIFT_API_KEY"),
            grpc_url=os.getenv("GRPC_API_URL"),
            rest_url=os.getenv("REST_API_URL"),
            use_ssl=False,
        )
    )


def main():
    client = get_client()
    
    # alternatively, call detect_config and make adjustments as necessary
    config = Hdf5ImportConfig(
        asset_name=ASSET_NAME,
        run_name="all-types-manual-run",
        time_format=TimeFormat.ABSOLUTE_UNIX_NANOSECONDS,
        data=[
            # Layout 1: Groups — separate timestamps/values datasets
            Hdf5DataColumn(
                name="channel_bool",
                data_type=ChannelDataType.BOOL,
                time_dataset="channel_bool/timestamps",
                value_dataset="channel_bool/values",
            ),
            Hdf5DataColumn(
                name="channel_int32",
                data_type=ChannelDataType.INT_32,
                time_dataset="channel_int32/timestamps",
                value_dataset="channel_int32/values",
            ),
            Hdf5DataColumn(
                name="channel_int64",
                data_type=ChannelDataType.INT_64,
                time_dataset="channel_int64/timestamps",
                value_dataset="channel_int64/values",
            ),
            Hdf5DataColumn(
                name="channel_uint32",
                data_type=ChannelDataType.UINT_32,
                time_dataset="channel_uint32/timestamps",
                value_dataset="channel_uint32/values",
            ),
            Hdf5DataColumn(
                name="channel_uint64",
                data_type=ChannelDataType.UINT_64,
                time_dataset="channel_uint64/timestamps",
                value_dataset="channel_uint64/values",
            ),
            Hdf5DataColumn(
                name="channel_string",
                data_type=ChannelDataType.STRING,
                time_dataset="channel_string/timestamps",
                value_dataset="channel_string/values",
            ),
            Hdf5DataColumn(
                name="channel_enum",
                data_type=ChannelDataType.INT_64,
                time_dataset="channel_enum/timestamps",
                value_dataset="channel_enum/values",
            ),
            Hdf5DataColumn(
                name="channel_bit_field",
                data_type=ChannelDataType.UINT_64,
                time_dataset="channel_bit_field/timestamps",
                value_dataset="channel_bit_field/values",
            ),
            Hdf5DataColumn(
                name="channel_bytes",
                data_type=ChannelDataType.BYTES,
                time_dataset="channel_bytes/timestamps",
                value_dataset="channel_bytes/values",
            ),
            # Layout 2: 2D datasets — same path, differentiated by column index
            Hdf5DataColumn(
                name="channel_double",
                data_type=ChannelDataType.DOUBLE,
                time_dataset="channel_double",
                value_dataset="channel_double",
                time_index=0,
                value_index=1,
            ),
            Hdf5DataColumn(
                name="channel_float",
                data_type=ChannelDataType.DOUBLE,
                time_dataset="channel_float",
                value_dataset="channel_float",
                time_index=0,
                value_index=1,
            ),
        ],
    )

    job = client.data_import.import_from_path(
        file_path=HDF5_FILE,
        asset=ASSET_NAME,
        config=config,
        show_progress=True,
    )

    result = job.wait_until_complete()
    result.get_import_run()

if __name__ == "__main__":
    main()

@wei-qlu wei-qlu requested a review from marc-sift April 14, 2026 00:13
@wei-qlu wei-qlu self-assigned this Apr 14, 2026
@wei-qlu wei-qlu requested a review from alexluck-sift April 14, 2026 18:43
@wei-qlu wei-qlu marked this pull request as ready for review April 14, 2026 18:43
@wei-qlu wei-qlu requested a review from solidiquis April 14, 2026 18:44
@wei-qlu wei-qlu changed the title python(feat): add hdf5 client-side detect_config python(feat): add HDF5 client-side detect_config Apr 14, 2026
@wei-qlu wei-qlu force-pushed the python/add-hdf5-client-support branch from 4cb86d3 to d770ac4 Compare April 14, 2026 19:28
@wei-qlu wei-qlu removed their assignment Apr 14, 2026
Comment thread python/lib/sift_client/_internal/util/hdf5.py
@wei-qlu wei-qlu requested a review from marc-sift April 16, 2026 16:51
@wei-qlu wei-qlu enabled auto-merge (squash) April 16, 2026 18:29
@wei-qlu wei-qlu merged commit e2352d1 into main Apr 16, 2026
22 checks passed
@wei-qlu wei-qlu deleted the python/add-hdf5-client-support branch April 16, 2026 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants