Skip to content

SG-43437 Data and storage module migration from Flow framework to vendor package in tk-core#1103

Draft
yungsiow wants to merge 53 commits into
masterfrom
ticket/sg-43437/migrate-data-storage-framework
Draft

SG-43437 Data and storage module migration from Flow framework to vendor package in tk-core#1103
yungsiow wants to merge 53 commits into
masterfrom
ticket/sg-43437/migrate-data-storage-framework

Conversation

@yungsiow
Copy link
Copy Markdown
Contributor

@yungsiow yungsiow commented Jun 3, 2026

Summary: The "data" and "sandbox" modules of the tk-framework-flowam within the Flow-Toolkit integration POC have been refactored and migrated into tk-core as a tank_vendor package. This has been named the "flow integration sdk" for now.

Refactoring changes:

  • Created a separation between storage management and data wrappers
  • Storage management is now implemented as a lower layer (interfacing only with the MEDM Python SDK)
  • Object wrappers use both the MEDM Python SDK and the storage management layer
  • Removed some redundant functions
  • Ported over all supporting utility functions, exception classes, constants, etc.

High-level description of functionality:
The integration sdk provides utilities for

  • managing a gql client and interactions with MEDM
  • managing storage locations/paths
  • providing file transfer capabilities
  • providing publish and fetching capabilities
  • implementing a sandbox mechanism (local work-in-progress storage)
  • managing a local file cache
  • Flow entity object wrappers for convenient property access and querying
  • a light schema cache and schema querying capabilities

stevelittlefish and others added 30 commits May 13, 2026 15:49
Adds requirements/any/ for Python-version-independent vendor zips and
teaches python/tank_vendor/__init__.py to auto-discover and load them
alongside the existing pkgs.zip. Drops in flow_data_sdk-beta.zip as the
first such vendor.

The loader refactor extracts the existing pkgs.zip init into a reusable
_load_packages_from_zip helper. Shared zips load after pkgs.zip so
per-version pins win on name collision; collisions warn and skip rather
than overwrite. Per-package import failures continue to warn-and-continue
(the SDK uses 3.10+ syntax, so it'll simply be absent on 3.7/3.9 instead
of breaking import tank_vendor).

Includes a small _patch_flow_data_sdk_version workaround for an upstream
bug: the SDK's _version.py queries importlib.metadata.version(
"adsk-flow-data") but the published wheel's distribution name is
"flow-data-sdk", so SDK_VERSION otherwise falls back to "local_dev"
even with .dist-info present. The patch is a self-disabling no-op once
upstream is fixed.

Tests cover the new package via PACKAGES_TO_TEST (3.10+ gated) plus a
TestFlowDataSDK class with a dist-info canary that catches future
regressions in the zip's metadata packaging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Flow Data SDK previously hardcoded the wrong distribution name in
flow_data_sdk/base/_version.py ("adsk-flow-data" instead of the actual
published name "flow-data-sdk"), so importlib.metadata.version() always
raised PackageNotFoundError and SDK_VERSION fell back to "local_dev"
even with .dist-info present in our shared zip.

The new SDK zip ships the upstream one-line fix — _version.py now
queries the correct name and SDK_VERSION resolves to the real version
on its own. The local workaround patch can go.

Removes:
- _patch_flow_data_sdk_version function and its call site in
  tank_vendor/__init__.py (~37 lines)
- The upstream-bug commentary in the test_dist_info_via_importlib_metadata
  docstring (the assertion itself is unchanged and still passes)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The required parameter was only meaningful in the late-stage-exception
branch (raise vs warn). The other two failure paths returned False
identically in both modes, so the parameter was mostly dead weight.

Collapse to always-raise for any zip's wholesale load failure, matching
the original pkgs.zip posture. Shared zips in requirements/any/ are now
held to the same standard: a corrupt or import-broken shared zip will
fail import tank_vendor with a clean RuntimeError instead of silently
degrading. Per-package ImportError inside the loop still warns and
skips, so flow_data_sdk being absent on Python 3.7/3.9 (3.10+ syntax)
remains non-fatal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes a regression in the share_core integration test on Windows /
Python 3.13.

flow_data_sdk's _version.py calls importlib.metadata.version(
"flow-data-sdk") at import time. importlib.metadata iterates sys.path
in order, and FastPath.zip_children() is @lru_cache'd — the cached
FastPath holds an open zipfile.ZipFile to whichever zip it probed.

Previously pkgs.zip was at sys.path[0] and flow_data_sdk-beta.zip at
sys.path[1], so the scan probed pkgs.zip first (no match → cached open
handle), then matched in flow_data_sdk-beta.zip. The lingering handle
on pkgs.zip caused share_core's shutil.move to fail with WinError 32
when relocating install/core on Windows.

Reorder so shared zips end up at sys.path[0], pkgs.zip at sys.path[1].
importlib.metadata then short-circuits on the first probe and never
opens pkgs.zip. pkgs.zip is still loaded into sys.modules first, so
collision precedence is unchanged.

Drop the path_position parameter from _load_packages_from_zip — every
zip is now always inserted at sys.path[0], and the call order in
tank_vendor/__init__.py determines the final sys.path ordering.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… zips

The previous reorder commit (f96c7d6) moved the share_core WinError 32
from pkgs.zip to flow_data_sdk-beta.zip — same root cause, different
file. This is the actual fix.

importlib.metadata.FastPath.__new__ is @lru_cache'd. The FastPath
instance for whichever zip importlib.metadata probes is kept alive
forever, and inside FastPath.zip_children() the line
`self.joinpath = zip_path.joinpath` binds a zipfile.Path (with its
underlying open ZipFile) as an instance attribute. The result: the
cache permanently pins an open file handle on every zip that ever
yielded a metadata match.

flow_data_sdk's _version.py triggers this by calling
importlib.metadata.version("flow-data-sdk") at module import time.
The cached FastPath then keeps flow_data_sdk-beta.zip open, which
on Windows blocks share_core's shutil.move(install/core, ...).

Fix: after all zips are loaded, call MetadataPathFinder().invalidate_caches()
to drop FastPath references, then gc.collect() so the underlying
ZipFile.__del__ fires immediately and releases the OS handle.

invalidate_caches is called on an instance, not the class, because it
isn't decorated as @classmethod in older Python versions but takes
`cls` by convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cleanup added in 8c006ca (clearing FastPath cache + gc.collect to
release zipfile handles) is a workaround for Windows' sharing-violation
file-move semantics. Linux and macOS allow moving files with open
handles, so the cleanup is unnecessary there — and it was observed to
break a Linux / Python 3.13 integration test in CI.

Guard with sys.platform == "win32" so non-Windows platforms get the
previous behaviour unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three changes prompted by PR review on #1098:

1. Broaden per-package import catch from ImportError to Exception. The
   inner try/except is best-effort by design (the outer wholesale-failure
   handler still raises). A future shared vendor using PEP 604 unions or
   other syntax-level newness would currently break import tank_vendor
   on older Pythons with a SyntaxError escaping the inner catch;
   widening the except matches the documented intent.

2. Add TestFlowDataSDKAbsentOnOldPython, gated to Python < 3.10. Pins the
   contract that the loader warns and continues when a shared vendor
   fails to import, so the PR's behavioural claim ("on 3.7/3.9 the SDK
   is simply absent") is actually exercised in CI rather than just
   asserted in the description.

3. Soften the misleading "mandatory" label on pkgs.zip in the module
   docstring. Missing pkgs.zip is tolerated to support pip-installed
   tk-core where dependencies come from the environment; the docstring
   for _load_packages_from_zip already says so, but the top-of-file
   summary still claimed otherwise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Filter top-level .dll files in _discover_top_level_packages so a stray
  Windows DLL at the root of a vendor zip is not treated as an
  importable package (Carlos).
- Reword the unreadable-zip warning to acknowledge that affected
  dependencies may still resolve from the Python environment instead of
  implying a guaranteed failure (Copilot).
- Pass RuntimeWarning + stacklevel=2 on per-package import failures so
  they match the other warnings in this module and point at the caller
  (Copilot).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a Flow/MEDM authentication path that triggers proactively during
ToolkitManager bootstrap when the resolved project is "AM-ready" (project
field sg_flow_am_id is set). On the silent path (keyring or refresh
token), no UI surfaces; otherwise a browser PKCE flow opens before the
engine starts.

Sourced from the tk-framework-flowam PoC, which will be deleted in time.

  * python/tank_vendor/adsk_auth/      vendored PKCE + keyring helper
                                       (PyJWT + keyring as new third-party
                                       deps, added to per-Python pkgs.zip
                                       via requirements/<py>/requirements.txt)

  * python/tank/authentication/flow_auth/
      init_authentication(), get_access_token(), check_token_expiry()
      FlowAuthSettings + resolve_flow_auth_settings() — defaults + env-var
      overrides (TK_FLOW_AUTH_APPLICATION_ID/BASE_URL/CALLBACK_URL)
      FlowAuthError / FlowAuthConfigurationError
      AM_READY_PROJECT_FIELD (single point of truth — currently
      sg_flow_am_id pending confirmation from Julien)

  * python/tank/bootstrap/manager.py
      _resolve_project_id() extracted from _get_configuration
      _check_and_trigger_am_auth() new hook called right before
      _get_updated_configuration returns. Configuration errors raise
      TankBootstrapError; runtime auth failures are logged and swallowed
      unless TK_FLOW_AUTH_REQUIRED=1.

Tests: 29 new (16 flow_auth unit + 13 bootstrap hook), all existing
bootstrap (103) and authentication (101) tests still pass.

FlowAuthenticationHandler from flowam is intentionally dropped — out
of scope for this ticket; tk-core only triggers auth here, it does not
own a GQL handler.

TODOs flagged in code for follow-up: confirm AM-ready field name and
real production APS values (application_id, base_url, callback_url) with
Julien Langlois before release. pkgs.zip regeneration is a release-time
step and not included in this commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Restore the unicode → arrow in the _install_import_hook docstring.
- Wrap the new requirements/any/ paragraph in developer/README.md.
- Restore the Step 1/2/3/4 navigational comments inside
  _load_packages_from_zip that the refactor had stripped.
- Drop the Python<3.8 fallback in _release_importlib_metadata_handles;
  Python 3.7/3.8 compatibility was discontinued after March 2026.
- Reorganise __init__.py so all helper defs (including
  _release_importlib_metadata_handles) come before the MAIN
  INITIALIZATION block, and call the release helper from the main block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add Sphinx-style type info to _resolve_project_id and _check_and_trigger_am_auth
- Change _check_and_trigger_am_auth to accept entity instead of project_id,
  merging the ShotGrid AM-ready field fetch with project resolution to save
  one API round-trip (deep-field notation for the general entity case)
- Drop the redundant _resolve_project_id call in _get_updated_configuration

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sertion

Regenerated pkgs.zip, frozen_requirements.txt, and certs for Python 3.7,
3.9, 3.10, 3.11, and 3.13. Also fixed update_python_packages.py to count
.dist-info directories instead of package directories when asserting install
completeness, since namespace packages (ruamel.yaml, jaraco.*) share parent
directories and caused a false assertion failure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
stevelittlefish and others added 5 commits June 2, 2026 17:19
Bind the PKCE callback HTTP server to 127.0.0.1 / ::1 instead of
0.0.0.0 / :: to prevent exposing the OAuth redirect endpoint on the
local network. Port-in-use probes updated to match.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Call server.shutdown() + server.server_close() in a finally block in
web_authenticate() so the bound port and server thread are released
once the auth code is obtained, on timeout, or on error — rather than
persisting for the lifetime of the host process.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 3, 2026

Codecov Report

❌ Patch coverage is 95.87629% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.27%. Comparing base (34cb3fb) to head (5415a9b).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
...n/tank/authentication/flow_auth/_authentication.py 94.59% 4 Missing ⚠️
python/tank/bootstrap/manager.py 94.87% 2 Missing ⚠️
python/tank/authentication/flow_auth/_settings.py 91.66% 1 Missing ⚠️
python/tank/context.py 92.85% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1103      +/-   ##
==========================================
- Coverage   79.42%   79.27%   -0.15%     
==========================================
  Files         198      204       +6     
  Lines       20750    20933     +183     
==========================================
+ Hits        16481    16595     +114     
- Misses       4269     4338      +69     
Flag Coverage Δ
Linux 79.09% <95.87%> (+0.19%) ⬆️
Python-3.10 79.09% <95.87%> (-0.15%) ⬇️
Python-3.11 78.77% <95.87%> (-0.38%) ⬇️
Python-3.13 ?
Python-3.9 78.87% <95.87%> (-0.34%) ⬇️
Windows ?
macOS 78.93% <95.87%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

stevelittlefish and others added 7 commits June 4, 2026 12:43
Co-authored-by: Julien Langlois <16244608+julien-lang@users.noreply.github.com>
Expose the FlowAM project ID (sg_flow_am_id ShotGrid field) as a
first-class property on the Context class. The value is fetched lazily
from ShotGrid on first access and then cached, so non-AM projects pay
no extra cost. It is included in to_dict()/from_dict() serialization
so it survives cross-process context handoffs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Guard the new context-based AM check against site-level bootstraps
where entity is None, which would otherwise crash before reaching
_trigger_am_auth's own None guard.

Update tests to match the renamed _trigger_am_auth method and the
restructured AM-readiness check (now at the call site via
ctx.flow_am_project_id).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
config.get_tk_instance() requires a fully serializable user object
which is not available mid-bootstrap. Query sg_flow_am_id via
self._sg_connection instead, using the existing _resolve_project_id
helper to keep it clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- The original solution (caching in private __flow_am_project_id var in context object) didn't seem to work as intended.  The project query ended up running multiple times rather than the value being cached and persisting.
- Setting the value on the context.project dictionary works and makes the value persist
- Changed the flow_am_project_id property to read from context.project, so we still get the benefit of the nicer interface
- Removed unnecessary context.__flow_am_project_id which is now unused
@yungsiow yungsiow force-pushed the ticket/sg-43437/migrate-data-storage-framework branch from 315fddb to b44abb6 Compare June 4, 2026 20:45
stevelittlefish and others added 10 commits June 5, 2026 13:00
- Fix KeyError in flow_am_project_id property: use .get() so non-AM
  projects return None instead of raising on a missing dict key
- Fix NameError in Context._from_dict: remove stale _FLOW_AM_ID_NOT_FETCHED
  sentinel; restore sg_flow_am_id onto ctx.project from serialized data
  so serialize/deserialize round-trips are preserved
- Fix _trigger_am_auth: add entity is None guard matching the documented
  no-op behaviour
- Update TestContextFlowAmProjectId tests to match the new design where
  sg_flow_am_id is read directly from the project dict rather than
  lazy-fetched from ShotGrid

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix path matching

When writing local cached paths back to SG, the matching logic for finding the corresponding entity to update was flawed.  Due to potential path format mismatches, the correct match was not being found.  Add an internal function to normalize these paths in an os agnostic way to ensure correct matches are found.

* Added unit test coverage for new normalize function

* fix unit test
…1096)

* Tests to reproduce unresolved version behaviour

* Fixed unkown version issue for pip installed tk-core

* SG-42277 Call get_currently_running_api_version via submodule in integration tests

The function is defined in tank.pipelineconfig_utils and is not re-exported
at the top of the tank/sgtk module, so sgtk.get_currently_running_api_version
raises AttributeError. Call it via sgtk.pipelineconfig_utils instead.

* SG-42277 Strip PYTHONPATH for venv subprocess in pip_install integration test

The integration test runner prepends the source tree's python/ directory to
PYTHONPATH, which the test's subprocess inherits. This shadows the venv's
pip-installed sgtk with the source tree, causing _get_version_from_manifest
to pick up the source repo's info.yml (version: "HEAD") instead of
exercising the importlib.metadata fallback. Drop PYTHONPATH from the
subprocess env so the venv's site-packages takes precedence.

* Linter

* Code review feedback addressed

* SG-42277 Use dist metadata in get_core_api_version when pip installed

Mirrors the importlib.metadata fallback from get_currently_running_api_version,
but only when the requested core_install_root refers to the currently-running
core. Other paths still return "unknown" so callers comparing distinct cores
(e.g. core_localize) aren't given the running version by mistake.
- refactored "data" and "sandbox" modules to become the foundation of a new integration sdk
- isolated storage management concerns from wrapper objects
- moved into tk-core as part of tank_vendor packages as a first step
- included all supporting globals and utility functions
- CODE UNVALIDATED!
- add caching to fetch.fetch_blob_urls()
- add SessionCollection class and global var, replacing SessionProject
- in FlowRevision.fetch(), add ability to choose whether to fetch dependencies too
- store revision object from within FlowComponent object
- fixed schema module to use SessionCollection
- fixed tracing output
- added Source and Thumbnail component specs and supporting globals because this is probably generally useful
- added globals.set_logger_callback()
- added FlowProject.collection_id property
…in-context' into ticket/sg-43437/migrate-data-storage-framework"

This reverts commit 74bbb78, reversing
changes made to cfb6ef1.
@yungsiow yungsiow force-pushed the ticket/sg-43437/migrate-data-storage-framework branch from b44abb6 to 9c4a088 Compare June 5, 2026 15:24
- created "flowam" submodule for containing toolkit specific flowam utilities
- added some key constants and custom schema config to the the above
- resolve config settings as overrides for flow authentication values
- initialize flow session during engine init
FLOW_AUTH_APP_ID, DEFAULT_AUTH_APPLICATION_ID
),
auth_base_url=overrides.get(FLOW_AUTH_BASE_URL, DEFAULT_AUTH_BASE_URL),
auth_callback_url=overrides.get(FLOW_AUTH_CALLBACK_URL, DEFAULT_AUTH_CALLBACK_URL),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Black would make changes.


from __future__ import annotations

import os
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'os' imported but unused


# Location of config for specifying custom schemas used
# in the toolkit Flow integration
FLOW_SCHEMA_CONFIG_PATH = str(pathlib.Path(__file__).resolve().parent) + "/config.json" No newline at end of file
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no newline at end of file


self.log_info("Doing Flow Integration SDK initialization...")
self.log_info(f"Flow AM Project ID: {flow_project_id}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line contains whitespace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants