Skip to content

Multi-tenant /Catalogs Extension#880

Open
jonhealy1 wants to merge 52 commits intostac-utils:mainfrom
jonhealy1:multi-tenant-catalogs-extension
Open

Multi-tenant /Catalogs Extension#880
jonhealy1 wants to merge 52 commits intostac-utils:mainfrom
jonhealy1:multi-tenant-catalogs-extension

Conversation

@jonhealy1
Copy link
Copy Markdown
Collaborator

@jonhealy1 jonhealy1 commented Feb 7, 2026

Related Issue(s):

Description: Multi-Tenant Catalogs Extension

This extension introduces a recursive /catalogs endpoint to the STAC API, enabling complex, nested hierarchies beyond the standard flat Root -> Collections structure. It transforms a STAC API into a Multi-Tenant system capable of serving distinct catalog trees (e.g., Provider -> Theme -> Project) within a single instance.

Key Architectural Concepts:

  • Recursive Hierarchy: Unlike standard STAC, which flattens data into a single list of Collections, this extension allows Catalogs to contain other Sub-Catalogs to unlimited depth.
  • Poly-Hierarchy (Virtual Organization): Collections are not physically moved but logically linked. A single Collection can belong to multiple parent Catalogs simultaneously (e.g., a "Sentinel-2" collection can exist under both a "USGS" catalog and an "Optical Data" catalog).
  • Safety-First Management: The architecture strictly separates "Organization" from "Data."
    • Unlinking vs. Deleting: Deleting a Catalog via this extension is non-destructive to the actual data. It "unlinks" the child Collections.
    • Orphan Adoption: If a Collection is unlinked from its last parent, it is automatically adopted by the Root Catalog to ensure no data is ever lost or becomes undiscoverable.
  • Unified Discovery: Integrates with the Children Extension to provide a single view (/children) that lists both Sub-Catalogs and Collections, supporting optional type filtering.
  • Configurable Transactions. The extension supports an enable_transactions flag (default: False).

Full Endpoint List:

  • Registry & Root:
    • GET /catalogs - List all root catalogs.
    • POST /catalogs - Register a new root catalog.
  • Catalog Management:
    • GET /catalogs/{catalog_id} - Get catalog metadata.
    • PUT /catalogs/{catalog_id} - Update catalog metadata.
    • DELETE /catalogs/{catalog_id} - Disband a catalog (does not delete data).
  • Sub-Catalogs (Recursive):
    • GET /catalogs/{catalog_id}/catalogs - List sub-catalogs.
    • POST /catalogs/{catalog_id}/catalogs - Create or link a sub-catalog.
    • DELETE /catalogs/{catalog_id}/catalogs/{sub_catalog_id} - Unlink a sub-catalog.
  • Collections (Poly-hierarchy):
    • GET /catalogs/{catalog_id}/collections - List linked collections.
    • POST /catalogs/{catalog_id}/collections - Create or link a collection.
    • GET /catalogs/{catalog_id}/collections/{collection_id} - Get collection details.
    • DELETE /catalogs/{catalog_id}/collections/{collection_id} - Unlink a collection.
  • Items (Scoped Access):
    • GET /catalogs/{catalog_id}/collections/{collection_id}/items - Search items within catalog context.
    • GET /catalogs/{catalog_id}/collections/{collection_id}/items/{item_id} - Get single item.
  • Discovery & Capabilities:
    • GET /catalogs/{catalog_id}/children - Unified list of child Catalogs and Collections.
    • GET /catalogs/{catalog_id}/conformance - Conformance classes for this catalog tree.
    • GET /catalogs/{catalog_id}/queryables - Queryable fields for this catalog tree.

Specification Reference:
Healy-Hyperspatial/multi-tenant-catalogs

PR Checklist:

  • pre-commit hooks pass locally
  • Tests pass (run make test)
  • Documentation has been updated to reflect changes, if applicable, and docs build successfully (run make docs)
  • Changes are added to the CHANGELOG.

@jonhealy1 jonhealy1 changed the title Multi tenant catalogs extension Multi-tenant catalogs extension Feb 8, 2026
@jonhealy1 jonhealy1 marked this pull request as ready for review February 8, 2026 17:08
@jonhealy1 jonhealy1 marked this pull request as draft February 8, 2026 17:18
@jonhealy1 jonhealy1 removed the request for review from vincentsarago February 8, 2026 17:19
@jonhealy1 jonhealy1 marked this pull request as ready for review February 9, 2026 06:59
@vincentsarago
Copy link
Copy Markdown
Member

thank for the PR @jonhealy1

Before I start the review I have a quick question: should this extension be in core or in third_party?

@jonhealy1
Copy link
Copy Markdown
Collaborator Author

Hi @vincentsarago I was wondering about that. I can move it to third party.

@jonhealy1
Copy link
Copy Markdown
Collaborator Author

jonhealy1 commented Mar 22, 2026

I've created an alternate pr #891. This alternate pr involves hosting the extension here @StacLabs and then having a section in the readme in this repository to list third-party stac-fastapi extensions.

I feel like this is a good pattern because we have another third-party extension we would like to publish soon as well. We are open to having either of these two prs approved.

@gadomski
Copy link
Copy Markdown
Member

We are open to having either of these two prs approved.

How should we make the decision about which to review first?

@jonhealy1
Copy link
Copy Markdown
Collaborator Author

jonhealy1 commented Mar 23, 2026

@gadomski I think the decision depends on whether the team wants this extension hosted in this repository - stac-fastapi - or they would be more comfortable with the extension hosted in another repository: https://github.com/StacLabs/stac-fastapi-catalogs-extension

#891

  • involves a few lines and a link added to the Readme, pointing to the repository where the extension will be hosted.
  • future changes to the extension won't involve more of your time

#880

  • involves adding a considerable amount of code to this repository
  • there will be more changes coming down the pipeline, like support for catalogs search

There are pros and cons to both approaches maybe. I don't think you need to review both, although you could review #891 in a few minutes, because it is so short.

If there is interest here in reviewing our full extension #880, we would really appreciate it. But, we also understand that there is a lot going on, not just with the amount of code, but also with the ideas and the motivation for making different design choices relating to the specification.

@hrodmn
Copy link
Copy Markdown
Collaborator

hrodmn commented Mar 24, 2026

@jonhealy1 I have been wading through all of the issues and discussions surrounding hierarchical catalogs today and will take another look at this PR tomorrow. Right now my opinion is that if we can structure this extension such that existing clients like pystac-client and STAC Browser can navigate the hierarchy reasonably well then it is a great addition to the ecosystem.

I think that scoped search of collections within a sub-catalog would be a really useful feature so I would like to at least sketch out the plans for scoped search functionality within sub-catalogs so we can be sure that the structure does not prevent that functionality.

@jonhealy1
Copy link
Copy Markdown
Collaborator Author

jonhealy1 commented Mar 24, 2026

@hrodmn Implementing scoped search of collections within a sub-catalog would be a huge contribution. I also agree that interoperability with existing clients like stac-browser and pystac would be ideal.

jonhealy1 added a commit to stac-utils/stac-fastapi-elasticsearch-opensearch that referenced this pull request Mar 24, 2026
**Related Issue(s):**

- #308 
- stac-utils/stac-fastapi#880
- stac-utils/stac-fastapi#891

**Description:**

This PR migrates the internal Catalogs Extension implementation to the
new, standardized stac-fastapi-catalogs-extension library available on
PyPI, reducing local maintenance burden. This transition aligns our
multi-tenant catalog support with the official STAC API ecosystem:
https://github.com/StacLabs/stac-fastapi-catalogs-extension

**PR Checklist:**

- [x] Code is formatted and linted (run `pre-commit run --all-files`)
- [x] Tests pass (run `make test`)
- [x] Documentation has been updated to reflect changes, if applicable
- [x] Changes are added to the changelog
Copy link
Copy Markdown
Collaborator

@hrodmn hrodmn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that it is inherently backend-specific, but it would be good to note somewhere that, since links tend to be dynamically generated by stac-fastapi applications, the relationships between catalogs and collections (and catalogs and catalogs) needs to be tracked somehow.

I am also wondering if the children extension gets activated does that mean the API root will potentially have a rel=child link for all collections (not just sub-catalogs)?

Comment on lines +225 to +231
Logic:
1. Verifies the parent catalog exists.
2. If the sub-catalog already exists: Appends the parent ID to its
parent_ids (enabling poly-hierarchy - a catalog can have multiple
parents).
3. If the sub-catalog is new: Creates it with parent_ids initialized
to [catalog_id].
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is parent_id assumed to be a linking field in the backend data model?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think you would need to track parent_ids in both catalogs and collections. That's how we do it in SFEOS anyways. The wording here definitely doesn't make this clear.

"""

client: AsyncBaseCatalogsClient = attr.ib(kw_only=True)
enable_transactions: bool = attr.ib(default=False, kw_only=True)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonhealy1 what do you think about having a separate CatalogsTransactions extension? The transaction capability is a pretty large feature set to put behind a feature flag.

It's true that you would never add the transactions endpoints alone, so maybe keeping it as an add-on in the CatalogsExtension is better.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about that too. The normal Transactions extension wouldn't be used without the core routes either. It would make some sense to separate them?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will create some more meta-work in the specification and in this PR but I think we should separate them. Keeping them separate simplifies the conformance class logic in the main CatalogsExtension and separates the concerns of access and management.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good - would that mean 2 prs?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could implement both extensions in stac-fastapi in this PR. I am not sure if it would require a new extension repo (i.e. StacLabs/multi-tenant-catalogs-transactions).

@bitner
Copy link
Copy Markdown
Collaborator

bitner commented Mar 26, 2026

so the thought i just had. could this be implemented backend agnostic by just acting as a proxy to rewrite queries inro collection search requests similar to how stac-auth-proxy works rather than needing to implement in stac-fastapi-pgstac at all??

@jonhealy1
Copy link
Copy Markdown
Collaborator Author

I realize that it is inherently backend-specific, but it would be good to note somewhere that, since links tend to be dynamically generated by stac-fastapi applications, the relationships between catalogs and collections (and catalogs and catalogs) needs to be tracked somehow.

Definitely. We should give some guidance on this. In stac-fastapi-elasticsearch-opensearch, we track an internal-only parent_ids list on both collections and catalogs stored in the database.

I am also wondering if the children extension gets activated does that mean the API root will potentially have a rel=child link for all collections (not just sub-catalogs)?

This is something to think about. We don't presently do this.

@jonhealy1
Copy link
Copy Markdown
Collaborator Author

jonhealy1 commented Mar 26, 2026

so the thought i just had. could this be implemented backend agnostic by just acting as a proxy to rewrite queries inro collection search requests similar to how stac-auth-proxy works rather than needing to implement in stac-fastapi-pgstac at all??

@bitner It's an interesting idea. Some quick thoughts:

  1. DAG Storage: You would need to store the DAG structure somewhere. This extension allows complex catalog and sub-catalog nesting. The initial motivation for this was to help provide SKOS thematic organization for large providers like ESA and NASA. There may be performance issues with searching a separately stored data structure - "give me every id in this sub-tree" - and then sending large api requests to the backend.

  2. State Synchronization: This extension provides a virtual, catalog DAG management system via catalog transactions that would be difficult to emulate via a proxy. It could be done via the externally stored DAG, but one issue would be state synchronization: let's say you delete a collection via the normal STAC API routes - how would your proxy's DAG know it was deleted?

  3. Sub-Catalog Search: We are looking at adding sub-catalog search soon and while this could be done via proxy- find the collections and catalogs under a sub-tree and do collections-search - similarly to point 1, proxy API requests would get very large and could hit URL length or payload limits.

  4. Virtual Collections: Another feature we would like to add is the ability to create virtual collections of items based on cql2 filtering. Let's say you want a virtual collection based on geography or other criteria. A proxy would have to parse and inject complex CQL2 on the fly.

  5. Dynamic Link Rewriting: STAC relies heavily on relational links (self, parent, root, child). We also add related and duplicate. If a proxy is creating a virtual structure, it must intercept every JSON response from the backend, parse it, rewrite all the href URLs to match the virtual proxy paths, and repackage it. This adds significant latency and computational overhead to every single request.

  6. Double API Latency & Payload Size: By introducing a proxy, we are forcing every request to travel through two complete HTTP layers.

I think the complexity of creating a proxy that added all of the features our extension provides - and more in the future -combined with the performance issues and JSON-rewriting overhead would make me think it's not the best solution.

@jonhealy1
Copy link
Copy Markdown
Collaborator Author

Closing in favor of #891 and hosting this extension externally: https://github.com/StacLabs/multi-tenant-catalogs If anyone wants to contribute there - issues, prs, discussions, questions - we would love to have you.

@jonhealy1
Copy link
Copy Markdown
Collaborator Author

Reopening. Thoughts:

  • All of the other stac-fastapi extensions live here
  • How will the community know about this extension if it's hosted somewhere else and we don't want to add a link to the extension in the Readme
  • This extension, like all other extensions, are optional
  • Similarly, the code from this extension does not touch any of the core api code - there is no danger to approving this pr
  • This extension is being used in SFEOS, the second most popular stac-fastapi implementation and an important project to many organisations.
  • There is a draft pr open in stac-fastapi-pgstac to implement this extension there
  • People have been asking for a /catalogs route for years
  • Our extension is not closed source and is open to new ideas and input

@gadomski has said that there isn't a consensus towards eventually approving this pr or not. I would like to know what the objections are. Has any other extension ever been refused in this project? Some extensions hosted here don't even have tests #901

@jonhealy1
Copy link
Copy Markdown
Collaborator Author

@hrodmn I have some time coming up to clean up the docstrings and divide this into two separate extensions. Thanks for all of the input

@jonhealy1 jonhealy1 requested a review from hrodmn April 4, 2026 11:01
@jonhealy1
Copy link
Copy Markdown
Collaborator Author

Slowly adding to stac-fastapi-pgstac: stac-utils/stac-fastapi-pgstac#366

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants