feat: [lungmap] add lungmap projects to google datasets catalog (#4808)#4832
Draft
frano-m wants to merge 5 commits into
Draft
feat: [lungmap] add lungmap projects to google datasets catalog (#4808)#4832frano-m wants to merge 5 commits into
frano-m wants to merge 5 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds Schema.org Dataset JSON-LD to LungMAP project detail pages so Google Dataset Search can index them. Completes the three-consumer rollout alongside #4806 (HCA) and #4807 (AnVIL).
LungMAP shares the HCA Azul backend, so this PR refactors the HCA builder into a shared parameterized core (
buildProjectJsonLd) and adds a thin LungMAP wrapper supplying its own catalog identity.Refactor:
app/utils/schemaOrg/projectDataset.ts— sharedbuildProjectJsonLd(data, browserURL, options)core extracted fromhcaProjectDataset.ts(lift-and-rename; logic unchanged). DefinesProjectCatalogOptionsfor catalog identity (catalogName,descriptionFallbackSuffix).app/utils/schemaOrg/hcaProjectDataset.ts— reduced to a 27-line wrapper passing HCA catalog config.LungMAP:
app/utils/schemaOrg/lungmapProjectDataset.ts— 27-line wrapper passing LungMAP catalog config.__tests__/utils/schemaOrg/lungmapProjectDataset.test.ts— 3 tests verifying LungMAP catalog identity + URL pattern + description padding (the shared core is covered by the 14 existing HCA tests).pages/[entityListType]/[...params].tsx— addedisLungMap = siteConfig.appTitle?.includes("LungMAP")guard and mount via the genericrenderJsonLdhelper introduced in [AnVIL DX] Add AnVIL datasets to Google Datasets catalog #4807.Closes #4808. Stacked on #4831 (AnVIL PR), which is stacked on #4829 (HCA PR). Once #4829 and #4831 merge, rebase this PR's base to
main.Ticket scope audit (MVP)
name,description(required)identifier,url,sameAs,includedInDataCatalog,isAccessibleForFree,keywords,creator,citationfunder,license,distribution,measurementTechnique,variableMeasuredLungMAP-specific differences from HCA:
catalogName = "LungMAP Data Explorer", padding suffix"LungMAP Data Explorer project.". Every other mapping is identical because LungMAP uses HCA'sProjectResponseshape.Test plan
npx tsc --noEmitpassesnpm run lint,npm run check-formatpassnpx jest __tests__/utils/schemaOrg— 28/28 tests pass (14 HCA + 11 AnVIL + 3 LungMAP)npm run build-dev:lungmap— 4/10 project detail pages emit JSON-LD with"name":"LungMAP Data Explorer"catalog (remainder are sub-tab routes whereprocessEntityPropsshort-circuits — same gating pattern as HCA/AnVIL)npm run build-ma-dev:hca-dcp— HCA still 110/116 (no regression from the core extraction)npm run build:anvil-cmg— AnVIL still 375/422 (no regression)npm run build-dev:anvil-catalog— clean, no JSON-LD (correctly gated)🤖 Generated with Claude Code