Skip to content

feat: [lungmap] add lungmap projects to google datasets catalog (#4808)#4832

Draft
frano-m wants to merge 5 commits into
mainfrom
fran/4808-lungmap-google-datasets-jsonld
Draft

feat: [lungmap] add lungmap projects to google datasets catalog (#4808)#4832
frano-m wants to merge 5 commits into
mainfrom
fran/4808-lungmap-google-datasets-jsonld

Conversation

@frano-m
Copy link
Copy Markdown
Contributor

@frano-m frano-m commented May 13, 2026

Summary

Adds Schema.org Dataset JSON-LD to LungMAP project detail pages so Google Dataset Search can index them. Completes the three-consumer rollout alongside #4806 (HCA) and #4807 (AnVIL).

LungMAP shares the HCA Azul backend, so this PR refactors the HCA builder into a shared parameterized core (buildProjectJsonLd) and adds a thin LungMAP wrapper supplying its own catalog identity.

Refactor:

  • NEW app/utils/schemaOrg/projectDataset.ts — shared buildProjectJsonLd(data, browserURL, options) core extracted from hcaProjectDataset.ts (lift-and-rename; logic unchanged). Defines ProjectCatalogOptions for catalog identity (catalogName, descriptionFallbackSuffix).
  • app/utils/schemaOrg/hcaProjectDataset.ts — reduced to a 27-line wrapper passing HCA catalog config.

LungMAP:

  • NEW app/utils/schemaOrg/lungmapProjectDataset.ts — 27-line wrapper passing LungMAP catalog config.
  • NEW __tests__/utils/schemaOrg/lungmapProjectDataset.test.ts — 3 tests verifying LungMAP catalog identity + URL pattern + description padding (the shared core is covered by the 14 existing HCA tests).
  • pages/[entityListType]/[...params].tsx — added isLungMap = siteConfig.appTitle?.includes("LungMAP") guard and mount via the generic renderJsonLd helper introduced in [AnVIL DX] Add AnVIL datasets to Google Datasets catalog #4807.

Closes #4808. Stacked on #4831 (AnVIL PR), which is stacked on #4829 (HCA PR). Once #4829 and #4831 merge, rebase this PR's base to main.

Ticket scope audit (MVP)

Field Status
name, description (required)
identifier, url, sameAs, includedInDataCatalog, isAccessibleForFree, keywords, creator, citation ✅ (inherited from shared core; same field coverage as HCA)
funder, license, distribution, measurementTechnique, variableMeasured ⏸ deferred per the HCA PR's deferral list

LungMAP-specific differences from HCA: catalogName = "LungMAP Data Explorer", padding suffix "LungMAP Data Explorer project.". Every other mapping is identical because LungMAP uses HCA's ProjectResponse shape.

Test plan

  • npx tsc --noEmit passes
  • npm run lint, npm run check-format pass
  • npx jest __tests__/utils/schemaOrg — 28/28 tests pass (14 HCA + 11 AnVIL + 3 LungMAP)
  • npm run build-dev:lungmap — 4/10 project detail pages emit JSON-LD with "name":"LungMAP Data Explorer" catalog (remainder are sub-tab routes where processEntityProps short-circuits — same gating pattern as HCA/AnVIL)
  • npm run build-ma-dev:hca-dcp — HCA still 110/116 (no regression from the core extraction)
  • npm run build:anvil-cmg — AnVIL still 375/422 (no regression)
  • npm run build-dev:anvil-catalog — clean, no JSON-LD (correctly gated)
  • Validate output against Google's Rich Results Test and Schema Markup Validator after deploy
  • Request indexing via Google Search Console post-merge

🤖 Generated with Claude Code

@frano-m frano-m changed the base branch from fran/4807-anvil-dx-google-datasets-jsonld to main May 13, 2026 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants