Skip to content

[SCHEMA] Support structured survey data as a BIDS modality#2404

Draft
karl-koschutnig wants to merge 3 commits into
bids-standard:masterfrom
karl-koschutnig:feat/phenotype-modality-structured
Draft

[SCHEMA] Support structured survey data as a BIDS modality#2404
karl-koschutnig wants to merge 3 commits into
bids-standard:masterfrom
karl-koschutnig:feat/phenotype-modality-structured

Conversation

@karl-koschutnig

@karl-koschutnig karl-koschutnig commented Apr 20, 2026

Copy link
Copy Markdown

Summary

This PR proposes adding schema support for structured instrument-based survey data as a valid BIDS representation, organized in the same subject-, session-, and run-resolved way as other BIDS modalities.

The goal is not to replace aggregated tabular phenotypic data. It is to complement them with a canonical acquisition-facing structure that preserves provenance, timing, and instrument context, while still allowing aggregated tables to be generated later when needed.

A working reference implementation already exists in PRISM Studio, with documentation at prism-studio.readthedocs.io.

Rationale

BIDS is strongest when its canonical structures reflect how data are actually acquired. For imaging, physio, events, and other modalities, the normative pattern is a subject-resolved structure first, with higher-level summaries and
derivatives produced later.

Instrument-based phenotypic data fit this same pattern. Treating phenotype/ as the only primary home for those data flattens acquisition context at the point where BIDS usually preserves it.

A structured survey modality would:

  • preserve provenance and administration context,
  • support repeated assessments across sessions and runs,
  • keep sidecar metadata next to the corresponding data files, and
  • still allow aggregated phenotype tables to be generated later.

That direction is structurally stronger because aggregated tables can be written from a structured survey layout with little ambiguity, whereas reconstructing the original structure from only an aggregate table often requires extra
assumptions.

Proposed Changes

This PR implements a minimal but complete first pass:

File What changed
src/modality-specific-files/survey.md New documentation page: file naming conventions, sidecar structure, instrument-level and item-level field tables, worked dataset structure and JSON/TSV examples
src/schema/objects/datatypes.yaml Added survey datatype
src/schema/objects/modalities.yaml Added survey modality
src/schema/objects/suffixes.yaml Added survey suffix
src/schema/rules/files/raw/survey.yaml New file rule using task entity template; .tsv and .json extensions
src/schema/rules/modalities.yaml Mapped survey modality → survey datatype
mkdocs.yml Added navigation entry

Example Structure

Below is the subject-, session-, and run-resolved structure this PR supports. A root-level sidecar applies to all matching files via the Inheritance Principle, avoiding duplication across sessions and runs. Per-file sidecars remain valid when instrument versions or languages differ between sessions.

study/
├── dataset_description.json
├── participants.tsv
├── task-pss_survey.json          ← shared sidecar via inheritance
├── sub-01/
│   ├── ses-baseline/
│   │   └── survey/
│   │       └── sub-01_ses-baseline_task-pss_run-01_survey.tsv
│   └── ses-week04/
│       └── survey/
│           ├── sub-01_ses-week04_task-pss_run-01_survey.tsv
│           └── sub-01_ses-week04_task-pss_run-02_survey.tsv
└── phenotype/
    └── pss_summary.tsv           ← optional aggregated downstream view

The phenotype/ directory in this example is deliberate. It shows that aggregated outputs remain compatible with this proposal, but they are downstream views of structured data rather than the only canonical form.

Reference Implementation

This proposal is grounded in an existing toolchain rather than a purely abstract design discussion:

These references show that the proposed structure is already practical for curation, metadata authoring, template-based reuse, and validation.

Scope

This PR does not propose removing aggregated phenotype tables. It proposes that BIDS should also recognize a canonical modality-style structure for instrument-based phenotypic data, especially when those data are acquired repeatedly across sessions and runs.

Questions for Reviewers

  • Should instrument-based phenotypic data be representable in a subject-, session-, and run-resolved modality structure in the schema?
  • If so, should phenotype/ remain an optional aggregate or derived representation rather than the only primary representation?
  • Is survey the right directory and suffix label, or should the working group prefer another term such as pheno, assess, form, inst, or meas?
  • The current sidecar spec uses the task entity to identify the instrument (e.g. task-pss). Should a dedicated instrument entity be considered instead?
  • The sidecar fields (TaskName, OriginalName, StimulusType, Respondent, etc.) follow existing BIDS conventions — are there missing fields, or any that should move between REQUIRED and OPTIONAL?

Add survey as a first-class BIDS datatype with subject- and session-resolved
file structure mirroring other modalities.

Changes:
- objects/suffixes.yaml: add 'survey' suffix
- objects/datatypes.yaml: add 'survey' datatype
- objects/modalities.yaml: add 'survey' modality
- rules/modalities.yaml: map survey modality to survey datatype
- rules/files/raw/survey.yaml: filename rules (sub+task required, ses+run optional)
- modality-specific-files/survey.md: documentation chapter
- mkdocs.yml: add survey chapter to navigation

Reference implementation: https://github.com/MRI-Lab-Graz/prism-studio
@codecov

codecov Bot commented Apr 21, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.07%. Comparing base (308395f) to head (9c1a560).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2404   +/-   ##
=======================================
  Coverage   83.07%   83.07%           
=======================================
  Files          22       22           
  Lines        1696     1696           
=======================================
  Hits         1409     1409           
  Misses        287      287           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

and a guide for using macros can be found at
https://github.com/bids-standard/bids-specification/blob/master/macros_doc.md
-->
{{ MACROS___make_filename_template("raw", datatypes=["survey"]) }}

@yarikoptic yarikoptic Apr 29, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick one (edited/expanded 2026/05/12):
Overall, I like this approach but immediate question -- is there semantical difference between "phenotype" and "survey" in this PR? if not, I would have preferred to stay consistent and make it phenotype/ (or pheno/) (I think I expressed smth like that elsewhere TODO: find refs here). It would then be a consistent principle we already have overall and distilling more for BEPs, e.g. for stimuli/ (@bids-standard/bep044 ; TODO: find/add refs). See:

and potentially others, one way or another hinting on it, e.g

Then, if there is a desire to generalize "phenotype" into "survey" -- could be done for bids 2.0... WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants