[ENH] BEP036 - Phenotypic Data Guidelines#2123
Conversation
Upstream PR
Quick update before merging our PR on surchs fork
BEP036 brings guidelines for best tabular phenotypic data to the BIDS specification. - Includes an appendix called `phenotype.md` - Includes admonitions for the guidelines in-line with modality agnostic files sections --------- Co-authored-by: Eric Earl <eric.earl@nih.gov> Co-authored-by: Samuel Guay <samuel.guay@umontreal.ca> Co-authored-by: Sebastian Urchs <sebastian.urchs@mcgill.ca> Co-authored-by: Arshitha B <arshitha.basavaraj@iiitb.ac.in>
Changed "e.g." to "for example" to follow contributing style guidelines.
for more information, see https://pre-commit.ci
| each `phenotype/<measurement_tool_name>.json` data dictionary. | ||
| This improves reusability and provides clarity about the measurement tool. | ||
|
|
||
| ### 5. Use the demographics file for common variables about participants |
There was a problem hiding this comment.
Copying from https://github.com/surchs/bids-specification/pull/1/files#r2103117486
For this section, would it make sense to suggest that demo-like information be prioritized in this file rather than participants.tsv, making the latter primarily a list of subject IDs? I haven't seen this explicitly addressed anywhere, though I'm unsure if it's something we want to formalize 😬
Something like this could follow the paragraph?:
When all demographic data is stored in
phenotype/demographics.tsv,participants.tsvmay serve primarily as a minimal listing of subject identifiers with only theparticipant_idcolumn.
There was a problem hiding this comment.
I agree. It'd be good to mention this.
There was a problem hiding this comment.
Hi, I am new to commenting a PR and also relatively new to BIDS - so my comment might be stupid.
However, I want to draw attention to a potential issue with the handling repeated usage of a phenotype/ measurement. Currently, if a measurement is conducted more than once, a run_id column MUST be added that "corresponds to an existing run- entity used in a filename(s)" (https://bids-specification--2123.org.readthedocs.build/en/2123/modality-agnostic-files/phenotypic-and-assessment-data.html).
However, in my case (and I guess many others), a phenotype measurement tool might be conducted independently of a run in an experimental task. For example, I conduct a mood questionnaire 7 times over the course of one session in which multiple tasks (with multiple runs) are conducted. Thus, I think it is not reasonable that the run_id column in the phenotype file corresponds to an existing run_ in a (task) filename. Instead, something like a measurement_timepoint_id (or similar) that is not related to the run_ in other files might be a good idea. Again, I am sorry if my comment is stupid or I misunderstood something.
There was a problem hiding this comment.
Along this line: The current proposal states:
A participant identifier of the form sub-, matching a participant entity found in the dataset. Note that data for one participant MAY be represented across multiple rows in case of multiple sessions or runs, and therefore the entry in the participant_id column will be repeated. The combination of participant_id, session_id and run_id MUST be unique.
The problem with this is that it doesn't cover the following cases which are in many datasets that I am familiar with:
Example 1: A dataset in which there are multiple training sessions on some equipment or on some test procedure on multiple different days before any imaging or any data that would have a run occurs.
Example 2: Sleep questionnaires and other questionnaires that administered multiple times un-associated with any imaging sessions.
Both of these require another column identifier (not associated with the run).
There was a problem hiding this comment.
@VisLab Great to hear from you! Thanks for reading the BEP. Now to your comments...
Example 1: A dataset in which there are multiple training sessions on some equipment or on some test procedure on multiple different days before any imaging or any data that would have a run occurs.
We may not have made it explicit, but here is this example in the HTML preview of the appendix. It covers what to do when there's "1 participant with 2 sessions, where 1 session is only tabular phenotype and the other is only imaging". The same idea extends to more than 2 sessions of some combination of acquired data and tabular phenotypic data. For instance, in your Example 1, we could say:
Define many phenotypic session names
ses-day1equipmenttraining1ses-day1equipmenttraining2ses-day1equipmenttraining3ses-day2equipmenttraining1ses-day2equipmenttraining2ses-preproceduretrainingses-procedureses-postprocedure- ...
Store most, if not all, in one aggregated file
phenotype/equipment_training.tsv- Where
participant_id,session_id,run_id, anddayare joint indexes. - Yes,
dayis not a a validate-able column, but you could alternatively create aphenotype/day#_equipment_training.tsvif you want stricter validation support. - Remember that
run_idis not about tying the data to a particular task's run, but instead to differentiate multiple runs of the same tabular phenotypic test, survey, assessment, questionnaire, or lab result within the sameparticipant_id'ssession_id.
- Where
Example 2: Sleep questionnaires and other questionnaires that administered multiple times un-associated with any imaging sessions.
The answer to example 2 is the appendix example I linked above, and now here.
I hope this helps and let me know if you don't think this resolves your comment. Or if perhaps there's something else we should consider which you can propose?
There was a problem hiding this comment.
I want to draw attention to a potential issue with the handling repeated usage of a phenotype/ measurement. Currently, if a measurement is conducted more than once, a run_id column MUST be added that "corresponds to an existing run- entity used in a filename(s)" (https://bids-specification--2123.org.readthedocs.build/en/2123/modality-agnostic-files/phenotypic-and-assessment-data.html).
Wowza, good catch! I didn't notice that incorrect set of words there. It currently says:
(run_id is) A run identifier that corresponds to an existing run- entity used in a filename(s). A chronological run number is used when a measurement tool or assessment described by a tabular file was repeated within a session.
What it should say:
(run_id is) A run identifier to differentiate multiple runs of the same measurement tool. A chronological run number is used when a measurement tool or assessment described by a tabular file is repeated within a session.
Thanks! I'll edit that.
For example, I conduct a mood questionnaire 7 times over the course of one session in which multiple tasks (with multiple runs) are conducted.
Does the text edit above of "what it should say" make better sense and resolve the run_id labeling issue?
Put the phenotypic and assessment data content where it belongs.
Fix PhenotypeSubjectsMissing selector (suffix -> extension) and check direction (phenotype participants subset of participants.tsv). Add PhenotypeSessionFileRequired, PhenotypeSessionAggregation, PhenotypeSessionsMissing, and PhenotypeSessionLevels check rules for phenotype validation under AdditionalValidation.
| even if this file only contains a two-column inventory | ||
| of `participant_id` and `session_id`. | ||
| The `sessions.tsv` file MUST list all sessions for all subjects across | ||
| imaging and tabular phenotypic data. The `sessions.json` file’s |
There was a problem hiding this comment.
Are we in public review already? here is one:
if you are talking about having top level sessions.tsv to now have both participant_id and session_id index keys, I will keep coming to beg on my knees "PLEASE DO NOT" (edited) until we define it as a generic principle one way or another (e.g. #2273) -- there should be a discussion on that particular aspect alone not subjects.tsv ad-hoc, and not in BEP mixing in various changes. ATM sessions.tsv must have only session_id index, as any other {entity_plural}.tsv. Please do not break the few consistencies we have.
There was a problem hiding this comment.
We are not in public review yet, but I am happy to receive feedback at all stages.
I see a need in BIDS for allowing flexibility in tabular data aggregating/moving "back" in the directory hierarchy (from subdirectory "leaves" to the top-level dataset "root"). When people interact with this kind of tabular data, they don't want 10's, 100's, or 1,000's of small sub-<label>/sub-<label>_sessions.tsv segregated files. They want one big aggregated file.
As a consequence of that: the closer to the dataset root you go, the more columns you will need to describe the aggregated data. For instance, if you have 1,000's of small sub-<label>/sub-<label>_sessions.tsv segregated files and you want one big aggregated file called sessions.tsv instead, aggregating up requires taking out the sub-<label>. So you end up with the need for that participant_id column in the sessions.tsv file.
I used to think this was relevant for ALL tabular files to be able to move back in the directory hierarchy, but I later realized that things like physio or other large files would be a real mess in a much larger aggregated file. I could maybe still make the case for scans.tsv or samples.tsv though, but that's a very different story...
I would almost like to say, "'Large' tabular files should stay as-is and 'small' tabular files should allow aggregation."
There was a problem hiding this comment.
sorry -- I am not following the argument here as nothing in what you said requires breaking an established .tsv file(s) convention as it can be easily adopted in a new .tsv file as e.g. I suggest in #2273.
| "sessions.tsv": "", | ||
| "phenotype": { | ||
| "measurement_tool.json": "", | ||
| "measurement_tool.tsv": "", |
There was a problem hiding this comment.
yikes -- that will be the literal filename like measurement_tool.tsv in there??? like the suffix _tool and some measurement thing?
There was a problem hiding this comment.
Yes, this is what the phenotype/ directory already permits and the validator validates pre-BEP036. It is also how the 40 (out of 1,548) datasets with a phenotype/ directory on OpenNeuro, as of 3/30/2026, curated their tabular phenotypic data. The filenames in the phenotype/ directory can be named alphanumerically (including hyphens and plus signs, I believe), but I think the practical way to make sure people continue sharing their tabular phenotypic data is to leave the file-naming of phenotype files as-is.
Names so far are usually acronyms or short descriptions or both. Some examples:
phenotype/demographics.tsvphenotype/whodas.tsvphenotype/nih_toolbox.tsvphenotype/ARI_parent.tsvphenotype/wisc-iii.tsv
There was a problem hiding this comment.
but I think the practical way to make sure people continue sharing their tabular phenotypic data is to leave the file-naming of phenotype files as-is.
by no means I want to restrict people from sharing their data any way they want in whatever file-names they have. It is only when we talk about standardizing file names like BIDS does, that's where I have an "opinion" or otherwise we just would have ended-up with something like data/my_favorite1.nii.gz and data/notsogoodone2.nii.gz
There was a problem hiding this comment.
Agree with @yarikoptic that this breaks BIDS search principles for use of underscores.
|
|
||
| ```tsv | ||
| participant_id session_id acq_time | ||
| sub-01 ses-MRI1 2001-01-01T11:12:00 |
There was a problem hiding this comment.
please please please no!!! what common principle such file construction follows? it introduces inconsistencies! If you want to introduce some new principle, let's formalize it. PLEASE PLEASE PLEASE lets discuss and do that, e.g. in #2273 or as I invited you there -- please formulate an alternative but generic approach!
You can keep resolving my comments but I am sorry -- I will be coming back, so we better do talk and resolve this "conflict"!
There was a problem hiding this comment.
Please do keep coming back. We want the voices of those that agree AND those that have differing opinions to see whether we're satisfying 80% of the most commonly used data cases or not (see BIDS mission statement foundational principle 2). I will try to make an "alternative, but generic approach" and tag you on it referencing #2273, etc. Hopefully we can come to some agreement soon.
yarikoptic
left a comment
There was a problem hiding this comment.
There are a number of major aspects in construction of file names and their content which come up without introducing generic principles and ruining that some consistency among them what we have so far in BIDS:
- singular index in
{entityplural}.tsv(see #2273 for proposal how to generalize to multiple) - filename construction to generally follow
{entityname}-{labelorindex}_{suffix}.{ext}+ aforementioned{entityplural}.tsvto describe those entity values (see #2283 for proposal on how to generalize having such files for all across levels).
I have voiced my concerns over proposed here IMHO breaking changes numerous times, but as far as I see it, they were just brushed off without due search for compromise.
|
I agree with @yarikoptic on the file naming, which is an easy fix. A more difficult problem is the summarization across sessions and understanding the order in which data were acquired. This needs a general solution because I imagine similar problems for annotating medication that was used during different sessions/runs/. For example, in our case we have patients who are using one type of medication during a MRI, and this changes across iEEG sessions and runs during or even within a run. It would be good to be able to filter which runs were collected on certain medication. Also, in my initial impression, a phenotype is a static feature that does not change across sessions, can phenotype change across sessions? |
|
AFAICT there are 3 questions left to answer: 1 - For a longitudinal dataset, where should I store my demographic/phenotypic (e.g. "age", language maybe not clearly delineated) info?BEP036 proposes I like the 2 - For a longitudinal dataset with phenotypic data (e.g. assessments, again language not super clear), where should I describe the properties (i.e.
|
| participant_id | session_id | acq_time |
|---|---|---|
| sub-1 | ses-1 | 2009-06-15T13:45:30 |
| sub-1 | ses-2 | 2009-06-16T14:45:30 |
| sub-2 | ses-1 |
Good:
- solves the problem
Not so good:
- imho not an intuitive place to go look for
acq_timeof a session - requires that we address Formalize joint index support in .tsv via {"+".join(entities)}s.tsv files #2273 before BEP036
- I assume this is not what @yarikoptic has in mind for this file
b) put it in /sub-<label>/sub-<label>_sessions.tsv
Good:
- status quo ante
Not so good:
- for a subject with only
/phenotypedata, I now must make an emptysub-<label>dir in which I put a singlesub-<label>_sessions.tsvfile, just to list the phenotypic sessions and theiracq_timesetc
As I mentioned above I think we should not do this.
c) put it inside the individual .tsv files in /phenotype dir
For example, I might have a assessment1.tsv like so:
| participant_id | session_id | acq_time | subscale_A_total |
|---|---|---|---|
| sub-1 | ses-1 | 2009-06-15T13:45:30 | 10 |
| sub-1 | ses-2 | 2009-06-15T14:45:30 | 15 |
| sub-2 | ses-1 |
and also a other_assessment.tsv like so:
| participant_id | session_id | acq_time | other_subscale |
|---|---|---|---|
| sub-1 | ses-1 | 2009-06-15T13:45:30 | 37 |
| sub-1 | ses-2 | 2009-06-16T14:45:30 | 25 |
| sub-2 | ses-1 |
Good:
- already consensus for the BEP (i.e. repeated
session_idorrun_idare OK)
Not so good:
- will be very repetitive, i.e. does not respect "Duplication is evil!" mantra
- likely very hard to validate for humans and validator
- easy to get inconsistent info / data entry errors
I don't like any of these options, but of this list I like a) the best. I believe the most useful thing for us to do would be to either choose one of the 3 options I listed above, or to together come up with a better one.
3 - What rules should apply to file names in /phenotype dir
BEP036 proposes nothing, which afaik is the status quo (i.e. anything is allowed). If we want to change that and treat the files in /phenotype directory as regular BIDS file names, I think we need to be clear what this means. afaict there are no pheno-relevant suffixes in BIDS, i.e. there is no "good suffix" I can pick. So iiuc all underscores in the file name are effectively out. I believe it's the same for entities - but I am not sure if this means that dashes are also out. Unless I'm missing something, we already know that anything looking like a "suffix" we encounter in /phenotype will be either invalid or wrong.
I disagree with this, but I don't have a strong opinion. We should though consider the additional friction we impose on groups who want to store their phenotypic/assessments data in a BIDS conform way - because in practice it most likely means a lot of renaming of files.
My main point here is: I think we're a bit stuck/going in circles with the discussion of one specific implementation vs another / the HOW of the BEP. But in the end the implementation is only useful in service of the WHY, which to me are the questions above. And I think it should be easier for us to agree on what an acceptable solution to these questions looks like.
|
Thanks for the work on BEP036 — this hits close to home. The structural mismatch between phenotypic/survey data and the rest of BIDS is exactly the problem that motivated us to build PRISM (https://github.com/MRI-Lab-Graz/prism-studio). When we tried to store multi-session questionnaire data in standard BIDS, the phenotype/ flat-file pattern was the only option — and it broke every assumption the rest of the spec is built on. Specifically what breaks:
What we ended up doing in PRISM:We store survey/questionnaire data inside the standard hierarchy under a survey/ modality folder: sub-001/ses-01/survey/sub-001_ses-01_task-phq9_survey.tsv Session and run fall out of filename entities. Sidecar inheritance works normally. BIDS Apps see the data without modification. The workaround is adding survey/ to .bidsignore — which is precisely the gap a proper BIDS extension should close. BEP036 codifies and improves the existing phenotype/ pattern, which is valuable. But the underlying structural issue remains. We'd encourage the working group to discuss whether the normative pattern for per-session instrument data should follow the sub-/ses-/ hierarchy, with phenotype/ serving as an optional aggregate/derived representation. Happy to share schema definitions or converter code. |
|
The scope of BEP036 is primarily to be a set of guidelines. BEP036 provides validatable best practices for BIDS tabular phenotypic data using an opt-in for additional validation. BEP036 also introduces a new use case for a “joint index” (definition below) in the With this comment, I would like to try to address what I think are Yarik’s main concerns. Those concerns, as I understand them, are about two currently undefined generic BIDS principles he feels should apply to BIDS files, including tabular phenotypic data:
1. Using only a singular index in participants.tsv and sessions.tsvI have long felt that aggregated participant’s session data would be best held and found inside one of three files: (a) A quick definition before I start, for clarity to readers who may be unfamiliar with the term “joint index”. Having a joint index in a tabular file means it has more than one index column. For example “participant_id & session_id” or “participant_id & session_id & run_id” or even “participant_id & run_id”. The point of the indices is to uniquely identify rows which allows for more aggregated data. For instance, if the joint index of a tabular phenotypic file is “participant_id & session_id & run_id” then there will not be more than one unique combination of a participant, their session, and their run across all rows in the aggregated file. So, if (a) and (b) here are not allowed to use a “joint index” then the ideal location for the aggregated data should be (c) (whether it goes by that filename or some other filename) aggregated in the I think BEP036 can skip using a root-level aggregated 2. Constructing filenames to generally follow {entityname}-{labelorindex}_{suffix}.{ext}I understand that BIDS does this in every other place in the first 2 or 3 levels of the directory hierarchy EXCEPT the I feel Some files defy this, like Arterial Spin Labeling (ASL) where the files can be either separate files with different suffixes or they can be concatenated with an Files in the I don’t think it adds any value to an aggregated tabular phenotypic data filename to introduce an On aggregation versus segregation of tabular phenotypic dataThis is an aside that I feel is important and topical in a few of the most recent comments here. For a long time to me, the motivating factor for aggregating While the current BIDS tabular phenotypic data standard for sessions files technically works well, in practice it puts a large technical file-concatenating load on people who want to analyze the multi-session changes in tabular phenotypic data. That is why we BEP036 leads have been advocating for aggregating such data to the root-level since about half-way through this BEP’s development. Summary
Lastly, I will try to respond to other newer comments here besides Yarik’s now. Thank you all for the comments! |
|
I love that it looks like you solved the segregated tabular phenotypic data issue with PRISM Studio! What it looks like you all did was an earlier-on goal for BEP036, but we wanted to try getting out the aggregated tabular phenotypic data BEP first before pursuing the “you need a tool to support curating and using your tabular phenotypic data” route. I would love to talk some time about how much adoption PRISM Studio has had and how well it’s being received by users. As for:
We BEP leads have talked at length and repeatedly about that solution and we felt the aggregated use case was more important, which is why you see what we have here today. However, I would recommend you create a PR to try to formally get PRISM Studio’s filenames and folder structure supported in the BIDS schema. Maybe re-use the “phenotype/” subdirectory as your modality directory instead of “survey” and consider other possible suffixes (I like |
…tadata Introduces a single new optional dataset-level file `participant+sessions.tsv` with a composite index `[participant_id, session_id]`. This provides a single top-level location for metadata that varies across both participants and sessions -- e.g. age at each visit, body weight, clinical scores in longitudinal studies -- complementing the existing `participants.tsv` (participant-constant) and per-subject `*_sessions.tsv` files. Note that it is already possible to provide such metadata in `sub-*/ses-*_sessions.tsv` file. So such approach just serves the way to provide an "aggregate" collection of metadata. As such, we might then need to define how it interacts with the inheritance principle, but defining that yet TODO in general for .tsv files. The `+` in the filename signals a composite index, following the convention proposed in #2273 and alternative to freshly proposed #2402 inspired by a work on BEP036 - #2123 hence attn @bids-standard/bep036 . Most of the changes are just straightforward interpolation of `participants.tsv` and `sessions.tsv` files definitions. One of the notable changes is to `meta/context.yaml` where we added `dataset.sessions` (union of all session directories across subjects) to enable session-level validation checks. I think it is only reasonable given that we did already included dataset level summaries for datatypes and modalities. But it would require bids-validator to support it. Alternative - is to drop it and that extra check we added. Ideally though we should figure out how to validate specific combinations of sub/sessions and TODO was left for that. An example `participant+sessions.tsv` with `body_weight` column for the already `7t_trt` bids-examples dataset is at - bids-standard/bids-examples#556 where, if you also look into original `participants.tsv`, makes it a little obvious that duplication of all entries across all sessions would be dubious. - implements a single first manifestation for #2273 - I think overall we can state that it closes #1020 which theoretically could have been closed with original introduction of _sessions.tsv files. Co-Authored-By: Claude Code 2.1.113 / Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks, this is very helpful, and I appreciate the context from the BEP leads. I’m happy to open a PR proposing formal schema support for the PRISM Studio structure. One point I’d want to make gently in that PR is that I’m not fully convinced by using phenotype/ as the primary representation, because it breaks from the way other BIDS modalities are organized. From my perspective, a survey-style BIDS structure fits the broader BIDS logic better, while aggregated tables can still be generated later from that canonical structure when needed. That said, I’m very open to discussing naming, suffixes, and entities in a pragmatic way so the proposal is useful for the working group. I’d also be very happy to talk about adoption and user feedback sometime. |
Updated the run_id description language to not associate it to a filename, specifically.
|
Just an FYI to the community after a discussion held between myself, @yarikoptic, and @dmoracze: BEP036 (this PR) will be trying to adopt the more explicit P.S. This also resolves @VisLab's Example 1 concern, where |
* fix(bst): Address deprecation warnings (bids-standard#2361) * chore: Set minimum dependencies, use dependency groups * chore: Add tox.ini to run test suite * chore(ci): Use tox to run bidsschematools CI * chore: Update pre-commit excludes * chore(tox): Enable -Werror to catch incoming warnings * fix: Address pyparsing deprecation warning * fix: Only opt in ot pandas 3.0 behaviors if pandas is not 3+ * chore(dependabot): Update uv.locks quarterly * chore(dependabot): Drop update frequencies to quarterly * chore(ci): Use FORCE_COLOR, limit token permissions * chore: Drop Python 3.9 support, test on 3.14 * chore: Add uv.lock for package, update base lock * chore: Bump schema package to 1.2.2 * chore: Bump schema package to 1.2.3-dev * fix(ci): Call pre-build script correctly * [DATALAD RUNCMD] bash -c 'uv lock && (cd tools/schemacode... === Do not change lines below === { "chain": [], "cmd": "bash -c 'uv lock && (cd tools/schemacode; uv lock)'", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^ * fix: Bad merge * chore(deps): bump the build-dependencies group across 1 directory with 5 updates (bids-standard#2370) Bumps the build-dependencies group with 5 updates in the / directory: | Package | From | To | | --- | --- | --- | | [mkdocs-material](https://github.com/squidfunk/mkdocs-material) | `9.7.1` | `9.7.4` | | [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) | `10.20` | `10.21` | | [numpy](https://github.com/numpy/numpy) | `2.4.1` | `2.4.3` | | [myst-nb](https://github.com/executablebooks/myst-nb) | `1.3.0` | `1.4.0` | | [universal-pathlib](https://github.com/fsspec/universal_pathlib) | `0.3.8` | `0.3.10` | Updates `mkdocs-material` from 9.7.1 to 9.7.4 - [Release notes](https://github.com/squidfunk/mkdocs-material/releases) - [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG) - [Commits](squidfunk/mkdocs-material@9.7.1...9.7.4) Updates `pymdown-extensions` from 10.20 to 10.21 - [Release notes](https://github.com/facelessuser/pymdown-extensions/releases) - [Commits](facelessuser/pymdown-extensions@10.20...10.21) Updates `numpy` from 2.4.1 to 2.4.3 - [Release notes](https://github.com/numpy/numpy/releases) - [Changelog](https://github.com/numpy/numpy/blob/main/doc/RELEASE_WALKTHROUGH.rst) - [Commits](numpy/numpy@v2.4.1...v2.4.3) Updates `myst-nb` from 1.3.0 to 1.4.0 - [Release notes](https://github.com/executablebooks/myst-nb/releases) - [Changelog](https://github.com/executablebooks/MyST-NB/blob/main/CHANGELOG.md) - [Commits](executablebooks/MyST-NB@v1.3.0...v1.4.0) Updates `universal-pathlib` from 0.3.8 to 0.3.10 - [Release notes](https://github.com/fsspec/universal_pathlib/releases) - [Changelog](https://github.com/fsspec/universal_pathlib/blob/main/CHANGELOG.md) - [Commits](fsspec/universal_pathlib@v0.3.8...v0.3.10) --- updated-dependencies: - dependency-name: mkdocs-material dependency-version: 9.7.4 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: build-dependencies - dependency-name: pymdown-extensions dependency-version: '10.21' dependency-type: direct:production update-type: version-update:semver-minor dependency-group: build-dependencies - dependency-name: numpy dependency-version: 2.4.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: build-dependencies - dependency-name: myst-nb dependency-version: 1.4.0 dependency-type: direct:development update-type: version-update:semver-minor dependency-group: build-dependencies - dependency-name: universal-pathlib dependency-version: 0.3.10 dependency-type: direct:development update-type: version-update:semver-patch dependency-group: build-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump tornado from 6.5.2 to 6.5.5 (bids-standard#2360) Bumps [tornado](https://github.com/tornadoweb/tornado) from 6.5.2 to 6.5.5. - [Changelog](https://github.com/tornadoweb/tornado/blob/master/docs/releases.rst) - [Commits](tornadoweb/tornado@v6.5.2...v6.5.5) --- updated-dependencies: - dependency-name: tornado dependency-version: 6.5.5 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump minimatch from 9.0.5 to 9.0.9 (bids-standard#2355) Bumps [minimatch](https://github.com/isaacs/minimatch) from 9.0.5 to 9.0.9. - [Changelog](https://github.com/isaacs/minimatch/blob/main/changelog.md) - [Commits](isaacs/minimatch@v9.0.5...v9.0.9) --- updated-dependencies: - dependency-name: minimatch dependency-version: 9.0.9 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump actions/upload-artifact (bids-standard#2369) Bumps the actions-infrastructure group with 1 update: [actions/upload-artifact](https://github.com/actions/upload-artifact). Updates `actions/upload-artifact` from 6 to 7 - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@v6...v7) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major dependency-group: actions-infrastructure ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump prettier in the node-utilities group (bids-standard#2352) Bumps the node-utilities group with 1 update: [prettier](https://github.com/prettier/prettier). Updates `prettier` from 3.8.0 to 3.8.1 - [Release notes](https://github.com/prettier/prettier/releases) - [Changelog](https://github.com/prettier/prettier/blob/main/CHANGELOG.md) - [Commits](prettier/prettier@3.8.0...3.8.1) --- updated-dependencies: - dependency-name: prettier dependency-version: 3.8.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: node-utilities ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump picomatch from 2.3.1 to 2.3.2 Bumps [picomatch](https://github.com/micromatch/picomatch) from 2.3.1 to 2.3.2. - [Release notes](https://github.com/micromatch/picomatch/releases) - [Changelog](https://github.com/micromatch/picomatch/blob/master/CHANGELOG.md) - [Commits](micromatch/picomatch@2.3.1...2.3.2) --- updated-dependencies: - dependency-name: picomatch dependency-version: 2.3.2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): bump requests from 2.32.5 to 2.33.0 in /tools/schemacode Bumps [requests](https://github.com/psf/requests) from 2.32.5 to 2.33.0. - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](psf/requests@v2.32.5...v2.33.0) --- updated-dependencies: - dependency-name: requests dependency-version: 2.33.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * Fix BIDS Starter Kit link in src/index.md (bids-standard#2374) Updated link for the BIDS Starter Kit to the correct URL. * chore(deps): bump yaml from 2.8.1 to 2.8.3 Bumps [yaml](https://github.com/eemeli/yaml) from 2.8.1 to 2.8.3. - [Release notes](https://github.com/eemeli/yaml/releases) - [Commits](eemeli/yaml@v2.8.1...v2.8.3) --- updated-dependencies: - dependency-name: yaml dependency-version: 2.8.3 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): bump brace-expansion from 2.0.2 to 2.0.3 Bumps [brace-expansion](https://github.com/juliangruber/brace-expansion) from 2.0.2 to 2.0.3. - [Release notes](https://github.com/juliangruber/brace-expansion/releases) - [Commits](juliangruber/brace-expansion@v2.0.2...v2.0.3) --- updated-dependencies: - dependency-name: brace-expansion dependency-version: 2.0.3 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): bump pygments from 2.19.2 to 2.20.0 Bumps [pygments](https://github.com/pygments/pygments) from 2.19.2 to 2.20.0. - [Release notes](https://github.com/pygments/pygments/releases) - [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES) - [Commits](pygments/pygments@2.19.2...2.20.0) --- updated-dependencies: - dependency-name: pygments dependency-version: 2.20.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): bump pygments from 2.19.2 to 2.20.0 in /tools/schemacode Bumps [pygments](https://github.com/pygments/pygments) from 2.19.2 to 2.20.0. - [Release notes](https://github.com/pygments/pygments/releases) - [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES) - [Commits](pygments/pygments@2.19.2...2.20.0) --- updated-dependencies: - dependency-name: pygments dependency-version: 2.20.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): bump actions/download-artifact Bumps the actions-infrastructure group with 1 update: [actions/download-artifact](https://github.com/actions/download-artifact). Updates `actions/download-artifact` from 7 to 8 - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](actions/download-artifact@v7...v8) --- updated-dependencies: - dependency-name: actions/download-artifact dependency-version: '8' dependency-type: direct:production update-type: version-update:semver-major dependency-group: actions-infrastructure ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): bump aiohttp from 3.13.3 to 3.13.4 --- updated-dependencies: - dependency-name: aiohttp dependency-version: 3.13.4 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): bump lodash from 4.17.23 to 4.18.1 Bumps [lodash](https://github.com/lodash/lodash) from 4.17.23 to 4.18.1. - [Release notes](https://github.com/lodash/lodash/releases) - [Commits](lodash/lodash@4.17.23...4.18.1) --- updated-dependencies: - dependency-name: lodash dependency-version: 4.18.1 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * chore(tox): Fix environment to avoid pre-release tests (bids-standard#2391) * [pre-commit.ci] pre-commit autoupdate (bids-standard#2381) updates: - [github.com/python-jsonschema/check-jsonschema: 0.37.0 → 0.37.1](python-jsonschema/check-jsonschema@0.37.0...0.37.1) - [github.com/codespell-project/codespell: v2.4.1 → v2.4.2](codespell-project/codespell@v2.4.1...v2.4.2) - [github.com/pre-commit/mirrors-mypy: v1.19.1 → v1.20.0](pre-commit/mirrors-mypy@v1.19.1...v1.20.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ENH] Link to example datasets page on the website (bids-standard#2364) * [FIX] Accept MiscChannelCount in EEG/Motion sidecars; deprecate MISCChannelCount alias (bids-standard#2394) * schema: accept MiscChannelCount in EEG sidecar; deprecate MISCChannelCount alias The Recommended-fields rule for EEG sidecars uses MISCChannelCount, but the validator check rule (MiscChannelCountReq), the spec example in electroencephalography.md, the MEG and iEEG sidecar rules, and the legacy validator's JSON schema all use MiscChannelCount. Datasets in the wild already exist with both spellings. Make MiscChannelCount the canonical recommended key (matching the rest of the schema and the docs example) while keeping MISCChannelCount as a deprecated alias so existing datasets continue to validate. Closes bids-standard#2393. * schema: apply MISCChannelCount deprecation to motion sidecar and glossary Addresses review feedback on bids-standard#2394: - Apply the same MiscChannelCount (recommended) + MISCChannelCount (deprecated) pattern to the motion sidecar rule, matching the EEG sidecar change. - Simplify the EEG sidecar entry to the concise `deprecated` level (no per-rule addendum) now that the deprecation note lives with the field definition. - Document the deprecation in the MISCChannelCount description in objects/metadata.yaml, so the glossary entry surfaces the canonical replacement wherever the field is referenced. * Fix typos (bids-standard#2399) * [FIX] Update OSIPI Task force link for ASL lexicon (bids-standard#2396) * [ENH] Add phenotype and rawbids directories to "study" datasets (bids-standard#2191) * Suggested modifications to directory layout of the bids-study DatasetType * fixed minor typos * removed bids prefix from the DatasetType and subdirs. * changed rawdata to rawbids directory in the study dataset and updated description in the common principles * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed URL in the JSON example * Update src/common-principles.md * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Chris Markiewicz <effigies@gmail.com> Co-authored-by: Chris Markiewicz <markiewicz@stanford.edu> Co-authored-by: Julia-Katharina Pfarr <111446107+julia-pfarr@users.noreply.github.com> * [ENH] Allow rawbids/ in derivative datasets for the raw BIDS source (bids-standard#2409) PR bids-standard#2191 added rawbids/ to the study DatasetType but not to derivative, even though the same convention -- "rawbids/ holds the raw BIDS dataset" -- is equally applicable when a standalone derivative dataset includes its raw source. - Add rawbids as an optional opaque subdirectory of the derivative DatasetType in src/schema/rules/directories.yaml. - Update the derivative example in common-principles.md to use rawbids/ instead of sourcedata/raw/. - Clarify that rawbids/ is reserved for raw BIDS datasets in both study and derivative cases, and that derivatives of derivatives MUST place their source derivative under sourcedata/ (not rawbids/). Co-authored-by: Claude Code 2.1.116 / Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [ENH] Recommend controlled vocabulary for age Units, clarify that it can be overloaded (bids-standard#2400) * Clarify that age Units could be overriden and refer to Units in common principles * Add ISO 8601-based duration units for age and validate in schema Document in common-principles.md that age Units MAY be overridden to one of: year, month, week, day, hour, minute, or second (based on ISO 8601 duration designators). Add AgeUnits schema check rule that validates participants.json age Units against the allowed set. Co-Authored-By: Claude Code 2.1.110 / Claude Opus 4.6 <noreply@anthropic.com> * Keep compatible (warning, not error) + simplify check Co-authored-by: Chris Markiewicz <effigies@gmail.com> * Remove duplicate specification of units in the Description Co-authored-by: Yaroslav Halchenko <debian@onerussian.com> * Make check operate on participants.tsv not .json Co-authored-by: Chris Markiewicz <effigies@gmail.com> --------- Co-authored-by: Claude Code 2.1.110 / Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Chris Markiewicz <effigies@gmail.com> * [ENH] Allow for institutions to be listed as Authors (bids-standard#2397) * Allow for institutions to be listed as Authors It is not uncommon to have datasets which are a truly an institutional effort -- from data collection planing, acquisition, curation, harmonization etc, where it is a pipeline to deliver high quality datasets. For instance we have a number of such contributions from Allen Institute(s) in DANDI archive. In DANDI schema we allow for both Person and Organization entries, and e.g. in https://github.com/dandisets/000020 we have ```yaml contributor: - affiliation: - identifier: https://ror.org/00dcv1019 name: Allen Institute for Brain Science schemaKey: Affiliation email: nathang@alleninstitute.org identifier: 0000-0001-8429-4090 includeInCitation: false name: Gouwens, Nathan roleName: - dcite:ContactPerson schemaKey: Person - contactPoint: [] identifier: https://ror.org/00dcv1019 includeInCitation: true name: Allen Institute for Brain Science roleName: [] schemaKey: Organization url: https://alleninstitute.org ``` so there is an Organization, which is actually the one to cite, although we do not list any particular roleName, and then responsible ContactPerson (who is not even listed as an Author) who could be contacted ATM (but might be a different person later) about this dandiset. I have tried to make wording a bit more explicit than just listing Organizations as a possible entry here, rather to keep it for large efforts. * Shorten and generalize statement about Authors Co-authored-by: Chris Markiewicz <effigies@gmail.com> --------- Co-authored-by: Chris Markiewicz <effigies@gmail.com> Co-authored-by: Mark Mikkelsen <mark.mikkelsen@gmail.com> * [FIX] Bump pymdown-extensions to >=10.21.2 to restore code-block rendering (bids-standard#2438) * [FIX] Bump pymdown-extensions to >=10.21.2 to restore code-block rendering (bids-standard#2421) Pygments 2.20.0 made `html.escape()` on the formatter's `filename` option strict, raising `AttributeError` when the value is `None`. pymdownx.highlight 10.21 (and earlier) passed `filename=title` where `title` defaults to `None` for code blocks without a title, which triggered the crash. pymdownx.superfences catches that exception and silently falls back to inline-`<code>` rendering, so fenced code blocks inside list items (e.g. the "Plain" examples in common-principles.md) lost their `<pre>` formatting and showed the language tag as literal text. pymdown-extensions 10.21.2 fixes the upstream defect, so require it (or newer) in both the docs build and bidsschematools[render]. Closes: bids-standard#2421 Co-Authored-By: Claude Code 2.1.160 / Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [FIX] Regenerate tools/schemacode/uv.lock for pymdown-extensions bump The previous commit updated tools/schemacode/pyproject.toml but forgot to refresh its sibling uv.lock. The `latest` tox envs use the `uv-venv-lock-runner` with `--locked`, so they failed `Setup test suite` with the pyproject/lockfile mismatch. Co-Authored-By: Claude Code 2.1.160 / Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Code 2.1.160 / Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Chris Markiewicz <markiewicz@stanford.edu> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Remi Gau <remi_gau@hotmail.com> Co-authored-by: Bru <b.aristimunha@gmail.com> Co-authored-by: Dimitri Papadopoulos Orfanos <3234522+DimitriPapadopoulos@users.noreply.github.com> Co-authored-by: Kabilar Gunalan <kabi@mit.edu> Co-authored-by: Nikhil Bhagwat <nikhil153@users.noreply.github.com> Co-authored-by: Chris Markiewicz <effigies@gmail.com> Co-authored-by: Julia-Katharina Pfarr <111446107+julia-pfarr@users.noreply.github.com> Co-authored-by: Yaroslav Halchenko <debian@onerussian.com> Co-authored-by: Claude Code 2.1.116 / Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Mark Mikkelsen <mark.mikkelsen@gmail.com>
Completed draft text edits I planned to make for the BEP036 IndexColumns rewrite.
Fixed two broken references to other MD files in the repo. Thanks CI!
Fixed a typo found by codespell. Thanks codespell!
| tabular phenotypic data like the participants file, sessions file, | ||
| and phenotypic and assessment data. | ||
|
|
||
| They are recommendations and are by default ignored during validation. |
There was a problem hiding this comment.
Would it be useful to mention that these recommendations are ignored by default only to maintain backward compatibility for previous datasets?
| ### 3. Add `MeasurementToolMetadata` to each tabular phenotypic measurement tool | ||
|
|
||
| Whenever possible, it is RECOMMENDED to add `MeasurementToolMetadata` to | ||
| each `phenotype/<measurement_tool_name>.json` data dictionary. |
There was a problem hiding this comment.
| each `phenotype/<measurement_tool_name>.json` data dictionary. | |
| each `phenotype/tool-<ToolName>_phenotype.json` data dictionary. |
According to the file-naming template just above (L26), this should now use tool-..._phenotype instead of only <measurement_tool_name>
| To read more about the guidelines for tabular phenotypic data and examples, | ||
| see the [tabular phenotypic data guidelines appendix](../appendices/phenotype.md). |
There was a problem hiding this comment.
| To read more about the guidelines for tabular phenotypic data and examples, | |
| see the [tabular phenotypic data guidelines appendix](../appendices/phenotype.md). |
I don't think we need to repeat it one more time considering the last 3 paragraphs mention the same URL.
| sessions_tsv: | ||
| description: 'Contents of /sessions.tsv, accessed by column header' | ||
| type: object | ||
| additionalProperties: | ||
| type: array | ||
| items: | ||
| type: string |
There was a problem hiding this comment.
I might've missed this or simply forgot about the final decision, but are we moving forward with having a /sessions.tsv at the root level? (the rest of the changes don't appear to reflect that)
|
|
||
| AgeUnits: | ||
| issue: | ||
| code: AGE_UNITS | ||
| message: | | ||
| The "Units" value for age in 'participants.json' is not a valid | ||
| ISO 8601-based duration unit. | ||
| Allowed values are "year", "month", "week", "day", "hour", "minute", | ||
| or "second". | ||
| level: warning | ||
| selectors: | ||
| - path == '/participants.tsv' | ||
| - '"Units" in sidecar.age' | ||
| checks: | ||
| - intersects([sidecar.age.Units], ["year", "month", "week", "day", "hour", "minute", "second"]) |
There was a problem hiding this comment.
| AgeUnits: | |
| issue: | |
| code: AGE_UNITS | |
| message: | | |
| The "Units" value for age in 'participants.json' is not a valid | |
| ISO 8601-based duration unit. | |
| Allowed values are "year", "month", "week", "day", "hour", "minute", | |
| or "second". | |
| level: warning | |
| selectors: | |
| - path == '/participants.tsv' | |
| - '"Units" in sidecar.age' | |
| checks: | |
| - intersects([sidecar.age.Units], ["year", "month", "week", "day", "hour", "minute", "second"]) |
duplicate
The BEP leads can meet as-needed to discuss this BEP PR
Coordinate a meeting by emailing Eric Earl: eric.earl.nih@gmail.com.
Communicate on this PR to provide feedback otherwise.
HTML preview of this BEP
BEP036 brings guidelines for best tabular phenotypic data to the BIDS specification.
phenotype.mdAdditionalValidationkey for thedataset_description.json, for which the usage is described in the modality agnostic files sectionssession_idas the second column in theparticipants.tsvAdditional Links
Co-authored-by: Eric Earl eric.earl@nih.gov @ericearl
Co-authored-by: Samuel Guay samuel.guay@umontreal.ca @SamGuay
Co-authored-by: Sebastian Urchs sebastian.urchs@mcgill.ca @surchs
Co-authored-by: Arshitha Basavaraj arshitha.basavaraj@iiitb.ac.in @Arshitha