Skip to content

[ENH] BEP036 - Phenotypic Data Guidelines#2123

Open
ericearl wants to merge 89 commits into
bids-standard:masterfrom
surchs:bep036-patch-mkdocs-macro
Open

[ENH] BEP036 - Phenotypic Data Guidelines#2123
ericearl wants to merge 89 commits into
bids-standard:masterfrom
surchs:bep036-patch-mkdocs-macro

Conversation

@ericearl

@ericearl ericearl commented May 30, 2025

Copy link
Copy Markdown
Collaborator

The BEP leads can meet as-needed to discuss this BEP PR

Coordinate a meeting by emailing Eric Earl: eric.earl.nih@gmail.com.

Communicate on this PR to provide feedback otherwise.

HTML preview of this BEP

BEP036 brings guidelines for best tabular phenotypic data to the BIDS specification.

  • Includes an appendix called phenotype.md
  • Includes a new AdditionalValidation key for the dataset_description.json, for which the usage is described in the modality agnostic files sections
  • Includes the new option to store session_id as the second column in the participants.tsv

Additional Links

  1. Original Google Doc
  2. Draft BIDS Validator errors and warnings
  3. BIDS Examples PR

Co-authored-by: Eric Earl eric.earl@nih.gov @ericearl
Co-authored-by: Samuel Guay samuel.guay@umontreal.ca @SamGuay
Co-authored-by: Sebastian Urchs sebastian.urchs@mcgill.ca @surchs
Co-authored-by: Arshitha Basavaraj arshitha.basavaraj@iiitb.ac.in @Arshitha

ericearl and others added 4 commits May 20, 2025 08:24
Quick update before merging our PR on surchs fork
BEP036 brings guidelines for best tabular phenotypic data to the BIDS specification.

- Includes an appendix called `phenotype.md`
- Includes admonitions for the guidelines in-line with modality agnostic files sections

---------

Co-authored-by: Eric Earl <eric.earl@nih.gov>
Co-authored-by: Samuel Guay <samuel.guay@umontreal.ca>
Co-authored-by: Sebastian Urchs <sebastian.urchs@mcgill.ca>
Co-authored-by: Arshitha B <arshitha.basavaraj@iiitb.ac.in>
Changed "e.g." to "for example" to follow contributing style guidelines.
Comment thread src/modality-agnostic-files/data-summary-files.md
Comment thread src/modality-agnostic-files/data-summary-files.md Outdated
Comment thread src/appendices/phenotype.md Outdated
each `phenotype/<measurement_tool_name>.json` data dictionary.
This improves reusability and provides clarity about the measurement tool.

### 5. Use the demographics file for common variables about participants

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying from https://github.com/surchs/bids-specification/pull/1/files#r2103117486

For this section, would it make sense to suggest that demo-like information be prioritized in this file rather than participants.tsv, making the latter primarily a list of subject IDs? I haven't seen this explicitly addressed anywhere, though I'm unsure if it's something we want to formalize 😬
Something like this could follow the paragraph?:

When all demographic data is stored in phenotype/demographics.tsv, participants.tsv may serve primarily as a minimal listing of subject identifiers with only the participant_id column.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It'd be good to mention this.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I am new to commenting a PR and also relatively new to BIDS - so my comment might be stupid.

However, I want to draw attention to a potential issue with the handling repeated usage of a phenotype/ measurement. Currently, if a measurement is conducted more than once, a run_id column MUST be added that "corresponds to an existing run- entity used in a filename(s)" (https://bids-specification--2123.org.readthedocs.build/en/2123/modality-agnostic-files/phenotypic-and-assessment-data.html).

However, in my case (and I guess many others), a phenotype measurement tool might be conducted independently of a run in an experimental task. For example, I conduct a mood questionnaire 7 times over the course of one session in which multiple tasks (with multiple runs) are conducted. Thus, I think it is not reasonable that the run_id column in the phenotype file corresponds to an existing run_ in a (task) filename. Instead, something like a measurement_timepoint_id (or similar) that is not related to the run_ in other files might be a good idea. Again, I am sorry if my comment is stupid or I misunderstood something.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along this line: The current proposal states:

A participant identifier of the form sub-, matching a participant entity found in the dataset. Note that data for one participant MAY be represented across multiple rows in case of multiple sessions or runs, and therefore the entry in the participant_id column will be repeated. The combination of participant_id, session_id and run_id MUST be unique.

The problem with this is that it doesn't cover the following cases which are in many datasets that I am familiar with:

Example 1: A dataset in which there are multiple training sessions on some equipment or on some test procedure on multiple different days before any imaging or any data that would have a run occurs.

Example 2: Sleep questionnaires and other questionnaires that administered multiple times un-associated with any imaging sessions.

Both of these require another column identifier (not associated with the run).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VisLab Great to hear from you! Thanks for reading the BEP. Now to your comments...

Example 1: A dataset in which there are multiple training sessions on some equipment or on some test procedure on multiple different days before any imaging or any data that would have a run occurs.

We may not have made it explicit, but here is this example in the HTML preview of the appendix. It covers what to do when there's "1 participant with 2 sessions, where 1 session is only tabular phenotype and the other is only imaging". The same idea extends to more than 2 sessions of some combination of acquired data and tabular phenotypic data. For instance, in your Example 1, we could say:

Define many phenotypic session names

  • ses-day1equipmenttraining1
  • ses-day1equipmenttraining2
  • ses-day1equipmenttraining3
  • ses-day2equipmenttraining1
  • ses-day2equipmenttraining2
  • ses-preproceduretraining
  • ses-procedure
  • ses-postprocedure
  • ...

Store most, if not all, in one aggregated file

  • phenotype/equipment_training.tsv
    • Where participant_id, session_id, run_id, and day are joint indexes.
    • Yes, day is not a a validate-able column, but you could alternatively create a phenotype/day#_equipment_training.tsv if you want stricter validation support.
    • Remember that run_id is not about tying the data to a particular task's run, but instead to differentiate multiple runs of the same tabular phenotypic test, survey, assessment, questionnaire, or lab result within the same participant_id's session_id.

Example 2: Sleep questionnaires and other questionnaires that administered multiple times un-associated with any imaging sessions.

The answer to example 2 is the appendix example I linked above, and now here.

I hope this helps and let me know if you don't think this resolves your comment. Or if perhaps there's something else we should consider which you can propose?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timdressler

I want to draw attention to a potential issue with the handling repeated usage of a phenotype/ measurement. Currently, if a measurement is conducted more than once, a run_id column MUST be added that "corresponds to an existing run- entity used in a filename(s)" (https://bids-specification--2123.org.readthedocs.build/en/2123/modality-agnostic-files/phenotypic-and-assessment-data.html).

Wowza, good catch! I didn't notice that incorrect set of words there. It currently says:

(run_id is) A run identifier that corresponds to an existing run- entity used in a filename(s). A chronological run number is used when a measurement tool or assessment described by a tabular file was repeated within a session.

What it should say:

(run_id is) A run identifier to differentiate multiple runs of the same measurement tool. A chronological run number is used when a measurement tool or assessment described by a tabular file is repeated within a session.

Thanks! I'll edit that.

For example, I conduct a mood questionnaire 7 times over the course of one session in which multiple tasks (with multiple runs) are conducted.

Does the text edit above of "what it should say" make better sense and resolve the run_id labeling issue?

Comment thread src/appendices/phenotype.md Outdated
Comment thread src/modality-agnostic-files/data-summary-files.md Outdated
Comment thread src/modality-agnostic-files/data-summary-files.md Outdated
Put the phenotypic and assessment data content where it belongs.
Comment thread src/modality-agnostic-files/data-summary-files.md Outdated
Comment thread src/modality-agnostic-files/data-summary-files.md Outdated
Comment thread src/modality-agnostic-files/data-summary-files.md Outdated
Comment thread src/modality-agnostic-files/phenotypic-and-assessment-data.md Outdated
Comment thread src/appendices/phenotype.md Outdated
Comment thread src/appendices/phenotype.md Outdated
Comment thread src/appendices/phenotype.md Outdated
Comment thread src/appendices/phenotype.md Outdated
Comment thread src/appendices/phenotype.md Outdated
Fix PhenotypeSubjectsMissing selector (suffix -> extension) and
check direction (phenotype participants subset of participants.tsv).

Add PhenotypeSessionFileRequired, PhenotypeSessionAggregation,
PhenotypeSessionsMissing, and PhenotypeSessionLevels check rules
for phenotype validation under AdditionalValidation.
Comment thread src/appendices/phenotype.md Outdated
even if this file only contains a two-column inventory
of `participant_id` and `session_id`.
The `sessions.tsv` file MUST list all sessions for all subjects across
imaging and tabular phenotypic data. The `sessions.json` file’s

@yarikoptic yarikoptic Mar 27, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we in public review already? here is one:

if you are talking about having top level sessions.tsv to now have both participant_id and session_id index keys, I will keep coming to beg on my knees "PLEASE DO NOT" (edited) until we define it as a generic principle one way or another (e.g. #2273) -- there should be a discussion on that particular aspect alone not subjects.tsv ad-hoc, and not in BEP mixing in various changes. ATM sessions.tsv must have only session_id index, as any other {entity_plural}.tsv. Please do not break the few consistencies we have.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not in public review yet, but I am happy to receive feedback at all stages.

I see a need in BIDS for allowing flexibility in tabular data aggregating/moving "back" in the directory hierarchy (from subdirectory "leaves" to the top-level dataset "root"). When people interact with this kind of tabular data, they don't want 10's, 100's, or 1,000's of small sub-<label>/sub-<label>_sessions.tsv segregated files. They want one big aggregated file.

As a consequence of that: the closer to the dataset root you go, the more columns you will need to describe the aggregated data. For instance, if you have 1,000's of small sub-<label>/sub-<label>_sessions.tsv segregated files and you want one big aggregated file called sessions.tsv instead, aggregating up requires taking out the sub-<label>. So you end up with the need for that participant_id column in the sessions.tsv file.

I used to think this was relevant for ALL tabular files to be able to move back in the directory hierarchy, but I later realized that things like physio or other large files would be a real mess in a much larger aggregated file. I could maybe still make the case for scans.tsv or samples.tsv though, but that's a very different story...

I would almost like to say, "'Large' tabular files should stay as-is and 'small' tabular files should allow aggregation."

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry -- I am not following the argument here as nothing in what you said requires breaking an established .tsv file(s) convention as it can be easily adopted in a new .tsv file as e.g. I suggest in #2273.

Comment thread src/appendices/phenotype.md Outdated
"sessions.tsv": "",
"phenotype": {
"measurement_tool.json": "",
"measurement_tool.tsv": "",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yikes -- that will be the literal filename like measurement_tool.tsv in there??? like the suffix _tool and some measurement thing?

@ericearl ericearl Mar 30, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is what the phenotype/ directory already permits and the validator validates pre-BEP036. It is also how the 40 (out of 1,548) datasets with a phenotype/ directory on OpenNeuro, as of 3/30/2026, curated their tabular phenotypic data. The filenames in the phenotype/ directory can be named alphanumerically (including hyphens and plus signs, I believe), but I think the practical way to make sure people continue sharing their tabular phenotypic data is to leave the file-naming of phenotype files as-is.

Names so far are usually acronyms or short descriptions or both. Some examples:

  • phenotype/demographics.tsv
  • phenotype/whodas.tsv
  • phenotype/nih_toolbox.tsv
  • phenotype/ARI_parent.tsv
  • phenotype/wisc-iii.tsv

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I think the practical way to make sure people continue sharing their tabular phenotypic data is to leave the file-naming of phenotype files as-is.

by no means I want to restrict people from sharing their data any way they want in whatever file-names they have. It is only when we talk about standardizing file names like BIDS does, that's where I have an "opinion" or otherwise we just would have ended-up with something like data/my_favorite1.nii.gz and data/notsogoodone2.nii.gz

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @yarikoptic that this breaks BIDS search principles for use of underscores.

Comment thread src/appendices/phenotype.md Outdated

```tsv
participant_id session_id acq_time
sub-01 ses-MRI1 2001-01-01T11:12:00

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please please please no!!! what common principle such file construction follows? it introduces inconsistencies! If you want to introduce some new principle, let's formalize it. PLEASE PLEASE PLEASE lets discuss and do that, e.g. in #2273 or as I invited you there -- please formulate an alternative but generic approach!

You can keep resolving my comments but I am sorry -- I will be coming back, so we better do talk and resolve this "conflict"!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do keep coming back. We want the voices of those that agree AND those that have differing opinions to see whether we're satisfying 80% of the most commonly used data cases or not (see BIDS mission statement foundational principle 2). I will try to make an "alternative, but generic approach" and tag you on it referencing #2273, etc. Hopefully we can come to some agreement soon.

@yarikoptic yarikoptic left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a number of major aspects in construction of file names and their content which come up without introducing generic principles and ruining that some consistency among them what we have so far in BIDS:

  • singular index in {entityplural}.tsv (see #2273 for proposal how to generalize to multiple)
  • filename construction to generally follow {entityname}-{labelorindex}_{suffix}.{ext} + aforementioned {entityplural}.tsv to describe those entity values (see #2283 for proposal on how to generalize having such files for all across levels).

I have voiced my concerns over proposed here IMHO breaking changes numerous times, but as far as I see it, they were just brushed off without due search for compromise.

@dorahermes

Copy link
Copy Markdown
Member

I agree with @yarikoptic on the file naming, which is an easy fix.

A more difficult problem is the summarization across sessions and understanding the order in which data were acquired. This needs a general solution because I imagine similar problems for annotating medication that was used during different sessions/runs/. For example, in our case we have patients who are using one type of medication during a MRI, and this changes across iEEG sessions and runs during or even within a run. It would be good to be able to filter which runs were collected on certain medication.

Also, in my initial impression, a phenotype is a static feature that does not change across sessions, can phenotype change across sessions?

@surchs

surchs commented Apr 10, 2026

Copy link
Copy Markdown
Contributor

AFAICT there are 3 questions left to answer:

1 - For a longitudinal dataset, where should I store my demographic/phenotypic (e.g. "age", language maybe not clearly delineated) info?

BEP036 proposes participants.tsv (in line with recommendations in the spec), and changes the schema to allow a double-index of participant_id and session_id. If we don't want to make this schema change / don't want to allow the double-index that's OK, then our alternative is to put it in /phenotype/demographics.tsv.

I like the /phenotype/demographics.tsv option. I also think we should then recommend this demographics.tsv (or best name) as the default target for such information about the participant.

2 - For a longitudinal dataset with phenotypic data (e.g. assessments, again language not super clear), where should I describe the properties (i.e. acq_time) of the phenotypic sessions.

BEP036 proposes a root level sessions.tsv, and changes the schema to allow a double-index of participant_id and session_id. If we don't want to allow this then that's also OK. However we should then find an alternative. AFAICT our options are:

a) put it in @yarikoptic proposed participant+session.tsv file as additional columns

e.g. participant+session.tsv could look like this:

participant_id session_id acq_time
sub-1 ses-1 2009-06-15T13:45:30
sub-1 ses-2 2009-06-16T14:45:30
sub-2 ses-1

Good:

  • solves the problem

Not so good:

b) put it in /sub-<label>/sub-<label>_sessions.tsv

Good:

  • status quo ante

Not so good:

  • for a subject with only /phenotype data, I now must make an empty sub-<label> dir in which I put a single sub-<label>_sessions.tsv file, just to list the phenotypic sessions and their acq_times etc

As I mentioned above I think we should not do this.

c) put it inside the individual .tsv files in /phenotype dir

For example, I might have a assessment1.tsv like so:

participant_id session_id acq_time subscale_A_total
sub-1 ses-1 2009-06-15T13:45:30 10
sub-1 ses-2 2009-06-15T14:45:30 15
sub-2 ses-1

and also a other_assessment.tsv like so:

participant_id session_id acq_time other_subscale
sub-1 ses-1 2009-06-15T13:45:30 37
sub-1 ses-2 2009-06-16T14:45:30 25
sub-2 ses-1

Good:

  • already consensus for the BEP (i.e. repeated session_id or run_id are OK)

Not so good:

  • will be very repetitive, i.e. does not respect "Duplication is evil!" mantra
  • likely very hard to validate for humans and validator
  • easy to get inconsistent info / data entry errors

I don't like any of these options, but of this list I like a) the best. I believe the most useful thing for us to do would be to either choose one of the 3 options I listed above, or to together come up with a better one.

3 - What rules should apply to file names in /phenotype dir

BEP036 proposes nothing, which afaik is the status quo (i.e. anything is allowed). If we want to change that and treat the files in /phenotype directory as regular BIDS file names, I think we need to be clear what this means. afaict there are no pheno-relevant suffixes in BIDS, i.e. there is no "good suffix" I can pick. So iiuc all underscores in the file name are effectively out. I believe it's the same for entities - but I am not sure if this means that dashes are also out. Unless I'm missing something, we already know that anything looking like a "suffix" we encounter in /phenotype will be either invalid or wrong.

I disagree with this, but I don't have a strong opinion. We should though consider the additional friction we impose on groups who want to store their phenotypic/assessments data in a BIDS conform way - because in practice it most likely means a lot of renaming of files.

My main point here is: I think we're a bit stuck/going in circles with the discussion of one specific implementation vs another / the HOW of the BEP. But in the end the implementation is only useful in service of the WHY, which to me are the questions above. And I think it should be easier for us to agree on what an acceptable solution to these questions looks like.

@karl-koschutnig

Copy link
Copy Markdown

Thanks for the work on BEP036 — this hits close to home.

The structural mismatch between phenotypic/survey data and the rest of BIDS is exactly the problem that motivated us to build PRISM (https://github.com/MRI-Lab-Graz/prism-studio). When we tried to store multi-session questionnaire data in standard BIDS, the phenotype/ flat-file pattern was the only option — and it broke every assumption the rest of the spec is built on.

Specifically what breaks:

  • phenotype/tool.tsv lives outside sub-/ses-/, so JSON sidecar inheritance doesn't apply. Shared instrument metadata has no canonical location.
  • session_id and run_id as TSV columns re-implement in data space what BIDS filename entities already express for every other modality.
  • BIDS Apps traversing sub-/ses-/ are structurally blind to phenotypic data.

What we ended up doing in PRISM:

We store survey/questionnaire data inside the standard hierarchy under a survey/ modality folder:

sub-001/ses-01/survey/sub-001_ses-01_task-phq9_survey.tsv
sub-001/ses-01/survey/sub-001_ses-01_task-phq9_survey.json

Session and run fall out of filename entities. Sidecar inheritance works normally. BIDS Apps see the data without modification. The workaround is adding survey/ to .bidsignore — which is precisely the gap a proper BIDS extension should close.

BEP036 codifies and improves the existing phenotype/ pattern, which is valuable. But the underlying structural issue remains. We'd encourage the working group to discuss whether the normative pattern for per-session instrument data should follow the sub-/ses-/ hierarchy, with phenotype/ serving as an optional aggregate/derived representation.

Happy to share schema definitions or converter code.

@ericearl

Copy link
Copy Markdown
Collaborator Author

The scope of BEP036 is primarily to be a set of guidelines. BEP036 provides validatable best practices for BIDS tabular phenotypic data using an opt-in for additional validation. BEP036 also introduces a new use case for a “joint index” (definition below) in the phenotype/ directory to support aggregated tabular phenotypic data.

With this comment, I would like to try to address what I think are Yarik’s main concerns. Those concerns, as I understand them, are about two currently undefined generic BIDS principles he feels should apply to BIDS files, including tabular phenotypic data:

  1. Using only a singular index in {entityplural}.tsv. For example: participants.tsv or sessions.tsv.
  2. Constructing filenames to generally follow {entityname}-{labelorindex}_{suffix}.{ext}. For example: phenotype/table-demographics_pheno.tsv or phenotype/survey-ARIParent_pheno.tsv.

1. Using only a singular index in participants.tsv and sessions.tsv

I have long felt that aggregated participant’s session data would be best held and found inside one of three files: (a) participants.tsv, (b) sessions.tsv, or (c) phenotype/demographics.tsv.

A quick definition before I start, for clarity to readers who may be unfamiliar with the term “joint index”. Having a joint index in a tabular file means it has more than one index column. For example “participant_id & session_id” or “participant_id & session_id & run_id” or even “participant_id & run_id”. The point of the indices is to uniquely identify rows which allows for more aggregated data. For instance, if the joint index of a tabular phenotypic file is “participant_id & session_id & run_id” then there will not be more than one unique combination of a participant, their session, and their run across all rows in the aggregated file.

So, if (a) and (b) here are not allowed to use a “joint index” then the ideal location for the aggregated data should be (c) (whether it goes by that filename or some other filename) aggregated in the phenotype/ directory.

I think BEP036 can skip using a root-level aggregated participants.tsv or sessions.tsv as the central inventory of multi-session changes in participant measurements. As long as BEP036 can instead use the phenotype/ directory for aggregated files, like phenotype/demographics.tsv.

2. Constructing filenames to generally follow {entityname}-{labelorindex}_{suffix}.{ext}

I understand that BIDS does this in every other place in the first 2 or 3 levels of the directory hierarchy EXCEPT the phenotype/ directory. The phenotype/ directory has, by convention, allowed tabular phenotypic data under a “name it anything” convention since way back in BIDS version 1.0.1.

I feel {entityname} and {suffix} in the BIDS context are for representing files that differ between the otherwise same filename in segregated data. For instance, by convention the “task” entity, “run” entity, or even the “ce” entity are included in the specification to know what differs between similar filenames at a glance, and to keep them separate.

Some files defy this, like Arterial Spin Labeling (ASL) where the files can be either separate files with different suffixes or they can be concatenated with an aslcontext.tsv file provided, so the end-user knows which 3D volume in the 4D volume is which ASL file.

Files in the phenotype/ directory are always “prefixed” by existing in that directory. Their extension is also always .tsv for the tabular data or .json for the corresponding data dictionary.

I don’t think it adds any value to an aggregated tabular phenotypic data filename to introduce an {entityname} (like table or survey) or a {suffix} (like pheno) in BIDS version 1. I think it would be fine to reconsider this for BIDS 2.0. Perhaps supporting both segregated files like PRISM Studio does it (thanks @karl-koschutnig) and aggregated files like BEP036 and Psych-DS do it. I do think if we are going to define entity names and suffixes, then we would need to define useful filename-separating entity names for tabular phenotypic data.

On aggregation versus segregation of tabular phenotypic data

This is an aside that I feel is important and topical in a few of the most recent comments here.

For a long time to me, the motivating factor for aggregating sub-<label>/sub-<label>_sessions.tsv files up to an aggregated root-level sessions.tsv file was the idea of having an easier time loading in a lot of sessions data (one load command instead of 10s or 100s of load commands and one concatenating command). Also to have a better place to put all the session_id name definitions (into the sessions.json data dictionary). And going up the hierarchy to the root, you could remove the sub-<label>_ off every filename and instead add a participant_id column inside, which I think is very clean.

While the current BIDS tabular phenotypic data standard for sessions files technically works well, in practice it puts a large technical file-concatenating load on people who want to analyze the multi-session changes in tabular phenotypic data. That is why we BEP036 leads have been advocating for aggregating such data to the root-level since about half-way through this BEP’s development.

Summary

  1. If we can agree on the need for a phenotype/demographics.tsv (by this or any other name) for at least an inventory of participants & sessions, then I agree we don’t need to allow a joint index in participants.tsv or sessions.tsv at the BIDS dataset root-level, and I can make those edits to the schema and specification.
  2. If we can agree that {entityname} and {suffix} don’t add any value to aggregated tabular phenotypic data filenames in the phenotype/ directory, then I suggest we leave the open filenaming scheme as-is for the phenotype/ directory, and I can add language to the specification to make clearer why files in the phenotype/ directory are freely named.
  3. If we cannot agree that {entityname} and {suffix} don’t add any value to aggregated tabular phenotypic data filenames in the phenotype/ directory, then please provide concrete suggestions for aggregated tabular phenotypic data {entityname} and {suffix}.

Lastly, I will try to respond to other newer comments here besides Yarik’s now. Thank you all for the comments!

Tags: @yarikoptic @dorahermes @surchs @SamGuay @Arshitha

@ericearl

Copy link
Copy Markdown
Collaborator Author

@karl-koschutnig

I love that it looks like you solved the segregated tabular phenotypic data issue with PRISM Studio! What it looks like you all did was an earlier-on goal for BEP036, but we wanted to try getting out the aggregated tabular phenotypic data BEP first before pursuing the “you need a tool to support curating and using your tabular phenotypic data” route. I would love to talk some time about how much adoption PRISM Studio has had and how well it’s being received by users.

As for:

We'd encourage the working group to discuss whether the normative pattern for per-session instrument data should follow the sub-/ses-/ hierarchy, with phenotype/ serving as an optional aggregate/derived representation.

We BEP leads have talked at length and repeatedly about that solution and we felt the aggregated use case was more important, which is why you see what we have here today.

However, I would recommend you create a PR to try to formally get PRISM Studio’s filenames and folder structure supported in the BIDS schema. Maybe re-use the “phenotype/” subdirectory as your modality directory instead of “survey” and consider other possible suffixes (I like pheno as a suffix idea) or entity names (here’s a list of entity names I’ve considered: table, survey, assess, quiz, form, meas, inst, or lab), but otherwise your project looks amazing. Nice work and thanks for continuing the conversation.

yarikoptic added a commit that referenced this pull request Apr 18, 2026
…tadata

Introduces a single new optional dataset-level file `participant+sessions.tsv`
with a composite index `[participant_id, session_id]`.  This provides a single
top-level location for metadata that varies across both participants and
sessions -- e.g. age at each visit, body weight, clinical scores in
longitudinal studies -- complementing the existing `participants.tsv`
(participant-constant) and per-subject `*_sessions.tsv` files.

Note that it is already possible to provide such metadata in
`sub-*/ses-*_sessions.tsv` file. So such approach just serves the way to
provide an "aggregate" collection of metadata.  As such, we might then need to
define how it interacts with the inheritance principle, but defining that yet
TODO in general for .tsv files.

The `+` in the filename signals a composite index, following the
convention proposed in #2273 and alternative to freshly proposed #2402 inspired by a work on BEP036

- #2123

hence attn @bids-standard/bep036 .

Most of the changes are just straightforward interpolation of
`participants.tsv` and `sessions.tsv` files definitions.

One of the notable changes is to `meta/context.yaml` where we added
`dataset.sessions` (union of all session directories across subjects) to enable
session-level validation checks.  I think it is only reasonable given that we
did already included dataset level summaries for datatypes and modalities. But
it would require bids-validator to support it.  Alternative - is to drop it and
that extra check we added.

Ideally though we should figure out how to validate specific combinations of
sub/sessions and TODO was left for that.

An example `participant+sessions.tsv` with `body_weight` column for the
already `7t_trt` bids-examples dataset is at

- bids-standard/bids-examples#556

where, if you also look into original `participants.tsv`, makes it a
little obvious that duplication of all entries across all sessions would be
dubious.

- implements a single first manifestation for #2273
- I think overall we can state that it closes #1020 which theoretically could
  have been closed with original introduction of _sessions.tsv files.

Co-Authored-By: Claude Code 2.1.113 / Claude Opus 4.6 <noreply@anthropic.com>
@karl-koschutnig

Copy link
Copy Markdown

@ericearl

Thanks, this is very helpful, and I appreciate the context from the BEP leads.

I’m happy to open a PR proposing formal schema support for the PRISM Studio structure. One point I’d want to make gently in that PR is that I’m not fully convinced by using phenotype/ as the primary representation, because it breaks from the way other BIDS modalities are organized. From my perspective, a survey-style BIDS structure fits the broader BIDS logic better, while aggregated tables can still be generated later from that canonical structure when needed.

That said, I’m very open to discussing naming, suffixes, and entities in a pragmatic way so the proposal is useful for the working group. I’d also be very happy to talk about adoption and user feedback sometime.

@karl-koschutnig

Copy link
Copy Markdown

as @ericearl suggested, I just opened a PR for the prism-idea of survey data structure.

#2404

I would be happy to discuss our approach in detail

Updated the run_id description language to not associate it to a filename, specifically.
@ericearl

ericearl commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

Just an FYI to the community after a discussion held between myself, @yarikoptic, and @dmoracze: BEP036 (this PR) will be trying to adopt the more explicit "IndexColumns" solution (issue #2402), instead of just expecting or allowing implicitly for BIDS data curators to put more than one index column into "participants" or "sessions" or "phenotypic and assessment data" files. In fact, BEP036 will be dropping the plan to allow multiple index columns inside the participants and sessions files at all. Support for multiple index columns will at first be restricted to only being in the phenotype/ directory. We also discussed how this idea could be re-used across the BIDS standard in other places later. The full discussion's meeting notes are publicly available here.

P.S. This also resolves @VisLab's Example 1 concern, where participant_id, session_id, run_id, and day all need to be joint indexes for an aggregated tabular file. I think this solution is more optimal all-around for joint indexing in tabular files.

* fix(bst): Address deprecation warnings (bids-standard#2361)

* chore: Set minimum dependencies, use dependency groups

* chore: Add tox.ini to run test suite

* chore(ci): Use tox to run bidsschematools CI

* chore: Update pre-commit excludes

* chore(tox): Enable -Werror to catch incoming warnings

* fix: Address pyparsing deprecation warning

* fix: Only opt in ot pandas 3.0 behaviors if pandas is not 3+

* chore(dependabot): Update uv.locks quarterly

* chore(dependabot): Drop update frequencies to quarterly

* chore(ci): Use FORCE_COLOR, limit token permissions

* chore: Drop Python 3.9 support, test on 3.14

* chore: Add uv.lock for package, update base lock

* chore: Bump schema package to 1.2.2

* chore: Bump schema package to 1.2.3-dev

* fix(ci): Call pre-build script correctly

* [DATALAD RUNCMD] bash -c 'uv lock && (cd tools/schemacode...

=== Do not change lines below ===
{
 "chain": [],
 "cmd": "bash -c 'uv lock && (cd tools/schemacode; uv lock)'",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^

* fix: Bad merge

* chore(deps): bump the build-dependencies group across 1 directory with 5 updates (bids-standard#2370)

Bumps the build-dependencies group with 5 updates in the / directory:

| Package | From | To |
| --- | --- | --- |
| [mkdocs-material](https://github.com/squidfunk/mkdocs-material) | `9.7.1` | `9.7.4` |
| [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) | `10.20` | `10.21` |
| [numpy](https://github.com/numpy/numpy) | `2.4.1` | `2.4.3` |
| [myst-nb](https://github.com/executablebooks/myst-nb) | `1.3.0` | `1.4.0` |
| [universal-pathlib](https://github.com/fsspec/universal_pathlib) | `0.3.8` | `0.3.10` |



Updates `mkdocs-material` from 9.7.1 to 9.7.4
- [Release notes](https://github.com/squidfunk/mkdocs-material/releases)
- [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG)
- [Commits](squidfunk/mkdocs-material@9.7.1...9.7.4)

Updates `pymdown-extensions` from 10.20 to 10.21
- [Release notes](https://github.com/facelessuser/pymdown-extensions/releases)
- [Commits](facelessuser/pymdown-extensions@10.20...10.21)

Updates `numpy` from 2.4.1 to 2.4.3
- [Release notes](https://github.com/numpy/numpy/releases)
- [Changelog](https://github.com/numpy/numpy/blob/main/doc/RELEASE_WALKTHROUGH.rst)
- [Commits](numpy/numpy@v2.4.1...v2.4.3)

Updates `myst-nb` from 1.3.0 to 1.4.0
- [Release notes](https://github.com/executablebooks/myst-nb/releases)
- [Changelog](https://github.com/executablebooks/MyST-NB/blob/main/CHANGELOG.md)
- [Commits](executablebooks/MyST-NB@v1.3.0...v1.4.0)

Updates `universal-pathlib` from 0.3.8 to 0.3.10
- [Release notes](https://github.com/fsspec/universal_pathlib/releases)
- [Changelog](https://github.com/fsspec/universal_pathlib/blob/main/CHANGELOG.md)
- [Commits](fsspec/universal_pathlib@v0.3.8...v0.3.10)

---
updated-dependencies:
- dependency-name: mkdocs-material
  dependency-version: 9.7.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: build-dependencies
- dependency-name: pymdown-extensions
  dependency-version: '10.21'
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: build-dependencies
- dependency-name: numpy
  dependency-version: 2.4.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: build-dependencies
- dependency-name: myst-nb
  dependency-version: 1.4.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: build-dependencies
- dependency-name: universal-pathlib
  dependency-version: 0.3.10
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: build-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump tornado from 6.5.2 to 6.5.5 (bids-standard#2360)

Bumps [tornado](https://github.com/tornadoweb/tornado) from 6.5.2 to 6.5.5.
- [Changelog](https://github.com/tornadoweb/tornado/blob/master/docs/releases.rst)
- [Commits](tornadoweb/tornado@v6.5.2...v6.5.5)

---
updated-dependencies:
- dependency-name: tornado
  dependency-version: 6.5.5
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump minimatch from 9.0.5 to 9.0.9 (bids-standard#2355)

Bumps [minimatch](https://github.com/isaacs/minimatch) from 9.0.5 to 9.0.9.
- [Changelog](https://github.com/isaacs/minimatch/blob/main/changelog.md)
- [Commits](isaacs/minimatch@v9.0.5...v9.0.9)

---
updated-dependencies:
- dependency-name: minimatch
  dependency-version: 9.0.9
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump actions/upload-artifact (bids-standard#2369)

Bumps the actions-infrastructure group with 1 update: [actions/upload-artifact](https://github.com/actions/upload-artifact).


Updates `actions/upload-artifact` from 6 to 7
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](actions/upload-artifact@v6...v7)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions-infrastructure
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump prettier in the node-utilities group (bids-standard#2352)

Bumps the node-utilities group with 1 update: [prettier](https://github.com/prettier/prettier).


Updates `prettier` from 3.8.0 to 3.8.1
- [Release notes](https://github.com/prettier/prettier/releases)
- [Changelog](https://github.com/prettier/prettier/blob/main/CHANGELOG.md)
- [Commits](prettier/prettier@3.8.0...3.8.1)

---
updated-dependencies:
- dependency-name: prettier
  dependency-version: 3.8.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: node-utilities
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump picomatch from 2.3.1 to 2.3.2

Bumps [picomatch](https://github.com/micromatch/picomatch) from 2.3.1 to 2.3.2.
- [Release notes](https://github.com/micromatch/picomatch/releases)
- [Changelog](https://github.com/micromatch/picomatch/blob/master/CHANGELOG.md)
- [Commits](micromatch/picomatch@2.3.1...2.3.2)

---
updated-dependencies:
- dependency-name: picomatch
  dependency-version: 2.3.2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump requests from 2.32.5 to 2.33.0 in /tools/schemacode

Bumps [requests](https://github.com/psf/requests) from 2.32.5 to 2.33.0.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](psf/requests@v2.32.5...v2.33.0)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.33.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* Fix BIDS Starter Kit link in src/index.md (bids-standard#2374)

Updated link for the BIDS Starter Kit to the correct URL.

* chore(deps): bump yaml from 2.8.1 to 2.8.3

Bumps [yaml](https://github.com/eemeli/yaml) from 2.8.1 to 2.8.3.
- [Release notes](https://github.com/eemeli/yaml/releases)
- [Commits](eemeli/yaml@v2.8.1...v2.8.3)

---
updated-dependencies:
- dependency-name: yaml
  dependency-version: 2.8.3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump brace-expansion from 2.0.2 to 2.0.3

Bumps [brace-expansion](https://github.com/juliangruber/brace-expansion) from 2.0.2 to 2.0.3.
- [Release notes](https://github.com/juliangruber/brace-expansion/releases)
- [Commits](juliangruber/brace-expansion@v2.0.2...v2.0.3)

---
updated-dependencies:
- dependency-name: brace-expansion
  dependency-version: 2.0.3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump pygments from 2.19.2 to 2.20.0

Bumps [pygments](https://github.com/pygments/pygments) from 2.19.2 to 2.20.0.
- [Release notes](https://github.com/pygments/pygments/releases)
- [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES)
- [Commits](pygments/pygments@2.19.2...2.20.0)

---
updated-dependencies:
- dependency-name: pygments
  dependency-version: 2.20.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump pygments from 2.19.2 to 2.20.0 in /tools/schemacode

Bumps [pygments](https://github.com/pygments/pygments) from 2.19.2 to 2.20.0.
- [Release notes](https://github.com/pygments/pygments/releases)
- [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES)
- [Commits](pygments/pygments@2.19.2...2.20.0)

---
updated-dependencies:
- dependency-name: pygments
  dependency-version: 2.20.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump actions/download-artifact

Bumps the actions-infrastructure group with 1 update: [actions/download-artifact](https://github.com/actions/download-artifact).


Updates `actions/download-artifact` from 7 to 8
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](actions/download-artifact@v7...v8)

---
updated-dependencies:
- dependency-name: actions/download-artifact
  dependency-version: '8'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions-infrastructure
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump aiohttp from 3.13.3 to 3.13.4

---
updated-dependencies:
- dependency-name: aiohttp
  dependency-version: 3.13.4
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): bump lodash from 4.17.23 to 4.18.1

Bumps [lodash](https://github.com/lodash/lodash) from 4.17.23 to 4.18.1.
- [Release notes](https://github.com/lodash/lodash/releases)
- [Commits](lodash/lodash@4.17.23...4.18.1)

---
updated-dependencies:
- dependency-name: lodash
  dependency-version: 4.18.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(tox): Fix environment to avoid pre-release tests (bids-standard#2391)

* [pre-commit.ci] pre-commit autoupdate (bids-standard#2381)

updates:
- [github.com/python-jsonschema/check-jsonschema: 0.37.0 → 0.37.1](python-jsonschema/check-jsonschema@0.37.0...0.37.1)
- [github.com/codespell-project/codespell: v2.4.1 → v2.4.2](codespell-project/codespell@v2.4.1...v2.4.2)
- [github.com/pre-commit/mirrors-mypy: v1.19.1 → v1.20.0](pre-commit/mirrors-mypy@v1.19.1...v1.20.0)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [ENH] Link to example datasets page on the website (bids-standard#2364)

* [FIX] Accept MiscChannelCount in EEG/Motion sidecars; deprecate MISCChannelCount alias (bids-standard#2394)

* schema: accept MiscChannelCount in EEG sidecar; deprecate MISCChannelCount alias

The Recommended-fields rule for EEG sidecars uses MISCChannelCount, but
the validator check rule (MiscChannelCountReq), the spec example in
electroencephalography.md, the MEG and iEEG sidecar rules, and the
legacy validator's JSON schema all use MiscChannelCount. Datasets in the
wild already exist with both spellings.

Make MiscChannelCount the canonical recommended key (matching the rest
of the schema and the docs example) while keeping MISCChannelCount as a
deprecated alias so existing datasets continue to validate.

Closes bids-standard#2393.

* schema: apply MISCChannelCount deprecation to motion sidecar and glossary

Addresses review feedback on bids-standard#2394:

- Apply the same MiscChannelCount (recommended) + MISCChannelCount
  (deprecated) pattern to the motion sidecar rule, matching the EEG
  sidecar change.
- Simplify the EEG sidecar entry to the concise `deprecated` level
  (no per-rule addendum) now that the deprecation note lives with the
  field definition.
- Document the deprecation in the MISCChannelCount description in
  objects/metadata.yaml, so the glossary entry surfaces the canonical
  replacement wherever the field is referenced.

* Fix typos (bids-standard#2399)

* [FIX] Update OSIPI Task force link for ASL lexicon (bids-standard#2396)

* [ENH] Add phenotype and rawbids directories to "study" datasets (bids-standard#2191)

* Suggested modifications to directory layout of the bids-study DatasetType

* fixed minor typos

* removed bids prefix from the DatasetType and subdirs.

* changed rawdata to rawbids directory in the study dataset and updated description in the common principles

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed URL in the JSON example

* Update src/common-principles.md

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Chris Markiewicz <effigies@gmail.com>
Co-authored-by: Chris Markiewicz <markiewicz@stanford.edu>
Co-authored-by: Julia-Katharina Pfarr <111446107+julia-pfarr@users.noreply.github.com>

* [ENH] Allow rawbids/ in derivative datasets for the raw BIDS source (bids-standard#2409)

PR bids-standard#2191 added rawbids/ to the study DatasetType but not to derivative,
even though the same convention -- "rawbids/ holds the raw BIDS dataset"
-- is equally applicable when a standalone derivative dataset includes
its raw source.

- Add rawbids as an optional opaque subdirectory of the derivative
  DatasetType in src/schema/rules/directories.yaml.
- Update the derivative example in common-principles.md to use rawbids/
  instead of sourcedata/raw/.
- Clarify that rawbids/ is reserved for raw BIDS datasets in both study
  and derivative cases, and that derivatives of derivatives MUST place
  their source derivative under sourcedata/ (not rawbids/).

Co-authored-by: Claude Code 2.1.116 / Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [ENH] Recommend controlled vocabulary for age Units, clarify that it can be overloaded (bids-standard#2400)

* Clarify that age Units could be overriden and refer to Units in common principles

* Add ISO 8601-based duration units for age and validate in schema

Document in common-principles.md that age Units MAY be overridden
to one of: year, month, week, day, hour, minute, or second (based on
ISO 8601 duration designators).  Add AgeUnits schema check rule that
validates participants.json age Units against the allowed set.

Co-Authored-By: Claude Code 2.1.110 / Claude Opus 4.6 <noreply@anthropic.com>

* Keep compatible (warning, not error) + simplify check

Co-authored-by: Chris Markiewicz <effigies@gmail.com>

* Remove duplicate specification of units in the Description

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

* Make check operate on participants.tsv not .json

Co-authored-by: Chris Markiewicz <effigies@gmail.com>

---------

Co-authored-by: Claude Code 2.1.110 / Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Chris Markiewicz <effigies@gmail.com>

* [ENH] Allow for institutions to be listed as Authors (bids-standard#2397)

* Allow for institutions to be listed as Authors

It is not uncommon to have datasets which are a truly an institutional effort -- from
data collection planing, acquisition, curation, harmonization etc, where it is a pipeline
to deliver high quality datasets.  For instance we have a number of such contributions from
Allen Institute(s) in DANDI archive.  In DANDI schema we allow for both Person and Organization
entries, and e.g. in https://github.com/dandisets/000020 we have

```yaml
contributor:
- affiliation:
  - identifier: https://ror.org/00dcv1019
    name: Allen Institute for Brain Science
    schemaKey: Affiliation
  email: nathang@alleninstitute.org
  identifier: 0000-0001-8429-4090
  includeInCitation: false
  name: Gouwens, Nathan
  roleName:
  - dcite:ContactPerson
  schemaKey: Person
- contactPoint: []
  identifier: https://ror.org/00dcv1019
  includeInCitation: true
  name: Allen Institute for Brain Science
  roleName: []
  schemaKey: Organization
  url: https://alleninstitute.org
```

so there is an Organization, which is actually the one to cite, although we do
not list any particular roleName, and then responsible ContactPerson (who is
not even listed as an Author) who could be contacted  ATM (but might be a
different person later) about this dandiset.

I have tried to make wording a bit more explicit than just listing Organizations
as a possible entry here, rather to keep it for large efforts.

* Shorten and generalize statement about Authors

Co-authored-by: Chris Markiewicz <effigies@gmail.com>

---------

Co-authored-by: Chris Markiewicz <effigies@gmail.com>
Co-authored-by: Mark Mikkelsen <mark.mikkelsen@gmail.com>

* [FIX] Bump pymdown-extensions to >=10.21.2 to restore code-block rendering (bids-standard#2438)

* [FIX] Bump pymdown-extensions to >=10.21.2 to restore code-block rendering (bids-standard#2421)

Pygments 2.20.0 made `html.escape()` on the formatter's `filename`
option strict, raising `AttributeError` when the value is `None`.
pymdownx.highlight 10.21 (and earlier) passed `filename=title` where
`title` defaults to `None` for code blocks without a title, which
triggered the crash. pymdownx.superfences catches that exception and
silently falls back to inline-`<code>` rendering, so fenced code
blocks inside list items (e.g. the "Plain" examples in
common-principles.md) lost their `<pre>` formatting and showed the
language tag as literal text.

pymdown-extensions 10.21.2 fixes the upstream defect, so require it
(or newer) in both the docs build and bidsschematools[render].

Closes: bids-standard#2421

Co-Authored-By: Claude Code 2.1.160 / Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [FIX] Regenerate tools/schemacode/uv.lock for pymdown-extensions bump

The previous commit updated tools/schemacode/pyproject.toml but
forgot to refresh its sibling uv.lock. The `latest` tox envs use
the `uv-venv-lock-runner` with `--locked`, so they failed `Setup
test suite` with the pyproject/lockfile mismatch.

Co-Authored-By: Claude Code 2.1.160 / Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Code 2.1.160 / Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Chris Markiewicz <markiewicz@stanford.edu>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Remi Gau <remi_gau@hotmail.com>
Co-authored-by: Bru <b.aristimunha@gmail.com>
Co-authored-by: Dimitri Papadopoulos Orfanos <3234522+DimitriPapadopoulos@users.noreply.github.com>
Co-authored-by: Kabilar Gunalan <kabi@mit.edu>
Co-authored-by: Nikhil Bhagwat <nikhil153@users.noreply.github.com>
Co-authored-by: Chris Markiewicz <effigies@gmail.com>
Co-authored-by: Julia-Katharina Pfarr <111446107+julia-pfarr@users.noreply.github.com>
Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
Co-authored-by: Claude Code 2.1.116 / Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Mark Mikkelsen <mark.mikkelsen@gmail.com>
@sappelhoff sappelhoff removed their request for review June 14, 2026 09:28
ericearl added 4 commits June 18, 2026 06:38
Completed draft text edits I planned to make for the BEP036 IndexColumns rewrite.
Fixed two broken references to other MD files in the repo. Thanks CI!
Fixed a typo found by codespell. Thanks codespell!

@SamGuay SamGuay left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There you go @ericearl !
(Do we need to wait for #2273 to have this one merge-able?)

tabular phenotypic data like the participants file, sessions file,
and phenotypic and assessment data.

They are recommendations and are by default ignored during validation.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful to mention that these recommendations are ignored by default only to maintain backward compatibility for previous datasets?

### 3. Add `MeasurementToolMetadata` to each tabular phenotypic measurement tool

Whenever possible, it is RECOMMENDED to add `MeasurementToolMetadata` to
each `phenotype/<measurement_tool_name>.json` data dictionary.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
each `phenotype/<measurement_tool_name>.json` data dictionary.
each `phenotype/tool-<ToolName>_phenotype.json` data dictionary.

According to the file-naming template just above (L26), this should now use tool-..._phenotype instead of only <measurement_tool_name>

Comment on lines +303 to +304
To read more about the guidelines for tabular phenotypic data and examples,
see the [tabular phenotypic data guidelines appendix](../appendices/phenotype.md).

@SamGuay SamGuay Jun 25, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To read more about the guidelines for tabular phenotypic data and examples,
see the [tabular phenotypic data guidelines appendix](../appendices/phenotype.md).

I don't think we need to repeat it one more time considering the last 3 paragraphs mention the same URL.

Comment on lines +72 to +78
sessions_tsv:
description: 'Contents of /sessions.tsv, accessed by column header'
type: object
additionalProperties:
type: array
items:
type: string

@SamGuay SamGuay Jun 25, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might've missed this or simply forgot about the final decision, but are we moving forward with having a /sessions.tsv at the root level? (the rest of the changes don't appear to reflect that)

Comment on lines +43 to +57

AgeUnits:
issue:
code: AGE_UNITS
message: |
The "Units" value for age in 'participants.json' is not a valid
ISO 8601-based duration unit.
Allowed values are "year", "month", "week", "day", "hour", "minute",
or "second".
level: warning
selectors:
- path == '/participants.tsv'
- '"Units" in sidecar.age'
checks:
- intersects([sidecar.age.Units], ["year", "month", "week", "day", "hour", "minute", "second"])

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AgeUnits:
issue:
code: AGE_UNITS
message: |
The "Units" value for age in 'participants.json' is not a valid
ISO 8601-based duration unit.
Allowed values are "year", "month", "week", "day", "hour", "minute",
or "second".
level: warning
selectors:
- path == '/participants.tsv'
- '"Units" in sidecar.age'
checks:
- intersects([sidecar.age.Units], ["year", "month", "week", "day", "hour", "minute", "second"])

duplicate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BEP enhancement New feature or request phenotype

Projects

None yet

Development

Successfully merging this pull request may close these issues.