Add common media file definitions for BEP044/BEP047 by yarikoptic · Pull Request #2367 · bids-standard/bids-specification

yarikoptic · 2026-03-19T00:13:23Z

Summary

Shortcuts:

Rendered appendix section "Media files".

Two BEPs independently define audio/video media file support with significant overlap:

BEP044 (Stimuli, [ENH] BEP044 - Stim-BIDS #2022 by @neuromechanist @bids-standard/bep044 ): media files as experimental stimuli in /stimuli/
BEP047 (Behavioral A/V, [ENH] BEP047 - Add audio/video recordings to behavioral experiments #2231 by @bendichter ): media files as behavioral recordings in beh/

Both need overlapping suffixes (audio, video), file extensions (.mp4, .wav, …), and similar technical metadata. Rather than define these independently in each BEP — risking inconsistencies and merge conflicts — this PR extracts the common foundation that both can build upon.

Following my own recommendation of [FOUNDATIONAL] Facilitate small atomic enhancements #371 I am proposing to agree on media files independently of the rest of the naming and placement in those BEPs.

For those who want "anecdotal" argumentation, here is a loose translation of one from russian

The French and British were planning the Channel Tunnel and looking for contractors. The Americans proposed digging from both sides, promising to meet in the middle with a maximum 15-meter margin of error. Time: two years. The Japanese agreed to the same plan, but guaranteed a 5-meter accuracy within one year. Then a Russian contractor walks in and says, “We’ll dig from both sides. Two weeks. No guarantees, but in the worst-case scenario, you’ll end up with two tunnels...”

kudos to @vmdocua for reminding of this one

What this PR adds

Component	Details
Suffixes	`audio`, `video`, `audiovideo`, `image`
Extensions	`.wav`, `.mp3`, `.aac`, `.ogg`, `.mp4`, `.avi`, `.mkv`, `.webm`, `.jpg`, `.png`, `.svg`, `.webp`, `.tif`, `.tiff`
Metadata fields	`RecordingDuration`, `AudioCodec`, `AudioCodecRFC6381`, `AudioSampleRate`, `AudioChannelCount`, `VideoCodec`, `VideoCodecRFC6381`, `VideoFrameRate`, `VideoFrameCount`, `ImageWidth`, `ImageHeight`, `ImagePixelFormat`, `ImageBitDepth`
Sidecar rules	`rules/sidecars/media.yaml` — suffix-based rules that auto-apply to any datatype using these suffixes
Appendix	`appendices/media-files.md` — supported formats, codec identification (FFmpeg + RFC 6381), privacy considerations, example JSON

All extension lists, suffix lists, and sidecar field tables in the appendix are rendered from the schema via macros (make_suffix_table, make_extension_table, make_sidecar_table) — no hand-maintained duplication.

full summary of video formats used in DANDI archive (likely all for beh recording BEP047) : 2 with mov, 20 with avi; 4 with mkv; 51 with mp4; none with webm

dandi@drogon:/mnt/backup/dandi/dandiset-manifests$ for ext in mov avi mkv mp4 webm; do echo "=== $ext" ; grep -c -E "\.$ext\$" 00*/draft/assets.yaml | grep -v ':0$' | nl; done
=== mov
     1	001538/draft/assets.yaml:88
     2	001613/draft/assets.yaml:1
=== avi
     1	000360/draft/assets.yaml:1
     2	000540/draft/assets.yaml:495
     3	000559/draft/assets.yaml:2483
     4	000568/draft/assets.yaml:82
     5	000576/draft/assets.yaml:9
     6	000624/draft/assets.yaml:19
     7	000691/draft/assets.yaml:1
     8	000727/draft/assets.yaml:9
     9	000892/draft/assets.yaml:336
    10	001084/draft/assets.yaml:64
    11	001172/draft/assets.yaml:16
    12	001190/draft/assets.yaml:5
    13	001432/draft/assets.yaml:46
    14	001457/draft/assets.yaml:111
    15	001509/draft/assets.yaml:11
    16	001530/draft/assets.yaml:35
    17	001564/draft/assets.yaml:111
    18	001613/draft/assets.yaml:4
    19	001699/draft/assets.yaml:54
    20	001700/draft/assets.yaml:204
=== mkv
     1	000167/draft/assets.yaml:1
     2	000231/draft/assets.yaml:115
     3	000689/draft/assets.yaml:5
     4	001457/draft/assets.yaml:15
=== mp4
     1	000409/draft/assets.yaml:1130
     2	000578/draft/assets.yaml:31
     3	000689/draft/assets.yaml:27
     4	000720/draft/assets.yaml:360
     5	000779/draft/assets.yaml:794
     6	000780/draft/assets.yaml:400
     7	000781/draft/assets.yaml:319
     8	000782/draft/assets.yaml:319
     9	000792/draft/assets.yaml:360
    10	000793/draft/assets.yaml:360
    11	000800/draft/assets.yaml:40
    12	000801/draft/assets.yaml:40
    13	000802/draft/assets.yaml:40
    14	000803/draft/assets.yaml:40
    15	000804/draft/assets.yaml:40
    16	000805/draft/assets.yaml:40
    17	000806/draft/assets.yaml:40
    18	000807/draft/assets.yaml:40
    19	000830/draft/assets.yaml:264
    20	000831/draft/assets.yaml:264
    21	000832/draft/assets.yaml:105
    22	000833/draft/assets.yaml:105
    23	000862/draft/assets.yaml:18
    24	000863/draft/assets.yaml:18
    25	000866/draft/assets.yaml:256
    26	000867/draft/assets.yaml:256
    27	000951/draft/assets.yaml:459
    28	001180/draft/assets.yaml:2
    29	001190/draft/assets.yaml:3
    30	001195/draft/assets.yaml:41
    31	001259/draft/assets.yaml:125
    32	001265/draft/assets.yaml:1
    33	001343/draft/assets.yaml:7
    34	001413/draft/assets.yaml:1
    35	001425/draft/assets.yaml:1343
    36	001454/draft/assets.yaml:77
    37	001471/draft/assets.yaml:598
    38	001528/draft/assets.yaml:32
    39	001538/draft/assets.yaml:4
    40	001608/draft/assets.yaml:6
    41	001613/draft/assets.yaml:4
    42	001617/draft/assets.yaml:317
    43	001702/draft/assets.yaml:175
    44	001711/draft/assets.yaml:5
    45	001712/draft/assets.yaml:6
    46	001713/draft/assets.yaml:3
    47	001749/draft/assets.yaml:52
    48	001757/draft/assets.yaml:2
    49	001771/draft/assets.yaml:36
    50	001772/draft/assets.yaml:1
    51	001782/draft/assets.yaml:56
=== webm

Design decisions

Suffix-only selectors in sidecar rules (no datatype constraint), so they automatically apply to both stimuli and beh datatypes without duplication
FFmpeg codec names as RECOMMENDED convention — de facto standard in scientific computing, auto-extractable via ffprobe
RFC 6381 codec strings as OPTIONAL — for web/broadcast interoperability, provided as separate fields since the mapping from FFmpeg names is one-to-many (e.g., h264 → multiple profile/level strings)
Family prefixes (Audio*, Video*, Image*) on all generic metadata terms — VideoFrameRate rather than FrameRate, ImageWidth rather than Width — to align with the rest of the BIDS schema and to disambiguate from non-media meanings (microscopy fields-of-view, physical sizes, etc.)
ImageBitDepth and ImagePixelFormat may coexist — when both are present the bit-depth encoded in pix_fmt and the explicit ImageBitDepth integer MUST agree. The integer is the more directly discoverable summary, the pix_fmt string is the authoritative source of truth for FFmpeg-readable files.
Descriptions are context-neutral — not tied to "behavioral" or "stimulus" use cases

Review feedback addressed

Click for details — most review threads have been resolved by commits on this branch

Schema-driven rendering of suffix / extension / sidecar tables (no hand-maintained duplication) — addresses @neuromechanist #discussion_r2972405564, #discussion_r2972406839.
photo suffix relationship section introduced; _image is positioned as the broader future generalization with eventual migration tooling — addresses the discussion with @effigies (#issuecomment-4106815099).
Variable frame rate clarification added to VideoFrameRate.description — addresses part of @h-mayorquin #discussion_r2989707902. Optional VariableFrameRate boolean still open (see TODOs).
VideoFrameCount added (RECOMMENDED) — addresses @h-mayorquin #discussion_r2989982185.
ImageWidth / ImageHeight description spells out "number of columns/rows in the stored pixel grid as captured, without applying any orientation correction (for example, the EXIF Orientation tag)" — addresses @h-mayorquin #discussion_r2989971312 and follow-up with @bendichter.
ImagePixelFormat (FFmpeg pix_fmt, OPTIONAL) added under MediaImageProperties — handles color model, channel count, chroma subsampling, and bit depth in one field for any FFmpeg-readable file. Addresses @h-mayorquin #discussion_r2989988246 and @bendichter #discussion_r3348715711.
ImageBitDepth (OPTIONAL) added — discoverable integer summary for image-only sidecars whose producing tools don't naturally surface pix_fmt (PIL L/RGB/RGBA are implicitly 8-bit-per-channel). Addresses @h-mayorquin #discussion_r2989972681 and @CodyCBakerPhD's image-vs-video tension on #discussion_r3349547943.
Family-prefix harmonization (MediaVisualProperties → MediaImageProperties; Width/Height/FrameRate → ImageWidth/ImageHeight/VideoFrameRate) — addresses @yarikoptic #discussion_r3349532158, #discussion_r3349547943.
"openness" and "prevalence in the domain of application" added to format-choice considerations — addresses the long discussion on #discussion_r2990014896.

What each BEP would then add on top

BEP044: file rules under stimuli datatype, provenance metadata (license, copyright, URL), stimulus-specific entities
BEP047: file rules under beh datatype, device metadata, behavioral entities (task, recording, split)

And both would get the common media.yaml sidecar rules for free.

Relation to existing PRs

This branch is based on master and is intentionally independent of both BEP PRs. The recent merge from master keeps the branch current.

I can furnish PRs for that after we agree to agree on this to be a reasonable (even if not final) common ground! We can even refine this further until satisfied and then first BEP to be accepted would "drag" this PR in as well.

Alternatively we could keep those PR separate of this until we finalize it really to simplify review of both BEPs by separating "what are media files in BIDS" from "how does datatype X use them."

CC: @bids-standard/bep044 @ree-gupta @neuromechanist @Remi-Gau @effigies @talmo — feedback welcome from both BEP teams and maintainers.

Test plan

Completed:

All YAML files parse correctly
Schema tests pass (tools/schemacode pytest)
Pre-commit hooks pass (yamllint, prettier, codespell, embedded-JSON check)
mkdocs serve renders appendix correctly
Initial review by BEP044 — @neuromechanist approved in principle (#pullrequestreview-3988878351)
Active review by BEP047 — @bendichter engaged with multiple suggestions, several adopted as commits

Remaining (non-blocking unless flagged):

Critical, pending decision: explicit VariableFrameRate: boolean field — currently the nominal-rate convention is documented in the VideoFrameRate description; @h-mayorquin's original ask for a separate boolean flag is still open (#discussion_r2989707902).
Open thread: whether to move the "Privacy considerations" section to BEP047 instead of keeping it here (#discussion_r3348549380). Current stance: keep it here as generic guidance applicable to both stimuli and behavioral recordings.
Verify correspondence between FFmpeg codec names and RFC 6381 strings in the "Common codec reference" table — the values shown are representative examples and should be spot-checked against authoritative sources.
Consider whether .mjpeg should be added for pose-estimation training snapshots (@talmo, Motion trajectories & pose extracted data for animal beh research #2057).
Final sign-off from BEP044 and BEP047 teams confirming the shared definitions are sufficient as a foundation.

Deferred (out of scope, recorded for follow-up):

Optional EXIF-style Orientation field for images — would be a follow-up if there's demand.
Color-mode controlled vocabulary beyond ImagePixelFormat (PIL-style ColorMode, free-form AudioVideoAcquisitionDescription) — discussion concluded these belong to a larger "acquisition notes" topic outside this PR.
Provenance / licensing metadata (copyright, license, URL) — to be handled in BEP044's stimulus-specific extension rather than the common foundation.

🤖 PR description refreshed with Claude Code.

…decar rules) Introduce shared media file infrastructure for BEP044 (stimuli) and BEP047 (behavioral A/V). Both BEPs need overlapping audio/video/image support, so this extracts the common foundation: - Suffixes: audio, video, audiovideo, image - Extensions: .wav, .mp3, .aac, .ogg, .mp4, .avi, .mkv, .webm, .svg, .webp, .tiff - Metadata: Duration, FrameRate, Width, Height, AudioChannelCount, AudioSampleRate, VideoCodec, AudioCodec, VideoCodecRFC6381, AudioCodecRFC6381 - Sidecar rules (media.yaml): suffix-based rules that auto-apply to any datatype - Appendix (media-files.md): formats, codec identification, privacy, examples Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

Add spaces between pipes and dashes in all separator rows (e.g., `| --- |` instead of `|---|`) to satisfy the remark-lint table-cell-padding rule. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

effigies

I approve in principle. Previous file type additions have seemed fine to me but gotten pushback, so I don't have a clear notion of what it takes to accept a (new) media file type.

yarikoptic · 2026-03-19T19:56:42Z

Previous file type additions have seemed fine to me but gotten pushback, so I don't have a clear notion of what it takes to accept a (new) media file type.

I tried to search up what you might have in mind here but failed, could you please elaborate?

effigies · 2026-03-19T21:10:48Z

The main ones that come to mind:

[ENH] Allow MED format for iEEG data (*_ieeg.medd/) #1956 (reverted in Revert "[ENH] Allow MED format for iEEG data (*_ieeg.medd/) (#1956)" #2122)
Add .nwb as supported format for EEG #2111

yarikoptic · 2026-03-20T00:25:49Z

ah - those beasts! gotcha. I think here situation is different since we are talking about commodity formats, but it brought me into the realm of a different 'conflict' that we have already photo

thus better align with them and having here image? (image is better suited since not necessarily a photo for stimuli or even behavior capture sketch)

Replace the newly added `Duration` metadata field with the existing `RecordingDuration` field, which already has the same semantics ("length of the recording in seconds") and unit. This avoids introducing a near-duplicate field for media files. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

Add a note in the appendix explaining why AudioSampleRate is used instead of the existing SamplingFrequency: audio-video containers need to distinguish the audio sampling rate from the video frame rate, so the Audio prefix is necessary for multi-stream files. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

The existing photo suffix rules use .tif, so document both .tif and .tiff as valid TIFF extensions for image contexts. This ensures consistency when BEPs define file rules for the image suffix. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

Add a section explaining that the media file definitions generalize all media in BIDS. The existing photo suffix covers a narrower use case (still images in electrophysiology/microscopy) and predates this framework. A "photo" could equally be a video with narration, an audio description, or a drawing. The media suffixes should be adopted for new datatypes, and a future proposal may deprecate photo in favor of the broader image suffix with migration tooling. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

yarikoptic · 2026-03-21T16:35:38Z

ok, pushed some commits which I think are bringing it very close to a reviewable state. Review commits , but 'major' one is the adding relationship to 'photo' we already have, rendered shortcut. I think it would be worth a separate PR to introduce that migration if we do proceed with "media files", and it would establish media files potentially even before 044/047, WDYT? IMHO it would make much sense since it could really be not just photo, but sketch, video, dictophone recording -- any media really IMHO to associate with data acquisition to describe locations etc.

effigies · 2026-03-22T19:15:17Z

Well, I guess it's a question of whether we care what the subject of the image is. _image tells me that it is an image, _photo tells me that it is a photograph (and asllabeling is a diagram of the ASL labeling protocol). _image definitely loses information here.

To compare to another collection of cases in BIDS, single-volume EPI images may have suffix sbref, m0scan or epi depending on the context of their acquisition or intended use.

It leads me to wonder: Would it make more sense to treat this as a discussion of permissible formats and codecs and common metadata, but leave the suffixes up to the BEP. I think audio/video/image/audiovideo could be completely reasonable for one context (under stimuli, they're quite clear), but too generic for another.

neuromechanist

LGTM in principle and +1 for implementing items with shared interest more atomically.
IMO, the tables and requirement levels could benefit from being pulled from the schema.

I bet Claude can figure out the minimal changes to the schema needed to make such implementation.

yarikoptic · 2026-03-23T16:16:34Z

I think audio/video/image/audiovideo could be completely reasonable for one context (under stimuli, they're quite clear), but too generic for another.

I am yet to think about it more, in particular

_photo is more specific that _image... could/should photo be allowed as such more specific 'subclass' of image? but then we are jumping into a potentially huge extra ontology (could have drawing , schematic, diagram) without clear boundaries and potentially non-orthogonal description. So photo could be an image of a schematic, and what matters really that it is a schematic not that it is a photo
asllabeling , which we describe as "A deidentified screenshot of the planning of the labeling slab/plane ...". On a first thought it is a very nice description of the underlying content of that image (with a little overspecification in description that it is "screenshot"). And seems to be very similar in purpose to where _photo is used to capture EEG etc electrodes location, or overall "I have this image of something which would describe what I do not have a standard form to describe in ATM, but might deduce later by looking at this image!". In other words -- in those two specific applications, it was to capture provenance in image form (photo or "screenshot" which might be a photo, or PrtSc capture) . Moreover I bet many other modalities could/would need similar ones for similar or related needs (e.g. thinking about @bids-standard/bep037 ATM). I am wondering if they would be better fit to some dedicated and generic entity to annotate with for that reason.

but

someone's "behavior" could be some else's "stimuli". Think about movies ;) reminded me of David Leopold experiments of monkeys watching freely behaving monkeys. Hence what matters is the "content" not "purpose" of use.
those examples you brought up are IMHO specific parametrizations/samples of less specific instrumentation. If to bring into media domain --- I might have proposed 360audiovideo or 360image with video/image recordings from cams allowing for such acquisitions (yet to figure how to play my hand-gliding video damn it). They would still be video and audio but of specific instrumental characteristics worth highlighting. But in all of those suffixes more about describing the content, not the purpose of those files use (e.g. T1w could be used for so many purposes).
somewhat similar to recent refactoring of going from "IntendedFor" to "B0Field" -- again, decoupling away the purpose "what for" from description of characteristics making it appropriate for specific use (e.g. fieldmap correction vs assessment of distortions overall for QA or alike)

So, overall, I feel that those counter-examples are valuable to consider and relate to, but I feel that we still might want to separate description of "data content" vs "purpose" (stimuli vs capture of beh; description of instrumentation setup as an appendix or just not expressable in machine readable form;) here and hence overall this PR for media files.

@neuromechanist

Replace hand-written metadata tables with MACROS___make_sidecar_table() calls that pull field names, requirement levels, types, and descriptions directly from the schema (rules/sidecars/media.yaml + objects/metadata.yaml). This eliminates duplication between the appendix prose and the schema, addressing review feedback from @neuromechanist and @effigies. The suffix applicability is noted in prose above each table since the existing macro does not render a "Suffix" column. Format/extension tables remain as manual markdown since no macro exists for that layout. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

Add MACROS___make_suffix_table() call in the introduction to render the audio, video, audiovideo, and image suffix definitions directly from the schema, keeping the appendix in sync with suffixes.yaml. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

Add MACROS___make_extension_table() that renders a table of file extensions from the schema (objects/extensions.yaml), with columns for format name, extension (linked to glossary), and description. Replace the 3 hand-written format tables in media-files.md (audio, video, image) with macro calls, eliminating duplication between the appendix prose and extensions.yaml. Other spec files with similar hand-written extension tables (EEG, iEEG, EMG, MEG appendix) can adopt this macro in follow-up PRs. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

Test that the macro correctly renders extension information from the schema, including display names, extension values, glossary links, and proper table structure. Follows the same pattern as existing render table tests. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

h-mayorquin

@bendichter asked me to take a look since I have been working with both images and video on the NWB side. I hope my review is useful.

The proposal looks good to me and I think it covers the basics very well.

I have one main suggestion about how to specify the resolution (clarifying Width and Height definitions, particularly for images where the convention is less established than for video) and some minor suggestions about adding extra metadata fields to the sidecars that could be useful for scientific reuse: bit depth, color channels, variable frame rate handling, and frame count. I also suggest including an openness angle in the recommendation for video containers.

There are other concerns I considered but think are too niche for the BIDS proposal: keyframe interval (which determines random access performance for inter-frame codecs), moov atom placement for MP4 and Cues placement for WebM/MKV (which determine whether a file is efficiently streamable over HTTP), and color spaces and gamma correction for images (which would matter for researchers who need precise physical representation of color in their data). I think those can be deferred and dealt with later.

h-mayorquin · 2026-03-25T17:03:45Z

+| Field               | Suffix                | Requirement Level |
+| ------------------- | --------------------- | ----------------- |
+| `VideoCodec`        | `video`, `audiovideo` | RECOMMENDED       |
+| `FrameRate`         | `video`, `audiovideo` | RECOMMENDED       |


The proposal includes FrameRate as a recommended field, but it should clarify how to handle variable frame rate (VFR) video. With constant frame rate, a single number is sufficient and any frame's timestamp can be computed as frame_number / frame_rate. With VFR, that arithmetic breaks down and each frame needs an explicit timestamp to be aligned with data on other recordings.

The spec should indicate whether FrameRate is expected to be the average rate, the nominal rate, or undefined for VFR files, and whether a boolean field like VariableFrameRate should accompany it so that downstream tools know they cannot rely on uniform spacing.

Partial progress: the field is now VideoFrameRate (renamed in a5b7aea for prefix consistency) and its description says "For variable rate videos, this value should be the nominal frame rate." (be841b7, line-wrapped in aba8721). Still open from your original ask: a separate VariableFrameRate: boolean flag so downstream tools can short-circuit without parsing the description. Do you think the nominal-rate convention alone is sufficient, or do you still want the explicit boolean? If the latter, happy to add it.

I think it's nice to have both the approximate framerate and the VariableFrameRate: bool for that case

Co-authored-by: Ben Dichter <ben.dichter@gmail.com>

@h-mayorquin

Per PR review discussion: - Width and Height descriptions now explicitly state they correspond to the number of columns and rows in the stored pixel grid as captured, without applying any orientation correction (for example, EXIF Orientation tag). Addresses thread r2989971312 and yarikoptic's follow-up r3349597959. - Add PixelFormat (OPTIONAL) under MediaVideoProperties, using FFmpeg's pix_fmt string. A single value encodes color model, channel count, chroma subsampling, and bit depth, and is auto-extractable via ffprobe. Addresses thread r2989988246 (proposed by @h-mayorquin, refined by @bendichter in r3348715711; OPTIONAL per project convention rather than RECOMMENDED). Co-authored-by: Heberto Mayorquin <h.mayorquin@gmail.com> Co-authored-by: Ben Dichter <ben.dichter@gmail.com> Co-Authored-By: Claude Code 2.1.161 / Claude Opus 4.7 <noreply@anthropic.com>

Following PR review discussion on naming consistency (yarikoptic's analysis in r3349532158 and r3349547943): - Rename FrameRate -> VideoFrameRate to match the existing Video* / Audio* prefix convention in MediaVideoProperties / MediaAudioProperties. - Add VideoFrameCount (RECOMMENDED) under MediaVideoProperties. Required for variable frame rate video where the count cannot be derived from VideoFrameRate * RecordingDuration; useful as an integrity check otherwise. Addresses thread r2989982185. - Rename MediaVisualProperties -> MediaImageProperties and rename Width/Height to ImageWidth/ImageHeight. The Image prefix disambiguates these pixel-grid dimensions from other notions of width/height (for example, physical object sizes in microscopy, field-of-view extents) and aligns with the schema-wide convention of family prefixes for generic terms. Example JSON in the appendix updated accordingly. Co-authored-by: Heberto Mayorquin <h.mayorquin@gmail.com> Co-authored-by: Cody Baker <CodyCBakerPhD@gmail.com> Co-Authored-By: Claude Code 2.1.161 / Claude Opus 4.7 <noreply@anthropic.com>

The pixel format (FFmpeg's pix_fmt) applies equally to single images and video frames: ffprobe reports it for both, and the encoded information (color model, channel count, chroma subsampling, bit depth) is the same concept in either case. Move the field out of MediaVideoProperties into MediaImageProperties (which already covers image, video, and audiovideo), and rename with the Image prefix to match the rest of the group (ImageWidth, ImageHeight). Description broadened from "video stream" to "video frame or image". Co-Authored-By: Claude Code 2.1.161 / Claude Opus 4.7 <noreply@anthropic.com>

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

@CodyCBakerPhD

Per PR review discussion on thread r2989972681 (h-mayorquin proposed, @CodyCBakerPhD raised the image-vs-video tension on r3349547943): ImagePixelFormat (FFmpeg pix_fmt) deterministically encodes bit depth for any FFmpeg-readable file, so ImageBitDepth is redundant for video. However: - Common PIL modes (L, RGB, RGBA, P, ...) are implicitly 8-bit-per- channel and do not encode bit depth in the mode name. Image-domain tooling (Pillow, libtiff, PNG library) surfaces bit depth as a first-class integer rather than as part of a pix_fmt string. - For image-only sidecars whose producing tools do not naturally go through FFmpeg, ImagePixelFormat may be absent and bit depth is the only color-precision field available. - An integer is more directly discoverable for the typical researcher than the FFmpeg pix_fmt naming convention. Added as OPTIONAL with an explicit "redundant with ImagePixelFormat when both present; the two MUST agree" note in the description, so the redundancy is acknowledged rather than hidden. Co-authored-by: Heberto Mayorquin <h.mayorquin@gmail.com> Co-authored-by: Cody Baker <CodyCBakerPhD@gmail.com> Co-Authored-By: Claude Code 2.1.161 / Claude Opus 4.7 <noreply@anthropic.com>

codecov · 2026-06-03T16:42:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.07%. Comparing base (d20ee46) to head (f49c5d9).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #2367   +/-   ##
=======================================
  Coverage   83.07%   83.07%           
=======================================
  Files          22       22           
  Lines        1696     1696           
=======================================
  Hits         1409     1409           
  Misses        287      287

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

yarikoptic · 2026-06-03T17:03:57Z

I think we at large all converged. We had extended interactive session with @bendichter and claude code to agree to agree, we pushed changes, merged master, updated pr description to reflect current state and reinvited reviewers! Overall -- I think we are potentially done here, as fine tuning could be even done in subsequent PRs ;)

CodyCBakerPhD · 2026-06-03T23:12:59Z

So this is just a pre-PR that can be reviewed/accepted/merged independently of the BEP process, right? Then 44/47 re-use these data types when defining their modalities and modality metadata?

The latest reading of the PR content LGTM

One final note I will make is that 044 for stimuli may need to expand AudioChannelCount with greater detail of channel assignments for some of the currently proposed audio file types that support surround sound; e.g., wav with extensible header. The current data typing just says 'integer' but would only be meaningful for 1 and 2 as described in that definition - I don't think behavior will need this however, so probably best to do in 044 (multi-microphone setups for behavioral audio recordings have usually used direct-to-array setups to get around this)

Open thread: whether to move the "Privacy considerations" section to BEP047 instead of keeping it here (#discussion_r3348549380). Current stance: keep it here as generic guidance applicable to both stimuli and behavioral recordings.

I cannot find this discussion from the ID given; I'd also be in favor of keeping it generic

Consider whether .mjpeg should be added for pose-estimation training snapshots (@talmo, #2057).

We've been primarily referring to source videos and frame indices through such training label data (either in .nwb/.slp) since this more naturally fits the common SLEAP use case for doing end-to-end pose estimation on the subjects so I don't personally think we need a specific .mjpeg suffix - if desired, can be done just as easily in a .avi file even if not contiguous/smooth in time

yarikoptic · 2026-06-04T01:18:24Z

So this is just a pre-PR that can be reviewed/accepted/merged independently of the BEP process, right? Then 44/47 re-use these data types when defining their modalities and modality metadata?

yes, that's the idea! they could also potentially just merge this branch into theirs but that would just add burden to the review of those beps since would duplicate information and potentially lead to divergences.

The latest reading of the PR content LGTM

approve then? ;)

One final note I will make is that 044 for stimuli may need to expand AudioChannelCount with greater detail of channel assignments for some of the currently proposed audio file types that support surround sound; e.g., wav with extensible header. The current data typing just says 'integer' but would only be meaningful for 1 and 2 as described in that definition - I don't think behavior will need this however, so probably best to do in 044 (multi-microphone setups for behavioral audio recordings have usually used direct-to-array setups to get around this)

how common such setups? thinking of 80/20 rule, we might want to not rush to complicate it.

Also different mics could just have separate files (e.g. separate _acq- or _recording entity?), each would have their own AudioChannelCount integer. Or am I missing some aspect?

Open thread: whether to move the "Privacy considerations" section to BEP047 instead of keeping it here (#discussion_r3348549380). Current stance: keep it here as generic guidance applicable to both stimuli and behavioral recordings.

I cannot find this discussion from the ID given; I'd also be in favor of keeping it generic

may be because it was resolved now trickier to find. Go to that section and see

to expand

Consider whether .mjpeg should be added for pose-estimation training snapshots (@talmo, #2057).

We've been primarily referring to source videos and frame indices through such training label data (either in .nwb/.slp) since this more naturally fits the common SLEAP use case for doing end-to-end pose estimation on the subjects so I don't personally think we need a specific .mjpeg suffix - if desired, can be done just as easily in a .avi file even if not contiguous/smooth in time.

Would .avi support some codec with independent compression per each frame (I guess the same as having each frame a key frame?)? (.slp is not a part of bids, nothing but SLEAP would know it right?)

CodyCBakerPhD · 2026-06-04T01:30:44Z

how common such setups? thinking of 80/20 rule, we might want to not rush to complicate it.

For some groups, it might be 100% of what they do. For others, they'd never consider it. I don't have statistics on the distribution of labs that do or don't.

For stimuli it's certainly more of a common historical psychophysics-type task with humans to have them blindfolded and try to determine the direction of an audio source, which could be done with a single surround sound speaker setup or multiple independent speakers.

If you think that could serve as a follow-up PR after 044 is merged though, maybe we can wait until that day - just wanted to raise awareness in the mind so we don't do something that could make it harder in the future, such as making implicit assumptions elsewhere that the audio is always 1-2 channels

Would .avi support some codec with independent compression per each frame (I guess the same as having each frame a key frame?)? (.slp is not a part of bids, nothing but SLEAP would know it right?)

That's exactly what MJPEG in .avi is. Each frame is an independently compressed image. 'Key frame' is a reference to inter-frame compression, so thus not applicable as a term there per se

talmo · 2026-06-04T01:34:50Z

@yarikoptic AVI is just the container format. It can store MJPEG encoded frames which are I-frame-only for reliable content time random seeking. A more modern alternative like FFV1 would also work and can be stored in AVI containers too.

We usually do x264 in MP4 containers because they're the most prevalent, but our current nwb exporter does compile random frames into an MJPEG AVI following discussions with the ember/nwb teams.

yarikoptic · 2026-06-04T03:04:51Z

making implicit assumptions elsewhere that the audio is always 1-2 channels

Where are we making that assumption? We just limit count to an integer which I think is reasonable assumption for a count.

If you have specific suggestions, please do a suggestion or a PR since I might be the missing the desired outcome you are seeking.

CodyCBakerPhD · 2026-06-04T03:34:26Z

Where are we making that assumption?

I didn't say we were in this PR - this is more of a topic which would come up within modalities and how the data is used more than how to describe it, but one influences the other

I might be the missing the desired outcome you are seeking.

Rephrased from above, "I just wanted to raise awareness in our minds so we don't do something that could make it harder in the future, such as making implicit assumptions ..."

bendichter

Looks good to me, thanks for pushing on this @yarikoptic

OK, so the next step is to incorporate this into BEP047. When should I start doing that? Should I wait for a maintainer to review this first? Should I wait for a merge into master?

The example JSON at the end of the markdown is rendering incorrectly but that is not an issue with this PR, it's an issue upstream I am tracking independently.

yarikoptic · 2026-06-04T19:29:51Z

I see possible ways

A. merge into master: preferable if there is a good chance to get either of the BEPs in shape for the next BIDS release, then those two could stay aiming at master and would gain all the changes while keeping their diff specific to their portions
B. keep open: then question on what to do about BEPs -- they would need to either
- 1. merge this branch, but that would blow their diff with "unrelated" changes, would require periodic merges if we get this changed
- 1. change their "base" to be this branch mediafiles (I just now realized that it is off the fork, but I could create it here too). Then, whenever any BEP is ready - we merge both PRs and "be done"

I think A is the easiest but B.b is kinda the "optimal" as would not lead to us releasing BIDS specification with some appendix which is not applicable anywhere, but it requires PRs change of base. If we would agree to it, I will push this branch to this repo. But I do not mind A at all ;)

@effigies @neuromechanist WDYT?

These are generic media properties usable by any modality that stores audio, so define them alongside the other common media file definitions: - flac (.flac) lossless audio extension - AudioBitDepth metadata, added as an optional field of MediaAudioProperties Moved here from the BEP047 behavioral PR per review discussion. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add flac extension and AudioBitDepth to common media definitions

yarikoptic and others added 2 commits March 18, 2026 16:50

Fix table separator padding for remark-lint compliance

bd55318

Add spaces between pipes and dashes in all separator rows (e.g., `| --- |` instead of `|---|`) to satisfy the remark-lint table-cell-padding rule. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

effigies reviewed Mar 19, 2026

View reviewed changes

yarikoptic and others added 4 commits March 21, 2026 11:45

neuromechanist approved these changes Mar 23, 2026

View reviewed changes

Comment thread src/appendices/media-files.md Outdated

Comment thread src/appendices/media-files.md Outdated

yarikoptic and others added 4 commits March 23, 2026 13:09

h-mayorquin reviewed Mar 25, 2026

View reviewed changes

yarikoptic mentioned this pull request Mar 27, 2026

Formalization of "entities" flow or new-function-in-life for _mod- bids-standard/bids-2-devel#97

Open

This was referenced Apr 30, 2026

Please review/contribute to relevant BEPs facebookresearch/neuroai#31

Open

Develop a script to "slice" videos to "match" DICOM series timing ReproNim/reprostim#14

Open

vmdocua mentioned this pull request May 20, 2026

Keep bids-inject sidecar JSON format in sync with BIDS BEP044/BEP047 WiP spec ReproNim/reprostim#251

Merged

CodyCBakerPhD mentioned this pull request May 21, 2026

[ENH] BEP047 - Add audio/video recordings to behavioral experiments #2231

Open

vmdocua mentioned this pull request May 23, 2026

Add RFC 6381 codec info BitDepth and PixelFormat to video-audit, bids-inject, and split-video ReproNim/reprostim#254

Merged

5 tasks

bendichter reviewed Jun 3, 2026

View reviewed changes

Comment thread src/appendices/media-files.md Outdated

bendichter reviewed Jun 3, 2026

View reviewed changes

Comment thread src/appendices/media-files.md

bendichter reviewed Jun 3, 2026

View reviewed changes

Comment thread src/schema/objects/metadata.yaml Outdated

Remove overspecification for "photo" and clarify on variable rate

be841b7

Co-authored-by: Ben Dichter <ben.dichter@gmail.com>

bendichter reviewed Jun 3, 2026

View reviewed changes

Comment thread src/appendices/media-files.md

yarikoptic requested a review from erdalkaraca as a code owner June 3, 2026 15:08

yarikoptic and others added 4 commits June 3, 2026 11:35

Minor wording tune up on the choices

399713d

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

yarikoptic requested review from CodyCBakerPhD, bendichter, effigies, h-mayorquin and neuromechanist June 3, 2026 16:20

yarikoptic and others added 2 commits June 3, 2026 12:35

Merge branch 'master' into mediafiles

6bf8f12

CodyCBakerPhD approved these changes Jun 4, 2026

View reviewed changes

bendichter approved these changes Jun 4, 2026

View reviewed changes

This was referenced Jun 4, 2026

Merge BEP047 audio/video (#2231) onto common media file definitions (#2367) bendichter/bids-specification#2

Closed

Add BEP047 audio/video behavioral recordings on top of common media definitions yarikoptic/bids-specification#2

Open

bendichter and others added 4 commits June 4, 2026 16:17

List flac in the media-files appendix audio formats table

1330ad9

Merge pull request #3 from bendichter/media-extra-formats

811b8fc

Add flac extension and AudioBitDepth to common media definitions

Merge branch 'master' into mediafiles

f49c5d9

Conversation

yarikoptic commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What this PR adds

Design decisions

Review feedback addressed

What each BEP would then add on top

Relation to existing PRs

Test plan

Uh oh!

effigies left a comment

Choose a reason for hiding this comment

Uh oh!

yarikoptic commented Mar 19, 2026

Uh oh!

effigies commented Mar 19, 2026

Uh oh!

yarikoptic commented Mar 20, 2026

Uh oh!

yarikoptic commented Mar 21, 2026

Uh oh!

effigies commented Mar 22, 2026

Uh oh!

neuromechanist left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yarikoptic commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

h-mayorquin left a comment

Choose a reason for hiding this comment

Uh oh!

h-mayorquin Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

yarikoptic-gitmate Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

CodyCBakerPhD Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yarikoptic commented Jun 3, 2026

Uh oh!

CodyCBakerPhD commented Jun 3, 2026

Uh oh!

yarikoptic commented Jun 4, 2026

Uh oh!

CodyCBakerPhD commented Jun 4, 2026

Uh oh!

talmo commented Jun 4, 2026

Uh oh!

yarikoptic commented Jun 4, 2026

Uh oh!

CodyCBakerPhD commented Jun 4, 2026

Uh oh!

bendichter left a comment

Choose a reason for hiding this comment

Uh oh!

yarikoptic commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

yarikoptic commented Mar 19, 2026 •

edited

Loading

yarikoptic commented Mar 23, 2026 •

edited

Loading

codecov Bot commented Jun 3, 2026 •

edited

Loading

yarikoptic commented Jun 4, 2026 •

edited

Loading