Add common media file definitions for BEP044/BEP047#2367
Conversation
…decar rules) Introduce shared media file infrastructure for BEP044 (stimuli) and BEP047 (behavioral A/V). Both BEPs need overlapping audio/video/image support, so this extracts the common foundation: - Suffixes: audio, video, audiovideo, image - Extensions: .wav, .mp3, .aac, .ogg, .mp4, .avi, .mkv, .webm, .svg, .webp, .tiff - Metadata: Duration, FrameRate, Width, Height, AudioChannelCount, AudioSampleRate, VideoCodec, AudioCodec, VideoCodecRFC6381, AudioCodecRFC6381 - Sidecar rules (media.yaml): suffix-based rules that auto-apply to any datatype - Appendix (media-files.md): formats, codec identification, privacy, examples Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Add spaces between pipes and dashes in all separator rows (e.g., `| --- |` instead of `|---|`) to satisfy the remark-lint table-cell-padding rule. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
effigies
left a comment
There was a problem hiding this comment.
I approve in principle. Previous file type additions have seemed fine to me but gotten pushback, so I don't have a clear notion of what it takes to accept a (new) media file type.
I tried to search up what you might have in mind here but failed, could you please elaborate? |
|
The main ones that come to mind: |
Replace the newly added `Duration` metadata field with the existing
`RecordingDuration` field, which already has the same semantics
("length of the recording in seconds") and unit. This avoids
introducing a near-duplicate field for media files.
Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Add a note in the appendix explaining why AudioSampleRate is used instead of the existing SamplingFrequency: audio-video containers need to distinguish the audio sampling rate from the video frame rate, so the Audio prefix is necessary for multi-stream files. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
The existing photo suffix rules use .tif, so document both .tif and .tiff as valid TIFF extensions for image contexts. This ensures consistency when BEPs define file rules for the image suffix. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Add a section explaining that the media file definitions generalize all media in BIDS. The existing photo suffix covers a narrower use case (still images in electrophysiology/microscopy) and predates this framework. A "photo" could equally be a video with narration, an audio description, or a drawing. The media suffixes should be adopted for new datatypes, and a future proposal may deprecate photo in favor of the broader image suffix with migration tooling. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
|
ok, pushed some commits which I think are bringing it very close to a reviewable state. Review commits , but 'major' one is the adding relationship to 'photo' we already have, rendered shortcut. I think it would be worth a separate PR to introduce that migration if we do proceed with "media files", and it would establish media files potentially even before 044/047, WDYT? IMHO it would make much sense since it could really be not just photo, but sketch, video, dictophone recording -- any media really IMHO to associate with data acquisition to describe locations etc. |
|
Well, I guess it's a question of whether we care what the subject of the image is. To compare to another collection of cases in BIDS, single-volume EPI images may have suffix It leads me to wonder: Would it make more sense to treat this as a discussion of permissible formats and codecs and common metadata, but leave the suffixes up to the BEP. I think |
neuromechanist
left a comment
There was a problem hiding this comment.
LGTM in principle and +1 for implementing items with shared interest more atomically.
IMO, the tables and requirement levels could benefit from being pulled from the schema.
I bet Claude can figure out the minimal changes to the schema needed to make such implementation.
I am yet to think about it more, in particular
but
So, overall, I feel that those counter-examples are valuable to consider and relate to, but I feel that we still might want to separate description of "data content" vs "purpose" (stimuli vs capture of beh; description of instrumentation setup as an appendix or just not expressable in machine readable form;) here and hence overall this PR for media files. |
Replace hand-written metadata tables with MACROS___make_sidecar_table() calls that pull field names, requirement levels, types, and descriptions directly from the schema (rules/sidecars/media.yaml + objects/metadata.yaml). This eliminates duplication between the appendix prose and the schema, addressing review feedback from @neuromechanist and @effigies. The suffix applicability is noted in prose above each table since the existing macro does not render a "Suffix" column. Format/extension tables remain as manual markdown since no macro exists for that layout. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Add MACROS___make_suffix_table() call in the introduction to render the audio, video, audiovideo, and image suffix definitions directly from the schema, keeping the appendix in sync with suffixes.yaml. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Add MACROS___make_extension_table() that renders a table of file extensions from the schema (objects/extensions.yaml), with columns for format name, extension (linked to glossary), and description. Replace the 3 hand-written format tables in media-files.md (audio, video, image) with macro calls, eliminating duplication between the appendix prose and extensions.yaml. Other spec files with similar hand-written extension tables (EEG, iEEG, EMG, MEG appendix) can adopt this macro in follow-up PRs. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Test that the macro correctly renders extension information from the schema, including display names, extension values, glossary links, and proper table structure. Follows the same pattern as existing render table tests. Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
h-mayorquin
left a comment
There was a problem hiding this comment.
@bendichter asked me to take a look since I have been working with both images and video on the NWB side. I hope my review is useful.
The proposal looks good to me and I think it covers the basics very well.
I have one main suggestion about how to specify the resolution (clarifying Width and Height definitions, particularly for images where the convention is less established than for video) and some minor suggestions about adding extra metadata fields to the sidecars that could be useful for scientific reuse: bit depth, color channels, variable frame rate handling, and frame count. I also suggest including an openness angle in the recommendation for video containers.
There are other concerns I considered but think are too niche for the BIDS proposal: keyframe interval (which determines random access performance for inter-frame codecs), moov atom placement for MP4 and Cues placement for WebM/MKV (which determine whether a file is efficiently streamable over HTTP), and color spaces and gamma correction for images (which would matter for researchers who need precise physical representation of color in their data). I think those can be deferred and dealt with later.
| | Field | Suffix | Requirement Level | | ||
| | ------------------- | --------------------- | ----------------- | | ||
| | `VideoCodec` | `video`, `audiovideo` | RECOMMENDED | | ||
| | `FrameRate` | `video`, `audiovideo` | RECOMMENDED | |
There was a problem hiding this comment.
The proposal includes FrameRate as a recommended field, but it should clarify how to handle variable frame rate (VFR) video. With constant frame rate, a single number is sufficient and any frame's timestamp can be computed as frame_number / frame_rate. With VFR, that arithmetic breaks down and each frame needs an explicit timestamp to be aligned with data on other recordings.
The spec should indicate whether FrameRate is expected to be the average rate, the nominal rate, or undefined for VFR files, and whether a boolean field like VariableFrameRate should accompany it so that downstream tools know they cannot rely on uniform spacing.
There was a problem hiding this comment.
Partial progress: the field is now VideoFrameRate (renamed in a5b7aea for prefix consistency) and its description says "For variable rate videos, this value should be the nominal frame rate." (be841b7, line-wrapped in aba8721). Still open from your original ask: a separate VariableFrameRate: boolean flag so downstream tools can short-circuit without parsing the description. Do you think the nominal-rate convention alone is sufficient, or do you still want the explicit boolean? If the latter, happy to add it.
There was a problem hiding this comment.
I think it's nice to have both the approximate framerate and the VariableFrameRate: bool for that case
Co-authored-by: Ben Dichter <ben.dichter@gmail.com>
Per PR review discussion: - Width and Height descriptions now explicitly state they correspond to the number of columns and rows in the stored pixel grid as captured, without applying any orientation correction (for example, EXIF Orientation tag). Addresses thread r2989971312 and yarikoptic's follow-up r3349597959. - Add PixelFormat (OPTIONAL) under MediaVideoProperties, using FFmpeg's pix_fmt string. A single value encodes color model, channel count, chroma subsampling, and bit depth, and is auto-extractable via ffprobe. Addresses thread r2989988246 (proposed by @h-mayorquin, refined by @bendichter in r3348715711; OPTIONAL per project convention rather than RECOMMENDED). Co-authored-by: Heberto Mayorquin <h.mayorquin@gmail.com> Co-authored-by: Ben Dichter <ben.dichter@gmail.com> Co-Authored-By: Claude Code 2.1.161 / Claude Opus 4.7 <noreply@anthropic.com>
Following PR review discussion on naming consistency (yarikoptic's analysis in r3349532158 and r3349547943): - Rename FrameRate -> VideoFrameRate to match the existing Video* / Audio* prefix convention in MediaVideoProperties / MediaAudioProperties. - Add VideoFrameCount (RECOMMENDED) under MediaVideoProperties. Required for variable frame rate video where the count cannot be derived from VideoFrameRate * RecordingDuration; useful as an integrity check otherwise. Addresses thread r2989982185. - Rename MediaVisualProperties -> MediaImageProperties and rename Width/Height to ImageWidth/ImageHeight. The Image prefix disambiguates these pixel-grid dimensions from other notions of width/height (for example, physical object sizes in microscopy, field-of-view extents) and aligns with the schema-wide convention of family prefixes for generic terms. Example JSON in the appendix updated accordingly. Co-authored-by: Heberto Mayorquin <h.mayorquin@gmail.com> Co-authored-by: Cody Baker <CodyCBakerPhD@gmail.com> Co-Authored-By: Claude Code 2.1.161 / Claude Opus 4.7 <noreply@anthropic.com>
The pixel format (FFmpeg's pix_fmt) applies equally to single images and video frames: ffprobe reports it for both, and the encoded information (color model, channel count, chroma subsampling, bit depth) is the same concept in either case. Move the field out of MediaVideoProperties into MediaImageProperties (which already covers image, video, and audiovideo), and rename with the Image prefix to match the rest of the group (ImageWidth, ImageHeight). Description broadened from "video stream" to "video frame or image". Co-Authored-By: Claude Code 2.1.161 / Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
Per PR review discussion on thread r2989972681 (h-mayorquin proposed, @CodyCBakerPhD raised the image-vs-video tension on r3349547943): ImagePixelFormat (FFmpeg pix_fmt) deterministically encodes bit depth for any FFmpeg-readable file, so ImageBitDepth is redundant for video. However: - Common PIL modes (L, RGB, RGBA, P, ...) are implicitly 8-bit-per- channel and do not encode bit depth in the mode name. Image-domain tooling (Pillow, libtiff, PNG library) surfaces bit depth as a first-class integer rather than as part of a pix_fmt string. - For image-only sidecars whose producing tools do not naturally go through FFmpeg, ImagePixelFormat may be absent and bit depth is the only color-precision field available. - An integer is more directly discoverable for the typical researcher than the FFmpeg pix_fmt naming convention. Added as OPTIONAL with an explicit "redundant with ImagePixelFormat when both present; the two MUST agree" note in the description, so the redundancy is acknowledged rather than hidden. Co-authored-by: Heberto Mayorquin <h.mayorquin@gmail.com> Co-authored-by: Cody Baker <CodyCBakerPhD@gmail.com> Co-Authored-By: Claude Code 2.1.161 / Claude Opus 4.7 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2367 +/- ##
=======================================
Coverage 83.07% 83.07%
=======================================
Files 22 22
Lines 1696 1696
=======================================
Hits 1409 1409
Misses 287 287 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
I think we at large all converged. We had extended interactive session with @bendichter and claude code to agree to agree, we pushed changes, merged master, updated pr description to reflect current state and reinvited reviewers! Overall -- I think we are potentially done here, as fine tuning could be even done in subsequent PRs ;) |
|
So this is just a pre-PR that can be reviewed/accepted/merged independently of the BEP process, right? Then 44/47 re-use these data types when defining their modalities and modality metadata? The latest reading of the PR content LGTM One final note I will make is that 044 for stimuli may need to expand
I cannot find this discussion from the ID given; I'd also be in favor of keeping it generic
We've been primarily referring to source videos and frame indices through such training label data (either in |
yes, that's the idea! they could also potentially just merge this branch into theirs but that would just add burden to the review of those beps since would duplicate information and potentially lead to divergences.
approve then? ;)
how common such setups? thinking of 80/20 rule, we might want to not rush to complicate it. Also different mics could just have separate files (e.g. separate
may be because it was resolved now trickier to find. Go to that section and see
to expand
Would .avi support some codec with independent compression per each frame (I guess the same as having each frame a key frame?)? (.slp is not a part of bids, nothing but SLEAP would know it right?) |
For some groups, it might be 100% of what they do. For others, they'd never consider it. I don't have statistics on the distribution of labs that do or don't. For stimuli it's certainly more of a common historical psychophysics-type task with humans to have them blindfolded and try to determine the direction of an audio source, which could be done with a single surround sound speaker setup or multiple independent speakers. If you think that could serve as a follow-up PR after 044 is merged though, maybe we can wait until that day - just wanted to raise awareness in the mind so we don't do something that could make it harder in the future, such as making implicit assumptions elsewhere that the audio is always 1-2 channels
That's exactly what MJPEG in |
|
@yarikoptic AVI is just the container format. It can store MJPEG encoded frames which are I-frame-only for reliable content time random seeking. A more modern alternative like FFV1 would also work and can be stored in AVI containers too. We usually do x264 in MP4 containers because they're the most prevalent, but our current nwb exporter does compile random frames into an MJPEG AVI following discussions with the ember/nwb teams. |
Where are we making that assumption? We just limit count to an integer which I think is reasonable assumption for a count. If you have specific suggestions, please do a suggestion or a PR since I might be the missing the desired outcome you are seeking. |
I didn't say we were in this PR - this is more of a topic which would come up within modalities and how the data is used more than how to describe it, but one influences the other
Rephrased from above, "I just wanted to raise awareness in our minds so we don't do something that could make it harder in the future, such as making implicit assumptions ..." |
bendichter
left a comment
There was a problem hiding this comment.
Looks good to me, thanks for pushing on this @yarikoptic
OK, so the next step is to incorporate this into BEP047. When should I start doing that? Should I wait for a maintainer to review this first? Should I wait for a merge into master?
The example JSON at the end of the markdown is rendering incorrectly but that is not an issue with this PR, it's an issue upstream I am tracking independently.
|
I see possible ways
I think A is the easiest but B.b is kinda the "optimal" as would not lead to us releasing BIDS specification with some appendix which is not applicable anywhere, but it requires PRs change of base. If we would agree to it, I will push this branch to this repo. But I do not mind A at all ;) @effigies @neuromechanist WDYT? |
These are generic media properties usable by any modality that stores audio, so define them alongside the other common media file definitions: - flac (.flac) lossless audio extension - AudioBitDepth metadata, added as an optional field of MediaAudioProperties Moved here from the BEP047 behavioral PR per review discussion. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add flac extension and AudioBitDepth to common media definitions

Summary
Shortcuts:
Two BEPs independently define audio/video media file support with significant overlap:
/stimuli/beh/Both need overlapping suffixes (
audio,video), file extensions (.mp4,.wav, …), and similar technical metadata. Rather than define these independently in each BEP — risking inconsistencies and merge conflicts — this PR extracts the common foundation that both can build upon.For those who want "anecdotal" argumentation, here is a loose translation of one from russian
The French and British were planning the Channel Tunnel and looking for contractors. The Americans proposed digging from both sides, promising to meet in the middle with a maximum 15-meter margin of error. Time: two years. The Japanese agreed to the same plan, but guaranteed a 5-meter accuracy within one year. Then a Russian contractor walks in and says, “We’ll dig from both sides. Two weeks. No guarantees, but in the worst-case scenario, you’ll end up with two tunnels...”
kudos to @vmdocua for reminding of this one
What this PR adds
audio,video,audiovideo,image.wav,.mp3,.aac,.ogg,.mp4,.avi,.mkv,.webm,.jpg,.png,.svg,.webp,.tif,.tiffRecordingDuration,AudioCodec,AudioCodecRFC6381,AudioSampleRate,AudioChannelCount,VideoCodec,VideoCodecRFC6381,VideoFrameRate,VideoFrameCount,ImageWidth,ImageHeight,ImagePixelFormat,ImageBitDepthrules/sidecars/media.yaml— suffix-based rules that auto-apply to any datatype using these suffixesappendices/media-files.md— supported formats, codec identification (FFmpeg + RFC 6381), privacy considerations, example JSONAll extension lists, suffix lists, and sidecar field tables in the appendix are rendered from the schema via macros (
make_suffix_table,make_extension_table,make_sidecar_table) — no hand-maintained duplication.full summary of video formats used in DANDI archive (likely all for beh recording BEP047) : 2 with mov, 20 with avi; 4 with mkv; 51 with mp4; none with webm
Design decisions
stimuliandbehdatatypes without duplicationffprobeh264→ multiple profile/level strings)Audio*,Video*,Image*) on all generic metadata terms —VideoFrameRaterather thanFrameRate,ImageWidthrather thanWidth— to align with the rest of the BIDS schema and to disambiguate from non-media meanings (microscopy fields-of-view, physical sizes, etc.)ImageBitDepthandImagePixelFormatmay coexist — when both are present the bit-depth encoded inpix_fmtand the explicitImageBitDepthinteger MUST agree. The integer is the more directly discoverable summary, the pix_fmt string is the authoritative source of truth for FFmpeg-readable files.Review feedback addressed
Click for details — most review threads have been resolved by commits on this branch
#discussion_r2972405564,#discussion_r2972406839.photosuffix relationship section introduced;_imageis positioned as the broader future generalization with eventual migration tooling — addresses the discussion with @effigies (#issuecomment-4106815099).VideoFrameRate.description— addresses part of @h-mayorquin#discussion_r2989707902. OptionalVariableFrameRateboolean still open (see TODOs).VideoFrameCountadded (RECOMMENDED) — addresses @h-mayorquin#discussion_r2989982185.ImageWidth/ImageHeightdescription spells out "number of columns/rows in the stored pixel grid as captured, without applying any orientation correction (for example, the EXIFOrientationtag)" — addresses @h-mayorquin#discussion_r2989971312and follow-up with @bendichter.ImagePixelFormat(FFmpegpix_fmt, OPTIONAL) added underMediaImageProperties— handles color model, channel count, chroma subsampling, and bit depth in one field for any FFmpeg-readable file. Addresses @h-mayorquin#discussion_r2989988246and @bendichter#discussion_r3348715711.ImageBitDepth(OPTIONAL) added — discoverable integer summary for image-only sidecars whose producing tools don't naturally surfacepix_fmt(PILL/RGB/RGBAare implicitly 8-bit-per-channel). Addresses @h-mayorquin#discussion_r2989972681and @CodyCBakerPhD's image-vs-video tension on#discussion_r3349547943.MediaVisualProperties→MediaImageProperties;Width/Height/FrameRate→ImageWidth/ImageHeight/VideoFrameRate) — addresses @yarikoptic#discussion_r3349532158,#discussion_r3349547943.#discussion_r2990014896.What each BEP would then add on top
stimulidatatype, provenance metadata (license, copyright, URL), stimulus-specific entitiesbehdatatype, device metadata, behavioral entities (task, recording, split)And both would get the common
media.yamlsidecar rules for free.Relation to existing PRs
This branch is based on
masterand is intentionally independent of both BEP PRs. The recent merge frommasterkeeps the branch current.I can furnish PRs for that after we agree to agree on this to be a reasonable (even if not final) common ground! We can even refine this further until satisfied and then first BEP to be accepted would "drag" this PR in as well.
Alternatively we could keep those PR separate of this until we finalize it really to simplify review of both BEPs by separating "what are media files in BIDS" from "how does datatype X use them."
CC: @bids-standard/bep044 @ree-gupta @neuromechanist @Remi-Gau @effigies @talmo — feedback welcome from both BEP teams and maintainers.
Test plan
Completed:
tools/schemacodepytest)mkdocs serverenders appendix correctly#pullrequestreview-3988878351)Remaining (non-blocking unless flagged):
VariableFrameRate: booleanfield — currently the nominal-rate convention is documented in theVideoFrameRatedescription; @h-mayorquin's original ask for a separate boolean flag is still open (#discussion_r2989707902).#discussion_r3348549380). Current stance: keep it here as generic guidance applicable to both stimuli and behavioral recordings..mjpegshould be added for pose-estimation training snapshots (@talmo, Motion trajectories & pose extracted data for animal beh research #2057).Deferred (out of scope, recorded for follow-up):
Orientationfield for images — would be a follow-up if there's demand.ImagePixelFormat(PIL-styleColorMode, free-formAudioVideoAcquisitionDescription) — discussion concluded these belong to a larger "acquisition notes" topic outside this PR.🤖 PR description refreshed with Claude Code.