Skip to content

Add common media file definitions for BEP044/BEP047#2367

Open
yarikoptic wants to merge 21 commits into
bids-standard:masterfrom
yarikoptic:mediafiles
Open

Add common media file definitions for BEP044/BEP047#2367
yarikoptic wants to merge 21 commits into
bids-standard:masterfrom
yarikoptic:mediafiles

Conversation

@yarikoptic

@yarikoptic yarikoptic commented Mar 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

Shortcuts:

Two BEPs independently define audio/video media file support with significant overlap:

Both need overlapping suffixes (audio, video), file extensions (.mp4, .wav, …), and similar technical metadata. Rather than define these independently in each BEP — risking inconsistencies and merge conflicts — this PR extracts the common foundation that both can build upon.

For those who want "anecdotal" argumentation, here is a loose translation of one from russian

The French and British were planning the Channel Tunnel and looking for contractors. The Americans proposed digging from both sides, promising to meet in the middle with a maximum 15-meter margin of error. Time: two years. The Japanese agreed to the same plan, but guaranteed a 5-meter accuracy within one year. Then a Russian contractor walks in and says, “We’ll dig from both sides. Two weeks. No guarantees, but in the worst-case scenario, you’ll end up with two tunnels...”

kudos to @vmdocua for reminding of this one

What this PR adds

Component Details
Suffixes audio, video, audiovideo, image
Extensions .wav, .mp3, .aac, .ogg, .mp4, .avi, .mkv, .webm, .jpg, .png, .svg, .webp, .tif, .tiff
Metadata fields RecordingDuration, AudioCodec, AudioCodecRFC6381, AudioSampleRate, AudioChannelCount, VideoCodec, VideoCodecRFC6381, VideoFrameRate, VideoFrameCount, ImageWidth, ImageHeight, ImagePixelFormat, ImageBitDepth
Sidecar rules rules/sidecars/media.yaml — suffix-based rules that auto-apply to any datatype using these suffixes
Appendix appendices/media-files.md — supported formats, codec identification (FFmpeg + RFC 6381), privacy considerations, example JSON

All extension lists, suffix lists, and sidecar field tables in the appendix are rendered from the schema via macros (make_suffix_table, make_extension_table, make_sidecar_table) — no hand-maintained duplication.

full summary of video formats used in DANDI archive (likely all for beh recording BEP047) : 2 with mov, 20 with avi; 4 with mkv; 51 with mp4; none with webm
dandi@drogon:/mnt/backup/dandi/dandiset-manifests$ for ext in mov avi mkv mp4 webm; do echo "=== $ext" ; grep -c -E "\.$ext\$" 00*/draft/assets.yaml | grep -v ':0$' | nl; done
=== mov
     1	001538/draft/assets.yaml:88
     2	001613/draft/assets.yaml:1
=== avi
     1	000360/draft/assets.yaml:1
     2	000540/draft/assets.yaml:495
     3	000559/draft/assets.yaml:2483
     4	000568/draft/assets.yaml:82
     5	000576/draft/assets.yaml:9
     6	000624/draft/assets.yaml:19
     7	000691/draft/assets.yaml:1
     8	000727/draft/assets.yaml:9
     9	000892/draft/assets.yaml:336
    10	001084/draft/assets.yaml:64
    11	001172/draft/assets.yaml:16
    12	001190/draft/assets.yaml:5
    13	001432/draft/assets.yaml:46
    14	001457/draft/assets.yaml:111
    15	001509/draft/assets.yaml:11
    16	001530/draft/assets.yaml:35
    17	001564/draft/assets.yaml:111
    18	001613/draft/assets.yaml:4
    19	001699/draft/assets.yaml:54
    20	001700/draft/assets.yaml:204
=== mkv
     1	000167/draft/assets.yaml:1
     2	000231/draft/assets.yaml:115
     3	000689/draft/assets.yaml:5
     4	001457/draft/assets.yaml:15
=== mp4
     1	000409/draft/assets.yaml:1130
     2	000578/draft/assets.yaml:31
     3	000689/draft/assets.yaml:27
     4	000720/draft/assets.yaml:360
     5	000779/draft/assets.yaml:794
     6	000780/draft/assets.yaml:400
     7	000781/draft/assets.yaml:319
     8	000782/draft/assets.yaml:319
     9	000792/draft/assets.yaml:360
    10	000793/draft/assets.yaml:360
    11	000800/draft/assets.yaml:40
    12	000801/draft/assets.yaml:40
    13	000802/draft/assets.yaml:40
    14	000803/draft/assets.yaml:40
    15	000804/draft/assets.yaml:40
    16	000805/draft/assets.yaml:40
    17	000806/draft/assets.yaml:40
    18	000807/draft/assets.yaml:40
    19	000830/draft/assets.yaml:264
    20	000831/draft/assets.yaml:264
    21	000832/draft/assets.yaml:105
    22	000833/draft/assets.yaml:105
    23	000862/draft/assets.yaml:18
    24	000863/draft/assets.yaml:18
    25	000866/draft/assets.yaml:256
    26	000867/draft/assets.yaml:256
    27	000951/draft/assets.yaml:459
    28	001180/draft/assets.yaml:2
    29	001190/draft/assets.yaml:3
    30	001195/draft/assets.yaml:41
    31	001259/draft/assets.yaml:125
    32	001265/draft/assets.yaml:1
    33	001343/draft/assets.yaml:7
    34	001413/draft/assets.yaml:1
    35	001425/draft/assets.yaml:1343
    36	001454/draft/assets.yaml:77
    37	001471/draft/assets.yaml:598
    38	001528/draft/assets.yaml:32
    39	001538/draft/assets.yaml:4
    40	001608/draft/assets.yaml:6
    41	001613/draft/assets.yaml:4
    42	001617/draft/assets.yaml:317
    43	001702/draft/assets.yaml:175
    44	001711/draft/assets.yaml:5
    45	001712/draft/assets.yaml:6
    46	001713/draft/assets.yaml:3
    47	001749/draft/assets.yaml:52
    48	001757/draft/assets.yaml:2
    49	001771/draft/assets.yaml:36
    50	001772/draft/assets.yaml:1
    51	001782/draft/assets.yaml:56
=== webm

Design decisions

  • Suffix-only selectors in sidecar rules (no datatype constraint), so they automatically apply to both stimuli and beh datatypes without duplication
  • FFmpeg codec names as RECOMMENDED convention — de facto standard in scientific computing, auto-extractable via ffprobe
  • RFC 6381 codec strings as OPTIONAL — for web/broadcast interoperability, provided as separate fields since the mapping from FFmpeg names is one-to-many (e.g., h264 → multiple profile/level strings)
  • Family prefixes (Audio*, Video*, Image*) on all generic metadata terms — VideoFrameRate rather than FrameRate, ImageWidth rather than Width — to align with the rest of the BIDS schema and to disambiguate from non-media meanings (microscopy fields-of-view, physical sizes, etc.)
  • ImageBitDepth and ImagePixelFormat may coexist — when both are present the bit-depth encoded in pix_fmt and the explicit ImageBitDepth integer MUST agree. The integer is the more directly discoverable summary, the pix_fmt string is the authoritative source of truth for FFmpeg-readable files.
  • Descriptions are context-neutral — not tied to "behavioral" or "stimulus" use cases

Review feedback addressed

Click for details — most review threads have been resolved by commits on this branch
  • Schema-driven rendering of suffix / extension / sidecar tables (no hand-maintained duplication) — addresses @neuromechanist #discussion_r2972405564, #discussion_r2972406839.
  • photo suffix relationship section introduced; _image is positioned as the broader future generalization with eventual migration tooling — addresses the discussion with @effigies (#issuecomment-4106815099).
  • Variable frame rate clarification added to VideoFrameRate.description — addresses part of @h-mayorquin #discussion_r2989707902. Optional VariableFrameRate boolean still open (see TODOs).
  • VideoFrameCount added (RECOMMENDED) — addresses @h-mayorquin #discussion_r2989982185.
  • ImageWidth / ImageHeight description spells out "number of columns/rows in the stored pixel grid as captured, without applying any orientation correction (for example, the EXIF Orientation tag)" — addresses @h-mayorquin #discussion_r2989971312 and follow-up with @bendichter.
  • ImagePixelFormat (FFmpeg pix_fmt, OPTIONAL) added under MediaImageProperties — handles color model, channel count, chroma subsampling, and bit depth in one field for any FFmpeg-readable file. Addresses @h-mayorquin #discussion_r2989988246 and @bendichter #discussion_r3348715711.
  • ImageBitDepth (OPTIONAL) added — discoverable integer summary for image-only sidecars whose producing tools don't naturally surface pix_fmt (PIL L/RGB/RGBA are implicitly 8-bit-per-channel). Addresses @h-mayorquin #discussion_r2989972681 and @CodyCBakerPhD's image-vs-video tension on #discussion_r3349547943.
  • Family-prefix harmonization (MediaVisualPropertiesMediaImageProperties; Width/Height/FrameRateImageWidth/ImageHeight/VideoFrameRate) — addresses @yarikoptic #discussion_r3349532158, #discussion_r3349547943.
  • "openness" and "prevalence in the domain of application" added to format-choice considerations — addresses the long discussion on #discussion_r2990014896.

What each BEP would then add on top

  • BEP044: file rules under stimuli datatype, provenance metadata (license, copyright, URL), stimulus-specific entities
  • BEP047: file rules under beh datatype, device metadata, behavioral entities (task, recording, split)

And both would get the common media.yaml sidecar rules for free.

Relation to existing PRs

This branch is based on master and is intentionally independent of both BEP PRs. The recent merge from master keeps the branch current.

I can furnish PRs for that after we agree to agree on this to be a reasonable (even if not final) common ground! We can even refine this further until satisfied and then first BEP to be accepted would "drag" this PR in as well.

Alternatively we could keep those PR separate of this until we finalize it really to simplify review of both BEPs by separating "what are media files in BIDS" from "how does datatype X use them."

CC: @bids-standard/bep044 @ree-gupta @neuromechanist @Remi-Gau @effigies @talmo — feedback welcome from both BEP teams and maintainers.

Test plan

Completed:

  • All YAML files parse correctly
  • Schema tests pass (tools/schemacode pytest)
  • Pre-commit hooks pass (yamllint, prettier, codespell, embedded-JSON check)
  • mkdocs serve renders appendix correctly
  • Initial review by BEP044@neuromechanist approved in principle (#pullrequestreview-3988878351)
  • Active review by BEP047@bendichter engaged with multiple suggestions, several adopted as commits

Remaining (non-blocking unless flagged):

  • Critical, pending decision: explicit VariableFrameRate: boolean field — currently the nominal-rate convention is documented in the VideoFrameRate description; @h-mayorquin's original ask for a separate boolean flag is still open (#discussion_r2989707902).
  • Open thread: whether to move the "Privacy considerations" section to BEP047 instead of keeping it here (#discussion_r3348549380). Current stance: keep it here as generic guidance applicable to both stimuli and behavioral recordings.
  • Verify correspondence between FFmpeg codec names and RFC 6381 strings in the "Common codec reference" table — the values shown are representative examples and should be spot-checked against authoritative sources.
  • Consider whether .mjpeg should be added for pose-estimation training snapshots (@talmo, Motion trajectories & pose extracted data for animal beh research #2057).
  • Final sign-off from BEP044 and BEP047 teams confirming the shared definitions are sufficient as a foundation.

Deferred (out of scope, recorded for follow-up):

  • Optional EXIF-style Orientation field for images — would be a follow-up if there's demand.
  • Color-mode controlled vocabulary beyond ImagePixelFormat (PIL-style ColorMode, free-form AudioVideoAcquisitionDescription) — discussion concluded these belong to a larger "acquisition notes" topic outside this PR.
  • Provenance / licensing metadata (copyright, license, URL) — to be handled in BEP044's stimulus-specific extension rather than the common foundation.

🤖 PR description refreshed with Claude Code.

yarikoptic and others added 2 commits March 18, 2026 16:50
…decar rules)

Introduce shared media file infrastructure for BEP044 (stimuli) and BEP047
(behavioral A/V). Both BEPs need overlapping audio/video/image support, so
this extracts the common foundation:

- Suffixes: audio, video, audiovideo, image
- Extensions: .wav, .mp3, .aac, .ogg, .mp4, .avi, .mkv, .webm, .svg, .webp, .tiff
- Metadata: Duration, FrameRate, Width, Height, AudioChannelCount, AudioSampleRate,
  VideoCodec, AudioCodec, VideoCodecRFC6381, AudioCodecRFC6381
- Sidecar rules (media.yaml): suffix-based rules that auto-apply to any datatype
- Appendix (media-files.md): formats, codec identification, privacy, examples

Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Add spaces between pipes and dashes in all separator rows
(e.g., `| --- |` instead of `|---|`) to satisfy the
remark-lint table-cell-padding rule.

Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

@effigies effigies left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve in principle. Previous file type additions have seemed fine to me but gotten pushback, so I don't have a clear notion of what it takes to accept a (new) media file type.

@yarikoptic

Copy link
Copy Markdown
Collaborator Author

Previous file type additions have seemed fine to me but gotten pushback, so I don't have a clear notion of what it takes to accept a (new) media file type.

I tried to search up what you might have in mind here but failed, could you please elaborate?

@effigies

Copy link
Copy Markdown
Collaborator

@yarikoptic

Copy link
Copy Markdown
Collaborator Author

ah - those beasts! gotcha. I think here situation is different since we are talking about commodity formats, but it brought me into the realm of a different 'conflict' that we have already photo

src/schema/objects/extensions.yaml:116 jpg:
src/schema/objects/extensions.yaml:117: value: .jpg
src/schema/objects/extensions.yaml:213 png:
src/schema/objects/extensions.yaml:214: value: .png
src/schema/objects/extensions.yaml:260 tif:
src/schema/objects/extensions.yaml:261: value: .tif
src/schema/rules/files/raw/micr.yaml:1 microscopy:
src/schema/rules/files/raw/micr.yaml:21 extensions:
src/schema/rules/files/raw/micr.yaml:25: - .png
src/schema/rules/files/raw/micr.yaml:26: - .tif
src/schema/rules/files/raw/perf.yaml:30 asllabeling:
src/schema/rules/files/raw/perf.yaml:33 extensions:
src/schema/rules/files/raw/perf.yaml:34: - .jpg
src/schema/rules/files/raw/perf.yaml:35: - .png
src/schema/rules/files/raw/perf.yaml:36: - .tif
src/schema/rules/files/raw/photo.yaml:1 photo:
src/schema/rules/files/raw/photo.yaml:4 extensions:
src/schema/rules/files/raw/photo.yaml:5: - .jpg
src/schema/rules/files/raw/photo.yaml:6: - .png
src/schema/rules/files/raw/photo.yaml:7: - .tif
src/schema/rules/files/raw/photo.yaml:25 photo__micr:
src/schema/rules/files/raw/photo.yaml:27 extensions:
src/schema/rules/files/raw/photo.yaml:28: - .jpg
src/schema/rules/files/raw/photo.yaml:29: - .png
src/schema/rules/files/raw/photo.yaml:30: - .tif

thus better align with them and having here image? (image is better suited since not necessarily a photo for stimuli or even behavior capture sketch)

yarikoptic and others added 4 commits March 21, 2026 11:45
Replace the newly added `Duration` metadata field with the existing
`RecordingDuration` field, which already has the same semantics
("length of the recording in seconds") and unit. This avoids
introducing a near-duplicate field for media files.

Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Add a note in the appendix explaining why AudioSampleRate is used
instead of the existing SamplingFrequency: audio-video containers
need to distinguish the audio sampling rate from the video frame
rate, so the Audio prefix is necessary for multi-stream files.

Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
The existing photo suffix rules use .tif, so document both .tif
and .tiff as valid TIFF extensions for image contexts. This ensures
consistency when BEPs define file rules for the image suffix.

Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Add a section explaining that the media file definitions generalize
all media in BIDS. The existing photo suffix covers a narrower use
case (still images in electrophysiology/microscopy) and predates
this framework. A "photo" could equally be a video with narration,
an audio description, or a drawing. The media suffixes should be
adopted for new datatypes, and a future proposal may deprecate
photo in favor of the broader image suffix with migration tooling.

Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
@yarikoptic

Copy link
Copy Markdown
Collaborator Author

ok, pushed some commits which I think are bringing it very close to a reviewable state. Review commits , but 'major' one is the adding relationship to 'photo' we already have, rendered shortcut. I think it would be worth a separate PR to introduce that migration if we do proceed with "media files", and it would establish media files potentially even before 044/047, WDYT? IMHO it would make much sense since it could really be not just photo, but sketch, video, dictophone recording -- any media really IMHO to associate with data acquisition to describe locations etc.

@effigies

Copy link
Copy Markdown
Collaborator

Well, I guess it's a question of whether we care what the subject of the image is. _image tells me that it is an image, _photo tells me that it is a photograph (and asllabeling is a diagram of the ASL labeling protocol). _image definitely loses information here.

To compare to another collection of cases in BIDS, single-volume EPI images may have suffix sbref, m0scan or epi depending on the context of their acquisition or intended use.

It leads me to wonder: Would it make more sense to treat this as a discussion of permissible formats and codecs and common metadata, but leave the suffixes up to the BEP. I think audio/video/image/audiovideo could be completely reasonable for one context (under stimuli, they're quite clear), but too generic for another.

@neuromechanist neuromechanist left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in principle and +1 for implementing items with shared interest more atomically.
IMO, the tables and requirement levels could benefit from being pulled from the schema.

I bet Claude can figure out the minimal changes to the schema needed to make such implementation.

Comment thread src/appendices/media-files.md Outdated
Comment thread src/appendices/media-files.md Outdated
@yarikoptic

yarikoptic commented Mar 23, 2026

Copy link
Copy Markdown
Collaborator Author

I think audio/video/image/audiovideo could be completely reasonable for one context (under stimuli, they're quite clear), but too generic for another.

I am yet to think about it more, in particular

  • _photo is more specific that _image... could/should photo be allowed as such more specific 'subclass' of image? but then we are jumping into a potentially huge extra ontology (could have drawing , schematic, diagram) without clear boundaries and potentially non-orthogonal description. So photo could be an image of a schematic, and what matters really that it is a schematic not that it is a photo
  • asllabeling , which we describe as "A deidentified screenshot of the planning of the labeling slab/plane ...". On a first thought it is a very nice description of the underlying content of that image (with a little overspecification in description that it is "screenshot"). And seems to be very similar in purpose to where _photo is used to capture EEG etc electrodes location, or overall "I have this image of something which would describe what I do not have a standard form to describe in ATM, but might deduce later by looking at this image!". In other words -- in those two specific applications, it was to capture provenance in image form (photo or "screenshot" which might be a photo, or PrtSc capture) . Moreover I bet many other modalities could/would need similar ones for similar or related needs (e.g. thinking about @bids-standard/bep037 ATM). I am wondering if they would be better fit to some dedicated and generic entity to annotate with for that reason.

but

  • someone's "behavior" could be some else's "stimuli". Think about movies ;) reminded me of David Leopold experiments of monkeys watching freely behaving monkeys. Hence what matters is the "content" not "purpose" of use.
  • those examples you brought up are IMHO specific parametrizations/samples of less specific instrumentation. If to bring into media domain --- I might have proposed 360audiovideo or 360image with video/image recordings from cams allowing for such acquisitions (yet to figure how to play my hand-gliding video damn it). They would still be video and audio but of specific instrumental characteristics worth highlighting. But in all of those suffixes more about describing the content, not the purpose of those files use (e.g. T1w could be used for so many purposes).
  • somewhat similar to recent refactoring of going from "IntendedFor" to "B0Field" -- again, decoupling away the purpose "what for" from description of characteristics making it appropriate for specific use (e.g. fieldmap correction vs assessment of distortions overall for QA or alike)

So, overall, I feel that those counter-examples are valuable to consider and relate to, but I feel that we still might want to separate description of "data content" vs "purpose" (stimuli vs capture of beh; description of instrumentation setup as an appendix or just not expressable in machine readable form;) here and hence overall this PR for media files.

yarikoptic and others added 4 commits March 23, 2026 13:09
Replace hand-written metadata tables with MACROS___make_sidecar_table()
calls that pull field names, requirement levels, types, and descriptions
directly from the schema (rules/sidecars/media.yaml + objects/metadata.yaml).
This eliminates duplication between the appendix prose and the schema,
addressing review feedback from @neuromechanist and @effigies.

The suffix applicability is noted in prose above each table since the
existing macro does not render a "Suffix" column.

Format/extension tables remain as manual markdown since no macro exists
for that layout.

Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Add MACROS___make_suffix_table() call in the introduction to render
the audio, video, audiovideo, and image suffix definitions directly
from the schema, keeping the appendix in sync with suffixes.yaml.

Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Add MACROS___make_extension_table() that renders a table of file
extensions from the schema (objects/extensions.yaml), with columns
for format name, extension (linked to glossary), and description.

Replace the 3 hand-written format tables in media-files.md (audio,
video, image) with macro calls, eliminating duplication between the
appendix prose and extensions.yaml.

Other spec files with similar hand-written extension tables (EEG,
iEEG, EMG, MEG appendix) can adopt this macro in follow-up PRs.

Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>
Test that the macro correctly renders extension information from the
schema, including display names, extension values, glossary links,
and proper table structure. Follows the same pattern as existing
render table tests.

Co-Authored-By: Claude Code 2.1.63 / Claude Opus 4.6 <noreply@anthropic.com>

@h-mayorquin h-mayorquin left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bendichter asked me to take a look since I have been working with both images and video on the NWB side. I hope my review is useful.

The proposal looks good to me and I think it covers the basics very well.

I have one main suggestion about how to specify the resolution (clarifying Width and Height definitions, particularly for images where the convention is less established than for video) and some minor suggestions about adding extra metadata fields to the sidecars that could be useful for scientific reuse: bit depth, color channels, variable frame rate handling, and frame count. I also suggest including an openness angle in the recommendation for video containers.

There are other concerns I considered but think are too niche for the BIDS proposal: keyframe interval (which determines random access performance for inter-frame codecs), moov atom placement for MP4 and Cues placement for WebM/MKV (which determine whether a file is efficiently streamable over HTTP), and color spaces and gamma correction for images (which would matter for researchers who need precise physical representation of color in their data). I think those can be deferred and dealt with later.

Comment thread src/appendices/media-files.md Outdated
| Field | Suffix | Requirement Level |
| ------------------- | --------------------- | ----------------- |
| `VideoCodec` | `video`, `audiovideo` | RECOMMENDED |
| `FrameRate` | `video`, `audiovideo` | RECOMMENDED |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposal includes FrameRate as a recommended field, but it should clarify how to handle variable frame rate (VFR) video. With constant frame rate, a single number is sufficient and any frame's timestamp can be computed as frame_number / frame_rate. With VFR, that arithmetic breaks down and each frame needs an explicit timestamp to be aligned with data on other recordings.

The spec should indicate whether FrameRate is expected to be the average rate, the nominal rate, or undefined for VFR files, and whether a boolean field like VariableFrameRate should accompany it so that downstream tools know they cannot rely on uniform spacing.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial progress: the field is now VideoFrameRate (renamed in a5b7aea for prefix consistency) and its description says "For variable rate videos, this value should be the nominal frame rate." (be841b7, line-wrapped in aba8721). Still open from your original ask: a separate VariableFrameRate: boolean flag so downstream tools can short-circuit without parsing the description. Do you think the nominal-rate convention alone is sufficient, or do you still want the explicit boolean? If the latter, happy to add it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's nice to have both the approximate framerate and the VariableFrameRate: bool for that case

Comment thread src/appendices/media-files.md Outdated
Comment thread src/appendices/media-files.md
Comment thread src/appendices/media-files.md
Comment thread src/appendices/media-files.md
Comment thread src/appendices/media-files.md Outdated
Comment thread src/appendices/media-files.md Outdated
Comment thread src/appendices/media-files.md
Comment thread src/schema/objects/metadata.yaml Outdated
Co-authored-by: Ben Dichter <ben.dichter@gmail.com>
Comment thread src/appendices/media-files.md
@yarikoptic yarikoptic requested a review from erdalkaraca as a code owner June 3, 2026 15:08
yarikoptic and others added 4 commits June 3, 2026 11:35
Per PR review discussion:

- Width and Height descriptions now explicitly state they correspond to
  the number of columns and rows in the stored pixel grid as captured,
  without applying any orientation correction (for example, EXIF
  Orientation tag). Addresses thread r2989971312 and yarikoptic's
  follow-up r3349597959.

- Add PixelFormat (OPTIONAL) under MediaVideoProperties, using FFmpeg's
  pix_fmt string. A single value encodes color model, channel count,
  chroma subsampling, and bit depth, and is auto-extractable via
  ffprobe. Addresses thread r2989988246 (proposed by @h-mayorquin,
  refined by @bendichter in r3348715711; OPTIONAL per project
  convention rather than RECOMMENDED).

Co-authored-by: Heberto Mayorquin <h.mayorquin@gmail.com>
Co-authored-by: Ben Dichter <ben.dichter@gmail.com>
Co-Authored-By: Claude Code 2.1.161 / Claude Opus 4.7 <noreply@anthropic.com>
Following PR review discussion on naming consistency
(yarikoptic's analysis in r3349532158 and r3349547943):

- Rename FrameRate -> VideoFrameRate to match the existing Video* /
  Audio* prefix convention in MediaVideoProperties / MediaAudioProperties.
- Add VideoFrameCount (RECOMMENDED) under MediaVideoProperties.
  Required for variable frame rate video where the count cannot be
  derived from VideoFrameRate * RecordingDuration; useful as an
  integrity check otherwise. Addresses thread r2989982185.
- Rename MediaVisualProperties -> MediaImageProperties and rename
  Width/Height to ImageWidth/ImageHeight. The Image prefix
  disambiguates these pixel-grid dimensions from other notions of
  width/height (for example, physical object sizes in microscopy,
  field-of-view extents) and aligns with the schema-wide convention
  of family prefixes for generic terms.

Example JSON in the appendix updated accordingly.

Co-authored-by: Heberto Mayorquin <h.mayorquin@gmail.com>
Co-authored-by: Cody Baker <CodyCBakerPhD@gmail.com>
Co-Authored-By: Claude Code 2.1.161 / Claude Opus 4.7 <noreply@anthropic.com>
The pixel format (FFmpeg's pix_fmt) applies equally to single images and
video frames: ffprobe reports it for both, and the encoded information
(color model, channel count, chroma subsampling, bit depth) is the same
concept in either case.

Move the field out of MediaVideoProperties into MediaImageProperties
(which already covers image, video, and audiovideo), and rename with the
Image prefix to match the rest of the group (ImageWidth, ImageHeight).
Description broadened from "video stream" to "video frame or image".

Co-Authored-By: Claude Code 2.1.161 / Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
yarikoptic and others added 2 commits June 3, 2026 12:35
Per PR review discussion on thread r2989972681 (h-mayorquin proposed,
@CodyCBakerPhD raised the image-vs-video tension on r3349547943):
ImagePixelFormat (FFmpeg pix_fmt) deterministically encodes bit depth
for any FFmpeg-readable file, so ImageBitDepth is redundant for video.
However:

- Common PIL modes (L, RGB, RGBA, P, ...) are implicitly 8-bit-per-
  channel and do not encode bit depth in the mode name. Image-domain
  tooling (Pillow, libtiff, PNG library) surfaces bit depth as a
  first-class integer rather than as part of a pix_fmt string.
- For image-only sidecars whose producing tools do not naturally go
  through FFmpeg, ImagePixelFormat may be absent and bit depth is the
  only color-precision field available.
- An integer is more directly discoverable for the typical researcher
  than the FFmpeg pix_fmt naming convention.

Added as OPTIONAL with an explicit "redundant with ImagePixelFormat
when both present; the two MUST agree" note in the description, so the
redundancy is acknowledged rather than hidden.

Co-authored-by: Heberto Mayorquin <h.mayorquin@gmail.com>
Co-authored-by: Cody Baker <CodyCBakerPhD@gmail.com>
Co-Authored-By: Claude Code 2.1.161 / Claude Opus 4.7 <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 3, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.07%. Comparing base (d20ee46) to head (f49c5d9).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2367   +/-   ##
=======================================
  Coverage   83.07%   83.07%           
=======================================
  Files          22       22           
  Lines        1696     1696           
=======================================
  Hits         1409     1409           
  Misses        287      287           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yarikoptic

Copy link
Copy Markdown
Collaborator Author

I think we at large all converged. We had extended interactive session with @bendichter and claude code to agree to agree, we pushed changes, merged master, updated pr description to reflect current state and reinvited reviewers! Overall -- I think we are potentially done here, as fine tuning could be even done in subsequent PRs ;)

@CodyCBakerPhD

Copy link
Copy Markdown
Contributor

So this is just a pre-PR that can be reviewed/accepted/merged independently of the BEP process, right? Then 44/47 re-use these data types when defining their modalities and modality metadata?

The latest reading of the PR content LGTM

One final note I will make is that 044 for stimuli may need to expand AudioChannelCount with greater detail of channel assignments for some of the currently proposed audio file types that support surround sound; e.g., wav with extensible header. The current data typing just says 'integer' but would only be meaningful for 1 and 2 as described in that definition - I don't think behavior will need this however, so probably best to do in 044 (multi-microphone setups for behavioral audio recordings have usually used direct-to-array setups to get around this)

Open thread: whether to move the "Privacy considerations" section to BEP047 instead of keeping it here (#discussion_r3348549380). Current stance: keep it here as generic guidance applicable to both stimuli and behavioral recordings.

I cannot find this discussion from the ID given; I'd also be in favor of keeping it generic

Consider whether .mjpeg should be added for pose-estimation training snapshots (@talmo, #2057).

We've been primarily referring to source videos and frame indices through such training label data (either in .nwb/.slp) since this more naturally fits the common SLEAP use case for doing end-to-end pose estimation on the subjects so I don't personally think we need a specific .mjpeg suffix - if desired, can be done just as easily in a .avi file even if not contiguous/smooth in time

@yarikoptic

Copy link
Copy Markdown
Collaborator Author

So this is just a pre-PR that can be reviewed/accepted/merged independently of the BEP process, right? Then 44/47 re-use these data types when defining their modalities and modality metadata?

yes, that's the idea! they could also potentially just merge this branch into theirs but that would just add burden to the review of those beps since would duplicate information and potentially lead to divergences.

The latest reading of the PR content LGTM

approve then? ;)

One final note I will make is that 044 for stimuli may need to expand AudioChannelCount with greater detail of channel assignments for some of the currently proposed audio file types that support surround sound; e.g., wav with extensible header. The current data typing just says 'integer' but would only be meaningful for 1 and 2 as described in that definition - I don't think behavior will need this however, so probably best to do in 044 (multi-microphone setups for behavioral audio recordings have usually used direct-to-array setups to get around this)

how common such setups? thinking of 80/20 rule, we might want to not rush to complicate it.

Also different mics could just have separate files (e.g. separate _acq- or _recording entity?), each would have their own AudioChannelCount integer. Or am I missing some aspect?

Open thread: whether to move the "Privacy considerations" section to BEP047 instead of keeping it here (#discussion_r3348549380). Current stance: keep it here as generic guidance applicable to both stimuli and behavioral recordings.

I cannot find this discussion from the ID given; I'd also be in favor of keeping it generic

may be because it was resolved now trickier to find. Go to that section and see

image

to expand

Consider whether .mjpeg should be added for pose-estimation training snapshots (@talmo, #2057).

We've been primarily referring to source videos and frame indices through such training label data (either in .nwb/.slp) since this more naturally fits the common SLEAP use case for doing end-to-end pose estimation on the subjects so I don't personally think we need a specific .mjpeg suffix - if desired, can be done just as easily in a .avi file even if not contiguous/smooth in time.

Would .avi support some codec with independent compression per each frame (I guess the same as having each frame a key frame?)? (.slp is not a part of bids, nothing but SLEAP would know it right?)

@CodyCBakerPhD

Copy link
Copy Markdown
Contributor

how common such setups? thinking of 80/20 rule, we might want to not rush to complicate it.

For some groups, it might be 100% of what they do. For others, they'd never consider it. I don't have statistics on the distribution of labs that do or don't.

For stimuli it's certainly more of a common historical psychophysics-type task with humans to have them blindfolded and try to determine the direction of an audio source, which could be done with a single surround sound speaker setup or multiple independent speakers.

If you think that could serve as a follow-up PR after 044 is merged though, maybe we can wait until that day - just wanted to raise awareness in the mind so we don't do something that could make it harder in the future, such as making implicit assumptions elsewhere that the audio is always 1-2 channels

Would .avi support some codec with independent compression per each frame (I guess the same as having each frame a key frame?)? (.slp is not a part of bids, nothing but SLEAP would know it right?)

That's exactly what MJPEG in .avi is. Each frame is an independently compressed image. 'Key frame' is a reference to inter-frame compression, so thus not applicable as a term there per se

@talmo

talmo commented Jun 4, 2026

Copy link
Copy Markdown

@yarikoptic AVI is just the container format. It can store MJPEG encoded frames which are I-frame-only for reliable content time random seeking. A more modern alternative like FFV1 would also work and can be stored in AVI containers too.

We usually do x264 in MP4 containers because they're the most prevalent, but our current nwb exporter does compile random frames into an MJPEG AVI following discussions with the ember/nwb teams.

@yarikoptic

Copy link
Copy Markdown
Collaborator Author

making implicit assumptions elsewhere that the audio is always 1-2 channels

Where are we making that assumption? We just limit count to an integer which I think is reasonable assumption for a count.

If you have specific suggestions, please do a suggestion or a PR since I might be the missing the desired outcome you are seeking.

@CodyCBakerPhD

Copy link
Copy Markdown
Contributor

Where are we making that assumption?

I didn't say we were in this PR - this is more of a topic which would come up within modalities and how the data is used more than how to describe it, but one influences the other

I might be the missing the desired outcome you are seeking.

Rephrased from above, "I just wanted to raise awareness in our minds so we don't do something that could make it harder in the future, such as making implicit assumptions ..."

@bendichter bendichter left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for pushing on this @yarikoptic

OK, so the next step is to incorporate this into BEP047. When should I start doing that? Should I wait for a maintainer to review this first? Should I wait for a merge into master?

The example JSON at the end of the markdown is rendering incorrectly but that is not an issue with this PR, it's an issue upstream I am tracking independently.

@yarikoptic

yarikoptic commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator Author

I see possible ways

  • A. merge into master: preferable if there is a good chance to get either of the BEPs in shape for the next BIDS release, then those two could stay aiming at master and would gain all the changes while keeping their diff specific to their portions
  • B. keep open: then question on what to do about BEPs -- they would need to either
      1. merge this branch, but that would blow their diff with "unrelated" changes, would require periodic merges if we get this changed
      1. change their "base" to be this branch mediafiles (I just now realized that it is off the fork, but I could create it here too). Then, whenever any BEP is ready - we merge both PRs and "be done"

I think A is the easiest but B.b is kinda the "optimal" as would not lead to us releasing BIDS specification with some appendix which is not applicable anywhere, but it requires PRs change of base. If we would agree to it, I will push this branch to this repo. But I do not mind A at all ;)

@effigies @neuromechanist WDYT?

bendichter and others added 4 commits June 4, 2026 16:17
These are generic media properties usable by any modality that stores
audio, so define them alongside the other common media file definitions:
- flac (.flac) lossless audio extension
- AudioBitDepth metadata, added as an optional field of MediaAudioProperties

Moved here from the BEP047 behavioral PR per review discussion.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add flac extension and AudioBitDepth to common media definitions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants