Skip to content

[ENH] BEP047 - Add audio/video recordings to behavioral experiments#2231

Open
bendichter wants to merge 43 commits into
masterfrom
audio-video-clean
Open

[ENH] BEP047 - Add audio/video recordings to behavioral experiments#2231
bendichter wants to merge 43 commits into
masterfrom
audio-video-clean

Conversation

@bendichter

@bendichter bendichter commented Oct 25, 2025

Copy link
Copy Markdown
Contributor

Add comprehensive support for audio and video recordings in behavioral
experiments:

- Add audio file extensions (mp3, wav) and video file extensions
  (mp4, mkv, avi) with corresponding _audio and _video suffixes
- Document usage of audio/video recordings in beh directory for
  capturing vocalizations, speech, facial expressions, and body movements
- Add metadata schema for audio/video device information and stream
  properties
- Include privacy warnings about personally identifiable information
  in human subject recordings
- Update behavioral experiments title to remove "with no neural
  recordings" restriction, clarifying data can be stored with or
  without neural recordings
- Add examples for file organization including multi-angle recordings
  and split files
- Define optional entities: task, acquisition, run, recording, split
Comment thread src/schema/objects/metadata.yaml Outdated
Comment thread src/schema/objects/metadata.yaml Outdated
Comment thread src/schema/objects/metadata.yaml Outdated
Comment thread src/modality-specific-files/behavioral-experiments.md Outdated
Comment thread src/modality-specific-files/behavioral-experiments.md Outdated
@yarikoptic yarikoptic changed the title SCHEMA: Add audio video SCHEMA: Add audio video behavioral data support Oct 25, 2025
@yarikoptic yarikoptic added the schema Issues related to the YAML schema representation of the specification. Patch version release. label Oct 25, 2025
…ee macros

- Change section title from 'Behavioral experiments' to 'Behavioral recordings'
- Convert file tree examples to use MACROS___make_filetree_example for consistent rendering
- Address review comments from @yarikoptic in PR #2231
Comment thread src/modality-specific-files/behavioral-experiments.md

@effigies effigies left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this makes sense to me. It would be good to get some feedback from contributors to related BEPs, such as eye-tracking (20), motion (29), stimuli (44) and physio (45). Even if this PR doesn't propose adding this as an associated file to those data types, the potential is there and it's worth getting opinions and identifying potential conflicts.

cc @bids-standard/bep029 @bids-standard/bep044
cc @mszinte @julia-pfarr @oesteban (BEP020)
cc @m-miedema @smoia @SouravKulkarni (?) (BEP045)

Comment thread src/modality-specific-files/behavioral-experiments.md Outdated
Comment thread src/modality-specific-files/behavioral-experiments.md Outdated
Comment thread src/schema/rules/sidecars/beh.yaml Outdated
@effigies effigies changed the title SCHEMA: Add audio video behavioral data support [ENH] Add audio/video recordings to behavioral experiments Oct 28, 2025
@codecov

codecov Bot commented Oct 28, 2025

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.07%. Comparing base (d20ee46) to head (63a4bf7).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2231   +/-   ##
=======================================
  Coverage   83.07%   83.07%           
=======================================
  Files          22       22           
  Lines        1696     1696           
=======================================
  Hits         1409     1409           
  Misses        287      287           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Co-authored-by: Chris Markiewicz <markiewicz@stanford.edu>
Comment thread src/modality-specific-files/behavioral-experiments.md
Comment thread src/schema/rules/files/raw/beh.yaml
@yarikoptic

Copy link
Copy Markdown
Collaborator

Let's continue on that in

@bendichter

Copy link
Copy Markdown
Contributor Author

@neuromechanist

  1. On the audio video vs. audiovideo labeling, we went back and forth a bit in the issue. For us it doesn't make a huge difference, since you would be able to parse that from the metadata about the streams in the json sidecar anyway. If you feel strongly about audiovideo I would be fine with changing it in interest of consistency.

  2. _image. I suppose one could take a picture of a subject performing a task task. I don't know if that's recording behavior per se, but I'd be fine with adding it if you think we should.

  3. I think one of our biggest differences is with the metadata in the sidecar files. Yours is attribution (License, Copyright, URL, Description) and this one is technical (AudioSampleRate, FrameRate, Height, Width, Duration, CameraPosition, AudioBitDepth). I don't think adding attribution to ours makes much sense. It will generally share the license of the rest of the dataset. However, I do think it might make sense for you to adopt our technical attributes. Maybe not CameraPosition, but it might be nice to be able to get AudioSampleRate, FrameRate, Height, Width, Duration without reading the data file.

  4. Our splitting is different. I think split is more consistent with existing usage. The only mention I see of part in the existing schema is:

part
Full name: Part

Format: part-

Allowed values: mag, phase, real, imag

Definition: This entity is used to indicate which component of the complex representation of the MRI signal is represented in voxel data. The part- entity is associated with the DICOM Tag 0008, 9208. Allowed label values for this entity are phase, mag, real and imag, which are typically used in part-mag/part-phase or part-real/part-imag pairs of files.

Phase images MAY be in radians or in arbitrary units. The sidecar JSON file MUST include the "Units" of the phase image. The possible options are "rad" or "arbitrary".

When there is only a magnitude image of a given type, the part entity MAY be omitted.

whereas split already has to do with splitting large files:

split
Full name: Split

Format: split-

Definition: In the case of long data recordings that exceed a file size of 2Gb, .fif files are conventionally split into multiple parts. Each of these files has an internal pointer to the next file. This is important when renaming these split recordings to the BIDS convention.

Instead of a simple renaming, files should be read in and saved under their new names with dedicated tools like MNE-Python, which will ensure that not only the filenames, but also the internal file pointers, will be updated.

It is RECOMMENDED that .fif files with multiple parts use the split- entity to indicate each part. If there are multiple parts of a recording and the optional scans.tsv is provided, all files MUST be listed separately in scans.tsv and the entries for the acq_time column in scans.tsv MUST all be identical, as described in Scans file.

though I can see in your case why you might want to use part, if you are splitting the stimulus up into logical components, like chapters of an audiobook. I don't mind terribly if we use different approaches for this.

@yarikoptic

Copy link
Copy Markdown
Collaborator

On the audio video vs. audiovideo labeling... since you would be able to parse that from the metadata

which is pretty much the case with every _suffix -- metadata and type of data would allow to "figure it out" but the point is to assist a human being and tools of quickly grasping the overall content of the file. In the scope e.g. of "stimuli" in our https://github.com/ReproNim/reprostim/ project (so not capture of beh, but rather of stimuli) it would help to tell apart audio-video stimuli vs pure video capture, thus potentially identify inconsistencies across sessions easier etc.

2. _image. I suppose one could take a picture of a subject performing a task task. I don't know if that's recording behavior per se, but I'd be fine with adding it if you think we should.

fwiw, I will not miss an opportunity of promoting my https://github.com/mykrok where I capture my photos during behavior tasks ;) on a more serious note, could be selected frames from a video for feeding into deeplabcat etc, photos done by location -specific cameras upon subject approaching that location (e.g. in a maze), etc.

  1. ... . I don't think adding attribution to ours makes much sense. It will generally share the license of the rest of the dataset....

FWIW - could be video recording of real people from e.g. YouTube thus having different terms etc.

This aspect is IMHO an interesting demonstration case pointing to the duality of such data (and thus requiring coherent annotation) -- for someone "captured behavior" could be a source of analytics (expressed emotions etc, like was done for https://studyforrest.org "stimuli" -- Forrest Gump movie) and for others -- would be used a stimuli (IIRC @mvdoc had that in his fMRI experiment), and for someone then both bringing BBQS flavor in here of bringing behavior qualities into analytics over neural data.

  1. ... split vs part

I feel also that _part is more for separating out qualitatively different parts of the larger beast (e.g. "_part-head" vs "_part-feet" if we take a video of a full body and decide to produce such "parts"), whenever _split is for sequential (in time) splitting of a larger recording.

@bendichter

bendichter commented Jan 10, 2026

Copy link
Copy Markdown
Contributor Author

@yarikoptic

I do understand why these decisions were made on the stimulus side. My question is specifically about whether we want to make changes to homogenize.

  1. Looking back and the discussion, I don't think there was a strong argument against including audiovideo for this BEP. I'll make the change.

  2. OK, yes, training frames for pose estimation does make sense here. I'll add _image.

  3. Regarding the copyright scenario: it sounds like you're describing a situation where a task recording includes a copyrighted video the subject is watching, and our recording might inherit some of those terms. I think this could happen, but in my judgment it's outside the 80/20 scope. Users can always add custom metadata to indicate this kind of thing if they want.

  4. On _part vs _split: I'd rather extend from existing definitions of entities in the BIDS schema rather than from their English meanings. In BIDS, part is specifically for complex signal components: "Allowed label values for this entity are phase, mag, real and imag, which are typically used in part-mag/part-phase or part-real/part-imag pairs of files." That's quite different from body parts or logical segments. As it currently stands, this PR would handle different cameras recording different body parts with the recording entity, which I think fits more closely with the existing definition:

I feel also that _part is more for separating out qualitatively different parts of the larger beast (e.g. "_part-head" vs "_part-feet" if we take a video of a full body and decide to produce such "parts"), whenever _split is for sequential (in time) splitting of a larger recording.

I'd rather extend from the existing definitions of entities in the BIDS schema, rather than trying to extend from the English definition. In BIDS. part "...allows label values for this entity are phase, mag, real and imag, which are typically used in part-mag/part-phase or part-real/part-imag pairs of files...." That is very different from parts of a body. As it currently stands, this PR would handle different cameras recording different body parts with the recording attribute, which I think fits more closely with the existing definition of recording:

This entity is commonly applied when continuous recordings are from different acquisition instruments, or have different sampling frequencies or start times. For example, physiological recordings with different sampling frequencies may be distinguished using labels like recording-100Hz and recording-500Hz.

I don't see a strong argument for changing this BEP so I'd like to leave it as:

  • task - OPTIONAL for audio and video recordings
  • acq - OPTIONAL, can distinguish different recording setups
  • run - OPTIONAL, for multiple recordings with identical parameters
  • recording - OPTIONAL, to differentiate simultaneous recordings from different angles, locations, or devices
  • split - OPTIONAL, for continuous recordings split into multiple files

and to not use part. Logically different segments in time, like training vs. testing, can be captured using task.

@neuromechanist

Copy link
Copy Markdown
Member
  1. On the audio video vs. audiovideo labeling
  2. _image. I suppose one could take a picture of a subject performing a task task.

I echo @yarikoptic points. Also for image, no one mentioned it could be only one image ;). Another question here is whether we should allow subdirs to group multiple images, multi-part videos? I think BEP044 allows that (should double check and make an example for).

  1. I think one of our biggest differences is with the metadata in the sidecar files. Yours is attribution (License, Copyright, URL, Description) and this one is technical (AudioSampleRate, FrameRate, Height, Width, Duration, CameraPosition, AudioBitDepth).

Yes, will do. Same for adding .flac audio type. I believe the file extensions should be the same (or very close) across the two BEPs.

  1. split vs part

BEP044 extends the definition of split entity:

current def...
For stimulus files, part-<label> can be used to distinguish different parts of a single stimulus, such as
chapters in an audiobook or segments of a long movie (for example, part-1, part-2, part-epilog,
part-chapter1).

My bias comes from the literal meaning of the word, as split (according to old Google) means to break or cause to break forcibly into parts, especially into halves or along the grain. Therefore, part is "more general" in common sense, and does not carry the bias of "especially in halves." Even following the current BIDS definitions, it is assumed that splits are the files of the same size. This assumption might be quite salient for behavioral files as size could be the deciding factor to splitting, but for stimuli, there are several other ways to create parts, importantly, based on content. For example, stim-zootopia2_part-epilog_audiovideo.mp4 is more meaningful that stim-zootopia2_splot-epilog_audiovideo.mp4.

Regarding the copyright scenario:

Please consider that videos containing participants and their responses could often have more restrictions (and therefore, licenses) compared to the main anonymized dataset.

- Add new `_audiovideo` suffix for files containing both audio and video streams
- Update documentation to distinguish between audio-only, video-only, and combined recordings
- Split AudioVideoStreams sidecar table into separate AudioStreams and VideoStreams tables
- Add example files and JSON sidecars for audiovideo recordings
- Update schema suffixes to include audiovideo definition
@bendichter

Copy link
Copy Markdown
Contributor Author

@neuromechanist can we please try to keep the discussion here to this PR? We can discuss whether split or part (or both) is more appropriate for stimuli in your thread, but I'd prefer to keep this to what should be allowed here. I would rather not support part for this BEP. Is that OK with you?

Another question here is whether we should allow subdirs to group multiple images, multi-part videos?

I originally had this in and @effigies pushed back, saying that would make this PR substantially more complex. I agree. I'd rather move forward with what we have now. We can add subdirectories later if we need to, since that would be a purely additive change to the schema.

Regarding copyright on participant recordings: that's a fair point. I'll add an optional License field to the sidecar metadata.

…iments

Add `_image` suffix for storing still images captured during behavioral
experiments in the `beh` directory. Changes include:

- Add `.jpg` and `.png` as supported image file extensions
- Document use cases: pose estimation training frames, behavioral setup
  snapshots, and extracted video frames
- Update privacy/PII warnings to include images alongside audio/video
- Add ImageProperties sidecar table and example files
- Update AudioVideoDevice macro to AudioVideoImageDevice
- Add License field to AudioVideoImageDevice sidecar schema
- Update documentation to include images in audio/video section headings
- Add note explaining licensing considerations for recordings containing
  identifiable participant data
@oesteban

oesteban commented Jan 11, 2026

Copy link
Copy Markdown
Collaborator

+1 for clarity. But, I am afraid the argument is not as strong, especially as BIDS has strived for clear definitions, reproducible metadata, and community-led extensibility. Following this argument, one could argue since the stimuli/ directory does not have a mandated structure, why other directories should have a structure.

May I ask this long response be moved to #2296, which @yarikoptic opened for that purpose? Others will likely refuse to answer here not to pollute this PR with the tangent discussion, which may lead a confused reader to think that this message from a maintainer is the last word on the issue.

@bendichter

Copy link
Copy Markdown
Contributor Author

Would *_image.{png,jpeg,json} work? I know "image" can have many different meanings, but it would be the most convenient term for us to cover photos, screenshots, depth camera photos, etc.

@neuromechanist neuromechanist changed the title [ENH] Add audio/video recordings to behavioral experiments [ENH] BEP047 - Add audio/video recordings to behavioral experiments Feb 8, 2026
@neuromechanist neuromechanist removed the schema Issues related to the YAML schema representation of the specification. Patch version release. label Feb 8, 2026
body movements, or other behavioral aspects during experimental tasks or rest periods.

Still images captured during behavioral experiments MAY be stored in the `beh` directory
using the `_image` suffix.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence about image. I mostly want a way to store audio/video data and I don't care as much about images/photos. I don't know whether to include it, and if we include it I don't know whether to call it image or photo

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An image can be thought of (and indeed sometimes is extracted and saved) as a single frame of a video, so I think we should include it

"Image" is good with me, "photo" implies more about how the image was captured and stored

@yarikoptic

Copy link
Copy Markdown
Collaborator

A great usecase came from @xiaonansun is video recording of epileptic patients along with EEG or iEEG or other (single units) recording.

@yarikoptic

Copy link
Copy Markdown
Collaborator

and @mvdm 's lab records rodent faces for eye tracking + potential whiskers analysis etc!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BEP for audio/video capture of behaving subjects

9 participants