Skip to content

phase2: multi-task AMICA concat to lift k-factor (issue #33)#34

Draft
neuromechanist wants to merge 1 commit into
mainfrom
feature/issue-33-phase2-multitask
Draft

phase2: multi-task AMICA concat to lift k-factor (issue #33)#34
neuromechanist wants to merge 1 commit into
mainfrom
feature/issue-33-phase2-multitask

Conversation

@neuromechanist

Copy link
Copy Markdown
Member

Summary

Refines Phase 2 to fix the under-determined ICA on PR #31. ThePresent alone gives k = samples / n_chans² = 1.1; the resulting AMICA components collapse to single electrodes (visual inspection confirmed). Reference pipeline mitigates by concatenating multiple HBN tasks for the ICA training pool while keeping ThePresent as the analysis target.

This PR adds an IcaTasks option to phase2_amica (default = the four passive-viewing movie tasks: DespicableMe, DiaryOfAWimpyKid, FunwithFractals, ThePresent). Lifts k from 1.1 to ≈3.5 on the 100 Hz local data.

Refs epic #1, closes #33. Blocks PR #32 (Phase 3 derivatives) until merged.

What changes

  • src/matlab/phase2_amica.m: IcaTasks opt. When multi-task: load all per-task Phase 1 .set files, restrict to channel intersection, pop_mergeset, train AMICA on merged, transplant weights onto Task=ThePresent EEG.
  • src/matlab/+hbn/prepare_multitask_amica_input.m (new): per-subject multi-task pre-merge step. Regenerates missing Phase 1 .set per task via phase1_preprocess.
  • src/matlab/+hbn/apply_ica_weights.m (new): copies icaweights/icasphere/icachansind/icawinv from training EEG onto target EEG; asserts channel-count match.
  • src/matlab/+hbn/write_qa_amica_csv.m: schema gains ica_tasks, ica_samples, k_factor, common_channels.
  • tests/matlab/test_phase2_smoke.m: passes IcaTasks="ThePresent" so the fixture-based smoke test stays on the single-task path (fixture has only ThePresent).
  • scripts/run_phase2_three_subjects_multitask.m (new): production 3-subject runner.

Why this isn't a brief violation

CLAUDE.md "Never generalize to other HBN movies in this project (ThePresent only)" is about the analysis scope; the contrast, epoching, ERSP, and stats all stay ThePresent-only. The reference pipeline study_handy_scripts.m explicitly uses task_group = ["surroundSupp", "RestingState", "DespicableMe", "ThePresent", "FunwithFractals", "DiaryOfAWimpyKid"] for the ICA training pool and pop_mergeset before runamica17_nsg. We are doing the same, scoped to the four movie tasks for ecological consistency.

Local verification

test_phase2_smoke: OK (118 ICs, 118 dipoles, median RV 0.436)

(Single-task fallback path; the wiring smoke test only exercises one task.)

Test plan

  • Single-task smoke green on the fixture.
  • CI green on this branch.
  • 3-subject multi-task AMICA run (in flight). Expected wall-time per subject: 1-2 h (≈4x ThePresent-only because merged data is ~11 min vs 3.5 min, and AMICA scales near-linearly in samples).
  • Verify k_factor in qa_amica.csv lifted from 1.1 to ~3-4.
  • eeg-qa-neuroscientist Phase 2 review: topographies look component-like (bilateral / dipolar), not single-electrode; median RV improved.
  • Then unblock PR phase3: iclabel classification + non-brain IC flagging #32: re-run Phase 3 on the better weights.

Status

Draft. PR body will be updated with derivatives + QA review once the 3-subject run completes.

ThePresent alone gives k = samples/n_chans^2 = 1.1 for 128-channel ICA;
standard ICA wants k >= 20. Under-determined AMICA produces components
that collapse to single electrodes (visual inspection on PR #31's 3
subjects confirms). Reference pipeline mitigates by concatenating
multiple HBN tasks for ICA training while keeping ThePresent as the
analysis target.

Adds IcaTasks opt to phase2_amica (default: the four passive-viewing
movie tasks DespicableMe, DiaryOfAWimpyKid, FunwithFractals,
ThePresent). When |IcaTasks| > 1: ensure Phase 1 .set exists for each
task (regenerate via phase1_preprocess if missing), load all, restrict
to common channel intersection, pop_mergeset, train AMICA on merged
data, transplant weights onto the Task=ThePresent EEG via
hbn.apply_ica_weights, then dipfit + figures + checkpoint as before.

src/matlab/phase2_amica.m
- IcaTasks (1,:) string default 4 movie tasks. Set to [Task] for the
  legacy single-task path (smoke test uses this).
- BidsRoot opt added so prepare_multitask_amica_input can regenerate
  missing Phase 1 .set files.
- qaRow now carries ica_tasks, ica_samples, k_factor, common_channels.

src/matlab/+hbn/prepare_multitask_amica_input.m (new)
- For each task in IcaTasks: load Phase 1 .set or regenerate via
  phase1_preprocess. Intersect channel labels across tasks
  (>= 32 channels required). pop_select to intersection. pop_mergeset.
  Returns merged EEG + info struct.

src/matlab/+hbn/apply_ica_weights.m (new)
- Copies icaweights/icasphere/icachansind/icawinv from sourceEEG to
  targetEEG, asserting channel-count match. After multi-task AMICA
  the target ThePresent EEG must already be on the same channel
  intersection (pop_select before the call).

src/matlab/+hbn/write_qa_amica_csv.m
- Schema gains ica_tasks, ica_samples, k_factor, common_channels.
  Insertion-order placement keeps participant_id, status, ica_method
  as the leftmost columns for readability.

tests/matlab/test_phase2_smoke.m
- Pass IcaTasks="ThePresent" so the fixture-based smoke test stays
  on the single-task path (the fixture only has ThePresent).
- requiredFields includes IcaTasks.

scripts/run_phase2_three_subjects_multitask.m (new)
- Production runner. Regenerates Phase 1 ThePresent .set if missing.
  Per-task .sets for the other 3 movie tasks are produced lazily by
  prepare_multitask_amica_input.

Refs #1, refs #33.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Phase 2 refinement: multi-task concat AMICA (improve k-factor)

1 participant