Skip to content

Fix #10632: RandomNodeSplit fails on HeteroData#10654

Open
JiwaniZakir wants to merge 2 commits intopyg-team:masterfrom
JiwaniZakir:fix/10632-randomnodesplit-fails-on-heterodata
Open

Fix #10632: RandomNodeSplit fails on HeteroData#10654
JiwaniZakir wants to merge 2 commits intopyg-team:masterfrom
JiwaniZakir:fix/10632-randomnodesplit-fails-on-heterodata

Conversation

@JiwaniZakir
Copy link
Copy Markdown
Contributor

Closes #10632

Before

RandomNodeSplit.__call__ in torch_geometric/transforms/random_node_split.py skipped any node store that lacked self.key (default: 'y'), regardless of the split mode. On HeteroData, this caused split='train_rest' to silently skip every node type that had no y attribute, producing a transformed object with no masks at all. Accessing .train_mask on the result then raised AttributeError: 'HeteroData' has no attribute 'train_mask'.

After

The skip condition in random_node_split.py (line 77) now gates on self.split != 'train_rest' in addition to the key check:

if (self.split != 'train_rest' and self.key is not None
        and not hasattr(store, self.key)):
    continue

train_rest distributes all remaining nodes to the training set without needing class labels, so it correctly processes every node store in a HeteroData object — including those without a y attribute (e.g., author nodes in MovieLens1M or OGB_MAG). test_rest and random still require the key to be present because they rely on per-class counts.

Changes

  • torch_geometric/transforms/random_node_split.py: Added self.split != 'train_rest' guard to the early-continue condition so unlabeled node stores are not skipped when the split mode doesn't require y.
  • test/transforms/test_random_node_split.py: Rewrote test_random_node_split_on_hetero_data to:
    • Increase author nodes from 300 → 1000 so num_val=100, num_test=200 fits within the store.
    • Explicitly test train_rest on HeteroData with both a labeled (paper) and unlabeled (author) node type, asserting exact mask cardinalities.
    • Add a second sub-test for test_rest confirming that author (no y) is still skipped while paper (has y) receives masks.

Testing

The updated test_random_node_split_on_hetero_data asserts:

data['author']: 4 attrs  → x, train_mask, val_mask, test_mask
data['paper']:  5 attrs  → x, y, train_mask, val_mask, test_mask

paper.train_mask.sum() == 1700   # 2000 - 100 - 200
paper.val_mask.sum()   == 100
paper.test_mask.sum()  == 200

author.train_mask.sum() == 700   # 1000 - 100 - 200
author.val_mask.sum()   == 100
author.test_mask.sum()  == 200

For test_rest with num_train_per_class=10, num_val=100, author (no y) is skipped (len == 1) and paper receives all three masks (len == 5).


This PR was created with AI assistance (Claude). The changes were reviewed by quality gates and a critic model before submission.

`split='train_rest'` does not use the `key` attribute at all, so the
guard that skips node stores lacking `key` (e.g. `y`) was overly
restrictive.  Node types without a label tensor were silently skipped,
leaving HeteroData graphs (such as MovieLens1M) with no masks set.

Scope the guard to only `test_rest` and `random` splits, which actually
require class labels, and update the HeteroData test accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.18%. Comparing base (c211214) to head (623ce43).
⚠️ Report is 191 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10654      +/-   ##
==========================================
- Coverage   86.11%   84.18%   -1.93%     
==========================================
  Files         496      510      +14     
  Lines       33655    36019    +2364     
==========================================
+ Hits        28981    30323    +1342     
- Misses       4674     5696    +1022     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RandomNodeSplit fails on HeteroData

1 participant