Skip to content

Fix MovieLens dataset incompatibility with SentenceTransformers =5.x#10668

Open
drivanov wants to merge 4 commits intopyg-team:masterfrom
drivanov:movie_lens
Open

Fix MovieLens dataset incompatibility with SentenceTransformers =5.x#10668
drivanov wants to merge 4 commits intopyg-team:masterfrom
drivanov:movie_lens

Conversation

@drivanov
Copy link
Copy Markdown
Contributor

Summary

This PR fixes a runtime error in the MovieLens dataset when used with newer versions of sentence-transformers (≥5.x).
The issue arises from passing a NumPy array (df['title'].values) to SentenceTransformer.encode(), which leads to incorrect modality inference and raises:

ValueError: Modality 'audio' is not supported by this SentenceTransformer model.

Root Cause

Recent versions of SentenceTransformers` introduced modality-aware input handling.
When a NumPy array is passed:

df['title'].values  # np.ndarray(dtype=object)

the library may misinterpret the input as a non-text modality (e.g., audio), instead of text.
However, encode() expects a List[str] for text inputs.

Fix

Convert the input to an explicit list of strings before encoding:

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.18%. Comparing base (c211214) to head (b988ba9).
⚠️ Report is 194 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10668      +/-   ##
==========================================
- Coverage   86.11%   84.18%   -1.93%     
==========================================
  Files         496      510      +14     
  Lines       33655    36022    +2367     
==========================================
+ Hits        28981    30325    +1344     
- Misses       4674     5697    +1023     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant