Skip to content

Pull Request: Fix n parameter in Source.sample()#505

Open
anna-SN99 wants to merge 2 commits intouktrade:mainfrom
anna-SN99:bugfix/issue-487
Open

Pull Request: Fix n parameter in Source.sample()#505
anna-SN99 wants to merge 2 commits intouktrade:mainfrom
anna-SN99:bugfix/issue-487

Conversation

@anna-SN99
Copy link
Copy Markdown

🛠️ Changes proposed in this pull request

  1. Added limit parameter to Location.execute() in src/matchbox/client/locations.py
  • New optional limit: int | None = None parameter to restrict query results
  • Wraps SQL in a LIMIT subquery when limit is specified
  • Updated all overload signatures and implementation
  1. Updated Source.fetch() and Source.sample() methods in src/matchbox/client/sources.py
  • Added limit parameter to all fetch() overloads and implementation
  • Fixed sample() method to properly respect the n parameter instead of using batch_size
  • Added input validation for n:
    - Raises TypeError if n is not an integer (e.g., strings, floats)
    - Raises ValueError if n is not positive (≤ 0)
  • Fixed return type annotation from None to QueryReturnClass
  1. Added comprehensive tests in test/client/test_locations.py and test/client/test_sources.py
  • Test for limit parameter in execute()
  • New test_source_sample_validation() covering:
    • String input validation
    • Negative/zero value validation
    • Float input validation
    • Valid positive integer behavior

👀 Guidance to review

Key changes to focus on:

  • The LIMIT clause implementation in locations.py uses SQL subqueries to ensure portability across different database engines
  • Input validation in sample() now provides clear error messages for invalid inputs
  • The fix maintains backward compatibility for correct usage (positive integers)

🤖 AI declaration

Claude Opus (via VS Code) was used to:

  • I suggested the broader scope of changes (extending fix to locations.py in addition to sources.py)
  • It recommended and helped implement comprehensive test coverage
  • I ran pre-commit checks
  • The implementation and logic were reviewed manually to ensure correctness. Tests were added to validate the fix works as intended and catches invalid inputs.

✅ Checklist:

  • This is the smallest, simplest solution to the problem
    If you want me to remove the test, that would be simpler. Just let me know.
  • I've read our code standards and this code follows them
  • All new code is tested
  • I've updated all relevant documentation (select all that apply)
    • API documentation (docstrings and indexes)
    • Tutorials
    • Developer docs

@anna-SN99 anna-SN99 requested a review from a team as a code owner February 26, 2026 22:32
@anna-SN99 anna-SN99 marked this pull request as draft February 27, 2026 00:39
@anna-SN99 anna-SN99 marked this pull request as ready for review February 27, 2026 00:39
@anna-SN99
Copy link
Copy Markdown
Author

BTW this is is AnnaSmoot-NOAA. I just realized I needed to switch from my work account to personal account.

@leo-mazzone
Copy link
Copy Markdown
Contributor

leo-mazzone commented Mar 3, 2026

Thanks for your contribution, this is a clean looking PR.

After trying to reproduce the issue locally, with the help of your test, it seems that the sample function works even without your new limit parameter when we use the SQLAlchemy engines. The issue arose specifically when using ADBC as the client for that location. polars.read_database doesn't seem to forward a batch size to ADBC.

Trying to get batch_size to work as intended for ADBC would make this a much trickier issue to solve. However, perhaps your solution is still valid, though may I suggest at least adding a note to the docstring of Source.fetch(), where the batch_size parameter is described, to mention it might not be honoured depending on the location's client?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants