Skip to content

test(source-hubspot): add synthetic CRM Search pagination boundary tests#81335

Draft
ZaneHyattAB wants to merge 4 commits into
masterfrom
devin/1782837632-hubspot-crm-search-pagination-boundary-tests
Draft

test(source-hubspot): add synthetic CRM Search pagination boundary tests#81335
ZaneHyattAB wants to merge 4 commits into
masterfrom
devin/1782837632-hubspot-crm-search-pagination-boundary-tests

Conversation

@ZaneHyattAB

@ZaneHyattAB ZaneHyattAB commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

What

Investigation of OC airbytehq/oncall#12852: HubSpot contacts missing/duplicate records at CRM Search pagination boundaries.

This PR adds tests only — no production code changes. The tests document and prove suspected failure modes in the current master code. Some tests intentionally assert broken behavior (e.g., master returning None when it should continue paginating). These are not "passing because master is correct" — they pass because they assert what master currently does, which is believed to be wrong.

Two prior draft PRs proposed independent fixes:

  • airbytehq/airbyte#79666: hypothesized that int(id)+1 with GTE creates lexicographic gaps at the 10k boundary
  • airbytehq/airbyte#80745: hypothesized that short/empty pages with a valid paging.next.after cursor cause premature pagination stop

How

34 synthetic unit tests in test_crm_search_pagination_boundaries.py, organized into two parts:

Part A — Connector-code tests (exercise real HubspotCRMSearchPaginationStrategy):

  • A.1: Short/empty page early-stop — documents that master stops when last_page_size < page_size even if paging.next.after exists (the #80745 bug)
  • A.2: 10k boundary int(id)+1 conversion — documents master's ID conversion behavior with various ID shapes (the #79666 bug)
  • A.3: Same-timestamp records at page/10k boundaries
  • A.4: RECORDS_LIMIT threshold edge cases
  • A.5: Full pagination sequences with patched RECORDS_LIMIT
  • A.6: Combined scenario showing how the short-page bug masks the 10k boundary bug
  • A.7: Verification that #80745 does NOT fix the ID gap (the two bugs are independent)
  • A.8: String cursor edge case

Part B — Pure string/lexicographic demonstrations (no connector code called):

  • B.1: Proves GTE str(int("7198")+1) = GTE "7199" skips IDs like "719869649082" (supports #79666 hypothesis)
  • B.2: Manifest template rendering creates string comparison
  • B.3: Adjacent 30-day slice boundaries — confirms no overlap or gap (slice logic is NOT a source of missing records)

What the tests prove vs. what they do not prove

Proved:

  • Master stops pagination prematurely on short/empty pages with valid cursors
  • Master's int(id)+1 + GTE creates lexicographic gaps with mixed-length IDs
  • The two bugs are independent — fixing one does not fix the other
  • The short-page bug can mask the ID-gap bug (fires first, prevents reaching 10k boundary)
  • 30-day slice boundaries are NOT a source of missing/duplicate records

Not proved (would require real API testing):

  • Whether HubSpot actually returns short pages with valid cursors in production
  • The exact frequency of mixed-length ID scenarios in real portals
  • Whether other undiscovered boundary conditions exist

Review guide

  1. airbyte-integrations/connectors/source-hubspot/unit_tests/test_crm_search_pagination_boundaries.py — the only file changed

User Impact

No user impact — tests only, no production code changes.

Can this PR be safely reverted and rolled back?

  • YES 💚

Requested by: ZaneHyattAB

Link to Devin session: https://app.devin.ai/sessions/8024cac44d9249e1ad4a6011e17b27df

Adds comprehensive unit tests covering HubSpot CRM Search pagination
boundary behavior to investigate OC #12852 (missing/duplicate contacts).

Tests cover:
- Short page + paging.next.after present (PR #80745 bug)
- Empty page + paging.next.after present
- 10k boundary int() conversion with various ID shapes
- Lexicographic gap demonstration (PR #79666 hypothesis)
- Same-timestamp records at pagination boundaries
- Adjacent 30-day slice boundary behavior (GTE/LTE filters)
- Full pagination sequence with patched RECORDS_LIMIT
- Combined bug interaction scenarios
- Verification of both PR hypotheses

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@ZaneHyattAB ZaneHyattAB self-assigned this Jun 30, 2026
@devin-ai-integration

Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment, CI, and merge conflict monitoring

@github-actions

Copy link
Copy Markdown
Contributor

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

PR Slash Commands

Airbyte Maintainers (that's you!) can execute the following slash commands on your PR:

  • 🛠️ Quick Fixes
    • /format-fix - Fixes most formatting issues.
    • /bump-version - Bumps connector versions, scraping changelog description from the PR title.
      • Bump types: patch (default), minor, major, major_rc, rc, promote.
      • The rc type is a smart default: applies minor_rc if stable, or bumps the RC number if already RC.
      • The promote type strips the RC suffix to finalize a release.
      • Example: /bump-version type=rc or /bump-version type=minor
    • /bump-progressive-rollout-version - Alias for /bump-version type=rc. Bumps with an RC suffix and enables progressive rollout.
  • ❇️ AI Testing and Review (internal link: AI-SDLC Docs):
    • /ai-prove-fix - Runs prerelease readiness checks, including testing against customer connections.
    • /ai-canary-prerelease - Rolls out prerelease to 5-10 connections for canary testing.
    • /ai-review - AI-powered PR review for connector safety and quality gates.
  • 📝 AI Documentation:
    • /ai-docs-review - AI-powered documentation review for PRs with connector changes.
    • /ai-create-docs-pr - Creates a documentation PR for connector changes, stacked on the current PR.
  • 🚀 Connector Releases:
    • /publish-connectors-prerelease - Publishes pre-release connector builds (tagged as {version}-preview.{git-sha}) for all modified connectors in the PR.
  • ☕️ JVM connectors:
    • /update-connector-cdk-version connector=<CONNECTOR_NAME> - Updates the specified connector to the latest CDK version.
      Example: /update-connector-cdk-version connector=destination-bigquery
  • 🐍 Python connectors:
    • /poe connector source-example lock - Run the Poe lock task on the source-example connector, committing the results back to the branch.
    • /poe source example lock - Alias for /poe connector source-example lock.
    • /poe source example use-cdk-branch my/branch - Pin the source-example CDK reference to the branch name specified.
    • /poe source example use-cdk-latest - Update the source-example CDK dependency to the latest available version.
  • ⚙️ Admin commands:
    • /force-merge reason="<REASON>" - Force merges the PR using admin privileges, bypassing CI checks. Requires a reason.
      Example: /force-merge reason="CI is flaky, tests pass locally"
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

source-hubspot Connector Test Results

240 tests   232 ✅  12m 57s ⏱️
  2 suites    8 💤
  2 files      0 ❌

Results for commit 3fcaabb.

♻️ This comment has been updated with latest results.

- Split tests into Part A (connector code) and Part B (string demos)
- Rename confusing test names (e.g. 'create_overlap' -> 'no_overlap_and_no_gap')
- Add docstrings linking each group to PR #79666 or #80745
- Clarify that tests document broken master behavior, not correct behavior
- Use owner/repo#number references consistently

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants