test(source-hubspot): add synthetic CRM Search pagination boundary tests#81335
Draft
ZaneHyattAB wants to merge 4 commits into
Draft
test(source-hubspot): add synthetic CRM Search pagination boundary tests#81335ZaneHyattAB wants to merge 4 commits into
ZaneHyattAB wants to merge 4 commits into
Conversation
Adds comprehensive unit tests covering HubSpot CRM Search pagination boundary behavior to investigate OC #12852 (missing/duplicate contacts). Tests cover: - Short page + paging.next.after present (PR #80745 bug) - Empty page + paging.next.after present - 10k boundary int() conversion with various ID shapes - Lexicographic gap demonstration (PR #79666 hypothesis) - Same-timestamp records at pagination boundaries - Adjacent 30-day slice boundary behavior (GTE/LTE filters) - Full pagination sequence with patched RECORDS_LIMIT - Combined bug interaction scenarios - Verification of both PR hypotheses Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Contributor
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Contributor
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksPR Slash CommandsAirbyte Maintainers (that's you!) can execute the following slash commands on your PR:
📚 Show Repo GuidanceHelpful Resources
|
Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Contributor
|
- Split tests into Part A (connector code) and Part B (string demos) - Rename confusing test names (e.g. 'create_overlap' -> 'no_overlap_and_no_gap') - Add docstrings linking each group to PR #79666 or #80745 - Clarify that tests document broken master behavior, not correct behavior - Use owner/repo#number references consistently Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Investigation of OC airbytehq/oncall#12852: HubSpot contacts missing/duplicate records at CRM Search pagination boundaries.
This PR adds tests only — no production code changes. The tests document and prove suspected failure modes in the current master code. Some tests intentionally assert broken behavior (e.g., master returning
Nonewhen it should continue paginating). These are not "passing because master is correct" — they pass because they assert what master currently does, which is believed to be wrong.Two prior draft PRs proposed independent fixes:
int(id)+1with GTE creates lexicographic gaps at the 10k boundarypaging.next.aftercursor cause premature pagination stopHow
34 synthetic unit tests in
test_crm_search_pagination_boundaries.py, organized into two parts:Part A — Connector-code tests (exercise real
HubspotCRMSearchPaginationStrategy):last_page_size < page_sizeeven ifpaging.next.afterexists (the #80745 bug)int(id)+1conversion — documents master's ID conversion behavior with various ID shapes (the #79666 bug)Part B — Pure string/lexicographic demonstrations (no connector code called):
str(int("7198")+1)= GTE"7199"skips IDs like"719869649082"(supports #79666 hypothesis)What the tests prove vs. what they do not prove
Proved:
int(id)+1+ GTE creates lexicographic gaps with mixed-length IDsNot proved (would require real API testing):
Review guide
airbyte-integrations/connectors/source-hubspot/unit_tests/test_crm_search_pagination_boundaries.py— the only file changedUser Impact
No user impact — tests only, no production code changes.
Can this PR be safely reverted and rolled back?
Requested by: ZaneHyattAB
Link to Devin session: https://app.devin.ai/sessions/8024cac44d9249e1ad4a6011e17b27df