-
Notifications
You must be signed in to change notification settings - Fork 5.2k
feat(source-mixpanel): add optional stream filtering to speed up schema discovery #81343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
a8b8c5b
7a27867
0e25744
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,4 @@ | ||
| version: 0.80.0 | ||
| version: 0.81.0 | ||
| type: DeclarativeSource | ||
|
|
||
| definitions: | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -145,12 +145,17 @@ def streams(self, config: Mapping[str, Any]) -> List[Stream]: | |
| else: | ||
| credentials["option_title"] = "Service Account" | ||
|
|
||
| streams = super().streams(config=config) | ||
| selected_streams = config.get("streams") | ||
|
|
||
| all_streams = super().streams(config=config) | ||
|
|
||
| config_transformed = copy.deepcopy(config) | ||
| config_transformed = self._validate_and_transform(config_transformed) | ||
| auth = self.get_authenticator(config) | ||
|
|
||
| streams.append(Export(authenticator=auth, **config_transformed)) | ||
| all_streams.append(Export(authenticator=auth, **config_transformed)) | ||
|
|
||
| if selected_streams: | ||
| all_streams = [s for s in all_streams if s.name in selected_streams] | ||
|
Comment on lines
+148
to
+159
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🚩 No tests added for the new streams filtering feature The PR adds a new config parameter and filtering logic but does not include any unit tests for the behavior. The existing Was this helpful? React with 👍 or 👎 to provide feedback.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ☑️ Resolved in 0e25744. Added a parametrized |
||
|
|
||
| return streams | ||
| return all_streams | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚩 Check stream interaction with the new streams filter is CDK-dependent
The manifest defines
check.stream_names: [cohorts]atsource_mixpanel/manifest.yaml:800-801. If a user sets thestreamsconfig to a subset that excludescohorts, the behavior ofcheck_connectiondepends on whether the CDK resolves check streams from the manifest definitions directly or from the output ofstreams(). If the CDK usesstreams(), the check would fail becausecohortswould be filtered out. In practice, declarative source check resolution typically resolves streams independently fromstreams(), so this is likely fine, but worth verifying with CDK documentation if this feature is used widely.Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 Not fixing. The CDK's
CheckStreamresolves check streams from the manifest definitions directly, not from the output ofstreams(). Thecheck_connectioncall intest_source.pyhits the cohorts endpoint (/api/query/cohorts/list) regardless of whatstreams()returns. So filtering via thestreamsconfig won't affect the check —cohortswill always be checked via the manifest-defined check stream.Worth noting in docs if this feature sees wide adoption, but not a code-level concern.
Devin session