[axonius][identity] Add Axonius Identity datastream#16620
[axonius][identity] Add Axonius Identity datastream#16620muskan-agarwal26 wants to merge 6 commits intoelastic:feature/axonius-0.1.0from
Conversation
|
Package microsoft_exchange_online_message_trace - 1.29.1 containing this change is available at https://epr.elastic.co/package/microsoft_exchange_online_message_trace/1.29.1/ |
|
Hi! We just realized that we haven't looked into this PR in a while. We're sorry! We're labeling this issue as |
|
Hi! This PR has been stale for a while and we're going to close it as part of our cleanup procedure. We appreciate your contribution and would like to apologize if we have not been able to review it, due to the current heavy load of the team. Feel free to re-open this PR if you think it should stay open and is worth rebasing. Thank you for your contribution! |
|
Hi! We just realized that we haven't looked into this PR in a while. We're sorry! We're labeling this issue as |
…ntegrations into datastream-identity
| : | ||
| string(resp.Status) + " (" + string(resp.StatusCode) + ")" | ||
| ), | ||
| "asset_type": string(state.worklist.asset_type_list[0]), |
There was a problem hiding this comment.
Why does this need a string conversion?
There was a problem hiding this comment.
No need, removing string conversion for this from everywhere in code.
| on_failure: | ||
| - append: | ||
| field: error.message | ||
| value: 'Processor {{{_ingest.on_failure_processor_type}}} with tag {{{_ingest.on_failure_processor_tag}}} in pipeline {{{_ingest.on_failure_pipeline}}} failed with message: {{{_ingest.on_failure_message}}}' |
There was a problem hiding this comment.
Check that all date and convert processors will remove the field value on failure. This field is defined as a date so if it fails we have left an unmappable value in place.
There was a problem hiding this comment.
Can this be more concisely/manageably expressed with some/most of the processors replaced with a script
| - version: 0.1.1 | ||
| changes: | ||
| - description: Add support for identity data stream. | ||
| type: enhancement | ||
| link: https://github.com/elastic/integrations/pull/16620 |
There was a problem hiding this comment.
This should be in the 0.1.1 changes. This is not being merged into main in this PR.
| name: axonius | ||
| title: Axonius | ||
| version: 0.1.0 | ||
| version: 0.1.1 |
There was a problem hiding this comment.
| version: 0.1.1 | |
| version: 0.1.0 |
| "id": "axonius-60472232-ca7b-45e6-9fa6-72e6efc41a8e", | ||
| "references": [ | ||
| { | ||
| "id": "metrics-*", |
| "type": "index-pattern" | ||
| }, | ||
| { | ||
| "id": "metrics-*", |
There was a problem hiding this comment.
logs? Also, duplicate of the index-pattern at L1275.
| { | ||
| "id": "logs-*", | ||
| "name": "kibanaSavedObjectMeta.searchSourceJSON.filter[1].meta.index", | ||
| "type": "index-pattern" | ||
| } |
There was a problem hiding this comment.
Duplicate of pattern at L1279.
| name: '{{ IngestPipeline "pipeline-account" }}' | ||
| tag: pipeline-account | ||
| if: >- | ||
| ctx.axonius?.identity?.asset_type.contains('accounts') |
There was a problem hiding this comment.
🟠 High ingest_pipeline/default.yml:605
The pipeline routing conditions on lines 605, 610, 615, and 620 use ctx.axonius?.identity?.asset_type.contains(...) which throws a NullPointerException when asset_type is null. The optional chaining only guards axonius and identity; once identity exists, the expression evaluates to null.contains(...) when asset_type is missing. This causes the entire pipeline to fail and fall through to on_failure, leaving documents incompletely processed. Consider adding null guards like ctx.axonius?.identity?.asset_type != null && ctx.axonius.identity.asset_type.contains('accounts') for all four conditions.
- ctx.axonius?.identity?.asset_type.contains('accounts')
+ ctx.axonius?.identity?.asset_type != null && ctx.axonius.identity.asset_type.contains('accounts')🤖 Copy this AI Prompt to have your agent fix this:
In file packages/axonius/data_stream/identity/elasticsearch/ingest_pipeline/default.yml around line 605:
The pipeline routing conditions on lines 605, 610, 615, and 620 use `ctx.axonius?.identity?.asset_type.contains(...)` which throws a `NullPointerException` when `asset_type` is null. The optional chaining only guards `axonius` and `identity`; once `identity` exists, the expression evaluates to `null.contains(...)` when `asset_type` is missing. This causes the entire pipeline to fail and fall through to `on_failure`, leaving documents incompletely processed. Consider adding null guards like `ctx.axonius?.identity?.asset_type != null && ctx.axonius.identity.asset_type.contains('accounts')` for all four conditions.
There was a problem hiding this comment.
This assessment is correct; there is no previous condition or action that guarantees that asset_type is not absent or null. An alternative fix is to use ctx.axonius?.identity?.asset_type?.contains('accounts') == true.
There was a problem hiding this comment.
Thank you for confirming the issue and suggesting the alternative approach. The ?.contains(...) == true pattern is indeed cleaner and more concise.
Would you like me to fix all four conditions (lines 605, 610, 615, and 620) using this approach?
There was a problem hiding this comment.
Leave all fixes for the author.
There was a problem hiding this comment.
Understood. I'll leave this for the author to fix.
There was a problem hiding this comment.
Keeping it as is.
There was a problem hiding this comment.
Sorry, I'm unable to act on this request because you do not have permissions within this repository.
| type: boolean | ||
| description: Indicates if auto-delete days for cloud recordings is configured. |
There was a problem hiding this comment.
🟡 Medium fields/fields.yml:971
axonius.identity.recording.auto_delete_cmr_days is defined as type: boolean, but the field name indicates it should store a numeric day count. When the Zoom API sends an integer like 30, the ingest pipeline's boolean conversion coerces it to true, silently dropping the actual value. Consider changing the type to long and removing this field from the boolean conversion script in pipeline-user.yml.
- name: auto_delete_cmr_days
- type: boolean
+ type: long🤖 Copy this AI Prompt to have your agent fix this:
In file packages/axonius/elasticsearch/transform/latest_identity/fields/fields.yml around lines 971-972:
`axonius.identity.recording.auto_delete_cmr_days` is defined as `type: boolean`, but the field name indicates it should store a numeric day count. When the Zoom API sends an integer like `30`, the ingest pipeline's boolean conversion coerces it to `true`, silently dropping the actual value. Consider changing the type to `long` and removing this field from the boolean conversion script in `pipeline-user.yml`.
There was a problem hiding this comment.
Do we have a test that exercises this? Is it documented somewhere? I don't see it in any tests.
There was a problem hiding this comment.
No tests currently exercise this field. I found it in the boolean conversion script at pipeline-user.yml line 775, where it's listed alongside other recording.* fields to be converted to boolean. The field is documented as type boolean with description "Indicates if auto-delete days for cloud recordings is configured." However, the field name auto_delete_cmr_days suggests it should store the actual day count (e.g., 30, 60, 90), not just a boolean flag—that's what the companion field auto_delete_cmr is for.
There was a problem hiding this comment.
The comment was actually directed to @muskan-agarwal26.
There was a problem hiding this comment.
yes, in live logs, we are getting value as true or false, hence it is kept boolean
There was a problem hiding this comment.
Sorry, I'm unable to act on this request because you do not have permissions within this repository.
| - foreach: | ||
| field: axonius.identity.user_count_link | ||
| tag: foreach_user_count_link_value | ||
| if: ctx.axonius?.identity?.user_count_link instanceof List | ||
| processor: | ||
| append: | ||
| field: related.user | ||
| tag: append_user_count_link_value_into_related_user | ||
| value: '{{{_ingest._value.value}}}' | ||
| allow_duplicates: false |
There was a problem hiding this comment.
🟡 Medium ingest_pipeline/pipeline-group.yml:88
The append processor at line 96 uses '{{{_ingest._value.value}}}' without an if condition or ignore_missing check, so when a user_count_link element lacks a value field or has an empty/null value, the mustache template resolves to an empty string that gets appended to related.user, polluting the array with empty entries. Other foreach loops in this file use ignore_missing: true on their inner processors; this one lacks equivalent protection. Consider adding if: ctx._ingest?._value?.value != null && ctx._ingest._value.value != '' to the append processor.
- foreach:
field: axonius.identity.user_count_link
tag: foreach_user_count_link_value
if: ctx.axonius?.identity?.user_count_link instanceof List
processor:
append:
field: related.user
tag: append_user_count_link_value_into_related_user
value: '{{{_ingest._value.value}}}'
+ if: ctx._ingest?._value?.value != null && ctx._ingest._value.value != ''
allow_duplicates: falseAlso found in 1 other location(s)
packages/axonius/data_stream/identity/elasticsearch/ingest_pipeline/pipeline-user.yml:118
At line 118, the
appendprocessor inside theforeachoveraxonius.identity.associated_groupsappends'{{{_ingest._value.display_name}}}'torelated.userwithout any null/empty guard on the inner processor. If any group element has a missing or nulldisplay_name, the Mustache template renders to an empty string"", which gets appended to therelated.userarray. Other similarforeach+appendpatterns in the pipeline (e.g., lines 411-416 forfirst_name) have explicitifconditions, but this one does not.
🤖 Copy this AI Prompt to have your agent fix this:
In file packages/axonius/data_stream/identity/elasticsearch/ingest_pipeline/pipeline-group.yml around lines 88-97:
The `append` processor at line 96 uses `'{{{_ingest._value.value}}}'` without an `if` condition or `ignore_missing` check, so when a `user_count_link` element lacks a `value` field or has an empty/null value, the mustache template resolves to an empty string that gets appended to `related.user`, polluting the array with empty entries. Other `foreach` loops in this file use `ignore_missing: true` on their inner processors; this one lacks equivalent protection. Consider adding `if: ctx._ingest?._value?.value != null && ctx._ingest._value.value != ''` to the `append` processor.
Also found in 1 other location(s):
- packages/axonius/data_stream/identity/elasticsearch/ingest_pipeline/pipeline-user.yml:118 -- At line 118, the `append` processor inside the `foreach` over `axonius.identity.associated_groups` appends `'{{{_ingest._value.display_name}}}'` to `related.user` without any null/empty guard on the inner processor. If any group element has a missing or null `display_name`, the Mustache template renders to an empty string `""`, which gets appended to the `related.user` array. Other similar `foreach`+`append` patterns in the pipeline (e.g., lines 411-416 for `first_name`) have explicit `if` conditions, but this one does not.
There was a problem hiding this comment.
This is not possible; conditionals in foreach processors are not usable. See ref.
There was a problem hiding this comment.
You're correct—my research confirms the codebase avoids inner conditionals in foreach processors due to this limitation. The alternative would be a script processor to check for null/empty values before appending, but that adds complexity. Is the risk of empty strings in related.user acceptable here, or would you like me to implement a script-based solution?
There was a problem hiding this comment.
any empty value will be dropped including the object in which it populates, so there is no need of if condition here
There was a problem hiding this comment.
Sorry, I'm unable to act on this request because you do not have permissions within this repository.
efd6
left a comment
There was a problem hiding this comment.
Please address https://github.com/elastic/integrations/pull/16620/changes#r3107948170 and the macroscopeapp concerns.
| { | ||
| "id": "logs-*", | ||
| "name": "kibanaSavedObjectMeta.searchSourceJSON.filter[0].meta.index", | ||
| "type": "index-pattern" | ||
| }, | ||
| { | ||
| "id": "logs-*", | ||
| "name": "kibanaSavedObjectMeta.searchSourceJSON.filter[1].meta.index", | ||
| "type": "index-pattern" | ||
| } |
There was a problem hiding this comment.
This duplication still exists.
There was a problem hiding this comment.
I had reverted that but after exporting the dashboard again, the entry got added again
also, I had a look in other integration's dashboards, they have duplicate entries similarly
|
@efd6 I have addressed the comment: https://github.com/elastic/integrations/pull/16620/changes#r3107948170 , used script processors to reduce the count of processors. |
Proposed commit message
The release includes identity data stream and associated dashboard.
Axonius fields are mapped to their corresponding ECS fields where possible.
Test samples were derived from live data samples, which were subsequently
sanitized.
Checklist
changelog.ymlfile.How to test this PR locally
To test the axonius package:
Related issues
Screenshots