Skip to content

OSIDB-4923 - Add helper function to pull cpe list#1253

Closed
darakian wants to merge 6 commits intoRedHatProductSecurity:masterfrom
darakian:collect-cpes
Closed

OSIDB-4923 - Add helper function to pull cpe list#1253
darakian wants to merge 6 commits intoRedHatProductSecurity:masterfrom
darakian:collect-cpes

Conversation

@darakian
Copy link
Copy Markdown
Contributor

Adds a helper function to extract nvd cpe data and adds it to the list of data returned Partial implementation for OSIDB-4923

I didn't see where the function response2result is being tested, so I didn't add a test, but happy to do so if that can be pointed out.

This PR does not store the data elsewhere. The eventual location will be one of the new fields discussed in OSIDB-4923

Adds a helper function to extract nvd cpe data and adds it to the list of data returned
Partial implementation for OSIDB-4923
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

Warning

Rate limit exceeded

@darakian has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 47 minutes and 15 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: e19c05c2-280d-4c8e-8ec6-3f1aaefcfe09

📥 Commits

Reviewing files that changed from the base of the PR and between a36d034 and 9169a82.

📒 Files selected for processing (1)
  • collectors/nvd/tests/test_collectors.py
📝 Walkthrough

Walkthrough

Adds extraction of CPE "criteria" values from NVD CVE records into per-CVE results under a new nvd_cpes field, and adds a VCR-backed integration test for CPE loading (the test currently contains an unconditional failing assertion).

Changes

Cohort / File(s) Summary
NVD CPE Extraction
collectors/nvd/collectors.py
Adds local get_cpe_list helper to iterate data.cpe when present and return an empty list otherwise; response2result now includes a new "nvd_cpes" key containing the extracted CPE criteria list for each CVE.
Tests — NVD integration
collectors/nvd/tests/test_collectors.py
Adds a VCR-backed test test_cpe_load that sets up a Flaw for CVE-2020-1234, runs NVD collection, checks for a cpe attribute on the resulting model, but ends with an unconditional assert False (test forced to fail).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding a helper function to extract and include CPE data from NVD responses.
Description check ✅ Passed The description is directly related to the changeset, explaining the purpose of the helper function and its integration into the NVD data extraction process.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
collectors/nvd/collectors.py (1)

68-80: Consider simplifying and deduplicating CPEs.

Two optional improvements:

  1. The helper can be collapsed into a comprehension; the outer len(data.cpe) > 0 guard is redundant with the final emptiness check.
  2. A single CVE can reference the same criteria across multiple configurations/nodes, producing duplicates. Unless downstream consumers rely on multiplicity, deduping here avoids surprises once this is persisted per OSIDB-4923.
♻️ Proposed simplification
-        def get_cpe_list(data: CVE) -> Optional[List[str]]:
-            """
-            Return a list of CPEs from the CVE `data`
-            """
-            cpe_list = []
-            if "cpe" in data and len(data.cpe) > 0:
-                for entry in data.cpe:
-                    cpe_list.append(entry.criteria)
-
-            if len(cpe_list) > 0:
-                return cpe_list
-            else:
-                return None
+        def get_cpe_list(data: CVE) -> Optional[List[str]]:
+            """
+            Return a deduplicated list of CPE criteria strings from the CVE `data`.
+            """
+            if "cpe" not in data:
+                return None
+            cpe_list = list({entry.criteria for entry in data.cpe})
+            return cpe_list or None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@collectors/nvd/collectors.py` around lines 68 - 80, The get_cpe_list helper
should be simplified and deduplicated: replace the manual loop and redundant
len(data.cpe) guard in get_cpe_list with a comprehension that extracts
entry.criteria for each entry in data.cpe, then deduplicate (e.g., via an
ordered set or dict.fromkeys) and return the deduped list or None if empty;
update references to data.cpe and entry.criteria accordingly so callers still
receive Optional[List[str]].
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@collectors/nvd/collectors.py`:
- Around line 68-80: The get_cpe_list helper should be simplified and
deduplicated: replace the manual loop and redundant len(data.cpe) guard in
get_cpe_list with a comprehension that extracts entry.criteria for each entry in
data.cpe, then deduplicate (e.g., via an ordered set or dict.fromkeys) and
return the deduped list or None if empty; update references to data.cpe and
entry.criteria accordingly so callers still receive Optional[List[str]].

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 5c96f460-e084-4870-8c77-bd1f571c6b8e

📥 Commits

Reviewing files that changed from the base of the PR and between a71cec7 and 7c1734f.

📒 Files selected for processing (1)
  • collectors/nvd/collectors.py

@Jincxz Jincxz self-requested a review April 23, 2026 21:29
Copy link
Copy Markdown
Contributor

@Jincxz Jincxz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments regarding the helper function.

There aren't tests directly for response2result but there are tests for the NVD collector at https://github.com/RedHatProductSecurity/osidb/blob/master/collectors/nvd/tests/test_collectors.py

You can add tests for the NVDQuerier, and response2result, there if you would like. Testing the collectors and the data being written to flaw(s) will have to come after the field is added.

Comment thread collectors/nvd/collectors.py Outdated
Return a list of CPEs from the CVE `data`
"""
cpe_list = []
if "cpe" in data and len(data.cpe) > 0:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be reduced to hasattr(data, "cpe") since we are checking if the field exists. An empty data.cpe will just mean that the following loop wouldn't run.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good call. Changed that in fa056a0

Comment thread collectors/nvd/collectors.py Outdated
Comment on lines +77 to +80
if len(cpe_list) > 0:
return cpe_list
else:
return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be fine with just returning the cpe_list as is (potentially an empty list). The new field will likely have a default of an empty list and a None value might produce an unexpected result when writing to it. Will make adjustments later if that is no longer the case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Updated in bb2d409

Comment thread collectors/nvd/collectors.py Outdated
Return a list of CPEs from the CVE `data`
"""
cpe_list = []
if "cpe" in data and len(data.cpe) > 0:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpe doesn't seem to be a valid top-level attribute per NVD's Vulnerability API schema, I would add a test or two as Jin suggested.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
collectors/nvd/tests/test_collectors.py (1)

239-273: ⚠️ Potential issue | 🔴 Critical

Test is broken in multiple critical ways and also breaks the next test.

This new test is non-functional, and its placement also breaks the test that previously consumed the parametrize decorator:

  1. Misplaced @pytest.mark.parametrize (lines 239–259): This decorator was originally attached to test_reset_flag_on_removal and parametrizes old_flag, new_flag. By inserting test_cpe_load between the decorator and test_reset_flag_on_removal, the parametrize now applies to test_cpe_load(self) (which doesn’t accept those params → pytest collection error) while test_reset_flag_on_removal loses its parametrization (its old_flag, new_flag parameters become unresolved fixtures and the test fails). Move the new test below test_reset_flag_on_removal, or above the parametrize block.
  2. assert False (line 273): unconditional failure — must be removed before merge.
  3. assert hasattr(flaw, "cpe") (line 272): the change in collectors/nvd/collectors.py only adds nvd_cpes to a transient dict in response2result; nothing in this PR persists CPEs onto the Flaw model, and per the codebase the only cpe field lives on PSModule, not Flaw. As written, this assertion will always fail. Either assert against the collector’s returned dict (e.g., nvdc.get_cve(cve_id) and inspect nvd_cpes), or defer the test until the storage field on Flaw from OSIDB-4923 lands.
  4. Missing @pytest.mark.enable_signals: every other VCR-backed collect-driven test in this class uses it; without it, behavior may diverge from the other integration tests.
🔧 Suggested restructuring
-    `@pytest.mark.parametrize`(
-        "old_flag,new_flag",
-        [
-            (
-                Flaw.FlawNistCvssValidation.APPROVED,
-                Flaw.FlawNistCvssValidation.APPROVED,
-            ),
-            (
-                Flaw.FlawNistCvssValidation.REJECTED,
-                Flaw.FlawNistCvssValidation.APPROVED,
-            ),
-            (
-                Flaw.FlawNistCvssValidation.REQUESTED,
-                Flaw.FlawNistCvssValidation.APPROVED,
-            ),
-            (
-                Flaw.FlawNistCvssValidation.NOVALUE,
-                Flaw.FlawNistCvssValidation.NOVALUE,
-            ),
-        ],
-    )
-    `@pytest.mark.vcr`
-    def test_cpe_load(self):
-        """
-        Test that CPE values are correctly loaded in the Flaw model.
-        """
-        cve_id = "CVE-2020-1234"
-        FlawFactory(cve_id=cve_id)
-
-        nvdc = NVDCollector()
-        nvdc.collect(cve_id)
-
-        flaw = Flaw.objects.get(cve_id=cve_id)
-        assert hasattr(flaw, "cpe")
-        assert False
-
-    def test_reset_flag_on_removal(self, old_flag, new_flag):
+    `@pytest.mark.vcr`
+    `@pytest.mark.enable_signals`
+    def test_cpe_load(self):
+        """
+        Test that CPE values are correctly extracted by the NVD querier.
+        """
+        cve_id = "CVE-2020-1234"
+
+        nvdc = NVDCollector()
+        result = nvdc.get_cve(cve_id)
+        assert result, "no result returned for CVE"
+        assert "nvd_cpes" in result[0]
+        assert isinstance(result[0]["nvd_cpes"], list)
+
+    `@pytest.mark.parametrize`(
+        "old_flag,new_flag",
+        [
+            (Flaw.FlawNistCvssValidation.APPROVED, Flaw.FlawNistCvssValidation.APPROVED),
+            (Flaw.FlawNistCvssValidation.REJECTED, Flaw.FlawNistCvssValidation.APPROVED),
+            (Flaw.FlawNistCvssValidation.REQUESTED, Flaw.FlawNistCvssValidation.APPROVED),
+            (Flaw.FlawNistCvssValidation.NOVALUE, Flaw.FlawNistCvssValidation.NOVALUE),
+        ],
+    )
+    def test_reset_flag_on_removal(self, old_flag, new_flag):

Re: the PR description — response2result is exercised indirectly by every nvdc.collect(...) test in this file (e.g., test_collect_cve, test_collect_batch); a focused unit test can be added by calling NVDQuerier().get_cve("CVE-2020-1234") against a recorded cassette and asserting on nvd_cpes directly, without needing the full collector pipeline.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@collectors/nvd/tests/test_collectors.py` around lines 239 - 273, The new
test_cpe_load breaks pytest collection and is incorrect: move test_cpe_load so
it does not sit between the pytest.mark.parametrize(...) block and
test_reset_flag_on_removal (or relocate the parametrize above/below the new
test) so the parametrize still applies to test_reset_flag_on_removal; remove the
stray assert False; replace the failing Flaw assertion (hasattr(flaw, "cpe"))
with an assertion against the collector result (call
NVDCollector().get_cve(cve_id) or use NVDQuerier().get_cve("CVE-2020-1234") and
assert on the returned nvd_cpes key); and add `@pytest.mark.enable_signals` and
`@pytest.mark.vcr` to the test to match other collect tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@collectors/nvd/tests/test_collectors.py`:
- Around line 239-273: The new test_cpe_load breaks pytest collection and is
incorrect: move test_cpe_load so it does not sit between the
pytest.mark.parametrize(...) block and test_reset_flag_on_removal (or relocate
the parametrize above/below the new test) so the parametrize still applies to
test_reset_flag_on_removal; remove the stray assert False; replace the failing
Flaw assertion (hasattr(flaw, "cpe")) with an assertion against the collector
result (call NVDCollector().get_cve(cve_id) or use
NVDQuerier().get_cve("CVE-2020-1234") and assert on the returned nvd_cpes key);
and add `@pytest.mark.enable_signals` and `@pytest.mark.vcr` to the test to match
other collect tests.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 3b8d98bd-d053-422a-a43c-c1790e0980bd

📥 Commits

Reviewing files that changed from the base of the PR and between 7c1734f and a36d034.

📒 Files selected for processing (2)
  • collectors/nvd/collectors.py
  • collectors/nvd/tests/test_collectors.py

@Elkasitu Elkasitu added the technical For PRs that introduce changes not worthy of a CHANGELOG entry label Apr 28, 2026
Comment on lines +239 to +240
@pytest.mark.vcr
def test_cpe_load(self):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@darakian
Copy link
Copy Markdown
Contributor Author

Closing this out as the work has been de-prioritized. NVDs reduced scope and unclear SLA make this less viable for our goals.

@darakian darakian closed this Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

technical For PRs that introduce changes not worthy of a CHANGELOG entry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants