OSIDB-4923 - Add helper function to pull cpe list#1253
OSIDB-4923 - Add helper function to pull cpe list#1253darakian wants to merge 6 commits intoRedHatProductSecurity:masterfrom
Conversation
Adds a helper function to extract nvd cpe data and adds it to the list of data returned Partial implementation for OSIDB-4923
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds extraction of CPE "criteria" values from NVD CVE records into per-CVE results under a new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
collectors/nvd/collectors.py (1)
68-80: Consider simplifying and deduplicating CPEs.Two optional improvements:
- The helper can be collapsed into a comprehension; the outer
len(data.cpe) > 0guard is redundant with the final emptiness check.- A single CVE can reference the same
criteriaacross multiple configurations/nodes, producing duplicates. Unless downstream consumers rely on multiplicity, deduping here avoids surprises once this is persisted per OSIDB-4923.♻️ Proposed simplification
- def get_cpe_list(data: CVE) -> Optional[List[str]]: - """ - Return a list of CPEs from the CVE `data` - """ - cpe_list = [] - if "cpe" in data and len(data.cpe) > 0: - for entry in data.cpe: - cpe_list.append(entry.criteria) - - if len(cpe_list) > 0: - return cpe_list - else: - return None + def get_cpe_list(data: CVE) -> Optional[List[str]]: + """ + Return a deduplicated list of CPE criteria strings from the CVE `data`. + """ + if "cpe" not in data: + return None + cpe_list = list({entry.criteria for entry in data.cpe}) + return cpe_list or None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@collectors/nvd/collectors.py` around lines 68 - 80, The get_cpe_list helper should be simplified and deduplicated: replace the manual loop and redundant len(data.cpe) guard in get_cpe_list with a comprehension that extracts entry.criteria for each entry in data.cpe, then deduplicate (e.g., via an ordered set or dict.fromkeys) and return the deduped list or None if empty; update references to data.cpe and entry.criteria accordingly so callers still receive Optional[List[str]].
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@collectors/nvd/collectors.py`:
- Around line 68-80: The get_cpe_list helper should be simplified and
deduplicated: replace the manual loop and redundant len(data.cpe) guard in
get_cpe_list with a comprehension that extracts entry.criteria for each entry in
data.cpe, then deduplicate (e.g., via an ordered set or dict.fromkeys) and
return the deduped list or None if empty; update references to data.cpe and
entry.criteria accordingly so callers still receive Optional[List[str]].
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 5c96f460-e084-4870-8c77-bd1f571c6b8e
📒 Files selected for processing (1)
collectors/nvd/collectors.py
There was a problem hiding this comment.
Some comments regarding the helper function.
There aren't tests directly for response2result but there are tests for the NVD collector at https://github.com/RedHatProductSecurity/osidb/blob/master/collectors/nvd/tests/test_collectors.py
You can add tests for the NVDQuerier, and response2result, there if you would like. Testing the collectors and the data being written to flaw(s) will have to come after the field is added.
| Return a list of CPEs from the CVE `data` | ||
| """ | ||
| cpe_list = [] | ||
| if "cpe" in data and len(data.cpe) > 0: |
There was a problem hiding this comment.
This can be reduced to hasattr(data, "cpe") since we are checking if the field exists. An empty data.cpe will just mean that the following loop wouldn't run.
| if len(cpe_list) > 0: | ||
| return cpe_list | ||
| else: | ||
| return None |
There was a problem hiding this comment.
I think it should be fine with just returning the cpe_list as is (potentially an empty list). The new field will likely have a default of an empty list and a None value might produce an unexpected result when writing to it. Will make adjustments later if that is no longer the case.
| Return a list of CPEs from the CVE `data` | ||
| """ | ||
| cpe_list = [] | ||
| if "cpe" in data and len(data.cpe) > 0: |
There was a problem hiding this comment.
cpe doesn't seem to be a valid top-level attribute per NVD's Vulnerability API schema, I would add a test or two as Jin suggested.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
collectors/nvd/tests/test_collectors.py (1)
239-273:⚠️ Potential issue | 🔴 CriticalTest is broken in multiple critical ways and also breaks the next test.
This new test is non-functional, and its placement also breaks the test that previously consumed the
parametrizedecorator:
- Misplaced
@pytest.mark.parametrize(lines 239–259): This decorator was originally attached totest_reset_flag_on_removaland parametrizesold_flag, new_flag. By insertingtest_cpe_loadbetween the decorator andtest_reset_flag_on_removal, the parametrize now applies totest_cpe_load(self)(which doesn’t accept those params → pytest collection error) whiletest_reset_flag_on_removalloses its parametrization (itsold_flag, new_flagparameters become unresolved fixtures and the test fails). Move the new test belowtest_reset_flag_on_removal, or above the parametrize block.assert False(line 273): unconditional failure — must be removed before merge.assert hasattr(flaw, "cpe")(line 272): the change incollectors/nvd/collectors.pyonly addsnvd_cpesto a transient dict inresponse2result; nothing in this PR persists CPEs onto theFlawmodel, and per the codebase the onlycpefield lives onPSModule, notFlaw. As written, this assertion will always fail. Either assert against the collector’s returned dict (e.g.,nvdc.get_cve(cve_id)and inspectnvd_cpes), or defer the test until the storage field onFlawfrom OSIDB-4923 lands.- Missing
@pytest.mark.enable_signals: every other VCR-backedcollect-driven test in this class uses it; without it, behavior may diverge from the other integration tests.🔧 Suggested restructuring
- `@pytest.mark.parametrize`( - "old_flag,new_flag", - [ - ( - Flaw.FlawNistCvssValidation.APPROVED, - Flaw.FlawNistCvssValidation.APPROVED, - ), - ( - Flaw.FlawNistCvssValidation.REJECTED, - Flaw.FlawNistCvssValidation.APPROVED, - ), - ( - Flaw.FlawNistCvssValidation.REQUESTED, - Flaw.FlawNistCvssValidation.APPROVED, - ), - ( - Flaw.FlawNistCvssValidation.NOVALUE, - Flaw.FlawNistCvssValidation.NOVALUE, - ), - ], - ) - `@pytest.mark.vcr` - def test_cpe_load(self): - """ - Test that CPE values are correctly loaded in the Flaw model. - """ - cve_id = "CVE-2020-1234" - FlawFactory(cve_id=cve_id) - - nvdc = NVDCollector() - nvdc.collect(cve_id) - - flaw = Flaw.objects.get(cve_id=cve_id) - assert hasattr(flaw, "cpe") - assert False - - def test_reset_flag_on_removal(self, old_flag, new_flag): + `@pytest.mark.vcr` + `@pytest.mark.enable_signals` + def test_cpe_load(self): + """ + Test that CPE values are correctly extracted by the NVD querier. + """ + cve_id = "CVE-2020-1234" + + nvdc = NVDCollector() + result = nvdc.get_cve(cve_id) + assert result, "no result returned for CVE" + assert "nvd_cpes" in result[0] + assert isinstance(result[0]["nvd_cpes"], list) + + `@pytest.mark.parametrize`( + "old_flag,new_flag", + [ + (Flaw.FlawNistCvssValidation.APPROVED, Flaw.FlawNistCvssValidation.APPROVED), + (Flaw.FlawNistCvssValidation.REJECTED, Flaw.FlawNistCvssValidation.APPROVED), + (Flaw.FlawNistCvssValidation.REQUESTED, Flaw.FlawNistCvssValidation.APPROVED), + (Flaw.FlawNistCvssValidation.NOVALUE, Flaw.FlawNistCvssValidation.NOVALUE), + ], + ) + def test_reset_flag_on_removal(self, old_flag, new_flag):Re: the PR description —
response2resultis exercised indirectly by everynvdc.collect(...)test in this file (e.g.,test_collect_cve,test_collect_batch); a focused unit test can be added by callingNVDQuerier().get_cve("CVE-2020-1234")against a recorded cassette and asserting onnvd_cpesdirectly, without needing the full collector pipeline.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@collectors/nvd/tests/test_collectors.py` around lines 239 - 273, The new test_cpe_load breaks pytest collection and is incorrect: move test_cpe_load so it does not sit between the pytest.mark.parametrize(...) block and test_reset_flag_on_removal (or relocate the parametrize above/below the new test) so the parametrize still applies to test_reset_flag_on_removal; remove the stray assert False; replace the failing Flaw assertion (hasattr(flaw, "cpe")) with an assertion against the collector result (call NVDCollector().get_cve(cve_id) or use NVDQuerier().get_cve("CVE-2020-1234") and assert on the returned nvd_cpes key); and add `@pytest.mark.enable_signals` and `@pytest.mark.vcr` to the test to match other collect tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@collectors/nvd/tests/test_collectors.py`:
- Around line 239-273: The new test_cpe_load breaks pytest collection and is
incorrect: move test_cpe_load so it does not sit between the
pytest.mark.parametrize(...) block and test_reset_flag_on_removal (or relocate
the parametrize above/below the new test) so the parametrize still applies to
test_reset_flag_on_removal; remove the stray assert False; replace the failing
Flaw assertion (hasattr(flaw, "cpe")) with an assertion against the collector
result (call NVDCollector().get_cve(cve_id) or use
NVDQuerier().get_cve("CVE-2020-1234") and assert on the returned nvd_cpes key);
and add `@pytest.mark.enable_signals` and `@pytest.mark.vcr` to the test to match
other collect tests.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 3b8d98bd-d053-422a-a43c-c1790e0980bd
📒 Files selected for processing (2)
collectors/nvd/collectors.pycollectors/nvd/tests/test_collectors.py
| @pytest.mark.vcr | ||
| def test_cpe_load(self): |
There was a problem hiding this comment.
This test won't work without the actual cassette recorded. https://github.com/RedHatProductSecurity/osidb/blob/master/docs/developer/DEVELOP.md#testing-external-apis
|
Closing this out as the work has been de-prioritized. NVDs reduced scope and unclear SLA make this less viable for our goals. |
Adds a helper function to extract nvd cpe data and adds it to the list of data returned Partial implementation for OSIDB-4923
I didn't see where the function
response2resultis being tested, so I didn't add a test, but happy to do so if that can be pointed out.This PR does not store the data elsewhere. The eventual location will be one of the new fields discussed in OSIDB-4923