Skip to content

feat(script): new script to fill positive_expected_result and fixing minibug on similarityID generation in scan commands#8011

Open
cx-ricardo-jesus wants to merge 22 commits intomasterfrom
AST-137381--create-new-script-to-write-positive_expected_result-file
Open

feat(script): new script to fill positive_expected_result and fixing minibug on similarityID generation in scan commands#8011
cx-ricardo-jesus wants to merge 22 commits intomasterfrom
AST-137381--create-new-script-to-write-positive_expected_result-file

Conversation

@cx-ricardo-jesus
Copy link
Copy Markdown
Contributor

@cx-ricardo-jesus cx-ricardo-jesus commented Mar 24, 2026

Reason for Proposed Changes

  • Currently, we are in the process of adding more fields to the positive_expected_result.json of each query and automating the process of manually putting the results of a KICS scan in the positive_expected_result.json file. This script now does that automatically. But the content this script writes to that file should be reviewed before committing the changes it makes.

Proposed Changes

  • Implemented a new script called positive_expected_result.json that does the work described above.
  • The first thing the script generate_positive_expected_result.py does is cal parse_args(), which sets up a CLI with two mutually exclusive modes:
    • --run-all: Scan every query found under assets/queries/.
    • --queryID + --queryPath. Scan a single specific query.
  • If --run-all is passed, iter_queries() is called, which walks the entire assets/queries directory tree. For every subdirectory that contains a metadata.json(a directory that contains a query, basically), it reads the "id" field and yields (query_id, query_path). The main goal of this is to give every query present in this repo. For each (query_id, query_path) pair, the function run_query_scans(query_id, query_path).
  • If the arguments queryID and queryPath are passed in the command line, the function run_query_scans(args.queryID, args.queryPath) is called directly for that one query.
  • The run_query_scans(query_id, query_path), has the job to discover all positive test files for the given query, run the appropriate KICS scans with the flags --experimental-queries, --bom, --enable_openapi-refs, --kcs_compute_new_simid, and then write the positive_expected_result.json output(s).
    • Step 1: The first step is to discover the test files, and for that, it calls find_positive_tests(query_path), which looks inside <query_path>/test/. For every entry that starts with positive, it handles two layouts:

      • File layout (test/positiveX.<ext>): If the entry is a regular file (e.g., positive1.tf, positive2.yaml), it creates a PositiveTest object imported from models.py with:
        • label: positive<N>_<ext> (e.g., "positive1_tf")
        • scan_path: the path of the test file.
        • group: test - meaning results go to test/positive_expected_result.json.
      • Directory layout (test/positiveX/): if the entry is a subdirectory (e.g, positive2/), it iterates the files inside. For each file (e.g., positive2_1.tf):
        • label: same as above (e.g., positive2_1_tf)
        • scan_path: the path to the file inside the test subdirectory.
        • group: test/positive2 - meaning results go to test/positive2/positive_expected_result.json.
      • The extension is always included in the label so that files with the same base name but different extensions produce distinct labels and result files. All discovered tests are returned and sorted by their label using natural sort (so that positive2 comes before positive10).
    • Step 2: Set up temporary directories: A single TemporaryDirectory is created for the entire query run, with the payloads/ and results/ subdirectories where KICS writes its payload files and JSON scan result files, respectively.

    • Step 3: Choose scan strategy based on test layout:

      • After discovering the tests, run_query_scans checks if the query test directory has any subdirectory-based tests.

      • If no subdirectory tests are found (all test files in test/), the function runs two levels of scans:

        • Directory scan: Calls run_directory_scan(query_id, all_paths, ...) with every positive file at once.
        • Individual file scans (skipped for passwords_and_secrets): the function run_scan is called for each positive file separately.
        • Both approaches are used because, after several iterations of approaches in the script, in some situations, queries failed on unit tests because if the tests directory was scanned with all the test samples at once, it produces different results, and the scan command runs for each test file individually.
      • If the query has subdirectory tests: This handles queries that have both loose files (e.g., positive1.tf ) and subdirectory files (e.g., positive2/positive2_1.tf).:

        • Directory scan for loose files (if any): the same as mentioned above, but only for loose files (files that are not inside a test subdirectory).
        • For the subdirectory test files, all files inside that subdirectory are scanned together via run_directory_scan.
      • To run scans for all the test directory files at once, or inside a subdirectory with test files, the run_directory_scan function is used. The run_directory_scan, as mentioned before, runs a KICS scan command that targets all the files inside a test directory once. This is done inside a mirrored temporary directory under assets/queries in order for similarityIDs to match. passed to this function are assumed to share the same parent directory (always test/ or test//). It takes the parent of the first file as src_dir, then computes its path relative to assets/queries/ to know where to mirror it inside the temp directory. It takes the parent of the first file as src_dir, then computes its path relative to assets/queries/ to know where to mirror it inside the temp directory. After that, it iterates every positive file in the list and copies each one into the mirrored temp directory, preserving only the filename(not the full path). After that, all the positive files sit together inside target_dir, exactly as they do inside assets/queries/.../test. After that, there is another for loop whose objective is to copy every single file that does not start with positive or negative, that are auxiliary files such as certificates or others that the tests depend on. After this loop, the KICS CLI command is built with the temporary directory tmp_dir as the scan root and printed to stdout for traceability. After the CLI command is generated, the command runs as a subprocess. If the KICS scan exits with a code that is not in KICS_RESULTS_CODES, it prints an error.

      • To run scans for a single positive test file, the run_scan function is used. In this function, firstly, the path of the file relative to assets/queries is computed and stored in the rel_to_queries variable. For example, if scan_path is .../assets/queries/terraform/aws/s3/test/positive1.tf, then rel_to_queries becomes terraform/aws/s3/test/positive1.tf. Same as above, this relative path is what will be replicated inside the temp directory, so that the KICS engine computes the same similarityID as the unit tests do. After that as above, all the auxiliary files are copied using the _copy_auxiliary_files a after that is runs a KICS Scan command using the helper function _run_kics as above.

    • Step 3: After all scans complete, the function collect_and_write_expected_results(query_path, results_dir, label_to_group) aggregates results and writes the final output files.

      • Firstly, it reads all result JSON files in results_dir. For each file, it looks up the label (filename without extension) in label_to_group to determine which group it belongs to (test or test/<dir>). It reads the data present inside queries and bill_of_materials (combined into all_findings variable), and for each finding extracts every file entry, converting it into an ExpectedResultEntry (defined in models.py).
      • On older versions of the script developed, the unit tests failed for the Passwords and Secrets query so, to fix this problem, the fix_secrets_query_names function was created. This function reads regex_rules.json, identifies which rule IDs appear more than once, compiles the regex pattern of each affected rule, and then re-matches each affected finding against those patterns using the actual line content from the source file. Once the correct rule is identified, , the entry's queryName is updated accordingly. This correction step ensures that the positive_expected_result.json for passwords_and_secrets reflects the true rule that triggered each finding, thereby preventing any errors in the unit tests.
      • After fixing passwords_and_secrets query names, within each group, entries are deduplicated using all fields from FIELD_ORDER variable, using a set of tuples to remove exact duplicates that can arise when the same finding appears in both the directory scan and individual scan results.
      • After this deduplication process, if subdirectory results exist but no loose file results were produced (edge case where test/ has only subdirectory positives), an empty test group is added. This ensures the unit tests always find a test/positive_expected_result.json to read.
      • After that, each group's entries are sorted using sort_key() from ExpectedResultEntry, which mirrors the order of vulnerabilityCompare Go function in test/queries_test.go. This ensures the written file's order matches exactly what the unit test produces when it sorts its actual findings, so comparisons are deterministic.
      • Finally, each group writes its entries as a JSON array to <query-path>/<group>/positive_expected_result.json.

  • Also added into the function getFilesMetadatasWithContent inside test/main_test.go to have the respective SubDocumentIndex value for each file, which is used in multidocs files in .yaml samples for some queries, mirroring the same logic used for the results produced by the unit tests inside the (*Service).sink() function in pkg/kics/sink.go file. This fixes the cases when there are samples tipically on .yaml formats that have multiple documents inside, producing different similarityIDs for CLI KICS scan commands and the results produced by the unit tests.

  • Added coverage for CNI files, which was already implemented before, in this PR, but possibly removed by mistake in this merge commit with the master branch. This enables KICS to detect CNI files.

  • Also changed the documentation, with information about the script and how to run it.
image image image image

I submit this contribution under the Apache-2.0 license.

@cx-ricardo-jesus cx-ricardo-jesus requested a review from a team as a code owner March 24, 2026 15:57
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 24, 2026

kics-logo

KICS version: v2.1.20

Category Results
CRITICAL CRITICAL 0
HIGH HIGH 0
MEDIUM MEDIUM 0
LOW LOW 0
INFO INFO 0
TRACE TRACE 0
TOTAL TOTAL 0
Metric Values
Files scanned placeholder 1
Files parsed placeholder 1
Files failed to scan placeholder 0
Total executed queries placeholder 47
Queries failed to execute placeholder 0
Execution time placeholder 0

@cx-ricardo-jesus cx-ricardo-jesus marked this pull request as draft March 30, 2026 12:15
@cx-ricardo-jesus cx-ricardo-jesus marked this pull request as ready for review March 30, 2026 14:05
@cx-ricardo-jesus cx-ricardo-jesus marked this pull request as draft March 30, 2026 14:06
@cx-ricardo-jesus cx-ricardo-jesus force-pushed the AST-137381--create-new-script-to-write-positive_expected_result-file branch from db10620 to 2a9202a Compare April 21, 2026 09:29
@cx-ricardo-jesus cx-ricardo-jesus changed the title feat(script): new script to fill positive_expected_result feat(script): new script to fill positive_expected_result and fixing similarityID generation in scan commands Apr 22, 2026
@cx-ricardo-jesus cx-ricardo-jesus changed the title feat(script): new script to fill positive_expected_result and fixing similarityID generation in scan commands feat(script): new script to fill positive_expected_result and fixing minibug on similarityID generation in scan commands Apr 22, 2026
@cx-ricardo-jesus cx-ricardo-jesus marked this pull request as ready for review April 22, 2026 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant