feat: Add NL2Query quality test runner by languy · Pull Request #3116 · microsoft/vscode-cosmosdb

languy · 2026-06-03T06:36:59Z

Introduce a test runner that takes test spec, schema as input and produces a quality report md file as output.
Test runs the same function calls used in NL2Query for eatch test spec.
The report includes statistics on time measurement, token count (input and output) and grading scores.
The test spec specifies the purpose of the test, prompt and expected query and we use AI to grade the actual query against the expected result.
The runner is designed to be lightweight and generic (user provides test spec).
The runner needs to be executed manually at runtime by the user who must be signed in github, in order to use gh copilot.

Can be invoked with runNl2QueryQualityTest command. Command is only available in debug mode.
User selects LLM model to test
User selects LLM model that grades the output
User selects number of test runs

…A formatting

…ades

…test

github-actions · 2026-06-05T08:56:25Z

🎉 Build Summary

🔗 Source

Commit: f14fbde
Pull Request: #3116 feat: Add NL2Query quality test runner

📦 Package Information

Version: 0.35.1
Preview: True
VSIX File: vscode-cosmosdb-0.35.1.vsix
VSIX Size: 3.34 MB
Artifact: dev-languy-add-nl2query-quality-test-0.35.1-f14fbde.zip

🧪 Test Results

Unit Tests: ✅ success
Integration Tests: ✅ success

✅ Build Status

All checks completed successfully!

…ty tests

github-actions · 2026-06-10T12:56:54Z

🎭 E2E Tests (Playwright + VS Code)

Commit: 2202ab3
Pull Request: #3116 feat: Add NL2Query quality test runner

🧪 Result

E2E Tests: ✅ success

📥 Artifacts (run)

e2e-results-1 — 202 B
e2e-html-report-1 — 198.7 KB

Tip: the HTML report artifact contains a self-contained Playwright report.
Download the zip, extract, and open index.html — or run
npx playwright show-report <extracted-dir> for the interactive view.

github-actions · 2026-06-10T12:57:12Z

🔨 Build, Lint & Test

🔗 Source

Commit: 2202ab3
Pull Request: #3116 feat: Add NL2Query quality test runner

📦 Package Information

Version: 0.35.1
Preview: True
VSIX File: vscode-cosmosdb-0.35.1.vsix
VSIX Size: 3.35 MB
Artifact: dev-languy-add-nl2query-quality-test-0.35.1-2202ab3.zip

🧪 Test Results

Unit Tests: ✅ success
Integration Tests (extension host): ✅ success

📥 Artifacts (run)

dev-languy-add-nl2query-quality-test-0.35.1-2202ab3 — 3.35 MB

✅ Build Status

Build and local tests passed. See sibling comments below for E2E and NoSQL integration results.

Copilot

Pull request overview

Adds a manual NL2Query quality test runner to help evaluate generateQuery behavior across a user-provided test spec and schema, producing a Markdown report with timing/token/grade statistics. This fits as a dev-only workflow tool for validating NL2Query prompt/pipeline quality during development.

Changes:

Adds a dev-only VS Code command (cosmosDB.dev.runNl2QueryQualityTest) to run NL2Query quality tests and generate a Markdown report.
Adds documentation and gitignore scaffolding for the quality test suite under test/quality/nl2query/.
Wires command visibility via a dev-mode context key (cosmosDB.devMode) and package contributions.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
test/quality/nl2query/README.md	Adds instructions and structure for running the NL2Query quality suite.
test/quality/nl2query/.gitignore	Ignores generated reports under `results/`.
src/extension.ts	Registers a dev-only context + dynamically imports the quality test command in Development mode.
src/commands/nl2queryQualityTest.ts	Implements the interactive runner, batch grading, and Markdown report generation.
package.json	Contributes the new dev command and gates it in the Command Palette with `cosmosDB.devMode`.
package-lock.json	Lockfile update (platform metadata normalization).

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

languy added 3 commits June 2, 2026 16:14

Add NL2Query quality test logic

14cc7cd

Merge branch 'main' into dev/languy/add-nl2query-quality-test

86a9811

feat: add NL2Query quality test command and sample schemas

60aa1d9

languy changed the title ~~feat: NL2Query quality test~~ feat: Add NL2Query quality test runner Jun 3, 2026

github-advanced-security AI found potential problems Jun 3, 2026

View reviewed changes

Comment thread src/commands/nl2queryQualityTest.ts Dismissed

languy added 4 commits June 3, 2026 13:56

Merge branch 'main' into dev/languy/add-nl2query-quality-test

00ae962

feat: enhance NL2Query quality test report with total duration and ET…

d14fc54

…A formatting

feat: update NL2Query quality test report to include count of zero gr…

f95ddaf

…ades

feat: remove unused language reference loading from NL2Query quality …

73c3a67

…test

languy added 2 commits June 5, 2026 11:59

Merge branch 'main' into dev/languy/add-nl2query-quality-test

35bdd57

Merge branch 'main' into dev/languy/add-nl2query-quality-test

a5a880d

languy marked this pull request as ready for review June 5, 2026 15:21

languy requested a review from a team as a code owner June 5, 2026 15:21

languy marked this pull request as draft June 5, 2026 15:23

languy added 2 commits June 10, 2026 14:37

chore: remove unused sample schemas and test cases for NL2Query quali…

f14fbde

…ty tests

Merge branch 'main' into dev/languy/add-nl2query-quality-test

b549356

chore: remove unnecessary 'libc' entries from package-lock.json

c250495

languy marked this pull request as ready for review June 10, 2026 13:02

languy requested a review from Copilot June 11, 2026 06:17

Copilot started reviewing on behalf of languy June 11, 2026 06:17 View session

Copilot AI reviewed Jun 11, 2026

View reviewed changes

bk201- previously approved these changes Jun 11, 2026

View reviewed changes

Potential fix for pull request finding

7705c72

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

languy dismissed bk201-’s stale review via 7705c72 June 11, 2026 15:47

languy and others added 3 commits June 11, 2026 17:48

Potential fix for pull request finding

a54692d

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

5c79a2e

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

85d3b85

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

languy and others added 5 commits June 11, 2026 17:50

Potential fix for pull request finding

3ca9372

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

0bce48b

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

7e7d3f1

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

8aa0093

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Merge branch 'main' into dev/languy/add-nl2query-quality-test

a47e22e

github-code-quality Bot found potential problems Jun 11, 2026

View reviewed changes

Comment thread src/commands/nl2queryQualityTest.ts Fixed

Comment thread src/commands/nl2queryQualityTest.ts Fixed

languy and others added 2 commits June 11, 2026 18:37

Refactor NL2Query quality test for improved type safety and clarity

814ba06

Potential fix for pull request finding 'Unneeded defensive code'

2202ab3

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add NL2Query quality test runner#3116

feat: Add NL2Query quality test runner#3116
languy wants to merge 23 commits into
mainfrom
dev/languy/add-nl2query-quality-test

languy commented Jun 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

languy commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎉 Build Summary

🔗 Source

📦 Package Information

🧪 Test Results

✅ Build Status

Uh oh!

github-actions Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎭 E2E Tests (Playwright + VS Code)

🧪 Result

📥 Artifacts (run)

Uh oh!

github-actions Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔨 Build, Lint & Test

🔗 Source

📦 Package Information

🧪 Test Results

📥 Artifacts (run)

✅ Build Status

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

languy commented Jun 3, 2026 •

edited

Loading

github-actions Bot commented Jun 5, 2026 •

edited

Loading

github-actions Bot commented Jun 10, 2026 •

edited

Loading

github-actions Bot commented Jun 10, 2026 •

edited

Loading