perf: Defer black import to first use by KRRT7 · Pull Request #229 · microsoft/typeagent-py

KRRT7 · 2026-04-10T04:58:07Z

Stack: 2/4 — depends on #231. Merge #231 first, then this PR.

Defer import black from module level to first use in answers.py and utils.py
black (code formatter + transitive deps: pathspec, black.nodes, etc.) loaded on every import typeagent but only used in two cold formatting paths

black.format_str() is called in two places:

create_context_prompt() in knowpro/answers.py — formats debug context for LLM prompts
format_code() in aitools/utils.py — developer pretty-print utility

Neither runs during normal library operation. Moving the import inside each function eliminates ~78ms of transitive module loading from the import chain.

Benchmark

Azure Standard_D2s_v5 — 2 vCPU, 8 GiB RAM, Python 3.13

Import Time (hyperfine, warmup 5, min-runs 30)

Benchmark	Before	After	Speedup
`import typeagent`	791 ms ± 11 ms	713 ms ± 8 ms	1.11x

Offline E2E Test Suite (hyperfine, warmup 2, min-runs 10)

Benchmark	Before	After	Speedup
69 offline tests	5.72 s ± 90 ms	5.60 s ± 98 ms	1.02x

Generated by codeflash optimization agent

uv 0.10.x is current; the <0.10.0 constraint caused build warnings.

parse_azure_endpoint returned the raw URL including ?api-version=... which AsyncAzureOpenAI then mangled into invalid paths like ...?api-version=2024-06-01/openai/. Strip the query string before returning — api_version is already returned as a separate value and passed to the SDK independently.

black is only used in create_context_prompt() and format_code() -- both cold paths. Moving the import inside the functions avoids loading black and its transitive deps (pathspec, black.nodes, etc.) on every import typeagent.

gvanrossum

Should have one this a long time ago. :-(

KRRT7 · 2026-04-10T17:40:45Z

Thanks, wasn't sure you wanted import time optimizations, if you do, I have a few more ready to go.

gvanrossum · 2026-04-10T19:03:47Z

This one is dear to my heart because I wish we could remove black from the mandatory dependencies. Happy to review more.

KRRT7 · 2026-04-10T19:27:05Z

why can't you remove black? out of curiousness

gvanrossum · 2026-04-10T19:43:11Z

why can't you remove black? out of curiousness

I could, but it gives the nicest formatted expressions when doing hardcore debugging. What we could do is import it conditionally and just use repr() if black cannot be imported. Then we can remove it from the main deps. Developers have the dev tools installed (pyright, black, isort, pytest, etc.) and will see the black-formatted debug output.

KRRT7 · 2026-04-10T21:46:28Z

Went ahead and did this — #235 replaces black.format_str with pprint.pformat + ast.literal_eval in both call sites (stdlib only, no conditional imports). Moved black from dependencies to the dev dependency-group so make format/make check still work but library consumers don't pull it in.

**Stack: 3/4** — depends on #229. Merge #231, #229, then this PR. --- - Add `add_terms_batch` and `add_properties_batch` to `ITermToSemanticRefIndex` and `IPropertyToSemanticRefIndex` interfaces - SQLite backend uses `executemany` instead of individual `cursor.execute()` calls (~1000+ calls per indexing batch reduced to 2-3) - Restructure `add_metadata_to_index_from_list` and `add_to_property_index` to collect all data first (pure functions), then batch-insert - Memory backend implements batch methods as loops for interface compatibility ## Benchmark ### Azure Standard_D2s_v5 -- 2 vCPU, 8 GiB RAM, Python 3.13 #### Indexing Pipeline (pytest-async-benchmark pedantic, 20 rounds, 3 warmup) Only the hot path (`add_messages_with_indexing`) is timed -- DB creation, storage init, and teardown are excluded. | Benchmark | Before (min) | After (min) | Speedup | |:---|---:|---:|---:| | `add_messages_with_indexing` (200 msgs) | 28.8 ms | 25.0 ms | **1.16x** | | `add_messages_with_indexing` (50 msgs) | 7.8 ms | 6.7 ms | **1.16x** | | VTT ingest (40 msgs) | 6.9 ms | 6.1 ms | **1.14x** | Consistent ~14-16% improvement -- `executemany` amortizes per-call overhead. <details> <summary><b>Reproduce the benchmark locally</b></summary> Save the benchmark file below as `tests/benchmarks/test_benchmark_indexing.py`, then: ```bash pip install 'pytest-async-benchmark @ git+https://github.com/KRRT7/pytest-async-benchmark.git@feat/pedantic-mode' pytest-asyncio # Run on main git checkout main python -m pytest tests/benchmarks/test_benchmark_indexing.py -v -s # Run on this branch git checkout perf/batch-inserts python -m pytest tests/benchmarks/test_benchmark_indexing.py -v -s ``` </details> --- *Generated by codeflash optimization agent* --------- Co-authored-by: Bernhard Merkle <bernhard.merkle@gmail.com>

**Stack: 4/4** — depends on #230. Merge #231, #229, #230, then this PR. --- - Five call sites used `get_item()` per scored ref — one SELECT and full deserialization per match (N+1 pattern) - Added `get_metadata_multiple` to `ISemanticRefCollection` that fetches only `semref_id, range_json, knowledge_type` in a single batch query - Replaced the N+1 loop with one `get_metadata_multiple` call at each site - Further optimized scope-filtering: binary search in `contains_range`, inline tuple comparisons in `TextRange`, skip pydantic validation in `get_metadata_multiple` ### Call sites optimized 1. `lookup_term_filtered` — batch metadata, filter by knowledge_type/range 2. `lookup_property_in_property_index` — batch metadata, filter by range scope 3. `SemanticRefAccumulator.group_matches_by_type` — batch metadata, group by knowledge_type 4. `SemanticRefAccumulator.get_matches_in_scope` — batch metadata, filter by range scope 5. `get_scored_semantic_refs_from_ordinals_iter` — two-phase: metadata filter then batch fetch ### Additional optimizations - **Binary search in `TextRangeCollection.contains_range`**: replaced O(n) linear scan with `bisect_right` keyed on `start`, reducing scope-filtering from ~25ms to ~9ms - **Inline tuple comparisons in `TextRange`**: replaced `TextLocation` allocations in `__eq__`/`__lt__`/`__contains__` with a shared `_effective_end` returning tuples - **Skip pydantic validation in `get_metadata_multiple`**: construct `TextLocation`/`TextRange` directly from JSON instead of going through `__pydantic_validator__` ## Benchmark ### Azure Standard_D2s_v5 — 2 vCPU, 8 GiB RAM, Python 3.13 #### Query (pytest-async-benchmark pedantic, 200 rounds) 200 matches against a 200-message indexed SQLite transcript. Only the function under test is timed. | Function | Before (median) | After (median) | Speedup | |:---|---:|---:|---:| | `lookup_term_filtered` | 2.650 ms | 1.184 ms | **2.24x** | | `group_matches_by_type` | 2.428 ms | 978 μs | **2.48x** | | `get_scored_semantic_refs_from_ordinals_iter` | 2.541 ms | 2.946 ms | 0.86x | | `lookup_property_in_property_index` | 25.306 ms | 9.365 ms | **2.70x** | | `get_matches_in_scope` | 25.011 ms | 9.160 ms | **2.73x** | <details> <summary><b>Reproduce the benchmark locally</b></summary> ```bash pip install 'pytest-async-benchmark @ git+https://github.com/KRRT7/pytest-async-benchmark.git@feat/pedantic-mode' pytest-asyncio python -m pytest tests/benchmarks/test_benchmark_query.py -v -s ``` </details> --- *Generated by codeflash optimization agent* --------- Co-authored-by: Bernhard Merkle <bernhard.merkle@gmail.com>

KRRT7 added 3 commits April 9, 2026 22:45

Bump uv_build upper bound to <0.11.0

9a3faeb

uv 0.10.x is current; the <0.10.0 constraint caused build warnings.

Defer black import to first use

bcf75b4

black is only used in create_context_prompt() and format_code() -- both cold paths. Moving the import inside the functions avoids loading black and its transitive deps (pathspec, black.nodes, etc.) on every import typeagent.

KRRT7 force-pushed the perf/defer-black branch from 36ae570 to bcf75b4 Compare April 10, 2026 10:21

This was referenced Apr 10, 2026

Fix parse_azure_endpoint passing query string to AsyncAzureOpenAI #231

Merged

perf: Batch SQLite INSERTs for indexing pipeline #230

Merged

perf: Batch metadata query to avoid N+1 across 5 call sites #232

Merged

Merge branch 'main' into perf/defer-black

3d1331d

gvanrossum approved these changes Apr 10, 2026

View reviewed changes

gvanrossum merged commit 63701f0 into microsoft:main Apr 10, 2026
15 checks passed

KRRT7 mentioned this pull request Apr 10, 2026

perf: Replace black with stdlib pprint for runtime formatting #235

Merged

KRRT7 mentioned this pull request Apr 11, 2026

perf: Defer query-time imports in conversation_base #236

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Defer black import to first use#229

perf: Defer black import to first use#229
gvanrossum merged 4 commits intomicrosoft:mainfrom
KRRT7:perf/defer-black

KRRT7 commented Apr 10, 2026 •

edited

Loading

Uh oh!

gvanrossum left a comment

Uh oh!

Uh oh!

KRRT7 commented Apr 10, 2026

Uh oh!

gvanrossum commented Apr 10, 2026

Uh oh!

KRRT7 commented Apr 10, 2026

Uh oh!

gvanrossum commented Apr 10, 2026

Uh oh!

KRRT7 commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KRRT7 commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark

Azure Standard_D2s_v5 — 2 vCPU, 8 GiB RAM, Python 3.13

Import Time (hyperfine, warmup 5, min-runs 30)

Offline E2E Test Suite (hyperfine, warmup 2, min-runs 10)

Uh oh!

gvanrossum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

KRRT7 commented Apr 10, 2026

Uh oh!

gvanrossum commented Apr 10, 2026

Uh oh!

KRRT7 commented Apr 10, 2026

Uh oh!

gvanrossum commented Apr 10, 2026

Uh oh!

KRRT7 commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KRRT7 commented Apr 10, 2026 •

edited

Loading