perf: Defer black import to first use#229
Conversation
uv 0.10.x is current; the <0.10.0 constraint caused build warnings.
parse_azure_endpoint returned the raw URL including ?api-version=... which AsyncAzureOpenAI then mangled into invalid paths like ...?api-version=2024-06-01/openai/. Strip the query string before returning — api_version is already returned as a separate value and passed to the SDK independently.
black is only used in create_context_prompt() and format_code() -- both cold paths. Moving the import inside the functions avoids loading black and its transitive deps (pathspec, black.nodes, etc.) on every import typeagent.
gvanrossum
left a comment
There was a problem hiding this comment.
Should have one this a long time ago. :-(
|
Thanks, wasn't sure you wanted import time optimizations, if you do, I have a few more ready to go. |
|
This one is dear to my heart because I wish we could remove black from the mandatory dependencies. Happy to review more. |
|
why can't you remove black? out of curiousness |
I could, but it gives the nicest formatted expressions when doing hardcore debugging. What we could do is import it conditionally and just use repr() if black cannot be imported. Then we can remove it from the main deps. Developers have the dev tools installed (pyright, black, isort, pytest, etc.) and will see the black-formatted debug output. |
|
Went ahead and did this — #235 replaces |
**Stack: 3/4** — depends on #229. Merge #231, #229, then this PR. --- - Add `add_terms_batch` and `add_properties_batch` to `ITermToSemanticRefIndex` and `IPropertyToSemanticRefIndex` interfaces - SQLite backend uses `executemany` instead of individual `cursor.execute()` calls (~1000+ calls per indexing batch reduced to 2-3) - Restructure `add_metadata_to_index_from_list` and `add_to_property_index` to collect all data first (pure functions), then batch-insert - Memory backend implements batch methods as loops for interface compatibility ## Benchmark ### Azure Standard_D2s_v5 -- 2 vCPU, 8 GiB RAM, Python 3.13 #### Indexing Pipeline (pytest-async-benchmark pedantic, 20 rounds, 3 warmup) Only the hot path (`add_messages_with_indexing`) is timed -- DB creation, storage init, and teardown are excluded. | Benchmark | Before (min) | After (min) | Speedup | |:---|---:|---:|---:| | `add_messages_with_indexing` (200 msgs) | 28.8 ms | 25.0 ms | **1.16x** | | `add_messages_with_indexing` (50 msgs) | 7.8 ms | 6.7 ms | **1.16x** | | VTT ingest (40 msgs) | 6.9 ms | 6.1 ms | **1.14x** | Consistent ~14-16% improvement -- `executemany` amortizes per-call overhead. <details> <summary><b>Reproduce the benchmark locally</b></summary> Save the benchmark file below as `tests/benchmarks/test_benchmark_indexing.py`, then: ```bash pip install 'pytest-async-benchmark @ git+https://github.com/KRRT7/pytest-async-benchmark.git@feat/pedantic-mode' pytest-asyncio # Run on main git checkout main python -m pytest tests/benchmarks/test_benchmark_indexing.py -v -s # Run on this branch git checkout perf/batch-inserts python -m pytest tests/benchmarks/test_benchmark_indexing.py -v -s ``` </details> --- *Generated by codeflash optimization agent* --------- Co-authored-by: Bernhard Merkle <bernhard.merkle@gmail.com>
**Stack: 4/4** — depends on #230. Merge #231, #229, #230, then this PR. --- - Five call sites used `get_item()` per scored ref — one SELECT and full deserialization per match (N+1 pattern) - Added `get_metadata_multiple` to `ISemanticRefCollection` that fetches only `semref_id, range_json, knowledge_type` in a single batch query - Replaced the N+1 loop with one `get_metadata_multiple` call at each site - Further optimized scope-filtering: binary search in `contains_range`, inline tuple comparisons in `TextRange`, skip pydantic validation in `get_metadata_multiple` ### Call sites optimized 1. `lookup_term_filtered` — batch metadata, filter by knowledge_type/range 2. `lookup_property_in_property_index` — batch metadata, filter by range scope 3. `SemanticRefAccumulator.group_matches_by_type` — batch metadata, group by knowledge_type 4. `SemanticRefAccumulator.get_matches_in_scope` — batch metadata, filter by range scope 5. `get_scored_semantic_refs_from_ordinals_iter` — two-phase: metadata filter then batch fetch ### Additional optimizations - **Binary search in `TextRangeCollection.contains_range`**: replaced O(n) linear scan with `bisect_right` keyed on `start`, reducing scope-filtering from ~25ms to ~9ms - **Inline tuple comparisons in `TextRange`**: replaced `TextLocation` allocations in `__eq__`/`__lt__`/`__contains__` with a shared `_effective_end` returning tuples - **Skip pydantic validation in `get_metadata_multiple`**: construct `TextLocation`/`TextRange` directly from JSON instead of going through `__pydantic_validator__` ## Benchmark ### Azure Standard_D2s_v5 — 2 vCPU, 8 GiB RAM, Python 3.13 #### Query (pytest-async-benchmark pedantic, 200 rounds) 200 matches against a 200-message indexed SQLite transcript. Only the function under test is timed. | Function | Before (median) | After (median) | Speedup | |:---|---:|---:|---:| | `lookup_term_filtered` | 2.650 ms | 1.184 ms | **2.24x** | | `group_matches_by_type` | 2.428 ms | 978 μs | **2.48x** | | `get_scored_semantic_refs_from_ordinals_iter` | 2.541 ms | 2.946 ms | 0.86x | | `lookup_property_in_property_index` | 25.306 ms | 9.365 ms | **2.70x** | | `get_matches_in_scope` | 25.011 ms | 9.160 ms | **2.73x** | <details> <summary><b>Reproduce the benchmark locally</b></summary> ```bash pip install 'pytest-async-benchmark @ git+https://github.com/KRRT7/pytest-async-benchmark.git@feat/pedantic-mode' pytest-asyncio python -m pytest tests/benchmarks/test_benchmark_query.py -v -s ``` </details> --- *Generated by codeflash optimization agent* --------- Co-authored-by: Bernhard Merkle <bernhard.merkle@gmail.com>
Stack: 2/4 — depends on #231. Merge #231 first, then this PR.
import blackfrom module level to first use inanswers.pyandutils.pyblack(code formatter + transitive deps: pathspec, black.nodes, etc.) loaded on everyimport typeagentbut only used in two cold formatting pathsblack.format_str()is called in two places:create_context_prompt()inknowpro/answers.py— formats debug context for LLM promptsformat_code()inaitools/utils.py— developer pretty-print utilityNeither runs during normal library operation. Moving the import inside each function eliminates ~78ms of transitive module loading from the import chain.
Benchmark
Azure Standard_D2s_v5 — 2 vCPU, 8 GiB RAM, Python 3.13
Import Time (hyperfine, warmup 5, min-runs 30)
import typeagentOffline E2E Test Suite (hyperfine, warmup 2, min-runs 10)
Generated by codeflash optimization agent