Skip to content
Open
2 changes: 2 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,7 @@ message(STATUS "Binary name: ${NDD_BINARY_NAME}")
set(NDD_CORE_SOURCES
src/sparse/inverted_index.cpp
src/utils/system_sanity/system_sanity.cpp
src/core/rebuild.cpp
)

# Build non-main project sources separately so they can be compiled in parallel
Expand Down Expand Up @@ -288,6 +289,7 @@ target_include_directories(ndd_core PRIVATE
${ASIO_INCLUDE_DIR}
${OPENSSL_INCLUDE_DIR}
${CURL_INCLUDE_DIRS}
${LIBARCHIVE_INCLUDE_DIR}
)
target_include_directories(${NDD_BINARY_NAME} PRIVATE
${CMAKE_CURRENT_SOURCE_DIR}/src
Expand Down
1 change: 1 addition & 0 deletions docs/logs.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ The same overload shapes apply to `LOG_WARN` and `LOG_ERROR`.
- `1500s` metadata logs
- `1600s` vector storage logs
- `1700s` system sanity checks (CPU compatibility, disk, memory, ulimits)
- `1800s` rebuild subsystem logs
- `2000s` index manager logs
- `2100s` HNSW load/cache logs

Expand Down
124 changes: 124 additions & 0 deletions docs/rebuild.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# Index Rebuild
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things that are missing in the doc. make it more comprehensive

Disk space: rebuild needs 2× the index size on disk (timestamped + canonical co-exist briefly).

Memory: 2× the graph in RAM during Phase 2 (old + new).

Expected duration: rough order of magnitude per million vectors helps capacity planning.

What "manually restart" means after a crash — operationally how does an operator know a rebuild was incomplete? (There's no persisted state.)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added Capacity and Timing section covering disk space (2×), memory (2×), and expected duration.


Rebuild allows you to reconstruct an HNSW index graph with new configuration parameters (M, ef_construction) without re-uploading vector data. All vectors are re-indexed from MDBX storage — only the graph structure is rebuilt.

## API Endpoints

| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/api/v1/index/{name}/rebuild` | Start async rebuild |
| GET | `/api/v1/index/{name}/rebuild/status` | Check rebuild progress |

---

## Start Rebuild

**POST** `/api/v1/index/{name}/rebuild`

All parameters are optional. Omitted parameters retain their current values.

```json
{
"M": 32,
"ef_con": 256
}
```

**Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `M` | int | HNSW graph connectivity (4–512) |
| `ef_con` | int | Construction-time search quality (8–4096) |

**Response 202:**
```json
{
"status": "rebuilding",
"previous_config": { "M": 16, "ef_con": 128 },
"new_config": { "M": 32, "ef_con": 256 },
"total_vectors": 50000
}
```

**Errors:**

| Code | Condition |
|------|-----------|
| 400 | No changes specified, invalid parameters, or attempted to change `precision`/`space_type` |
| 404 | Index not found |
| 409 | Rebuild or backup already in progress for this user |

---

## Check Progress

**GET** `/api/v1/index/{name}/rebuild/status`

**Status values:**

| Status | Meaning |
|--------|---------|
| `idle` | No rebuild has run for this index (or querying a different index) |
| `in_progress` | Rebuild is currently running |
| `completed` | Rebuild finished successfully |
| `failed` | Rebuild failed (see `error` field) |

**In progress:**
```json
{
"status": "in_progress",
"vectors_processed": 45000,
"total_vectors": 100000,
"percent_complete": 45.0,
"started_at": "2026-03-25T10:30:00Z"
}
```

**Completed:**
```json
{
"status": "completed",
"vectors_processed": 100000,
"total_vectors": 100000,
"percent_complete": 100.0,
"started_at": "2026-03-25T10:30:00Z",
"completed_at": "2026-03-25T10:32:15Z"
}
```

**Failed:**
```json
{
"status": "failed",
"vectors_processed": 45000,
"total_vectors": 100000,
"percent_complete": 45.0,
"started_at": "2026-03-25T10:30:00Z",
"completed_at": "2026-03-25T10:31:05Z",
"error": "Out of memory"
}
```

Status is per-index. The `completed`/`failed` state persists until the next rebuild is started for that user.

---

## Restrictions

The following parameters **cannot** be changed via rebuild (returns 400):
- `precision` (quantization level)
- `space_type`


---

## Behavior

- **All vectors are re-indexed** from MDBX storage into a new HNSW graph with the updated configuration.
- **Search continues** during rebuild — queries use the old index until the rebuild completes.
- **Write operations** (insert, delete, update) will block and timeout while the rebuild is running, same as during backup.
- **One rebuild at a time per user** — cannot start a rebuild on any index while another rebuild is in progress for the same user. Also cannot run concurrently with a backup.
- **Periodic checkpoints** — the in-progress graph is saved to a temp file at regular intervals.
- **On completion**, the new graph replaces `default.idx`. All temporary and intermediate files are cleaned up.
- **On server restart** during an incomplete rebuild, the old index loads normally. Temp files are cleaned up automatically. The rebuild must be restarted manually.
Loading
Loading