Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
85aded6
Rebuild index with new config (#136)
hemant-endee Mar 25, 2026
bb2f3d6
using jthread with stop token
hemant-endee Apr 8, 2026
cbaa744
Shared parallel addPoint utility function — static chunk partition(sa…
hemant-endee Apr 9, 2026
cb9f73d
comments resolved
hemant-endee Apr 21, 2026
fd2c034
rebuild also handles execute_rebuild_job
hemant-endee Apr 23, 2026
3b32a2e
correction on cleantempfiles and error code
hemant-endee Apr 23, 2026
13fe6ce
rebuild status enum and logs code change
hemant-endee Apr 23, 2026
6f4083e
rebuild.cpp
hemant-endee Apr 23, 2026
2202923
Using Rebuild as friend class on Indexmanager helps to pass CacheEntr…
hemant-endee Apr 25, 2026
c2adbac
fix(rebuild): using new_alg directly, stop_request before phase3, an…
hemant-endee Apr 27, 2026
af444b6
test cases
hemant-endee Apr 27, 2026
37c72b5
Omnish/sync release note (#223)
omnish-endee Apr 23, 2026
012bd44
add restore backup asyn with backupOperation
itspunkaj Apr 23, 2026
c1584f0
fix: try catch error handling
itspunkaj Apr 23, 2026
cefe072
remove commented code
itspunkaj Apr 23, 2026
075589e
remove ActiveBackupStatus
itspunkaj Apr 23, 2026
b580bed
minor change
itspunkaj Apr 23, 2026
d55cd9f
docs: add async restore backup flow
itspunkaj Apr 23, 2026
f2d70e1
bump web ui versiont to 1.6.0-alpha.5
itspunkaj Apr 23, 2026
0627523
refactor: add size check for restore backup
itspunkaj Apr 24, 2026
1d525a6
refactor: streamline backup thread management in BackupStore
itspunkaj Apr 24, 2026
26bb171
docs: add attachBackupThread()
itspunkaj Apr 24, 2026
9749d47
refactor: add disk space check before restoring backup
itspunkaj Apr 24, 2026
b9f38b3
refactor: add backup.cpp
itspunkaj Apr 28, 2026
b69657e
refactor: update logs
itspunkaj Apr 28, 2026
868ed74
update backup flow
itspunkaj Apr 28, 2026
2769c64
refactor: add test cases for backup flow
itspunkaj Apr 28, 2026
d028cbc
update readme.md
itspunkaj Apr 28, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
197 changes: 197 additions & 0 deletions .github/workflows/sync_release_notes.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
name: Release Notes Syncing

on:
workflow_dispatch:
inputs:
tag_name:
description: 'Release tag to build (e.g. v1.0.0)'
required: true
type: string


jobs:

create-and-build:
runs-on: ubuntu-latest

strategy:
fail-fast: false
matrix:
arch:
- name: avx2
instance_type: c6i.large
binary_file_name: ndd-avx2
- name: avx512
instance_type: c6i.large
binary_file_name: ndd-avx2
- name: neon
instance_type: c6g.large
binary_file_name: ndd-neon
- name: sve2
instance_type: c7g.large
binary_file_name: ndd-neon

steps:

- name: Checkout PR commit
uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.tag_name }}

- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ vars.AWS_REGION }}

- name: Launch Endee Server
id: launch
shell: bash
run: |
ARCH_NAME="${{ matrix.arch.name }}"
INSTANCE_TYPE="${{ matrix.arch.instance_type }}"

if [[ "$ARCH_NAME" == "avx2" ]] || [[ "$ARCH_NAME" == "avx512" ]]; then
AMI_ID="${{ vars.AMI_ID }}"
else
AMI_ID="${{ vars.ARM_AMI_ID }}"
fi

ENDEE_INSTANCE_ID=$(aws ec2 run-instances \
--region ${{ vars.AWS_REGION }} \
--image-id "$AMI_ID" \
--instance-type "$INSTANCE_TYPE" \
--key-name ${{ secrets.ENDEE_PEM }} \
--security-group-ids ${{ secrets.VECTORDBBENCH_SERVER_GROUP_ID }} \
--subnet-id ${{ secrets.AWS_SUBNET_ID }} \
--block-device-mappings '[{"DeviceName":"/dev/sda1","Ebs":{"VolumeSize":30,"VolumeType":"gp3"}}]' \
--tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=$ARCH_NAME}]" \
--query 'Instances[0].InstanceId' \
--output text)

echo "InstanceID: $ENDEE_INSTANCE_ID"
echo "instance_id=$ENDEE_INSTANCE_ID" >> $GITHUB_OUTPUT

aws ec2 wait instance-running \
--instance-ids $ENDEE_INSTANCE_ID

IP=$(aws ec2 describe-instances \
--instance-ids $ENDEE_INSTANCE_ID \
--query 'Reservations[0].Instances[0].PublicIpAddress' \
--output text)

echo "IP: $IP"
echo "ip=$IP" >> $GITHUB_OUTPUT

- name: Write PEM file
run: |
mkdir -p "$HOME/.ssh"
echo "${{ secrets.ENDEE_SSH_PRIVATE_KEY }}" > "$HOME/.ssh/${{ secrets.ENDEE_PEM }}"
chmod 400 "$HOME/.ssh/${{ secrets.ENDEE_PEM }}"
echo "PEM file created"

- name: Wait for SSH to be ready
shell: bash
run: |
ENDEE_SSH_READY=false
ENDEE_IP="${{ steps.launch.outputs.ip }}"
ENDEE_PEM_FILE="$HOME/.ssh/${{ secrets.ENDEE_PEM }}"

for i in {1..20}; do
if ssh -i "$ENDEE_PEM_FILE" -o StrictHostKeyChecking=no -o ConnectTimeout=5 -o BatchMode=yes ubuntu@"$ENDEE_IP" "echo ok" 2>/dev/null; then
echo "SSH ready on ${{ matrix.arch.name }} @ $ENDEE_IP"
ENDEE_SSH_READY=true
break
fi
echo "Attempt $i/20 failed, retrying in 10 seconds..."
sleep 10
done

if [ "$ENDEE_SSH_READY" = false ]; then
echo "Failed to SSH to Endee Server"
exit 1
fi

- name: Build Endee Binary
run: |
ssh -o StrictHostKeyChecking=no -i "$HOME/.ssh/${{ secrets.ENDEE_PEM }}" ubuntu@${{ steps.launch.outputs.ip }} << 'EOF'
set -euo pipefail
sudo apt-get update -y
sudo apt-get install -y git build-essential
cd ~
git clone https://github.com/endee-io/endee.git
cd endee
ulimit -n 5000
chmod +x ./install.sh
ARCH="${{ matrix.arch.name }}"
if [[ "$ARCH" == "avx2" ]] || [[ "$ARCH" == "avx512" ]]; then
./install.sh --release --avx2
else
./install.sh --release --neon
fi
EOF

- name: Download binary
run: |
# verify path exists first
ssh -o StrictHostKeyChecking=no -i "$HOME/.ssh/${{ secrets.ENDEE_PEM }}" \
ubuntu@${{ steps.launch.outputs.ip }} \
"find /home/ubuntu -name '${{ matrix.arch.binary_file_name }}' 2>/dev/null"

scp -o StrictHostKeyChecking=no -i "$HOME/.ssh/${{ secrets.ENDEE_PEM }}" \
ubuntu@${{ steps.launch.outputs.ip }}:"/home/ubuntu/endee/build/${{ matrix.arch.binary_file_name }}" \
./ndd-${{ matrix.arch.name }}

- name: Upload binary as artifact
uses: actions/upload-artifact@v4
with:
name: ndd-${{ matrix.arch.name }}
path: ./ndd-${{ matrix.arch.name }}


- name: Terminate instance
if: always()
run: |
aws ec2 terminate-instances \
--instance-ids ${{ steps.launch.outputs.instance_id }}

# ← separate job, runs AFTER all 4 builds finish
push-binaries:
runs-on: ubuntu-latest
needs: create-and-build # waits for all 4 matrix jobs to complete

steps:
- name: Download all binaries
uses: actions/download-artifact@v4
with:
path: ./binaries # downloads all 4 artifacts here

- name: Push all binaries to ndd-repo
run: |
git clone https://x-access-token:${{ secrets.PAT }}@github.com/Endee-Pro/ndd-docker.git
cd ndd-docker

git checkout main
mkdir -p build

# copy all 4 binaries at once
cp ../binaries/ndd-avx2/ndd-avx2 ./build/ndd-avx2
cp ../binaries/ndd-avx512/ndd-avx512 ./build/ndd-avx512
cp ../binaries/ndd-neon/ndd-neon ./build/ndd-neon
cp ../binaries/ndd-sve2/ndd-sve2 ./build/ndd-sve2

# UPDATE TAG IN DOCKERFILE
sed -i 's/LABEL version=".*"/LABEL version="${{ github.event.inputs.tag_name }}"/' ./Dockerfile

git config user.email "actions@github.com"
git config user.name "GitHub Actions"

git add .

if git diff --staged --quiet; then
echo "No changes to commit"
else
git commit -m "Add binaries from release ${{ github.event.inputs.tag_name }}"
git push -u origin omnish/release-note-sync
fi
3 changes: 3 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,8 @@ message(STATUS "Binary name: ${NDD_BINARY_NAME}")
set(NDD_CORE_SOURCES
src/sparse/inverted_index.cpp
src/utils/system_sanity/system_sanity.cpp
src/core/rebuild.cpp
src/storage/backup_store.cpp
)

# Build non-main project sources separately so they can be compiled in parallel
Expand Down Expand Up @@ -288,6 +290,7 @@ target_include_directories(ndd_core PRIVATE
${ASIO_INCLUDE_DIR}
${OPENSSL_INCLUDE_DIR}
${CURL_INCLUDE_DIRS}
${LIBARCHIVE_INCLUDE_DIR}
)
target_include_directories(${NDD_BINARY_NAME} PRIVATE
${CMAKE_CURRENT_SOURCE_DIR}/src
Expand Down
53 changes: 35 additions & 18 deletions docs/backup-system.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,27 @@

`BackupStore` is a standalone utility class owned by `IndexManager` as a direct member (`BackupStore backup_store_`). It has no dependency on IndexManager — it handles tar operations, backup JSON, file paths, and active backup tracking. `IndexManager` orchestrates the backup flow (save, lock, metadata) and delegates file-level operations to `BackupStore`. All backup API calls go through `IndexManager` — `BackupStore` is not exposed to `main.cpp`.

Backups are stored as `.tar` archives in per-user directories: `{DATA_DIR}/backups/{username}/`. Temp files use a centralized `{DATA_DIR}/backups/.tmp/{username}/` directory. Active backup state is tracked in-memory with mutex protection (`backup_state_mutex_`).
Backups are stored as `.tar` archives in per-user directories: `{DATA_DIR}/backups/{username}/`. Temp files use a centralized `{DATA_DIR}/backups/.tmp/{username}/` directory. Active backup state is tracked in-memory with mutex protection (`active_user_backups_mutex_`).

## Architecture

```
IndexManager (ndd.hpp)
├── BackupStore backup_store_ (direct member)
├── 3 orchestration methods (inline, defined after class):
│ executeBackupJob, createBackupAsync, restoreBackup, uploadBackup
│ executeBackupJob, createBackupAsync, restoreBackupAsync, uploadBackup
├── 5 forwarding methods:
│ listBackups, deleteBackup, getActiveBackup, getBackupInfo, validateBackupName
└── Handles: saveIndexInternal, getIndexEntry, metadata_manager_, loadIndex

BackupStore (src/storage/backup_store.hpp — standalone, no IndexManager dependency)
├── Archive: createBackupTar(), extractBackupTar()
├── Helpers: getUserBackupDir(), getUserTempDir(), readBackupJson(), writeBackupJson(), cleanupTempDir()
├── Active backup: setActiveBackup(), clearActiveBackup(), hasActiveBackup(), getActiveBackup()
│ (all protected by backup_state_mutex_)
├── Active backup: setActiveBackup(), attachBackupThread(), clearActiveBackup(), hasActiveBackup(), getActiveBackup()
│ (all protected by active_user_backups_mutex_; tracks both Creation and Restoration operations)
│ setActiveBackup() registers the entry before the thread is spawned; attachBackupThread() moves the jthread in after
├── Public methods: validateBackupName(), listBackups(), deleteBackup(), getBackupInfo()
└── Owns: data_dir_, active_user_backups_, backup_state_mutex_ (mutable)
└── Owns: data_dir_, active_user_backups_, active_user_backups_mutex_ (mutable)
```

## API Endpoints
Expand All @@ -32,7 +33,7 @@ BackupStore (src/storage/backup_store.hpp — standalone, no IndexManager depend
| GET | `/api/v1/backups` | List all backup files |
| GET | `/api/v1/backups/active` | Check active backup for current user |
| GET | `/api/v1/backups/{name}/info` | Get backup metadata (read from .tar) |
| POST | `/api/v1/backups/{name}/restore` | Restore backup to new index |
| POST | `/api/v1/backups/{name}/restore` | Restore backup to new index (async, 202) |
| DELETE | `/api/v1/backups/{name}` | Delete a backup file |
| GET | `/api/v1/backups/{name}/download` | Download backup (streaming) |
| POST | `/api/v1/backups/upload` | Upload a backup file |
Expand All @@ -49,7 +50,7 @@ operation_mutex (mutex, per-index)
└── Write operations block until mutex is available
```

**Simple approach:** No atomic flags or file locks. The backup thread holds `operation_mutex` while saving and creating the tar. Write operations that arrive during backup simply block on the mutex until the backup releases it. One active backup per user is enforced via in-memory map protected by `backup_state_mutex_` for thread-safe access.
**Simple approach:** No atomic flags or file locks. The backup thread holds `operation_mutex` while saving and creating the tar. Write operations that arrive during backup simply block on the mutex until the backup releases it. One active operation per user is enforced via in-memory map protected by `active_user_backups_mutex_` — this covers both backup creation and restore operations, so a user cannot run a backup and a restore concurrently.

**Write path during backup:**

Expand All @@ -69,8 +70,8 @@ If backup holds the mutex, writes block until it completes. Normal write-vs-writ
```
POST /index/X/backup → validateBackupName() → check no duplicate .tar on disk
→ check active_user_backups_[username] empty (one per user)
→ insert into active_user_backups_ map
→ spawn detached thread → return 202 { backup_name }
setActiveBackup() — insert entry into active_user_backups_ map (no thread yet)
→ spawn jthread → attachBackupThread() — move jthread into map entry → return 202 { backup_name }
```

**Background thread** (`executeBackupJob`):
Expand All @@ -92,16 +93,31 @@ addVectors/deleteVectors/updateFilters/deleteByFilter/deleteIndex
(blocks if backup holds operation_mutex — resumes after backup completes)
```

### Restore Backup
### Restore Backup (Async)

```
POST /backups/{name}/restore
→ validate name → check tar exists → check target index does NOT exist
→ extract tar to backups/.tmp/{username}/ → read metadata.json → copy files to target dir
→ register in MetadataManager → cleanup temp dir → loadIndex()
→ 201 OK
POST /backups/{name}/restore → validate name → check backup exists in backup registry
→ check target index does NOT exist → check disk space (need 2x tar size)
→ check active_user_backups_[username] empty (one per user)
→ setActiveBackup() — insert entry into active_user_backups_ map (BackupOperation::Restoration, no thread yet)
→ spawn jthread → attachBackupThread() — move jthread into map entry → return 202 { backup_name, target_index, status: "in_progress" }
```

**Background thread** (`restoreBackup`):

```
→ extract tar to backups/.tmp/{username}/{backup_name}/
→ validate archive structure (expect exactly 1 directory)
→ read metadata.json → copy files to target dir → remove metadata.json from target
→ register in MetadataManager
→ [LOCK indices_mutex_] loadIndex() [UNLOCK]
→ cleanup temp dir → erase from active_user_backups_
```

**On failure**: cleanup temp dir → erase from active_user_backups_ → log error (not returned to client).

**Status polling**: client polls `GET /api/v1/backups/active` to check if restore is still in progress.

### Download (Streaming)

```
Expand Down Expand Up @@ -138,11 +154,12 @@ GET /backups/{name}/info

| # | Check | Where |
|---|-------|-------|
| 1 | **One backup per user** — `active_user_backups_` map rejects if user already has active backup | createBackupAsync |
| 1 | **One operation per user** — `active_user_backups_` map rejects if user already has an active backup or restore | createBackupAsync, restoreBackupAsync |
| 2 | **Write blocking** — writes block on `operation_mutex` until backup completes | addVectors, deleteVectors, updateFilters, deleteByFilter, deleteIndex |
| 3 | **Name validation** — alphanumeric, underscores, hyphens only; max 200 chars | validateBackupName |
| 4 | **Duplicate prevention** — checks if .tar file already exists on disk | createBackupAsync, upload |
| 5 | **Disk space** — requires 2x index size available | executeBackupJob |
| 5 | **Disk space (create)** — requires 2x index size available in backup dir | executeBackupJob |
| 5b | **Disk space (restore)** — requires 2x tar file size available in temp dir | restoreBackupAsync |
| 6 | **Atomic tar** — writes to `backups/.tmp/{username}/` first, then renames to final location | executeBackupJob |
| 7 | **Crash recovery** — on startup: `cleanupTempDir()` deletes entire `backups/.tmp/` directory | BackupStore constructor |
| 8 | **Restore safety** — target must not exist, metadata must be valid, cleanup on failure | restoreBackup |
| 8 | **Restore safety** — target must not exist, metadata must be valid; cleanup (temp dir + active status) on failure in background thread | restoreBackupAsync, restoreBackup |
1 change: 1 addition & 0 deletions docs/logs.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ The same overload shapes apply to `LOG_WARN` and `LOG_ERROR`.
- `1500s` metadata logs
- `1600s` vector storage logs
- `1700s` system sanity checks (CPU compatibility, disk, memory, ulimits)
- `1800s` rebuild subsystem logs
- `2000s` index manager logs
- `2100s` HNSW load/cache logs

Expand Down
Loading
Loading