-
Notifications
You must be signed in to change notification settings - Fork 90
LCORE-2499: prevent HF from downloading #1854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -114,6 +114,20 @@ jobs: | |||||
| echo "=== lightspeed-stack.yaml ===" | ||||||
| grep -A 3 "llama_stack:" lightspeed-stack.yaml | ||||||
|
|
||||||
| - name: Cache HuggingFace embedding model | ||||||
| uses: actions/cache@v4 | ||||||
| with: | ||||||
| path: /tmp/hf-cache | ||||||
| key: hf-sentence-transformers-all-mpnet-base-v2 | ||||||
|
Comment on lines
+117
to
+121
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add versioning to cache key to enable invalidation. The cache key 📦 Suggested improvement - name: Cache HuggingFace embedding model
uses: actions/cache@v4
with:
path: /tmp/hf-cache
- key: hf-sentence-transformers-all-mpnet-base-v2
+ key: hf-sentence-transformers-all-mpnet-base-v2-${{ hashFiles('**/pyproject.toml') }}
+ restore-keys: |
+ hf-sentence-transformers-all-mpnet-base-v2-Include a hash of dependency lockfiles or a date component (e.g., 🧰 Tools🪛 zizmor (1.25.2)[error] 118-118: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy) (unpinned-uses) 🤖 Prompt for AI Agents |
||||||
|
|
||||||
| - name: Pre-download HuggingFace embedding model | ||||||
| env: | ||||||
| HF_HOME: /tmp/hf-cache | ||||||
| run: | | ||||||
| pip install -q sentence-transformers | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pin sentence-transformers version for reproducibility. Installing 📌 Suggested fix- pip install -q sentence-transformers
+ pip install -q sentence-transformers==3.3.1Pin to the version currently in use (check 📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||
| python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/all-mpnet-base-v2')" | ||||||
| echo "HF_CACHE_PATH=/tmp/hf-cache" >> $GITHUB_ENV | ||||||
|
|
||||||
| - name: Docker Login for quay access | ||||||
| if: matrix.mode == 'server' | ||||||
| env: | ||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -21,6 +21,7 @@ services: | |
| - llama-storage:/opt/app-root/src/.llama/storage | ||
| - ./tests/e2e/rag:/opt/app-root/src/.llama/storage/rag:z | ||
| - mock-tls-certs:/certs:ro | ||
| - ${HF_CACHE_PATH:-./tmp/.hf-cache}:/opt/app-root/src/.cache/huggingface | ||
| environment: | ||
| - BRAVE_SEARCH_API_KEY=${BRAVE_SEARCH_API_KEY:-} | ||
| - TAVILY_SEARCH_API_KEY=${TAVILY_SEARCH_API_KEY:-} | ||
|
|
@@ -57,6 +58,8 @@ services: | |
| - LLAMA_STACK_LOGGING=${LLAMA_STACK_LOGGING:-} | ||
| # FAISS test | ||
| - FAISS_VECTOR_STORE_ID=${FAISS_VECTOR_STORE_ID:-} | ||
| # Prevent HuggingFace Hub update checks (HTTP 429 rate-limiting in CI from parallel jobs). | ||
| - HF_HUB_OFFLINE=1 | ||
|
Comment on lines
+61
to
+62
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Check for documentation on local vs CI compose usage
rg -i -C3 'docker.compose.*local|local.*docker|development.*setup' --type=md README.md CONTRIBUTING.md docs/
# Check if there's a separate compose override file for local development
fd 'docker-compose.*override|docker-compose.*local' --type=fRepository: lightspeed-core/lightspeed-stack Length of output: 5038 🏁 Script executed: #!/bin/bash
set -euo pipefail
# Locate docker-compose.yaml
echo "== docker-compose.yaml candidates =="
fd -i 'docker-compose\.ya?ml' . || true
# Show the referenced lines (if file at repo root; otherwise find the closest match and print the requested vicinity)
echo
echo "== docker-compose.yaml excerpt around lines 55-75 =="
for f in $(fd -i 'docker-compose\.ya?ml' . -x echo 2>/dev/null | tr '\n' ' '); do
echo "--- $f ---"
nl -ba "$f" | sed -n '55,80p' || true
done
echo
echo "== occurrences of HF_HUB_OFFLINE / HF_CACHE_PATH =="
rg -n --hidden --no-ignore -S 'HF_HUB_OFFLINE|HF_CACHE_PATH' . || true
echo
echo "== occurrences of huggingface offline/cache/pre-download =="
rg -n --hidden --no-ignore -S 'hugging.?face|hf-cache|hf cache|pre-?download' . || trueRepository: lightspeed-core/lightspeed-stack Length of output: 11390 🏁 Script executed: #!/bin/bash
set -euo pipefail
mapfile -t files < <(fd -i 'docker-compose\.ya?ml' .)
echo "== docker-compose.yaml candidates =="
printf '%s\n' "${files[@]:-<none>}"
echo
echo "== docker-compose.yaml excerpt around lines 55-80 (each match) =="
if [ "${`#files`[@]}" -eq 0 ]; then
exit 0
fi
for f in "${files[@]}"; do
echo "--- $f ---"
nl -ba "$f" | sed -n '55,80p'
echo
done
echo "== occurrences of HF_HUB_OFFLINE / HF_CACHE_PATH =="
rg -n --hidden --no-ignore -S 'HF_HUB_OFFLINE|HF_CACHE_PATH' . || true
echo
echo "== occurrences of huggingface offline/cache/pre-download =="
rg -n --hidden --no-ignore -S 'hugging.?face|hf-cache|HF_HOME|HF_DATASETS_CACHE|pre-?download' . || trueRepository: lightspeed-core/lightspeed-stack Length of output: 133 🏁 Script executed: #!/bin/bash
set -euo pipefail
mapfile -t files < <(fd -i 'docker-compose\.ya?ml' .)
echo "== docker-compose.yaml candidates =="
printf '%s\n' "${files[@]:-<none>}"
echo
echo "== docker-compose.yaml excerpt around lines 55-80 (each match) =="
if [ "${`#files`[@]}" -eq 0 ]; then
exit 0
fi
for f in "${files[@]}"; do
echo "--- $f ---"
nl -ba "$f" | sed -n '55,80p'
echo
done
echo "== occurrences of HF_HUB_OFFLINE / HF_CACHE_PATH =="
rg -n --hidden --no-ignore -S 'HF_HUB_OFFLINE|HF_CACHE_PATH' . || true
echo
echo "== occurrences of huggingface offline/cache/pre-download =="
rg -n --hidden --no-ignore -S 'hugging.?face|hf-cache|HF_HOME|HF_DATASETS_CACHE|pre-?download' . || trueRepository: lightspeed-core/lightspeed-stack Length of output: 133 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "== docker-compose.yaml lines 50-80 =="
sed -n '50,80p' docker-compose.yaml
echo
echo "== docker-compose-library.yaml lines 50-80 =="
sed -n '50,80p' docker-compose-library.yaml
echo
echo "== Search: HF_HUB_OFFLINE / HF_CACHE_PATH guidance in docs README workflows =="
rg -n --hidden --no-ignore -S 'HF_HUB_OFFLINE|HF_CACHE_PATH|hf-cache|HF_HOME' README.md CONTRIBUTING.md docs .github/workflows || true
echo
echo "== Show e2e_tests workflow around HF cache pre-download lines =="
# Tight context around the lines indicated earlier
rg -n 'Cache HuggingFace embedding model|Pre-download HuggingFace embedding model|HF_CACHE_PATH=/tmp/hf-cache' .github/workflows/e2e_tests.yaml -n
awk 'NR>=90 && NR<=160 {print NR":"$0}' .github/workflows/e2e_tests.yamlRepository: lightspeed-core/lightspeed-stack Length of output: 6528 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "== docker-compose override/local candidates =="
fd -i "docker-compose.*override.*\.ya?ml|docker-compose.*local.*\.ya?ml|docker-compose\.override\.ya?ml" .
echo
echo "== docker-compose_library.yaml override/local candidates =="
fd -i "docker-compose-library.*override.*\.ya?ml|docker-compose-library.*local.*\.ya?ml" .Repository: lightspeed-core/lightspeed-stack Length of output: 184 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "== Find Containerfile/Dockerfile references to .cache/huggingface =="
rg -n --hidden --no-ignore -S '\.cache/huggingface|HF_HOME|HF_HUB_OFFLINE|sentence-transformers' deploy .github/workflows || true
echo
echo "== Show relevant parts of deploy/lightspeed-stack/Containerfile =="
if [ -f deploy/lightspeed-stack/Containerfile ]; then
nl -ba deploy/lightspeed-stack/Containerfile | sed -n '90,170p'
fi
echo
echo "== Show relevant parts of deploy/llama-stack/test.containerfile (if present) =="
if [ -f deploy/llama-stack/test.containerfile ]; then
nl -ba deploy/llama-stack/test.containerfile | sed -n '1,120p'
fiRepository: lightspeed-core/lightspeed-stack Length of output: 894 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "== deploy/lightspeed-stack/Containerfile (around .cache/huggingface) =="
sed -n '105,150p' deploy/lightspeed-stack/Containerfile
echo
echo "== deploy/llama-stack/test.containerfile (around line ~36) =="
sed -n '1,80p' deploy/llama-stack/test.containerfileRepository: lightspeed-core/lightspeed-stack Length of output: 3857 🏁 Script executed: #!/bin/bash
set -euo pipefail
rg -n --hidden --no-ignore -S 'HF_HUB_OFFLINE' README.md CONTRIBUTING.md docs .github/workflows deploy || true
echo
rg -n --hidden --no-ignore -S 'hf-cache|HF_CACHE_PATH|pre-download HuggingFace|sentence-transformers/all-mpnet-base-v2' README.md docs .github/workflows || trueRepository: lightspeed-core/lightspeed-stack Length of output: 2310 Hardcoded
Local 💡 Recommended fixesOption 1: Conditional offline mode via environment variable (preferred) # Prevent HuggingFace Hub update checks (HTTP 429 rate-limiting in CI from parallel jobs).
- - HF_HUB_OFFLINE=1
+ - HF_HUB_OFFLINE=${HF_HUB_OFFLINE:-0}Set Option 2: Local pre-download step + docs Add a local setup script to pre-download the embedding model into 🤖 Prompt for AI Agents |
||
| # OKP/Solr RAG | ||
| - RH_SERVER_OKP=${RH_SERVER_OKP:-} | ||
| - SOLR_URL=${SOLR_URL:-} | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pin action to commit SHA for supply-chain security.
The
actions/cache@v4reference is not pinned to a specific commit SHA. GitHub Actions security best practices require pinning to immutable commit hashes to prevent supply-chain attacks.🔒 Recommended fix
Use
actions/cache@3624ceb22c1c5a301c8db4169662070a689d9ea8(current v4.1.1) or the latest commit from the v4 branch. Add a comment with the version for maintainability.📝 Committable suggestion
🧰 Tools
🪛 zizmor (1.25.2)
[error] 118-118: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)
(unpinned-uses)
🤖 Prompt for AI Agents