-
Notifications
You must be signed in to change notification settings - Fork 3
Week 4 project update #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| GITHUB_WEBHOOK_SECRET= | ||
| GITHUB_TOKEN= | ||
| REPO_NAME= #Enter repo from where you want to fetch issues | ||
| SIMILARITY_THRESHOLD= | ||
| MY_REPO_NAME= #Enter repo for which you want to use | ||
|
|
||
| GITHUB_TOKEN=your_token_here | ||
| GITHUB_WEBHOOK_SECRET=your_secret_here | ||
| REPO_NAME=OpenLake/your-repo | ||
| TARGET_REPO=your-test-repo | ||
| SIMILARITY_THRESHOLD=0.85 | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| # Python | ||
| __pycache__/ | ||
| *.pyc | ||
|
|
||
| # Virtual environment | ||
| venv/ | ||
| .env | ||
|
|
||
| # Chroma database | ||
| data/issues.json | ||
| data/chroma/chroma.sqlite3 | ||
| data/chroma/defcbcc9-4578-4553-b3f4-648cbe1c763b | ||
|
|
||
| # IDE | ||
| .vscode/ | ||
| .idea/ | ||
|
|
||
| # OS | ||
| .DS_Store |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,6 @@ | ||
| <<<<<<< HEAD | ||
|
|
||
| ======= | ||
| # SmartTriage | ||
|
Comment on lines
+1
to
4
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win Remove the merge-conflict markers before merging. The README is still in a conflicted state at both the top and bottom, which breaks the rendered docs and should block merge. As per the provided line-range change details, both markers need to be removed. Also applies to: 264-264 🤖 Prompt for AI Agents |
||
|
|
||
| An intelligent GitHub bot that automates issue triage and PR reviewer assignment using vector similarity search and commit history analysis for OpenLake repositories. | ||
|
|
@@ -258,3 +261,4 @@ Refer to `app/api/webhooks.py` for the full webhook routing logic. | |
| ↥ [Back to top](#table-of-contents) | ||
|
|
||
| If you have any questions or feedback, feel free to reach out to the maintainers or open an issue in the repository. | ||
| >>>>>>> c703023b2e677d75aef064a1828fc5d857b18b68 | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,51 @@ | ||
| from fastapi import APIRouter, Request | ||
|
|
||
| from app.ml.duplicate import detect_duplicate | ||
| from app.core.github import GitHubClient | ||
|
|
||
| router = APIRouter() | ||
|
|
||
| github = GitHubClient() | ||
|
|
||
| @router.post("/webhook") | ||
| async def github_webhook(request: Request): | ||
|
|
||
| event = request.headers.get("X-GitHub-Event") | ||
|
|
||
| print(f"GitHub Event: {event}") | ||
|
|
||
| try: | ||
| payload = await request.json() | ||
| except Exception: | ||
| payload = {} | ||
|
|
||
|
Comment on lines
+10
to
+21
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🔒 Security & Privacy | 🟠 Major | 🏗️ Heavy lift Verify the GitHub webhook signature. The handler accepts any POST without validating the Sketchimport hmac, hashlib
from app.config import WEBHOOK_SECRET
raw = await request.body()
sig = request.headers.get("X-Hub-Signature-256", "")
expected = "sha256=" + hmac.new(WEBHOOK_SECRET.encode(), raw, hashlib.sha256).hexdigest()
if not hmac.compare_digest(sig, expected):
raise HTTPException(status_code=401, detail="Invalid signature")
payload = json.loads(raw)🧰 Tools🪛 Ruff (0.15.18)[warning] 19-19: Do not catch blind exception: (BLE001) 🤖 Prompt for AI Agents |
||
| if event == "issues": | ||
|
|
||
| action = payload.get("action") | ||
|
|
||
| if action == "opened": | ||
| issue_text = ( | ||
| payload["issue"]["title"] | ||
| + "\n" | ||
| + (payload["issue"]["body"] or "") | ||
| ) | ||
|
Comment on lines
+27
to
+31
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win Guard against malformed If the payload is missing 🤖 Prompt for AI Agents |
||
|
|
||
| result = detect_duplicate(issue_text) | ||
| print("Duplicate result:", result) | ||
|
|
||
| if result and result["duplicate"]: | ||
| github.comment_issue( | ||
| payload["issue"]["number"], | ||
|
|
||
| f""" | ||
| Possible duplicate issue detected. | ||
| Similar issue: | ||
| {result['issue']['url']} | ||
| Similarity: | ||
| {result['similarity']:.2f} | ||
| """ | ||
| ) | ||
|
Comment on lines
+33
to
+49
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🚀 Performance & Scalability | 🟠 Major | ⚡ Quick win Blocking work runs on the event loop.
Proposed change- result = detect_duplicate(issue_text)
+ from starlette.concurrency import run_in_threadpool
+ result = await run_in_threadpool(detect_duplicate, issue_text)
print("Duplicate result:", result)
if result and result["duplicate"]:
- github.comment_issue(
- payload["issue"]["number"],
- ...
- )
+ await run_in_threadpool(
+ github.comment_issue,
+ payload["issue"]["number"],
+ comment_body,
+ )🤖 Prompt for AI Agents |
||
|
|
||
| return {"ok": True} | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| from dotenv import load_dotenv | ||
| import os | ||
|
|
||
| load_dotenv() | ||
|
|
||
| WEBHOOK_SECRET = os.getenv("GITHUB_WEBHOOK_SECRET") | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🔒 Security & Privacy | 🔴 Critical | 🏗️ Heavy lift
The webhook handler in Compute an HMAC-SHA256 over the raw request body using 🤖 Prompt for AI Agents
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @kushal281 take a look There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| GITHUB_TOKEN = os.getenv("GITHUB_TOKEN") | ||
| TARGET_REPO = os.getenv("TARGET_REPO") | ||
| SIMILARITY_THRESHOLD = float(os.getenv("SIMILARITY_THRESHOLD", 0.85)) | ||
| SOURCE_REPO = os.getenv("SOURCE_REPO") | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| from github import Github | ||
| from app.config import GITHUB_TOKEN | ||
| from app.config import TARGET_REPO | ||
|
|
||
| class GitHubClient: | ||
|
|
||
| def __init__(self): | ||
| self.client = Github(GITHUB_TOKEN) | ||
|
|
||
|
|
||
| def get_repo(self, repo_name): | ||
| return self.client.get_repo(repo_name) | ||
|
|
||
|
|
||
| def fetch_issues(self, repo_name): | ||
| repo = self.get_repo(repo_name) | ||
|
|
||
| issues = [] | ||
|
|
||
| issue_iterator = repo.get_issues(state="all") | ||
|
|
||
| for issue in issue_iterator: | ||
|
|
||
| # skip pull requests | ||
| if issue.pull_request: | ||
| continue | ||
|
|
||
| issues.append({ | ||
| "id": issue.id, | ||
| "title": issue.title, | ||
| "body": issue.body, | ||
| "url": issue.html_url, | ||
| "created_at": str(issue.created_at), | ||
| "state": issue.state | ||
| }) | ||
|
|
||
| return issues | ||
|
|
||
| def comment_issue(self, issue_number, comment): | ||
| self.repo = self.get_repo(TARGET_REPO) | ||
| issue = self.repo.get_issue(number=issue_number) | ||
| issue.create_comment(comment) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| from app.core.github import GitHubClient | ||
| from app.ml.clean import clean_issue | ||
| from app.db.vector import add_issue | ||
| from app.config import SOURCE_REPO | ||
| import json | ||
| import os | ||
|
|
||
| def main(): | ||
|
|
||
| github = GitHubClient() | ||
| issues = github.fetch_issues(SOURCE_REPO) | ||
|
|
||
| uncleaned_issues = [] | ||
|
|
||
| for issue in issues: | ||
| cleaned = clean_issue(issue) | ||
| add_issue( | ||
| cleaned["id"], | ||
| cleaned["text"], | ||
| cleaned["vector"], | ||
| { | ||
| "state": cleaned["state"], | ||
| "url": cleaned["url"] | ||
| } | ||
| ) | ||
| uncleaned_issues.append({ | ||
| "id": issue['id'], | ||
| "title": issue['title'], | ||
| "body": issue['body'], | ||
| "state": issue['state'], | ||
| "url": issue['url'] | ||
| }) | ||
|
|
||
| os.makedirs("data", exist_ok=True) | ||
|
|
||
| with open("data/issues.json", "w", encoding="utf-8") as f: | ||
| json.dump(uncleaned_issues, f, indent=4, ensure_ascii=False) | ||
| print(f"Stored {len(issues)} issues") | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() |
| Original file line number | Diff line number | Diff line change | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,24 @@ | ||||||||||||
| import chromadb | ||||||||||||
| from chromadb.config import Settings | ||||||||||||
|
|
||||||||||||
| client = chromadb.PersistentClient(path="./data/chroma") | ||||||||||||
|
|
||||||||||||
|
|
||||||||||||
| collection = client.get_or_create_collection(name="github_issues") | ||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🎯 Functional Correctness | 🟠 Major 🧩 Analysis chain🌐 Web query:
💡 Result: The default distance metric for a ChromaDB collection is l2 (Squared Euclidean distance) [1][2]. In ChromaDB version 0.4.6, the Citations:
Set an explicit cosine distance metric for the collection. The default distance metric for Proposed change-collection = client.get_or_create_collection(name="github_issues")
+collection = client.get_or_create_collection(
+ name="github_issues",
+ metadata={"hnsw:space": "cosine"},
+)📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||
|
|
||||||||||||
|
|
||||||||||||
| def add_issue(issue_id, text, vector, metadata): | ||||||||||||
| collection.upsert( | ||||||||||||
| ids=[str(issue_id)], | ||||||||||||
| documents=[text], | ||||||||||||
| embeddings=[vector], | ||||||||||||
| metadatas=[metadata] | ||||||||||||
| ) | ||||||||||||
|
|
||||||||||||
|
|
||||||||||||
| def search_similar_issue(embedding, limit=1): | ||||||||||||
| results = collection.query( | ||||||||||||
| query_embeddings=[embedding], | ||||||||||||
| n_results=limit | ||||||||||||
| ) | ||||||||||||
| return results | ||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| from fastapi import FastAPI | ||
| from app.api.webhooks import router | ||
|
|
||
| app = FastAPI(title="SmartTriage") | ||
|
|
||
| app.include_router(router) | ||
|
|
||
| @app.get("/health") | ||
| def health(): | ||
| return {"status": "ok"} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| from app.ml.embedder import Embedder | ||
|
|
||
| embedder = Embedder() | ||
|
|
||
| def clean_issue(issue): | ||
| text = "" | ||
|
|
||
| if issue["title"]: | ||
| text += issue["title"] | ||
|
|
||
| if issue["body"]: | ||
| text += "\n" + issue["body"] | ||
|
|
||
| vector = embedder.generate_embedding(text) | ||
|
|
||
| return { | ||
| "id": issue["id"], | ||
| "text": text.strip(), | ||
| "url": issue["url"], | ||
| "state": issue["state"], | ||
| "vector": vector | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| from app.ml.embedder import Embedder | ||
| from app.db.vector import search_similar_issue | ||
| from app.config import SIMILARITY_THRESHOLD | ||
|
|
||
| embedder = Embedder() | ||
|
|
||
| def detect_duplicate(issue_text): | ||
|
|
||
| embedding = embedder.generate_embedding(issue_text) | ||
|
|
||
| result = search_similar_issue(embedding) | ||
|
|
||
|
|
||
| if not result["distances"][0]: | ||
| return None | ||
|
|
||
| distance = result["distances"][0][0] | ||
| similarity = max(0, 1 - distance) | ||
|
|
||
| if similarity >= SIMILARITY_THRESHOLD: | ||
| return { | ||
| "duplicate": True, | ||
| "similarity": similarity, | ||
| "issue": result["metadatas"][0][0] | ||
| } | ||
|
|
||
| return { | ||
| "duplicate": False, | ||
| "similarity": similarity | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| from sentence_transformers import SentenceTransformer | ||
|
|
||
| class Embedder: | ||
| def __init__(self): | ||
| self.model = SentenceTransformer("all-MiniLM-L6-v2") | ||
|
|
||
| def generate_embedding(self, text): | ||
| embedding = self.model.encode(text) | ||
| return embedding.tolist() |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,19 @@ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from app.ml.embedder import Embedder | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from app.db.vector import search_similar | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| embedder = Embedder() | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| text = "Cannot login into application" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| embedding = embedder.generate_embedding(text) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| result = search_similar( | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| embedding | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+2
to
+16
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🎯 Functional Correctness | 🔴 Critical | ⚡ Quick win Import name mismatch —
Proposed fix-from app.db.vector import search_similar
+from app.db.vector import search_similar_issue
@@
-result = search_similar(
- embedding
-)
+result = search_similar_issue(
+ embedding
+)📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The minimal fix is: from app.db.vector import search_similar_issue
result = search_similar_issue(embedding) |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| print(result) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Align the example with the actual env contract.
This file has two conflicting key blocks, and it never documents
SOURCE_REPO, whichapp/config.pyreads. New contributors will copy an incomplete/ambiguous config.♻️ Suggested cleanup
As per the
app/config.pysnippet, the app loadsGITHUB_WEBHOOK_SECRET,GITHUB_TOKEN,TARGET_REPO,SIMILARITY_THRESHOLD, andSOURCE_REPO.📝 Committable suggestion
🧰 Tools
🪛 dotenv-linter (4.0.0)
[warning] 2-2: [UnorderedKey] The GITHUB_TOKEN key should go before the GITHUB_WEBHOOK_SECRET key
(UnorderedKey)
[warning] 3-3: [SpaceCharacter] The line has spaces around equal sign
(SpaceCharacter)
[warning] 3-3: [ValueWithoutQuotes] This value needs to be surrounded in quotes
(ValueWithoutQuotes)
[warning] 5-5: [SpaceCharacter] The line has spaces around equal sign
(SpaceCharacter)
[warning] 5-5: [TrailingWhitespace] Trailing whitespace detected
(TrailingWhitespace)
[warning] 5-5: [UnorderedKey] The MY_REPO_NAME key should go before the REPO_NAME key
(UnorderedKey)
[warning] 5-5: [ValueWithoutQuotes] This value needs to be surrounded in quotes
(ValueWithoutQuotes)
[warning] 7-7: [DuplicatedKey] The GITHUB_TOKEN key is duplicated
(DuplicatedKey)
[warning] 8-8: [DuplicatedKey] The GITHUB_WEBHOOK_SECRET key is duplicated
(DuplicatedKey)
[warning] 9-9: [DuplicatedKey] The REPO_NAME key is duplicated
(DuplicatedKey)
[warning] 11-11: [DuplicatedKey] The SIMILARITY_THRESHOLD key is duplicated
(DuplicatedKey)
[warning] 11-11: [EndingBlankLine] No blank line at the end of the file
(EndingBlankLine)
[warning] 11-11: [UnorderedKey] The SIMILARITY_THRESHOLD key should go before the TARGET_REPO key
(UnorderedKey)
🤖 Prompt for AI Agents