Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions .claude/settings.local.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"permissions": {
"allow": [
"mcp__claude_ai_Notion__notion-fetch",
"mcp__claude_ai_Notion__notion-create-pages",
"mcp__claude_ai_Notion__notion-update-page",
"mcp__claude_ai_Notion__notion-search",
"Bash(Get-ChildItem -Path \"e:\\\\AI\\\\Udemy\\\\AI Engineer Agentic Track The Complete Agent & MCP Course - Ed Donner\\\\agents\" -Force)",
"Bash(Select-Object Name, PSIsContainer)",
"Bash(Format-Table -AutoSize)",
"Bash(grep -E \"\\\\.py$\")",
"Bash(Get-ChildItem -Path \"e:\\\\AI\\\\Udemy\\\\AI Engineer Agentic Track The Complete Agent & MCP Course - Ed Donner\\\\agents\\\\7_advanced\" -Recurse)",
"Bash(Select-Object FullName, Length)",
"Bash(Remove-Item \"e:\\\\AI\\\\Udemy\\\\AI Engineer Agentic Track The Complete Agent & MCP Course - Ed Donner\\\\agents\\\\7_advanced\\\\ISSUES.md\")"
]
}
}
160 changes: 160 additions & 0 deletions 10_knowledge_graphs/1_lab1.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Lab 1 — Triple Extraction\n",
"\n",
"**Goal:** Turn unstructured text into structured (subject, predicate, object) triples using an LLM with structured outputs.\n",
"\n",
"**Why triples?** \n",
"Triples are the atomic unit of knowledge. Every fact can be expressed as: _someone_ → _relationship_ → _something_. \n",
"Build enough triples and you have a knowledge graph."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys, os\n",
"sys.path.insert(0, os.path.dirname(os.path.abspath('__file__')))\n",
"from dotenv import load_dotenv\n",
"load_dotenv(override=True)\n",
"from knowledge_graph import extract_triples\n",
"print('Ready ✓')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Extract from a simple paragraph"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"TEXT = \"\"\"\n",
"Jeff Bezos founded Amazon in 1994 in Seattle. Amazon later acquired Whole Foods in 2017.\n",
"Andy Jassy became CEO of Amazon in 2021. Amazon Web Services, a subsidiary of Amazon, \n",
"is headquartered in Seattle. AWS competes with Microsoft Azure and Google Cloud.\n",
"\"\"\"\n",
"\n",
"triples = extract_triples(TEXT)\n",
"\n",
"print(f'Extracted {len(triples)} triples:\\n')\n",
"for t in triples:\n",
" print(f' ({t.subject}) --[{t.predicate}]--> ({t.obj}) [conf={t.confidence:.0%}]')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Explore predicate vocabulary"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from collections import Counter\n",
"predicates = Counter(t.predicate for t in triples)\n",
"print('Predicates found:')\n",
"for pred, count in predicates.most_common():\n",
" print(f' {pred}: {count}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Confidence filtering\n",
"Low-confidence triples should be reviewed before adding to the graph."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"high_conf = [t for t in triples if t.confidence >= 0.9]\n",
"low_conf = [t for t in triples if t.confidence < 0.9]\n",
"\n",
"print(f'High confidence (≥90%): {len(high_conf)}')\n",
"print(f'Low confidence (<90%): {len(low_conf)}')\n",
"if low_conf:\n",
" print('\\nLow confidence triples (review before adding):')\n",
" for t in low_conf:\n",
" print(f' [{t.confidence:.0%}] ({t.subject}) --[{t.predicate}]--> ({t.obj})')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Extract from multiple documents and merge"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"DOC2 = \"\"\"\n",
"Sundar Pichai is the CEO of Alphabet, the parent company of Google.\n",
"Google was founded by Larry Page and Sergey Brin at Stanford University in 1998.\n",
"Google Cloud competes with AWS and Azure in the cloud market.\n",
"\"\"\"\n",
"\n",
"triples2 = extract_triples(DOC2)\n",
"all_triples = triples + triples2\n",
"\n",
"print(f'Doc 1: {len(triples)} triples')\n",
"print(f'Doc 2: {len(triples2)} triples')\n",
"print(f'Total: {len(all_triples)} triples')\n",
"\n",
"# Find shared entities (entities mentioned in both docs)\n",
"entities1 = {t.subject for t in triples} | {t.obj for t in triples}\n",
"entities2 = {t.subject for t in triples2} | {t.obj for t in triples2}\n",
"shared = entities1 & entities2\n",
"print(f'\\nShared entities (bridge nodes): {shared}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise\n",
"Extend the extraction to also extract **temporal facts** — triples that include a year or date. \n",
"Add a `year: Optional[int]` field to the Triple model and populate it when the text mentions a specific year. \n",
"Test on text with multiple dated events."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Your code here\n"
]
}
],
"metadata": {
"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"},
"language_info": {"name": "python", "version": "3.12.0"}
},
"nbformat": 4,
"nbformat_minor": 5
}
201 changes: 201 additions & 0 deletions 10_knowledge_graphs/2_lab2.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Lab 2 — Graph Construction & Query\n",
"\n",
"**Goal:** Store extracted triples in a NetworkX graph and query it — neighbours, shortest paths, predicate search.\n",
"\n",
"**Requires:** `pip install networkx matplotlib`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys, os\n",
"sys.path.insert(0, os.path.dirname(os.path.abspath('__file__')))\n",
"from dotenv import load_dotenv\n",
"load_dotenv(override=True)\n",
"from knowledge_graph import KnowledgeGraph, Triple\n",
"import networkx as nx\n",
"print('Ready ✓')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Build a graph from text"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"TEXT = \"\"\"\n",
"Elon Musk founded SpaceX in 2002 and Tesla in 2003. SpaceX is headquartered in Hawthorne, California.\n",
"Tesla is headquartered in Austin, Texas. In 2022, Musk acquired Twitter and renamed it X.\n",
"Gwynne Shotwell is the President of SpaceX. Sam Altman is the CEO of OpenAI.\n",
"OpenAI is based in San Francisco. Microsoft invested in OpenAI in 2019.\n",
"California is a state in the United States. Texas is also a US state.\n",
"\"\"\"\n",
"\n",
"kg = KnowledgeGraph()\n",
"triples = kg.add_text(TEXT)\n",
"\n",
"print(f'Graph stats: {kg.stats()}')\n",
"print(f'\\nAll {len(triples)} triples:')\n",
"for t in triples:\n",
" print(f' ({t.subject}) --[{t.predicate}]--> ({t.obj})')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Query neighbours"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('What does Elon Musk connect to?')\n",
"for n in kg.neighbors('Elon Musk'):\n",
" print(f' {n[\"direction\"]} [{n[\"predicate\"]}] {n[\"entity\"]}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Find by predicate"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Find all headquarters relationships\n",
"hq_pairs = kg.find_by_predicate('headquartered_in')\n",
"if not hq_pairs:\n",
" # Try common variations\n",
" for edge_pred in set(d['predicate'] for _,_,d in kg.graph.edges(data=True)):\n",
" if 'headquarter' in edge_pred.lower() or 'located' in edge_pred.lower() or 'based' in edge_pred.lower():\n",
" hq_pairs += kg.find_by_predicate(edge_pred)\n",
" \n",
"print('Companies and their locations:')\n",
"for org, loc in hq_pairs:\n",
" print(f' {org} → {loc}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Shortest path between entities"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# What is the connection between Gwynne Shotwell and California?\n",
"path = kg.shortest_path('Gwynne Shotwell', 'California')\n",
"if path:\n",
" print('Path: ' + ' → '.join(path))\n",
"else:\n",
" print('No path found. Try the graph nodes:', list(kg.graph.nodes())[:10])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Visualise the graph"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"try:\n",
" import matplotlib.pyplot as plt\n",
" plt.figure(figsize=(14, 8))\n",
" pos = nx.spring_layout(kg.graph, k=2, seed=42)\n",
" nx.draw_networkx_nodes(kg.graph, pos, node_size=1500, node_color='lightblue', alpha=0.8)\n",
" nx.draw_networkx_labels(kg.graph, pos, font_size=8)\n",
" edge_labels = {(s, o): d['predicate'] for s, o, d in kg.graph.edges(data=True)}\n",
" nx.draw_networkx_edges(kg.graph, pos, alpha=0.5, arrows=True, arrowsize=20)\n",
" nx.draw_networkx_edge_labels(kg.graph, pos, edge_labels=edge_labels, font_size=6)\n",
" plt.title('Knowledge Graph')\n",
" plt.axis('off')\n",
" plt.tight_layout()\n",
" plt.show()\nexcept ImportError:\n",
" print('matplotlib not installed — skipping visualisation')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. Save and reload"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import tempfile, os\n",
"path = os.path.join(tempfile.gettempdir(), 'demo_kg.json')\n",
"kg.save(path)\n",
"\n",
"kg2 = KnowledgeGraph.load(path)\n",
"print(f'Saved and reloaded: {kg2.stats()}')\n",
"print('Graphs match:', kg.stats() == kg2.stats())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise\n",
"Build a function `entity_importance(kg)` that ranks entities by their degree centrality (number of connections). \n",
"The most important entity in a knowledge graph is typically the most connected one. \n",
"Print the top 5 entities and explain why they rank highest."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Your code here\n"
]
}
],
"metadata": {
"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"},
"language_info": {"name": "python", "version": "3.12.0"}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading