ed-donner · hariharanp07 · May 9, 2026 · May 9, 2026 · May 9, 2026 · Jun 6, 2026
diff --git a/.claude/settings.local.json b/.claude/settings.local.json
@@ -0,0 +1,17 @@
+{
+  "permissions": {
+    "allow": [
+      "mcp__claude_ai_Notion__notion-fetch",
+      "mcp__claude_ai_Notion__notion-create-pages",
+      "mcp__claude_ai_Notion__notion-update-page",
+      "mcp__claude_ai_Notion__notion-search",
+      "Bash(Get-ChildItem -Path \"e:\\\\AI\\\\Udemy\\\\AI Engineer Agentic Track The Complete Agent & MCP Course - Ed Donner\\\\agents\" -Force)",
+      "Bash(Select-Object Name, PSIsContainer)",
+      "Bash(Format-Table -AutoSize)",
+      "Bash(grep -E \"\\\\.py$\")",
+      "Bash(Get-ChildItem -Path \"e:\\\\AI\\\\Udemy\\\\AI Engineer Agentic Track The Complete Agent & MCP Course - Ed Donner\\\\agents\\\\7_advanced\" -Recurse)",
+      "Bash(Select-Object FullName, Length)",
+      "Bash(Remove-Item \"e:\\\\AI\\\\Udemy\\\\AI Engineer Agentic Track The Complete Agent & MCP Course - Ed Donner\\\\agents\\\\7_advanced\\\\ISSUES.md\")"
+    ]
+  }
+}
diff --git a/10_knowledge_graphs/1_lab1.ipynb b/10_knowledge_graphs/1_lab1.ipynb
@@ -0,0 +1,160 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Lab 1 — Triple Extraction\n",
+    "\n",
+    "**Goal:** Turn unstructured text into structured (subject, predicate, object) triples using an LLM with structured outputs.\n",
+    "\n",
+    "**Why triples?**  \n",
+    "Triples are the atomic unit of knowledge. Every fact can be expressed as: _someone_ → _relationship_ → _something_.  \n",
+    "Build enough triples and you have a knowledge graph."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import sys, os\n",
+    "sys.path.insert(0, os.path.dirname(os.path.abspath('__file__')))\n",
+    "from dotenv import load_dotenv\n",
+    "load_dotenv(override=True)\n",
+    "from knowledge_graph import extract_triples\n",
+    "print('Ready ✓')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Extract from a simple paragraph"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "TEXT = \"\"\"\n",
+    "Jeff Bezos founded Amazon in 1994 in Seattle. Amazon later acquired Whole Foods in 2017.\n",
+    "Andy Jassy became CEO of Amazon in 2021. Amazon Web Services, a subsidiary of Amazon, \n",
+    "is headquartered in Seattle. AWS competes with Microsoft Azure and Google Cloud.\n",
+    "\"\"\"\n",
+    "\n",
+    "triples = extract_triples(TEXT)\n",
+    "\n",
+    "print(f'Extracted {len(triples)} triples:\\n')\n",
+    "for t in triples:\n",
+    "    print(f'  ({t.subject}) --[{t.predicate}]--> ({t.obj})  [conf={t.confidence:.0%}]')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Explore predicate vocabulary"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from collections import Counter\n",
+    "predicates = Counter(t.predicate for t in triples)\n",
+    "print('Predicates found:')\n",
+    "for pred, count in predicates.most_common():\n",
+    "    print(f'  {pred}: {count}')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Confidence filtering\n",
+    "Low-confidence triples should be reviewed before adding to the graph."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "high_conf = [t for t in triples if t.confidence >= 0.9]\n",
+    "low_conf  = [t for t in triples if t.confidence < 0.9]\n",
+    "\n",
+    "print(f'High confidence (≥90%): {len(high_conf)}')\n",
+    "print(f'Low confidence  (<90%): {len(low_conf)}')\n",
+    "if low_conf:\n",
+    "    print('\\nLow confidence triples (review before adding):')\n",
+    "    for t in low_conf:\n",
+    "        print(f'  [{t.confidence:.0%}] ({t.subject}) --[{t.predicate}]--> ({t.obj})')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Extract from multiple documents and merge"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "DOC2 = \"\"\"\n",
+    "Sundar Pichai is the CEO of Alphabet, the parent company of Google.\n",
+    "Google was founded by Larry Page and Sergey Brin at Stanford University in 1998.\n",
+    "Google Cloud competes with AWS and Azure in the cloud market.\n",
+    "\"\"\"\n",
+    "\n",
+    "triples2 = extract_triples(DOC2)\n",
+    "all_triples = triples + triples2\n",
+    "\n",
+    "print(f'Doc 1: {len(triples)} triples')\n",
+    "print(f'Doc 2: {len(triples2)} triples')\n",
+    "print(f'Total: {len(all_triples)} triples')\n",
+    "\n",
+    "# Find shared entities (entities mentioned in both docs)\n",
+    "entities1 = {t.subject for t in triples} | {t.obj for t in triples}\n",
+    "entities2 = {t.subject for t in triples2} | {t.obj for t in triples2}\n",
+    "shared = entities1 & entities2\n",
+    "print(f'\\nShared entities (bridge nodes): {shared}')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Exercise\n",
+    "Extend the extraction to also extract **temporal facts** — triples that include a year or date.  \n",
+    "Add a `year: Optional[int]` field to the Triple model and populate it when the text mentions a specific year.  \n",
+    "Test on text with multiple dated events."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Your code here\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"},
+  "language_info": {"name": "python", "version": "3.12.0"}
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/10_knowledge_graphs/2_lab2.ipynb b/10_knowledge_graphs/2_lab2.ipynb
@@ -0,0 +1,201 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Lab 2 — Graph Construction & Query\n",
+    "\n",
+    "**Goal:** Store extracted triples in a NetworkX graph and query it — neighbours, shortest paths, predicate search.\n",
+    "\n",
+    "**Requires:** `pip install networkx matplotlib`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import sys, os\n",
+    "sys.path.insert(0, os.path.dirname(os.path.abspath('__file__')))\n",
+    "from dotenv import load_dotenv\n",
+    "load_dotenv(override=True)\n",
+    "from knowledge_graph import KnowledgeGraph, Triple\n",
+    "import networkx as nx\n",
+    "print('Ready ✓')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Build a graph from text"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "TEXT = \"\"\"\n",
+    "Elon Musk founded SpaceX in 2002 and Tesla in 2003. SpaceX is headquartered in Hawthorne, California.\n",
+    "Tesla is headquartered in Austin, Texas. In 2022, Musk acquired Twitter and renamed it X.\n",
+    "Gwynne Shotwell is the President of SpaceX. Sam Altman is the CEO of OpenAI.\n",
+    "OpenAI is based in San Francisco. Microsoft invested in OpenAI in 2019.\n",
+    "California is a state in the United States. Texas is also a US state.\n",
+    "\"\"\"\n",
+    "\n",
+    "kg = KnowledgeGraph()\n",
+    "triples = kg.add_text(TEXT)\n",
+    "\n",
+    "print(f'Graph stats: {kg.stats()}')\n",
+    "print(f'\\nAll {len(triples)} triples:')\n",
+    "for t in triples:\n",
+    "    print(f'  ({t.subject}) --[{t.predicate}]--> ({t.obj})')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Query neighbours"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print('What does Elon Musk connect to?')\n",
+    "for n in kg.neighbors('Elon Musk'):\n",
+    "    print(f'  {n[\"direction\"]} [{n[\"predicate\"]}] {n[\"entity\"]}')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Find by predicate"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Find all headquarters relationships\n",
+    "hq_pairs = kg.find_by_predicate('headquartered_in')\n",
+    "if not hq_pairs:\n",
+    "    # Try common variations\n",
+    "    for edge_pred in set(d['predicate'] for _,_,d in kg.graph.edges(data=True)):\n",
+    "        if 'headquarter' in edge_pred.lower() or 'located' in edge_pred.lower() or 'based' in edge_pred.lower():\n",
+    "            hq_pairs += kg.find_by_predicate(edge_pred)\n",
+    "            \n",
+    "print('Companies and their locations:')\n",
+    "for org, loc in hq_pairs:\n",
+    "    print(f'  {org} → {loc}')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Shortest path between entities"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# What is the connection between Gwynne Shotwell and California?\n",
+    "path = kg.shortest_path('Gwynne Shotwell', 'California')\n",
+    "if path:\n",
+    "    print('Path: ' + ' → '.join(path))\n",
+    "else:\n",
+    "    print('No path found. Try the graph nodes:', list(kg.graph.nodes())[:10])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5. Visualise the graph"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "try:\n",
+    "    import matplotlib.pyplot as plt\n",
+    "    plt.figure(figsize=(14, 8))\n",
+    "    pos = nx.spring_layout(kg.graph, k=2, seed=42)\n",
+    "    nx.draw_networkx_nodes(kg.graph, pos, node_size=1500, node_color='lightblue', alpha=0.8)\n",
+    "    nx.draw_networkx_labels(kg.graph, pos, font_size=8)\n",
+    "    edge_labels = {(s, o): d['predicate'] for s, o, d in kg.graph.edges(data=True)}\n",
+    "    nx.draw_networkx_edges(kg.graph, pos, alpha=0.5, arrows=True, arrowsize=20)\n",
+    "    nx.draw_networkx_edge_labels(kg.graph, pos, edge_labels=edge_labels, font_size=6)\n",
+    "    plt.title('Knowledge Graph')\n",
+    "    plt.axis('off')\n",
+    "    plt.tight_layout()\n",
+    "    plt.show()\nexcept ImportError:\n",
+    "    print('matplotlib not installed — skipping visualisation')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6. Save and reload"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import tempfile, os\n",
+    "path = os.path.join(tempfile.gettempdir(), 'demo_kg.json')\n",
+    "kg.save(path)\n",
+    "\n",
+    "kg2 = KnowledgeGraph.load(path)\n",
+    "print(f'Saved and reloaded: {kg2.stats()}')\n",
+    "print('Graphs match:', kg.stats() == kg2.stats())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Exercise\n",
+    "Build a function `entity_importance(kg)` that ranks entities by their degree centrality (number of connections).  \n",
+    "The most important entity in a knowledge graph is typically the most connected one.  \n",
+    "Print the top 5 entities and explain why they rank highest."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Your code here\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"},
+  "language_info": {"name": "python", "version": "3.12.0"}
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}