This service listens to an IMAP inbox, filters incoming mailing-list emails, extracts event details using an LLM pipeline, and stores extracted events in a SQL database.
The core runtime file is listen.py, and extraction/cleaning logic is in extract.py.
- Connects to an IMAP server over STARTTLS (port 143 + TLS upgrade).
- Tracks mailbox progress using IMAP UID values.
- Filters emails whose
Tocontains at least one configured target address. - Extracts message body text (prefers
text/plain, thentext/html, ignores attachments). - Cleans text with
clean_text. - Calls
extract_eventto parse event metadata. - Stores extracted event fields in SQL.
- Persists the last processed UID in SQL so restarts continue from the last checkpoint.
- Start process.
- Load environment variables from
.env.dev. - Open IMAP connection and select mailbox.
- Read
last_processed_uidfrom DB state table. - If no state exists (
0):- Read current mailbox max UID.
- Save that UID as initial state.
- Do not process historical messages.
- Wait for new mail via IMAP IDLE.
- For each UID newer than state:
- Fetch message.
- Log sender/recipient/subject.
- Filter by
Toaddresses. - Extract and clean body.
- Send extraction request to model.
- Store extracted event if present.
- Update DB checkpoint (
last_processed_uid).
- On failures, reconnect after delay.
Responsibilities:
- IMAP connection and mailbox selection.
- IDLE waiting (built-in path if available, manual fallback otherwise).
- New UID discovery and processing loop.
- Event persistence and checkpoint persistence.
- Runtime logging.
Responsibilities:
- Convert HTML-ish text into normalized textual content via
html2textand regex cleanup. - Send prompt + email body to model (
qwen3.5:9b). - Parse model output as strict JSON or
None.
Model path:
- Local path by default (
ollama.generate). - Cloud path if started with
--cloud(Client(host='https://ollama.com', Authorization Bearer key)).
Tables are auto-created by SQLAlchemy on startup.
id(int, primary key, autoincrement)name(string, nullable)venue(string, nullable)date(string, nullable)time(string, nullable)created_at(datetime, UTC default)
id(int, primary key, autoincrement)last_processed_uid(int)updated_at(datetime, UTC default)
Notes:
- Current code stores one logical state row and updates it.
last_processed_uidis updated for every processed-or-skipped UID to avoid reprocessing on restart.
Environment variables are loaded from .env.dev because listen.py calls:
load_dotenv('.env.dev')
IMAP_EMAIL: IMAP login username.IMAP_PASSWORD: IMAP login password.
IMAP_HOST(default:newmailhost.cc.iitk.ac.in)IMAP_PORT(default:143)IMAP_MAILBOX(default:INBOX)DATABASE_URL(default:sqlite:///events.db)RECONNECT_DELAY_SECONDS(default:10)IDLE_TIMEOUT_SECONDS(default:300)TARGET_TO_ADDRESSES(default resolved fromTARGET_ADDRESSESfallback, then built-in list)TARGET_ADDRESSES(used as fallback seed for default target list)OLLAMA_API_KEY(required only when using--cloud)
Current code defines:
DEFAULT_TARGET_TO_ADDRESSES = os.getenv('TARGET_ADDRESSES', 'students@list.iitk.ac.in,all@lists.iitk.ac.in')TARGET_TO_ADDRESSESfromos.getenv('TARGET_TO_ADDRESSES', DEFAULT_TARGET_TO_ADDRESSES)
So either of these can affect behavior:
TARGET_TO_ADDRESSES(primary)TARGET_ADDRESSES(fallback source)
Prefer setting TARGET_TO_ADDRESSES explicitly to avoid confusion.
IMAP_HOST=newmailhost.cc.iitk.ac.in
IMAP_PORT=143
IMAP_EMAIL=your_username_or_email
IMAP_PASSWORD=your_password
IMAP_MAILBOX=INBOX
TARGET_TO_ADDRESSES=students@list.iitk.ac.in,all@lists.iitk.ac.in
DATABASE_URL=sqlite:///events.db
RECONNECT_DELAY_SECONDS=10
IDLE_TIMEOUT_SECONDS=300
# Required only for --cloud mode
OLLAMA_API_KEY=your_ollama_api_keyUse Python 3.11+ recommended.
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install python-dotenv sqlalchemy ollama html2text pandasIf you use local Ollama, make sure the model exists locally and the Ollama server is available.
Local extraction path:
python3 listen.pyCloud extraction path:
python3 listen.py --cloudTypical log sequence for one message:
- Connection startup
- Listener state initialization
- New mail metadata log (UID, From, To, Subject)
- Body extraction start and source (
multipart/text/plain, etc.) - Extraction request sent to model
- Either:
- No event extracted
- Event stored
Examples:
New email received UID ... | From: ... | To: ... | Subject: ...Extracting body for UID ...Body extracted for UID ... from multipart/text/plainSending extract request for UID ...Stored event for UID ...: {...}
wait_for_new_mail has two paths:
- Built-in IDLE path (
imap.idle) if available in Python runtime. - Manual IMAP protocol fallback:
- Send
IDLE - Wait on socket with
select - Send
DONE
- Send
Both paths are push-like waiting and avoid frequent polling loops.
- On first startup with empty state:
- The listener sets checkpoint to current max UID.
- Historical mail is not processed.
- On subsequent restarts:
- Reads stored checkpoint.
- Processes only UIDs greater than checkpoint.
This prevents duplicate processing across restarts.
Set DATABASE_URL to a SQLAlchemy-compatible URL.
Examples:
# SQLite
DATABASE_URL=sqlite:///events.db
# PostgreSQL
DATABASE_URL=postgresql+psycopg2://user:password@host:5432/dbname
# MySQL
DATABASE_URL=mysql+pymysql://user:password@host:3306/dbnameInstall the matching DB driver package when moving away from SQLite.
- Verify
IMAP_EMAILandIMAP_PASSWORD. - Some servers accept local-part usernames, others need full email.
- Check network reachability to IMAP host and port.
- Validate TLS handshake is allowed.
- Increase
RECONNECT_DELAY_SECONDSif needed.
- Confirm message
Tocontains configured target addresses. - Inspect logs for body source and extraction call.
- Verify model availability and model name in
extract.py.
- Model output may deviate from strict JSON.
- Current code logs warning and skips that message.
- Ensure
OLLAMA_API_KEYis set. - Confirm outbound access to
https://ollama.com.
- Attachments are ignored during body extraction.
- If both plain and HTML body exist, plain text is preferred.
clean_textmay trim forwarded headers and quote prefixes.- Process runs continuously until interrupted.
- Never commit real credentials in
.env.dev. - Rotate IMAP and API credentials if leaked.
- Consider least-privileged mailbox credentials where possible.