(The project is still in beta)
Words Finder is a desktop application for Japanese learners that connects to your Anki decks, scans subtitle files (.srt / .ass), and produces a filtered list of words that appear in the subtitles but that you don't know yet — along with the sentences they appear in. It is designed to be used alongside Anki and the AnkiConnect add-on.
- Features
- Prerequisites
- Installation (Windows)
- Installation (macOS)
- Installation (Linux)
- First Launch — Initial Setup
- Main Screen
- Subtitle Results (Subs Page)
- Kanji Stats
- Words Stats
- Settings
- Keyboard Shortcuts
- Parsing Modes: Orth-base vs Lemma
- Background Images
- Data Storage
- Advanced: CLI Usage
- Tech Stack
- Anki sync — reads all cards from your configured decks and extracts the words/kanji you already know.
- Subtitle scanning — scans all
.srtand.assfiles in a folder (including subfolders) and finds every word you don't know yet. - Sentence context — for each unknown word, the app stores every sentence it appears in along with the subtitle file title and timestamp.
- Kanji tracker — shows which kanji from your subtitles you already know and which you don't, sortable by school grade or RTK order.
- Word list — a paginated view of all words pulled from your Anki decks, sortable by frequency.
- New subtitles export — save selected sentences to an output
.srtfile to feed into tools like subs2srs for card creation. - LLM prompt generator — copies a ready-made prompt (with 200 unknown words) to clipboard for pasting into an AI assistant to get readings and meanings quickly.
- Two parsing modes — orthographic-base and lemma, so you can control how word variants are matched.
- English / Japanese UI.
- Dark and light themes, with optional custom background images per subtitle collection.
Before installing Words Finder you need:
- Anki — the flashcard application. Must be running whenever you sync.
- AnkiConnect — an Anki add-on (code
2055492159) that exposes a local API the app uses to read your cards.- Install it from inside Anki: Tools → Add-ons → Get Add-ons… and enter the code above.
- The default AnkiConnect URL is
http://localhost:8765. Leave this as-is unless you have changed it manually.
Requires Windows 10 or later (64-bit)
- Download
WordsFinderSetup.exefrom the Releases page. - Run the installer and follow the prompts. The installation is roughly 1 GB — the majority of the space is the UniDic dictionary used for Japanese tokenisation.
- Launch Words Finder from the Start Menu or the desktop shortcut.
All user data (configuration, word databases, scan results) is stored in %AppData%\Roaming\WordsFinder and is never affected by updating or reinstalling the app.
Requires macOS 13 (Ventura) or later — supports both Apple Silicon (M1/M2/M3/M4) and Intel Macs
- Download
WordsFinder-macOS.zipfrom the Releases. - Unzip the file — you will get
WordsFinder.app. - Drag
WordsFinder.appinto your Applications folder. - Double-click it to launch.
First launch — security warning
Because the app is not signed with an Apple Developer certificate, macOS will block it the very first time with a message like "WordsFinder cannot be opened because the developer cannot be verified."
To get past this one time only:
- Right-click (or Control-click)
WordsFinder.app- Select Open from the context menu
- Click Open in the dialog that appears
After that, you can open the app normally by double-clicking it.
All user data (configuration, word databases, scan results) is stored in ~/Library/Application Support/WordsFinder and is never affected by updating or reinstalling the app.
Requires a GTK3 desktop environment with WebKit2GTK — Ubuntu 22.04+ / Debian 11+ or equivalent
- Download
WordsFinder.AppImagefrom the Releases page. - Make it executable (one time only):
chmod +x WordsFinder.AppImage- Double-click it in your file manager, or run it from the terminal:
./WordsFinder.AppImageFirst launch registers the app icon in your system so it appears correctly in file managers and the app launcher going forward.
Required system packages (Ubuntu/Debian — install once):
sudo apt install python3-gi python3-gi-cairo gir1.2-gtk-3.0 gir1.2-webkit2-4.1On Ubuntu 22.04 replace
gir1.2-webkit2-4.1withgir1.2-webkit2-4.0.
- Download
WordsFinderSetup.tar.gzfrom the Releases page. - Extract and launch:
tar -xzf WordsFinderSetup.tar.gz
cd WordsFinder
./WordsFinder.shWordsFinder.sh checks for the required GTK packages and installs any that are missing before launching.
# 1. Create a virtual environment using the SYSTEM Python (required for GTK)
/usr/bin/python3 -m venv --system-site-packages venv
source venv/bin/activate
# 2. Install Python dependencies
pip install -r requirements.txt
# 3. Download the UniDic dictionary (~600 MB, one time)
python -m unidic download
# 4. Run
python main.pyOn first launch you will be asked to choose between English and 日本語 for the UI language. This can be changed later in Settings.
You must tell Words Finder which Anki deck(s) to read from and which field within each note type contains the Japanese word.
| Field | What to enter |
|---|---|
| Deck | The exact name of your Anki deck (e.g. Japanese) |
| Field | The note field that holds the word (e.g. Expression) |
Tips:
- If one deck contains multiple note types with different field names (e.g.
Expressionon some cards andSentenceon others), add a separate row for each combination. - It is strongly recommended to maintain two decks: a large archive deck (cards you no longer add to) and a smaller active deck where new cards go. After the first sync, configure Words Finder to watch only the active deck — the app only fetches cards newer than the last sync timestamp, so pointing it at a huge static deck wastes time on every subsequent sync.
- Scanning 9,000 cards takes approximately 3 minutes on a typical machine.
You can add up to 9 deck/field pairs. Click OK to save and continue to the main screen.
The main screen has three functional areas:
| Button | Description |
|---|---|
| Sync Anki | Connects to AnkiConnect and fetches all cards from the configured decks. Extracts words and kanji from the configured fields. Only cards newer than the last sync are fetched on subsequent runs. Anki must be open. |
| Scan Subs | Tokenises all subtitle files in the configured input folder and finds every word that appears in the subtitles but is not in your Anki word list. |
| Path field | The folder path to your subtitle files. All .srt and .ass files in the folder and its subfolders are processed. The name of the top-level folder is used as the title of the resulting collection card. |
| Browse | Opens a folder picker dialog to set the subtitle input path. |
| Settings (⚙) | Opens the Settings page. |
Each scanned subtitle folder appears as a card showing the collection name and your current comprehension percentage (words you know ÷ total words). Click a card to open the full word and sentence view. Right-click a card to delete it.
Two navigation panels at the bottom of the screen:
- 漢 Kanji — opens the Kanji Stats page.
- 語 Words — opens the Words Stats page.
Clicking a collection card opens the detailed results page for that subtitle folder.
| Button | Description |
|---|---|
| Rescan | Re-runs the subtitle scan for this specific collection (useful after syncing new Anki cards). |
| Prompt | Copies a prompt to your clipboard containing up to 200 of the unknown words with an instruction for an LLM (e.g. ChatGPT or Claude) to return readings and meanings. |
| Search / Ctrl+F | Filters the word list by the text you type. |
Each word entry shows:
- The word itself.
- A count of how many times it appears across all subtitle files in the collection.
- An Already Know button — marks the word as known, removes it from this collection, and adds it to your known-words database in the currently active parsing mode.
- Right-click → Delete — removes the word from this collection without marking it as known.
Clicking a word in the left panel populates the sentence panel with every sentence that word appears in. Each row shows:
| Column | Description |
|---|---|
| Title | The subtitle filename / show title |
| Timestamp | The start time of the subtitle line |
| Sentence | The full subtitle line containing the word |
| Add to output | Writes this subtitle line to the configured output .srt file (requires an output folder to be set in Settings) |
| Copy | Copies the sentence to clipboard. Right-clicking a sentence row also copies it. |
- Set an output folder in Settings.
- In the sentence panel, click Add to output for any sentence you want to study.
- The sentence is appended to an
.srtfile in the output folder, named after the collection. - Import that
.srtinto subs2srs to generate Anki cards with audio and screenshots automatically.
The Kanji page shows all kanji encountered in your subtitle files, divided into:
- Known — kanji found in your Anki cards.
- Unknown — kanji not yet in Anki.
| Option | Description |
|---|---|
| Grade | Japanese school grade order (Grade 1 → Grade 6 → Secondary → Other) |
| RTK | Remembering the Kanji (Heisig) order |
The sort preference is saved to your config automatically.
Press Ctrl+F (or click the search bar) to filter kanji by character. Matching tiles are shown across all sections with their original numbers preserved. Press Escape to clear the search and return to the normal view.
The Words page shows all words extracted from your Anki cards.
- Words are displayed in a paginated grid (up to 500 per page). Use the navigation buttons at the top and bottom to move between pages.
- Words marked as Already Know via the Subs page appear in a separate "Known" section.
- Use the Descending / Ascending sort buttons to reorder by frequency.
- Press
Ctrl+F(or click the search bar) to filter words by text. Matching tiles are shown across all pages with their original numbers preserved. PressEscapeto clear the search and return to the normal view.
Access Settings via the ⚙ icon on the main screen.
Same interface as the initial setup screen. After your first Anki sync, it is best practice to configure only your active (new-card) deck here so that subsequent syncs stay fast.
⚠ The app only fetches cards newer than the last sync. If you remove a deck from the list and later want to include older cards from it, you will need to Reset Database first and re-sync everything.
| Setting | Description |
|---|---|
| Input folder | Default folder to scan for subtitle files (same as the path field on the main screen). |
| Output folder | Where exported subtitle lines are saved. Required for the Add to output button in the Subs page. If not set, that button is disabled. |
When Ignore in brackets is enabled, any text inside the following bracket types is excluded from word and kanji extraction:
- Half-width and full-width parentheses
()() - Half-width and full-width square brackets
[][] - Curly braces
{}{} - Angle brackets
<><> - Lenticular brackets
【】 - Tortoise-shell brackets
〔〕 - Angle brackets
〈〉《》
Text inside Japanese corner brackets 「」 and white corner brackets 『』 is always included regardless of this setting.
See Parsing Modes below.
Switch between English and Japanese UI at any time.
Toggle between dark and light mode.
Choose a folder of images to use as thumbnails/backgrounds for your subtitle collection cards. The image filename (without extension) must exactly match the collection title (the subtitle folder name). If no matching image is found, the default icon is shown.
Deletes all stored word and kanji data. Subtitle collection results (the scan output files) are not deleted — remove those manually by right-clicking a collection card on the main screen and choosing Delete.
| Shortcut | Action |
|---|---|
Ctrl + |
Zoom in |
Ctrl - |
Zoom out |
Ctrl 0 |
Reset zoom |
Ctrl R |
Reload the current page |
Ctrl F |
Search / filter words on the Subs, Words, and Kanji pages |
Words Finder tokenises Japanese text using fugashi with the UniDic dictionary. Each word can be stored in one of two modes, and you can switch between them at any time in Settings. Both modes are updated simultaneously on every Anki sync and subtitle scan.
| Mode | Behaviour |
|---|---|
| Orth-base | Treats different written forms of a word as separate word, e.g. 嵌る and ハマる are counted as two distinct entries. |
| Lemma | Groups all written forms of the same word together, e.g. 嵌る and ハマる are counted as one entry. |
The Already Know button on the Subs page adds the word only to the currently active parsing mode's database. If you switch modes, you may need to re-mark some words.
To add custom artwork to your collection cards:
- Prepare a folder of image files (
.jpg,.png, etc.). - In Settings → Background folder, select that folder.
- Name each image file to exactly match the subtitle collection title.
- Example: if your subtitle folder was called
Spirited Away, the image should be namedSpirited Away.jpg.
- Example: if your subtitle folder was called
- If no matching image is found, the default icon is displayed instead.
User data is stored at a platform-specific location and is never affected by updating or reinstalling the app:
| Platform | Path |
|---|---|
| Windows | %APPDATA%\WordsFinder\ |
| Linux | ~/.local/share/WordsFinder/ |
| macOS | ~/Library/Application Support/WordsFinder/ |
The folder structure is the same across all platforms:
WordsFinder/
├── config.json # App configuration (decks, paths, settings)
├── ui_config.json # UI preferences (language, theme, background folder)
└── data/
├── anki_words_lemma.jsonl # Words database — lemma mode
├── anki_words_orth_base.jsonl # Words database — orth-base mode
├── kanji/
│ ├── kanji_static.json # Read-only kanji reference data (bundled)
│ └── kanji_dynamic.json # Your known-kanji database (written at runtime)
└── new_words/
└── <CollectionName>/ # Subtitle scan results per collection
To fully reset to a clean state, delete the WordsFinder folder from the path above.
Words Finder also exposes a command-line interface. Run it from a terminal in the installation directory or, if running from source, from the project root.
Windows:
python main.py [--config PATH] <command>
Linux (AppImage):
./WordsFinder.AppImage [--config PATH] <command>
| Command | Description |
|---|---|
sync-anki |
Fetch new cards from Anki |
scan-subs |
Scan subtitle files for new words |
mark-known <word> |
Mark a specific word as known |
remove-word <folder> <word> |
Remove a word from a subtitle collection |
delete-results <folder> |
Delete a subtitle collection result folder |
subtitle-stats <folder> |
Print comprehension statistics for a collection |
kanji-stats [--sort RTK] |
Print kanji statistics (optional RTK sort) |
reset |
Delete all word and kanji databases (prompts for confirmation) |
| Component | Library |
|---|---|
| Japanese tokenisation | fugashi + UniDic |
| Web UI framework | Flask |
| Desktop window | pywebview |
| Anki integration | AnkiConnect |
| Frontend styling | Tailwind CSS |