ML-powered manga translator, written in Rust.
Koharu introduces a local-first workflow for manga translation, utilizing the power of ML to automate the process. It combines the capabilities of object detection, OCR, inpainting, and LLMs to create a seamless translation experience.
Under the hood, Koharu uses candle and llama.cpp for high-performance inference, with Tauri for the desktop app. All components are written in Rust, ensuring safety and speed.
Note
Koharu runs its vision models and LLMs locally on your machine to keep your data private and secure.
Note
Support and discussion are available on the Discord server.
- Automatic detection of text regions, speech bubbles, and cleanup masks
- OCR for manga dialogue, captions, and other page text
- Inpainting to remove source lettering from the page
- Translation with local or remote LLM backends
- Advanced text rendering with vertical CJK and RTL support
- Layered PSD export with editable text
- Local HTTP API and MCP server for automation
For installation and first-run guidance, see Install Koharu and Translate Your First Page.
- Ctrl + Mouse Wheel: Zoom in/out
- Ctrl + Drag: Pan the canvas
- Del: Delete selected text block
Koharu can export the current page either as a flattened rendered image or as a layered Photoshop PSD. PSD export preserves helper layers and writes translated text as editable text layers, which is useful for downstream cleanup and manual refinement.
For export behavior, PSD contents, and file naming, see Export Pages and Manage Projects.
Koharu includes a built-in MCP server for local agent integrations. By default it listens on a random local port, but you can pin it with --port.
# macOS / Linux
koharu --port 9999
# Windows
koharu.exe --port 9999Then point your client at http://localhost:9999/mcp.
For local setup and the available tools, see Run GUI, Headless, and MCP Modes, Configure MCP Clients, and MCP Tools Reference.
Koharu can run without launching the desktop window.
# macOS / Linux
koharu --port 4000 --headless
# Windows
koharu.exe --port 4000 --headlessYou can then connect to the web client at http://localhost:4000.
For runtime modes, ports, and local endpoints, see Run GUI, Headless, and MCP Modes.
Koharu lets you configure the shared local data path plus HTTP connect timeout, read timeout, and retry count used by downloads and provider requests.
Those values are loaded at startup, so changing them saves the config and restarts the app.
Koharu includes built-in Google Fonts support for translated text rendering, so you can use web fonts without managing font files by hand.
Google Fonts are fetched on demand from a bundled catalog. Koharu caches downloaded files under the app data directory and reuses them for later renders, so you usually only need an internet connection the first time a family is used on that machine.
The catalog includes a small set of comic-friendly recommended families. Once cached, a Google Font behaves like any other local render font.
Koharu includes a dedicated text renderer tuned for manga lettering, using Unicode-aware OpenType shaping, script-aware line breaking, precise glyph metrics, and real glyph bounds instead of generic browser or OS text primitives.
It supports vertical CJK layout, right-to-left scripts, font fallback, vertical punctuation alignment, constrained-box fitting, and manga-oriented stroke and effect compositing so translated text reads naturally inside speech bubbles, captions, and other irregular page layouts.
Koharu supports CUDA, experimental ZLUDA, Metal, and Vulkan. CPU fallback is always available when the accelerated path is unavailable or not worth the setup cost on your system.
On Windows, Koharu ships with CUDA support so it can use NVIDIA GPUs for the full local pipeline.
Koharu bundles CUDA Toolkit 13.0. The required DLLs are extracted to the application data directory on first run.
Note
Make sure you have current NVIDIA drivers installed. You can update them through NVIDIA App.
Koharu supports NVIDIA GPUs with compute capability 7.5 or higher.
For GPU compatibility references, see CUDA GPU Compute Capability.
Koharu supports experimental ZLUDA acceleration on Windows for AMD GPUs. ZLUDA is a CUDA compatibility layer that lets some CUDA workloads run on AMD GPUs.
To use it, install the AMD HIP SDK.
Koharu supports Metal on Apple Silicon Macs. No extra runtime setup is required beyond a normal app install.
Koharu also supports Vulkan on Windows and Linux. This backend is currently used primarily for OCR and local LLM inference.
Detection and inpainting still depend on CUDA, ZLUDA, or Metal, so Vulkan is useful but not a full replacement for the main accelerated path. AMD and Intel GPUs can still benefit from it.
You can always force Koharu to use CPU for inference:
# macOS / Linux
koharu --cpu
# Windows
koharu.exe --cpuFor backend selection, fallback behavior, and model runtime support, see Acceleration and Runtime.
Koharu uses a staged stack of vision and language models instead of trying to solve the entire page with a single network.
Koharu uses multiple pretrained models, each tuned for a specific part of the page pipeline.
These models find text regions, speech bubbles, and page structure.
- comic-text-bubble-detector for joint text block and speech bubble detection
- comic-text-detector for text segmentation masks
- PP-DocLayoutV3 for document layout analysis
- speech-bubble-segmentation for dedicated speech bubble detection
These models recognize source text after detection.
- PaddleOCR-VL-1.5 for OCR text recognition
- Manga OCR for OCR
- MIT 48px OCR for OCR
These models remove source lettering before translated text is rendered back onto the page.
- aot-inpainting for inpainting
- lama-manga for inpainting
This model helps infer source font and color characteristics for rendering.
- YuzuMarker.FontDetection for font and color detection
The required models are downloaded automatically on first use.
Some models are consumed directly from upstream Hugging Face repos, while Rust-friendly safetensors conversions are hosted on Hugging Face when Koharu needs a converted bundle.
For a closer look at the pipeline, see Models and Providers and the Technical Deep Dive.
Koharu supports both local and remote LLM backends. Local models run through llama.cpp and are downloaded on demand. Hosted and self-hosted APIs are also supported when you want to use a provider instead of a downloaded model. When possible, Koharu also tries to preselect sensible defaults based on your system locale.
These are broad instruct models that work well when you want one local model for many translation tasks.
- Gemma 4 instruct: gemma4-e2b-it, gemma4-e4b-it, gemma4-26b-a4b-it, gemma4-31b-it
- Qwen 3.5: qwen3.5-0.8b, qwen3.5-2b, qwen3.5-4b, qwen3.5-9b, qwen3.5-27b, qwen3.5-35b-a3b
These variants relax the safety tuning applied to the corresponding base instruct models.
- Gemma 4 uncensored: gemma4-e2b-uncensored, gemma4-e4b-uncensored
- Qwen 3.5 uncensored: qwen3.5-2b-uncensored, qwen3.5-4b-uncensored, qwen3.5-9b-uncensored, qwen3.5-27b-uncensored, qwen3.5-35b-a3b-uncensored
These models are more specialized for translation quality, language coverage, or lower-resource setups.
- vntl-llama3-8b-v2: around 8.5 GB in Q8_0, best when translation quality matters more than speed or memory use
- lfm2.5-1.2b-instruct: a smaller multilingual instruct model that is easier to run on CPUs or low-memory GPUs
- sugoi-14b-ultra and sugoi-32b-ultra: larger translation-oriented options when you have more VRAM or RAM available
- sakura-galtransl-7b-v3.7: around 6.3 GB, a good balance of quality and speed on 8 GB GPUs
- sakura-1.5b-qwen2.5-v1.0: lighter and faster, useful on mid-range GPUs or CPU-only setups
- hunyuan-mt-7b: around 6.3 GB, with broad multilingual translation coverage
LLMs are downloaded on demand when you activate a model. For constrained memory environments, start with a smaller model. When VRAM or RAM permits, 7B and 8B class models generally provide better translation quality.
Koharu supports hosted APIs from OpenAI, Gemini, Claude, and DeepSeek instead of a local GGUF model.
Built-in cloud defaults: OpenAI gpt-5-mini, Gemini gemini-3.1-flash-lite-preview, Claude claude-haiku-4-5, and DeepSeek deepseek-chat.
Koharu supports OpenAI-compatible endpoints such as LM Studio, OpenRouter, and other self-hosted or third-party APIs that expose /v1/models and /v1/chat/completions.
Built-in OpenAI-compatible behavior: models are discovered from the configured endpoint.
Cloud providers can be configured with API keys. OpenAI-compatible providers also need a custom base URL. API keys are stored securely in your system keychain instead of plain text config files. API keys are optional for local servers such as LM Studio, but are usually required for hosted services such as OpenRouter.
Use a remote provider to avoid local model downloads, reduce VRAM or RAM requirements, or integrate with an existing hosted or self-hosted endpoint. Keep in mind that the OCR text selected for translation is sent to the provider you configured.
For LM Studio, OpenRouter, and other OpenAI-style endpoints, see Use OpenAI-Compatible APIs. For provider configuration, see Settings Reference.
You can download the latest release of Koharu from the releases page.
We provide prebuilt binaries for Windows, macOS, and Linux. For the standard install flow, see Install Koharu. If something goes wrong, see Troubleshooting.
To build Koharu from source, follow the steps below.
- Rust 1.92 or later
- Bun 1.0 or later
- LLVM 15 or later (for GPU acceleration builds)
- CUDA Toolkit 13.0 (for CUDA and ZLUDA support on Windows)
- AMD HIP SDK (for ZLUDA support on Windows)
bun installbun devbun run buildThe built binaries are written to target/release.
For platform-specific build notes, see Build From Source. For the local development workflow, see Contributing.
If Koharu is useful in your workflow, consider sponsoring the project.
Thanks to all the contributors who have helped make Koharu better!
Koharu is licensed under the GNU General Public License v3.0.
