Koharu

ML-powered manga translator, written in Rust.

Koharu introduces a local-first workflow for manga translation, utilizing the power of ML to automate the process. It combines the capabilities of object detection, OCR, inpainting, and LLMs to create a seamless translation experience.

Under the hood, Koharu uses candle and llama.cpp for high-performance inference, with Tauri for the desktop app. All components are written in Rust, ensuring safety and speed.

Note

Koharu runs its vision models and LLMs locally on your machine to keep your data private and secure.

Note

Support and discussion are available on the Discord server.

Features

Automatic detection of text regions, speech bubbles, and cleanup masks
OCR for manga dialogue, captions, and other page text
Inpainting to remove source lettering from the page
Translation with local or remote LLM backends
Advanced text rendering with vertical CJK and RTL support
Layered PSD export with editable text
Local HTTP API and MCP server for automation

For installation and first-run guidance, see Install Koharu and Translate Your First Page.

Usage

Hotkeys

Ctrl + Mouse Wheel: Zoom in/out
Ctrl + Drag: Pan the canvas
Del: Delete selected text block

Export

Koharu can export the current page either as a flattened rendered image or as a layered Photoshop PSD. PSD export preserves helper layers and writes translated text as editable text layers, which is useful for downstream cleanup and manual refinement.

For export behavior, PSD contents, and file naming, see Export Pages and Manage Projects.

MCP Server

Koharu includes a built-in MCP server for local agent integrations. By default it listens on a random local port, but you can pin it with --port.

# macOS / Linux
koharu --port 9999
# Windows
koharu.exe --port 9999

Then point your client at http://localhost:9999/mcp.

For local setup and the available tools, see Run GUI, Headless, and MCP Modes, Configure MCP Clients, and MCP Tools Reference.

Headless Mode

Koharu can run without launching the desktop window.

# macOS / Linux
koharu --port 4000 --headless
# Windows
koharu.exe --port 4000 --headless

You can then connect to the web client at http://localhost:4000.

For runtime modes, ports, and local endpoints, see Run GUI, Headless, and MCP Modes.

Runtime Configuration

Koharu lets you configure the shared local data path plus HTTP connect timeout, read timeout, and retry count used by downloads and provider requests.

Those values are loaded at startup, so changing them saves the config and restarts the app.

Google Fonts

Koharu includes built-in Google Fonts support for translated text rendering, so you can use web fonts without managing font files by hand.

Google Fonts are fetched on demand from a bundled catalog. Koharu caches downloaded files under the app data directory and reuses them for later renders, so you usually only need an internet connection the first time a family is used on that machine.

The catalog includes a small set of comic-friendly recommended families. Once cached, a Google Font behaves like any other local render font.

Text Rendering

Koharu includes a dedicated text renderer tuned for manga lettering, using Unicode-aware OpenType shaping, script-aware line breaking, precise glyph metrics, and real glyph bounds instead of generic browser or OS text primitives.

It supports vertical CJK layout, right-to-left scripts, font fallback, vertical punctuation alignment, constrained-box fitting, and manga-oriented stroke and effect compositing so translated text reads naturally inside speech bubbles, captions, and other irregular page layouts.

GPU Acceleration

Koharu supports CUDA, experimental ZLUDA, Metal, and Vulkan. CPU fallback is always available when the accelerated path is unavailable or not worth the setup cost on your system.

CUDA (NVIDIA GPUs on Windows)

On Windows, Koharu ships with CUDA support so it can use NVIDIA GPUs for the full local pipeline.

Koharu bundles CUDA Toolkit 13.0. The required DLLs are extracted to the application data directory on first run.

Note

Make sure you have current NVIDIA drivers installed. You can update them through NVIDIA App.

Supported NVIDIA GPUs

Koharu supports NVIDIA GPUs with compute capability 7.5 or higher.

For GPU compatibility references, see CUDA GPU Compute Capability.

ZLUDA (AMD GPUs on Windows, experimental)

Koharu supports experimental ZLUDA acceleration on Windows for AMD GPUs. ZLUDA is a CUDA compatibility layer that lets some CUDA workloads run on AMD GPUs.

To use it, install the AMD HIP SDK.

Metal (Apple Silicon on macOS)

Koharu supports Metal on Apple Silicon Macs. No extra runtime setup is required beyond a normal app install.

Vulkan (Windows and Linux)

Koharu also supports Vulkan on Windows and Linux. This backend is currently used primarily for OCR and local LLM inference.

Detection and inpainting still depend on CUDA, ZLUDA, or Metal, so Vulkan is useful but not a full replacement for the main accelerated path. AMD and Intel GPUs can still benefit from it.

CPU Fallback

You can always force Koharu to use CPU for inference:

# macOS / Linux
koharu --cpu
# Windows
koharu.exe --cpu

For backend selection, fallback behavior, and model runtime support, see Acceleration and Runtime.

ML Models

Koharu uses a staged stack of vision and language models instead of trying to solve the entire page with a single network.

Computer Vision Models

Koharu uses multiple pretrained models, each tuned for a specific part of the page pipeline.

Detection and Layout

These models find text regions, speech bubbles, and page structure.

comic-text-bubble-detector for joint text block and speech bubble detection
comic-text-detector for text segmentation masks
PP-DocLayoutV3 for document layout analysis
speech-bubble-segmentation for dedicated speech bubble detection

OCR

These models recognize source text after detection.

PaddleOCR-VL-1.5 for OCR text recognition
Manga OCR for OCR
MIT 48px OCR for OCR

Inpainting

These models remove source lettering before translated text is rendered back onto the page.

aot-inpainting for inpainting
lama-manga for inpainting

Font Analysis

This model helps infer source font and color characteristics for rendering.

YuzuMarker.FontDetection for font and color detection

The required models are downloaded automatically on first use.

Some models are consumed directly from upstream Hugging Face repos, while Rust-friendly safetensors conversions are hosted on Hugging Face when Koharu needs a converted bundle.

For a closer look at the pipeline, see Models and Providers and the Technical Deep Dive.

Large Language Models

Koharu supports both local and remote LLM backends. Local models run through llama.cpp and are downloaded on demand. Hosted and self-hosted APIs are also supported when you want to use a provider instead of a downloaded model. When possible, Koharu also tries to preselect sensible defaults based on your system locale.

General-Purpose Local Models

These are broad instruct models that work well when you want one local model for many translation tasks.

Gemma 4 instruct: gemma4-e2b-it, gemma4-e4b-it, gemma4-26b-a4b-it, gemma4-31b-it
Qwen 3.5: qwen3.5-0.8b, qwen3.5-2b, qwen3.5-4b, qwen3.5-9b, qwen3.5-27b, qwen3.5-35b-a3b

NSFW-Capable Local Models

These variants relax the safety tuning applied to the corresponding base instruct models.

Gemma 4 uncensored: gemma4-e2b-uncensored, gemma4-e4b-uncensored
Qwen 3.5 uncensored: qwen3.5-2b-uncensored, qwen3.5-4b-uncensored, qwen3.5-9b-uncensored, qwen3.5-27b-uncensored, qwen3.5-35b-a3b-uncensored

Fine-Tuned Translation Models

These models are more specialized for translation quality, language coverage, or lower-resource setups.

vntl-llama3-8b-v2: around 8.5 GB in Q8_0, best when translation quality matters more than speed or memory use
lfm2.5-1.2b-instruct: a smaller multilingual instruct model that is easier to run on CPUs or low-memory GPUs
sugoi-14b-ultra and sugoi-32b-ultra: larger translation-oriented options when you have more VRAM or RAM available
sakura-galtransl-7b-v3.7: around 6.3 GB, a good balance of quality and speed on 8 GB GPUs
sakura-1.5b-qwen2.5-v1.0: lighter and faster, useful on mid-range GPUs or CPU-only setups
hunyuan-mt-7b: around 6.3 GB, with broad multilingual translation coverage

LLMs are downloaded on demand when you activate a model. For constrained memory environments, start with a smaller model. When VRAM or RAM permits, 7B and 8B class models generally provide better translation quality.

Cloud Providers

Koharu supports hosted APIs from OpenAI, Gemini, Claude, and DeepSeek instead of a local GGUF model.

Built-in cloud defaults: OpenAI gpt-5-mini, Gemini gemini-3.1-flash-lite-preview, Claude claude-haiku-4-5, and DeepSeek deepseek-chat.

OpenAI-Compatible Providers

Koharu supports OpenAI-compatible endpoints such as LM Studio, OpenRouter, and other self-hosted or third-party APIs that expose /v1/models and /v1/chat/completions.

Built-in OpenAI-compatible behavior: models are discovered from the configured endpoint.

Cloud providers can be configured with API keys. OpenAI-compatible providers also need a custom base URL. API keys are stored securely in your system keychain instead of plain text config files. API keys are optional for local servers such as LM Studio, but are usually required for hosted services such as OpenRouter.

Use a remote provider to avoid local model downloads, reduce VRAM or RAM requirements, or integrate with an existing hosted or self-hosted endpoint. Keep in mind that the OCR text selected for translation is sent to the provider you configured.

For LM Studio, OpenRouter, and other OpenAI-style endpoints, see Use OpenAI-Compatible APIs. For provider configuration, see Settings Reference.

Installation

You can download the latest release of Koharu from the releases page.

We provide prebuilt binaries for Windows, macOS, and Linux. For the standard install flow, see Install Koharu. If something goes wrong, see Troubleshooting.

Development

To build Koharu from source, follow the steps below.

Prerequisites

Rust 1.92 or later
Bun 1.0 or later
LLVM 15 or later (for GPU acceleration builds)
CUDA Toolkit 13.0 (for CUDA and ZLUDA support on Windows)
AMD HIP SDK (for ZLUDA support on Windows)

Install dependencies

bun install

Development

bun dev

Build

bun run build

The built binaries are written to target/release.

For platform-specific build notes, see Build From Source. For the local development workflow, see Contributing.

Sponsorship

If Koharu is useful in your workflow, consider sponsoring the project.

Contributors ❤️

Thanks to all the contributors who have helped make Koharu better!

License

Koharu is licensed under the GNU General Public License v3.0.

Name		Name	Last commit message	Last commit date
Latest commit History 1,479 Commits
.cargo		.cargo
.github		.github
docs		docs
e2e		e2e
koharu-app		koharu-app
koharu-core		koharu-core
koharu-llm		koharu-llm
koharu-ml		koharu-ml
koharu-psd		koharu-psd
koharu-renderer		koharu-renderer
koharu-rpc		koharu-rpc
koharu-runtime		koharu-runtime
koharu		koharu
scripts		scripts
ui		ui
.gitignore		.gitignore
.prettierrc		.prettierrc
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
cliff.toml		cliff.toml
package.json		package.json
playwright.config.ts		playwright.config.ts

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Koharu

Features

Usage

Hotkeys

Export

MCP Server

Headless Mode

Runtime Configuration

Google Fonts

Text Rendering

GPU Acceleration

CUDA (NVIDIA GPUs on Windows)

Supported NVIDIA GPUs

ZLUDA (AMD GPUs on Windows, experimental)

Metal (Apple Silicon on macOS)

Vulkan (Windows and Linux)

CPU Fallback

ML Models

Computer Vision Models

Detection and Layout

OCR

Inpainting

Font Analysis

Large Language Models

General-Purpose Local Models

NSFW-Capable Local Models

Fine-Tuned Translation Models

Cloud Providers

OpenAI-Compatible Providers

Installation

Development

Prerequisites

Install dependencies

Development

Build

Sponsorship

Contributors ❤️

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 100

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages