OpenCV + ddddocr pipeline for noisy Chinese captcha images.
MyCaptchaOCR focuses on preprocessing and evidence-based reranking before OCR. It generates multiple image variants, reads them with ddddocr, and reranks OCR outputs by consensus, character-position evidence, and preprocessing diversity.
PaddleOCR is intentionally not included. On the bundled distorted captcha samples, PaddleOCR detection treated interference lines and glyph fragments as text boxes, while recognition-only mode produced unstable Chinese predictions.
- Crops the captcha body from source images.
- Upscales before denoising to preserve Chinese glyph strokes.
- Builds conservative, color-priority, line-inpaint, dark-line-suppression, and crop variants.
- Runs ddddocr default, beta, and old recognizers.
- Uses an adaptive profile (low-confidence images expand to the full candidate set); the shipped default runs that balanced set on every image for stable accuracy.
- Reranks candidates while reducing false consensus from near-duplicate preprocessing variants.
- Writes reproducible CSV, Markdown, and visual sheet outputs.
- Python 3.11 through 3.14.
- Current pinned runtime dependencies:
ddddocr==1.6.1onnxruntime==1.26.0opencv-python==4.13.0.92pillow==12.2.0numpy==2.4.6PySide6==6.11.1
opencv-contrib-python and pandas are not required.
Use a local virtual environment. Do not install dependencies globally.
python3.14 -m venv .venv
.venv/bin/python -m pip install --upgrade pip setuptools wheel
.venv/bin/python -m pip install -r requirements.txtCheck the runtime:
.venv/bin/python scripts/check_ocr_env.pyOn Windows PowerShell, use a Windows virtual environment:
py -3.12 -m venv .venv
.\.venv\Scripts\python.exe -m pip install --upgrade pip setuptools wheel
.\.venv\Scripts\python.exe -m pip install -r requirements.txt
.\.venv\Scripts\python.exe scripts\check_ocr_env.pyThe check prints Python, ddddocr, ONNX Runtime, OpenCV, NumPy, Pillow, PySide6, and the project-local OCR cache paths.
Run the default Chinese sample set:
.venv/bin/python scripts/adaptive_ocr_pipeline.py
.venv/bin/python scripts/adaptive_ocr_rerank.py --top-n 12Run the mini desktop UI:
.venv/bin/python scripts/ocr_desktop_app.pyOn Windows, double-click 启动.bat in the project root, or run:
.\.venv\Scripts\python.exe scripts\ocr_desktop_app.pyClick 选取区域 and drag over the target region once. The region is remembered,
so each later 识别 click re-captures that same region (picking up a refreshed
captcha in the same spot) without re-selecting; click 选取区域 again only to
change the region. The UI keeps OCR engines warm after the first recognition so
later runs avoid model reload time. On macOS, grant Screen Recording permission
if the selection capture is blank or blocked.
See docs/DESKTOP_UI.md for platform notes and packaging commands.
The default input pattern is sample-[0-9]*.png, which covers the five bundled
Chinese captcha samples in data/raw/. To process every raw PNG, including the
alphanumeric smoke-test image, use:
.venv/bin/python scripts/adaptive_ocr_pipeline.py --input-dir data/raw --pattern "*.png"
.venv/bin/python scripts/adaptive_ocr_rerank.py --top-n 12Use a different input directory:
.venv/bin/python scripts/adaptive_ocr_pipeline.py --input-dir /path/to/images --pattern "*.png"
.venv/bin/python scripts/adaptive_ocr_rerank.py --top-n 12--profile adaptiveis the default. It first runs 75 no-crop candidates per image, then expands to a capped fallback set only when the early result is low confidence. The default fallback cap is 455 candidates; use--adaptive-full-limitto tune the speed/accuracy tradeoff.--profile fastalways uses the 75-candidate set.--profile fullalways uses the full generated candidate set.
The desktop app and the shipped default set OCR_FORCE_BALANCED = True in
scripts/ocr_project_env.py, which makes the adaptive profile skip the fast
early-exit and always run the 455-candidate balanced path. Set it to False to
restore confidence-based early exit (and the faster adaptive-fast mode on easy
images).
Each image prints mode, candidate count, OCR row count, top candidate, and
preprocessing/OCR/total seconds. adaptive-fast means the early result was
accepted; adaptive-balanced means the image expanded to the capped fallback
candidate set.
OCR inference over the candidate variants is parallelized across CPU cores, so a
455-candidate balanced run takes about 9s on a 6-core CPU instead of about 37s
when run serially, with byte-identical results. Tuning knobs live at the top of
scripts/ocr_project_env.py:
OCR_WORKERS— concurrent inference workers (default 6; set to your physical core count).OCR_INTRA_OP_THREADSandOCR_CV_THREADS— per-session ONNX Runtime threads and OpenCV threads, capped so recognition does not saturate every core.OCR_FORCE_BALANCED— always run the balanced path (defaultTrue).
Generated files are ignored by Git:
data/processed/adaptive_ocr/ generated candidate images
reports/adaptive_ocr_rows.csv OCR rows
reports/adaptive_ocr_scores.csv first-stage text scores
reports/adaptive_ocr_summary.md first-stage summary
reports/adaptive_ocr_top_sheet.png visual first-stage sheet
reports/adaptive_ocr_v2_combinations.csv
reports/adaptive_ocr_v2_summary.md
reports/adaptive_ocr_v2_top_sheet.png
See docs/RESULTS.md for current sample outputs. The most
ambiguous sample is sample-034918.png; the reranker keeps 狱己擦九 and
狱己擦力 close because the final character has nearly tied evidence.
Recent local timing on an Intel i5-10400 (6 cores / 12 threads), Python 3.14.5, default adaptive profile with always-balanced and parallel inference:
| sample set | per-image time | notes |
|---|---|---|
| 5 Chinese samples | ~9-14s/image | every image runs the 455-candidate balanced path; time scales with image size |
data/raw/ tracked sample input images
data/processed/ generated candidate images, ignored by Git
docs/ design notes and sample results
reports/ generated CSV/Markdown/PNG reports, ignored by Git
scripts/ runnable pipeline scripts
requirements.txt pinned runtime dependencies
pyproject.toml project metadata
- This is an OCR research pipeline, not a CAPTCHA-bypass service.
- Use it only for images you own or are authorized to analyze.
- It assumes the target text is Chinese and four characters long by default;
override
--expected-lenfor other lengths. - The adaptive profile is tuned on the bundled samples. Use
--profile fullwhen evaluating new captcha styles.
MIT. See LICENSE.