Skip to content

Add OVD local inference example#359

Open
Travor278 wants to merge 1 commit into
om-ai-lab:mainfrom
Travor278:add-ovd-local-inference-example
Open

Add OVD local inference example#359
Travor278 wants to merge 1 commit into
om-ai-lab:mainfrom
Travor278:add-ovd-local-inference-example

Conversation

@Travor278
Copy link
Copy Markdown

@Travor278 Travor278 commented May 12, 2026

Description

Adds a small examples/ovd gallery for running the released VLM-R1 OVD checkpoint on local images.

The example includes:

  • a CLI inference script for omlab/VLM-R1-Qwen2.5VL-3B-OVD-0321
  • the OVD prompt template used by the public demo flow
  • robust parsing for <answer> JSON, Markdown-fenced JSON, and optional json_repair
  • annotated image output plus detections.json, raw_output.txt, prompt.txt, and a single-case HTML report
  • a small gallery builder for comparing multiple local inference outputs
  • three bundled smoke-test images covering single-class, multi-class, and multi-object outputs

This is additive only and does not change training, evaluation, or model-loading defaults elsewhere in the repository.

Gallery Preview

The gallery below was generated locally with examples/ovd/build_gallery.py from multiple OVD output directories. The checked-in builder supports the same switchable layout for any number of --case entries.

VLM-R1 OVD generated gallery preview

Related Issue

Closes #200.

Related to #232 and #306 because the example supports multiple bounding boxes and comma-separated target labels. It may also help users debugging OVD demo setup issues such as #297, but it does not change the hosted demo or add OVDEval evaluation templates.

Motivation and Context

Several users asked for a runnable OVD inference path and the exact prompt shape needed to get bounding-box JSON from the released OVD model. Pointing users only to the hosted Space makes local debugging harder, especially when they need to inspect raw model output, parsed boxes, and the rendered result.

This example keeps the surface small: one local image in, annotated output and JSON artifacts out.

How Has This Been Tested?

Local environment:

  • Python 3.11
  • PyTorch 2.7.1+cu128
  • Transformers 5.8.0
  • NVIDIA GeForce RTX 5070 Laptop GPU, 8GB VRAM

Commands run:

python -m py_compile examples/ovd/infer_ovd.py examples/ovd/build_gallery.py
python examples/ovd/infer_ovd.py \
  --image examples/ovd/assets/person.jpg \
  --labels person \
  --output-dir outputs/ovd_person \
  --max-memory "cuda:7GiB,cpu:24GiB" \
  --local-files-only

Result: parsed 4 detections.

python examples/ovd/infer_ovd.py \
  --image examples/ovd/assets/drinks_fruit.jpg \
  --labels "drink,fruit" \
  --output-dir outputs/ovd_drinks_fruit \
  --max-memory "cuda:7GiB,cpu:24GiB" \
  --local-files-only

Result: parsed 3 detections.

python examples/ovd/infer_ovd.py \
  --image examples/ovd/assets/desk.png \
  --labels "keyboard,white cup,laptop" \
  --output-dir outputs/ovd_desk \
  --max-memory "cuda:7GiB,cpu:24GiB" \
  --local-files-only

Result: parsed 3 detections.

python examples/ovd/build_gallery.py \
  --case "Person=outputs/ovd_person" \
  --case "Drinks/Fruit=outputs/ovd_drinks_fruit" \
  --case "Desk=outputs/ovd_desk"

Result: wrote outputs/ovd_gallery/index.html.

Checklist

  • Added an additive example without changing training, evaluation, or model-loading defaults.
  • Included local inference artifacts for raw output, parsed detections, annotated image, and HTML report generation.
  • Tested single-label, multi-label, and multi-object OVD examples locally.
  • Tested the gallery builder on multiple generated output directories.
  • Kept generated outputs/ files out of the committed changes.

@Travor278 Travor278 marked this pull request as ready for review May 12, 2026 01:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OVD模型 demo 代码

1 participant