Skip to content

Migrate to tensorrt_cpp_api v7#2

Merged
cyrusbehr merged 6 commits into
mainfrom
v7-migration
May 30, 2026
Merged

Migrate to tensorrt_cpp_api v7#2
cyrusbehr merged 6 commits into
mainfrom
v7-migration

Conversation

@cyrusbehr

@cyrusbehr cyrusbehr commented May 29, 2026

Copy link
Copy Markdown
Owner

What

Ports YOLOv9-TensorRT-CPP from v6 tensorrt-cpp-api (templated Engine<T>, OpenCV-in-the-signature, nested-vector IO) to v7 (namespace trtcpp, no-throw Status/Result, name-keyed tensors, fused GPU preprocessing). Depends on tensorrt-cpp-api #95; the libs/tensorrt-cpp-api submodule is pinned to the merged v7 main.

Changes

  • Engine<float> -> trtcpp::Engine; Options/buildLoadNetwork -> BuildOptions + EngineBuilder::buildAndLoad.
  • Preprocessing uses the v7 fused kernel preproc::letterboxToTensor over a zero-copy trtcpp::opencv::viewOf(GpuMat) view (BGR->RGB, letterbox, /255), replacing OpenCV cvtColor/resize + in-engine normalize. (Pitched GpuMat is copied into a createContinuous buffer first.)
  • runInference + transformOutput -> name-keyed engine.infer(...) + toHost(stream); the detection post-processing math is unchanged.
  • Failed Result unwraps throw instead of asserting/UB in release builds.
  • Precision::FP32/FP16/INT8 -> trtcpp::Precision::kFp32/kFp16/kInt8Qdq (INT8 caveat in MIGRATION.md).
  • CMake: C++20, namespaced targets + WITH_OPENCV/BUILD_PREPROC; local stopwatch.h.
  • libs/tensorrt-cpp-api had been committed as in-tree v6 source files; converted to a proper v7 submodule gitlink (run git submodule update --init after pulling).

Verified

Built and run on an RTX 3080 (CUDA-12.6 OpenCV-CUDA + v7): detect_object_image on images/cars.jpg (FP16 yolov9-e) runs end-to-end and detects the vehicles in the frame (~26 objects; the exact count shifts by one or two across FP16 engine rebuilds, as TensorRT tactic selection moves borderline detections across the confidence threshold). The library + detect_object_image build and run; the video demo target additionally needs an OpenCV with highgui/videoio. No merge conflict (branch is clean against main). See MIGRATION.md.

🤖 Generated with Claude Code

cyrusbehr and others added 6 commits May 29, 2026 16:01
Port the inference layer from the v6 templated Engine<T> API to v7 (namespace trtcpp, no-throw
Status/Result, name-keyed tensors, fused GPU preprocessing). YOLOv9 here is detection-only.

- Engine<float> -> trtcpp::Engine; Options/buildLoadNetwork -> BuildOptions/EngineBuilder::buildAndLoad.
- Preprocessing now uses the v7 fused kernel preproc::letterboxToTensor over a zero-copy
  opencv::viewOf(GpuMat) device view (BGR->RGB, letterbox, /255), replacing the OpenCV
  cvtColor/resize + in-engine normalize.
- runInference + transformOutput -> engine.infer(name-keyed) + toHost(stream); the detection
  post-processing math is unchanged (only dim access and output buffering moved to the v7
  Shape / flat Tensor API).
- Precision::FP32/FP16/INT8 -> trtcpp::Precision::kFp32/kFp16/kInt8Qdq (INT8 caveat in MIGRATION.md).
- CMake: C++20, namespaced targets + WITH_OPENCV/BUILD_PREPROC; added a local stopwatch.h and
  explicit OpenCV/<iostream> includes that v6 pulled in transitively via engine.h.

Requires the libs/tensorrt-cpp-api submodule at v7.0.0+. Syntax-checked against the real v7 +
OpenCV headers (incl. the benchmark path); NOT yet compiled/linked/run (prepared on a host with
broken OpenCV-CUDA) -- build and run before merging. See MIGRATION.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
From the code-review pass: replace the unchecked .value() calls (tensorShape, Tensor::allocate,
TensorView::as<float>) with a must() helper that throws std::runtime_error on error. .value() on
an error Result asserts in debug and is undefined behavior in this app's -DNDEBUG release builds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ource)

This repo had the v6 tensorrt-cpp-api committed as in-tree source files under libs/, even though
.gitmodules declares it a submodule. Convert it to a proper submodule gitlink pointing at
tensorrt-cpp-api v7.0.0-rc1 (matching .gitmodules and the YOLOv8 repo layout), so the v7-migrated
YoloV9 code builds. Run `git submodule update --init` after pulling. See MIGRATION.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Built and ran the migration on an RTX 3080 against OpenCV-CUDA + tensorrt_cpp_api v7:
detect_object_image on cars.jpg (FP16 yolov9-e) detects 24 objects.

Fixes found by that real build/run:
- The preprocess fed cv::cuda::GpuMat directly to opencv::viewOf, but a GpuMat's rows are pitched
  (padded) while a TensorView is contiguous, so viewOf rejected it at runtime. Copy into a
  cv::cuda::createContinuous buffer when the upload isn't already continuous.
- Removed the now-dead <opencv2/cudaimgproc.hpp> includes (only cv::cuda::GpuMat from core/cuda is
  still used).

Bump the libs/tensorrt-cpp-api submodule to the v7 release that builds the static core as PIC
(required to link libYoloV9_TRT.so). MIGRATION.md updated to reflect the verified build.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Point the submodule at the squash-merged v7 main of tensorrt-cpp-api (PR #95)
instead of the rc1 commit 6a6d4dd, matching the YOLOv8 migration PR. The public
headers the app uses are byte-identical between the two commits; only CI
workflows and one unused allocator member differ. Rebuilt and re-ran:
detect_object_image on images/cars.jpg with FP16 yolov9-e detects 26 objects.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
detect_object_image detects ~26 objects on cars.jpg; the exact count shifts by
one or two across FP16 engine rebuilds (TensorRT tactic selection near the
confidence threshold), so state it as approximate rather than pinning an exact
count. Also note the video demo target needs an OpenCV with highgui/videoio.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cyrusbehr cyrusbehr merged commit 5600add into main May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant