Skip to content

Migrate to tensorrt_cpp_api v7#78

Merged
cyrusbehr merged 8 commits into
mainfrom
v7-migration
May 30, 2026
Merged

Migrate to tensorrt_cpp_api v7#78
cyrusbehr merged 8 commits into
mainfrom
v7-migration

Conversation

@cyrusbehr

@cyrusbehr cyrusbehr commented May 29, 2026

Copy link
Copy Markdown
Owner

What

Ports YOLOv8-TensorRT-CPP from v6 tensorrt-cpp-api (templated Engine<T>, OpenCV-in-the-signature, nested-vector IO) to v7 (namespace trtcpp, no-throw Status/Result, name-keyed tensors, fused GPU preprocessing). Depends on tensorrt-cpp-api #95; the libs/tensorrt-cpp-api submodule is pinned to the merged v7 main.

Changes

  • Engine<float> -> trtcpp::Engine; Options/buildLoadNetwork -> BuildOptions + EngineBuilder::buildAndLoad.
  • Preprocessing uses the v7 fused kernel preproc::letterboxToTensor over a zero-copy trtcpp::opencv::viewOf(GpuMat) view (BGR->RGB, letterbox, /255), replacing OpenCV cvtColor/resize + in-engine normalize. (Pitched GpuMat is copied into a cv::cuda::createContinuous buffer first.)
  • runInference + transformOutput -> name-keyed engine.infer(...) + toHost(stream); the detect/pose/seg post-processing math is unchanged.
  • Failed Result unwraps throw instead of asserting/UB in release builds.
  • Precision::FP32/FP16/INT8 -> trtcpp::Precision::kFp32/kFp16/kInt8Qdq (INT8 caveat in MIGRATION.md).
  • CMake: C++20, namespaced targets + WITH_OPENCV/BUILD_PREPROC; local stopwatch.h.

Synced with main

This branch is merged up to current main (which landed PRs #69/#70/#71/#74). The only content conflict, src/yolov8.cpp, was resolved by keeping the v7 form while folding in main's two new features:

  • Prebuilt-engine loading (a trtModelPath arg): re-expressed in v7 as trtcpp::Engine::loadFromFile(...) when no ONNX is given, vs EngineBuilder::buildAndLoad(...) for the ONNX path.
  • Magic-number removal (pose vs detect): the channel count is read from the cached v7 output Shape (4 + classes + kpts*3 for pose vs 4 + classes for detect) instead of a literal.

Also fixed object_detection_csi_jetson.cpp (new in main): it relied on highgui/videoio being pulled in transitively by the v6 engine.h. v7's public headers don't leak OpenCV, so those includes are now explicit.

Verified

Built and run on an RTX 3080 (CUDA-12.6 OpenCV-CUDA + v7): detect_object_image on images/team.jpg (FP16 YOLOv8n) runs end-to-end and detects the people in the frame (~10 objects; the exact count shifts by one or two across FP16 engine rebuilds, as TensorRT tactic selection moves borderline detections across the confidence threshold). The library + detect_object_image build and run; the video/CSI demo targets additionally need an OpenCV with highgui/videoio. After pulling, run git submodule update --init. See MIGRATION.md.

🤖 Generated with Claude Code

cyrusbehr and others added 8 commits May 29, 2026 15:54
Port the inference layer from the v6 templated Engine<T> API to v7 (namespace trtcpp, no-throw
Status/Result, name-keyed tensors, fused GPU preprocessing):

- Engine<float> -> trtcpp::Engine; Options/buildLoadNetwork -> BuildOptions/EngineBuilder::buildAndLoad.
- Preprocessing now uses the v7 fused kernel preproc::letterboxToTensor over a zero-copy
  opencv::viewOf(GpuMat) device view (BGR->RGB, letterbox, /255), replacing the OpenCV
  cvtColor/resize + in-engine normalize.
- runInference + transformOutput -> engine.infer(name-keyed) + per-output toHost(stream); the
  detect/pose/seg post-processing math is unchanged (only dim access and output buffering moved
  to the v7 Shape / flat Tensor API).
- Precision::FP32/FP16/INT8 -> trtcpp::Precision::kFp32/kFp16/kInt8Qdq (INT8 caveat in MIGRATION.md).
- CMake: C++20, namespaced targets + WITH_OPENCV/BUILD_PREPROC; added a local stopwatch.h to
  replace the timing util v7 no longer ships; explicit OpenCV module includes.

Requires the libs/tensorrt-cpp-api submodule at v7.0.0+. Syntax-checked against the real v7 +
OpenCV headers; NOT yet compiled/linked/run (prepared on a host with broken OpenCV-CUDA) -- build
and run before merging. See MIGRATION.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
yolov8.cpp uses std::cout under ENABLE_BENCHMARKS; v6 got <iostream> transitively via engine.h,
which v7 does not pull in. Include it explicitly. (Default builds are unaffected.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Point libs/tensorrt-cpp-api at the v7 release candidate so the v7-migrated YoloV8 code
(namespaced targets, fused preproc, name-keyed IO) builds. See MIGRATION.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
From the code-review pass: replace the unchecked .value() calls (tensorShape, Tensor::allocate,
TensorView::as<float>) with a must() helper that throws std::runtime_error on error. .value() on
an error Result asserts in debug and is undefined behavior in this app's -DNDEBUG release builds
(e.g. on a dynamic/oversized input shape, CUDA OOM, or a non-float engine output).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Built and ran the migration on an RTX 3080 against OpenCV-CUDA + tensorrt_cpp_api v7:
detect_object_image on team.jpg (FP16 YOLOv8n) detects 9 objects.

Fixes found by that real build/run:
- The preprocess fed cv::cuda::GpuMat directly to opencv::viewOf, but a GpuMat's rows are pitched
  (padded) while a TensorView is contiguous, so viewOf rejected it at runtime. Copy into a
  cv::cuda::createContinuous buffer when the upload isn't already continuous.
- Removed the now-dead <opencv2/cudaimgproc.hpp> includes (the migration replaced cv::cuda::cvtColor
  with the fused preproc kernel; only cv::cuda::GpuMat from core/cuda is still used).

Bump the libs/tensorrt-cpp-api submodule to the v7 release that builds the static core as PIC
(required to link libYoloV8_TRT.so). MIGRATION.md updated to reflect the verified build.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ation

origin/main advanced 9 commits since the migration branched, adding two
features that collided with the v7 rewrite in src/yolov8.cpp:

  - Optional prebuilt-TensorRT-engine loading (PR #69): a `trtModelPath`
    constructor/CLI arg. Re-expressed in v7 form via Engine::loadFromFile()
    when no ONNX path is given; ONNX path still goes through
    EngineBuilder{}.buildAndLoad(). The v6 SUB/DIV/NORMALIZE args are gone
    (preprocessing is now fused on the GPU in preprocess()).
  - Remove the magic number 56 for pose vs. detect (PR #71): now branches on
    4 + CLASS_NAMES.size() (+ NUM_KPS*3 for pose), reading the channel count
    from the cached v7 output Shape (m_outputShapes[0][1]) instead of the v6
    getOutputDims(), and throws if neither matches.

All other touched files (CMakeLists.txt, cmd_line_util.h, yolov8.h,
benchmark.cpp, the three object_detection_*.cpp, and the new
object_detection_csi_jetson.cpp) auto-merged cleanly: main's trtModelPath
threading composes with the v7 Precision/Engine/Tensor changes, and the new
CSI-Jetson executable already uses the 3-arg constructor + parseArgumentsVideo
signatures that the migration produced.

Bumped the tensorrt-cpp-api submodule from the rc1 commit 6a6d4dd to the
squash-merged v7 main 166ce91 (PR #95). The public headers the app uses are
byte-identical between the two; only CI workflows and one unused allocator
member differ.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The object_detection_csi_jetson.cpp tool added on main relied on
<opencv2/highgui.hpp> and <opencv2/videoio.hpp> being pulled in transitively
via the v6 "engine.h". Under tensorrt_cpp_api v7 the public headers
deliberately include no OpenCV, so cv::VideoCapture / cv::imshow / cv::waitKey
no longer resolved and the target failed to compile. Include highgui and
videoio explicitly (matching how the migration fixed the other tools) and drop
the unused <opencv2/cudaimgproc.hpp> include (this tool does no cv::cuda
imgproc; it uploads a cv::Mat inside detectObjects).

Verified with -fsyntax-only against the full OpenCV-CUDA headers: clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
detect_object_image detects ~10 objects on team.jpg; the exact count shifts by
one or two across FP16 engine rebuilds (TensorRT tactic selection near the
confidence threshold), so state it as approximate rather than pinning an exact
count. Also note the video/CSI demo targets need an OpenCV with highgui/videoio.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cyrusbehr cyrusbehr merged commit 7e66d88 into main May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant