Migrate to tensorrt_cpp_api v7 by cyrusbehr · Pull Request #78 · cyrusbehr/YOLOv8-TensorRT-CPP

cyrusbehr · 2026-05-29T21:44:03Z

What

Ports YOLOv8-TensorRT-CPP from v6 tensorrt-cpp-api (templated Engine<T>, OpenCV-in-the-signature, nested-vector IO) to v7 (namespace trtcpp, no-throw Status/Result, name-keyed tensors, fused GPU preprocessing). Depends on tensorrt-cpp-api #95; the libs/tensorrt-cpp-api submodule is pinned to the merged v7 main.

Changes

Engine<float> -> trtcpp::Engine; Options/buildLoadNetwork -> BuildOptions + EngineBuilder::buildAndLoad.
Preprocessing uses the v7 fused kernel preproc::letterboxToTensor over a zero-copy trtcpp::opencv::viewOf(GpuMat) view (BGR->RGB, letterbox, /255), replacing OpenCV cvtColor/resize + in-engine normalize. (Pitched GpuMat is copied into a cv::cuda::createContinuous buffer first.)
runInference + transformOutput -> name-keyed engine.infer(...) + toHost(stream); the detect/pose/seg post-processing math is unchanged.
Failed Result unwraps throw instead of asserting/UB in release builds.
Precision::FP32/FP16/INT8 -> trtcpp::Precision::kFp32/kFp16/kInt8Qdq (INT8 caveat in MIGRATION.md).
CMake: C++20, namespaced targets + WITH_OPENCV/BUILD_PREPROC; local stopwatch.h.

Synced with `main`

This branch is merged up to current main (which landed PRs #69/#70/#71/#74). The only content conflict, src/yolov8.cpp, was resolved by keeping the v7 form while folding in main's two new features:

Prebuilt-engine loading (a trtModelPath arg): re-expressed in v7 as trtcpp::Engine::loadFromFile(...) when no ONNX is given, vs EngineBuilder::buildAndLoad(...) for the ONNX path.
Magic-number removal (pose vs detect): the channel count is read from the cached v7 output Shape (4 + classes + kpts*3 for pose vs 4 + classes for detect) instead of a literal.

Also fixed object_detection_csi_jetson.cpp (new in main): it relied on highgui/videoio being pulled in transitively by the v6 engine.h. v7's public headers don't leak OpenCV, so those includes are now explicit.

Verified

Built and run on an RTX 3080 (CUDA-12.6 OpenCV-CUDA + v7): detect_object_image on images/team.jpg (FP16 YOLOv8n) runs end-to-end and detects the people in the frame (~10 objects; the exact count shifts by one or two across FP16 engine rebuilds, as TensorRT tactic selection moves borderline detections across the confidence threshold). The library + detect_object_image build and run; the video/CSI demo targets additionally need an OpenCV with highgui/videoio. After pulling, run git submodule update --init. See MIGRATION.md.

🤖 Generated with Claude Code

Port the inference layer from the v6 templated Engine<T> API to v7 (namespace trtcpp, no-throw Status/Result, name-keyed tensors, fused GPU preprocessing): - Engine<float> -> trtcpp::Engine; Options/buildLoadNetwork -> BuildOptions/EngineBuilder::buildAndLoad. - Preprocessing now uses the v7 fused kernel preproc::letterboxToTensor over a zero-copy opencv::viewOf(GpuMat) device view (BGR->RGB, letterbox, /255), replacing the OpenCV cvtColor/resize + in-engine normalize. - runInference + transformOutput -> engine.infer(name-keyed) + per-output toHost(stream); the detect/pose/seg post-processing math is unchanged (only dim access and output buffering moved to the v7 Shape / flat Tensor API). - Precision::FP32/FP16/INT8 -> trtcpp::Precision::kFp32/kFp16/kInt8Qdq (INT8 caveat in MIGRATION.md). - CMake: C++20, namespaced targets + WITH_OPENCV/BUILD_PREPROC; added a local stopwatch.h to replace the timing util v7 no longer ships; explicit OpenCV module includes. Requires the libs/tensorrt-cpp-api submodule at v7.0.0+. Syntax-checked against the real v7 + OpenCV headers; NOT yet compiled/linked/run (prepared on a host with broken OpenCV-CUDA) -- build and run before merging. See MIGRATION.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

yolov8.cpp uses std::cout under ENABLE_BENCHMARKS; v6 got <iostream> transitively via engine.h, which v7 does not pull in. Include it explicitly. (Default builds are unaffected.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Point libs/tensorrt-cpp-api at the v7 release candidate so the v7-migrated YoloV8 code (namespaced targets, fused preproc, name-keyed IO) builds. See MIGRATION.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

From the code-review pass: replace the unchecked .value() calls (tensorShape, Tensor::allocate, TensorView::as<float>) with a must() helper that throws std::runtime_error on error. .value() on an error Result asserts in debug and is undefined behavior in this app's -DNDEBUG release builds (e.g. on a dynamic/oversized input shape, CUDA OOM, or a non-float engine output). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Built and ran the migration on an RTX 3080 against OpenCV-CUDA + tensorrt_cpp_api v7: detect_object_image on team.jpg (FP16 YOLOv8n) detects 9 objects. Fixes found by that real build/run: - The preprocess fed cv::cuda::GpuMat directly to opencv::viewOf, but a GpuMat's rows are pitched (padded) while a TensorView is contiguous, so viewOf rejected it at runtime. Copy into a cv::cuda::createContinuous buffer when the upload isn't already continuous. - Removed the now-dead <opencv2/cudaimgproc.hpp> includes (the migration replaced cv::cuda::cvtColor with the fused preproc kernel; only cv::cuda::GpuMat from core/cuda is still used). Bump the libs/tensorrt-cpp-api submodule to the v7 release that builds the static core as PIC (required to link libYoloV8_TRT.so). MIGRATION.md updated to reflect the verified build. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ation origin/main advanced 9 commits since the migration branched, adding two features that collided with the v7 rewrite in src/yolov8.cpp: - Optional prebuilt-TensorRT-engine loading (PR #69): a `trtModelPath` constructor/CLI arg. Re-expressed in v7 form via Engine::loadFromFile() when no ONNX path is given; ONNX path still goes through EngineBuilder{}.buildAndLoad(). The v6 SUB/DIV/NORMALIZE args are gone (preprocessing is now fused on the GPU in preprocess()). - Remove the magic number 56 for pose vs. detect (PR #71): now branches on 4 + CLASS_NAMES.size() (+ NUM_KPS*3 for pose), reading the channel count from the cached v7 output Shape (m_outputShapes[0][1]) instead of the v6 getOutputDims(), and throws if neither matches. All other touched files (CMakeLists.txt, cmd_line_util.h, yolov8.h, benchmark.cpp, the three object_detection_*.cpp, and the new object_detection_csi_jetson.cpp) auto-merged cleanly: main's trtModelPath threading composes with the v7 Precision/Engine/Tensor changes, and the new CSI-Jetson executable already uses the 3-arg constructor + parseArgumentsVideo signatures that the migration produced. Bumped the tensorrt-cpp-api submodule from the rc1 commit 6a6d4dd to the squash-merged v7 main 166ce91 (PR #95). The public headers the app uses are byte-identical between the two; only CI workflows and one unused allocator member differ. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The object_detection_csi_jetson.cpp tool added on main relied on <opencv2/highgui.hpp> and <opencv2/videoio.hpp> being pulled in transitively via the v6 "engine.h". Under tensorrt_cpp_api v7 the public headers deliberately include no OpenCV, so cv::VideoCapture / cv::imshow / cv::waitKey no longer resolved and the target failed to compile. Include highgui and videoio explicitly (matching how the migration fixed the other tools) and drop the unused <opencv2/cudaimgproc.hpp> include (this tool does no cv::cuda imgproc; it uploads a cv::Mat inside detectObjects). Verified with -fsyntax-only against the full OpenCV-CUDA headers: clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

detect_object_image detects ~10 objects on team.jpg; the exact count shifts by one or two across FP16 engine rebuilds (TensorRT tactic selection near the confidence threshold), so state it as approximate rather than pinning an exact count. Also note the video/CSI demo targets need an OpenCV with highgui/videoio. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cyrusbehr and others added 8 commits May 29, 2026 15:54

Bump tensorrt-cpp-api submodule to v7.0.0-rc1

a69b629

Point libs/tensorrt-cpp-api at the v7 release candidate so the v7-migrated YoloV8 code (namespaced targets, fused preproc, name-keyed IO) builds. See MIGRATION.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cyrusbehr merged commit 7e66d88 into main May 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate to tensorrt_cpp_api v7#78

Migrate to tensorrt_cpp_api v7#78
cyrusbehr merged 8 commits into
mainfrom
v7-migration

cyrusbehr commented May 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cyrusbehr commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Changes

Synced with main

Verified

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cyrusbehr commented May 29, 2026 •

edited

Loading

Synced with `main`