Migrate to tensorrt_cpp_api v7#78
Merged
Merged
Conversation
Port the inference layer from the v6 templated Engine<T> API to v7 (namespace trtcpp, no-throw Status/Result, name-keyed tensors, fused GPU preprocessing): - Engine<float> -> trtcpp::Engine; Options/buildLoadNetwork -> BuildOptions/EngineBuilder::buildAndLoad. - Preprocessing now uses the v7 fused kernel preproc::letterboxToTensor over a zero-copy opencv::viewOf(GpuMat) device view (BGR->RGB, letterbox, /255), replacing the OpenCV cvtColor/resize + in-engine normalize. - runInference + transformOutput -> engine.infer(name-keyed) + per-output toHost(stream); the detect/pose/seg post-processing math is unchanged (only dim access and output buffering moved to the v7 Shape / flat Tensor API). - Precision::FP32/FP16/INT8 -> trtcpp::Precision::kFp32/kFp16/kInt8Qdq (INT8 caveat in MIGRATION.md). - CMake: C++20, namespaced targets + WITH_OPENCV/BUILD_PREPROC; added a local stopwatch.h to replace the timing util v7 no longer ships; explicit OpenCV module includes. Requires the libs/tensorrt-cpp-api submodule at v7.0.0+. Syntax-checked against the real v7 + OpenCV headers; NOT yet compiled/linked/run (prepared on a host with broken OpenCV-CUDA) -- build and run before merging. See MIGRATION.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
yolov8.cpp uses std::cout under ENABLE_BENCHMARKS; v6 got <iostream> transitively via engine.h, which v7 does not pull in. Include it explicitly. (Default builds are unaffected.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Point libs/tensorrt-cpp-api at the v7 release candidate so the v7-migrated YoloV8 code (namespaced targets, fused preproc, name-keyed IO) builds. See MIGRATION.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
From the code-review pass: replace the unchecked .value() calls (tensorShape, Tensor::allocate, TensorView::as<float>) with a must() helper that throws std::runtime_error on error. .value() on an error Result asserts in debug and is undefined behavior in this app's -DNDEBUG release builds (e.g. on a dynamic/oversized input shape, CUDA OOM, or a non-float engine output). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Built and ran the migration on an RTX 3080 against OpenCV-CUDA + tensorrt_cpp_api v7: detect_object_image on team.jpg (FP16 YOLOv8n) detects 9 objects. Fixes found by that real build/run: - The preprocess fed cv::cuda::GpuMat directly to opencv::viewOf, but a GpuMat's rows are pitched (padded) while a TensorView is contiguous, so viewOf rejected it at runtime. Copy into a cv::cuda::createContinuous buffer when the upload isn't already continuous. - Removed the now-dead <opencv2/cudaimgproc.hpp> includes (the migration replaced cv::cuda::cvtColor with the fused preproc kernel; only cv::cuda::GpuMat from core/cuda is still used). Bump the libs/tensorrt-cpp-api submodule to the v7 release that builds the static core as PIC (required to link libYoloV8_TRT.so). MIGRATION.md updated to reflect the verified build. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ation origin/main advanced 9 commits since the migration branched, adding two features that collided with the v7 rewrite in src/yolov8.cpp: - Optional prebuilt-TensorRT-engine loading (PR #69): a `trtModelPath` constructor/CLI arg. Re-expressed in v7 form via Engine::loadFromFile() when no ONNX path is given; ONNX path still goes through EngineBuilder{}.buildAndLoad(). The v6 SUB/DIV/NORMALIZE args are gone (preprocessing is now fused on the GPU in preprocess()). - Remove the magic number 56 for pose vs. detect (PR #71): now branches on 4 + CLASS_NAMES.size() (+ NUM_KPS*3 for pose), reading the channel count from the cached v7 output Shape (m_outputShapes[0][1]) instead of the v6 getOutputDims(), and throws if neither matches. All other touched files (CMakeLists.txt, cmd_line_util.h, yolov8.h, benchmark.cpp, the three object_detection_*.cpp, and the new object_detection_csi_jetson.cpp) auto-merged cleanly: main's trtModelPath threading composes with the v7 Precision/Engine/Tensor changes, and the new CSI-Jetson executable already uses the 3-arg constructor + parseArgumentsVideo signatures that the migration produced. Bumped the tensorrt-cpp-api submodule from the rc1 commit 6a6d4dd to the squash-merged v7 main 166ce91 (PR #95). The public headers the app uses are byte-identical between the two; only CI workflows and one unused allocator member differ. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The object_detection_csi_jetson.cpp tool added on main relied on <opencv2/highgui.hpp> and <opencv2/videoio.hpp> being pulled in transitively via the v6 "engine.h". Under tensorrt_cpp_api v7 the public headers deliberately include no OpenCV, so cv::VideoCapture / cv::imshow / cv::waitKey no longer resolved and the target failed to compile. Include highgui and videoio explicitly (matching how the migration fixed the other tools) and drop the unused <opencv2/cudaimgproc.hpp> include (this tool does no cv::cuda imgproc; it uploads a cv::Mat inside detectObjects). Verified with -fsyntax-only against the full OpenCV-CUDA headers: clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
detect_object_image detects ~10 objects on team.jpg; the exact count shifts by one or two across FP16 engine rebuilds (TensorRT tactic selection near the confidence threshold), so state it as approximate rather than pinning an exact count. Also note the video/CSI demo targets need an OpenCV with highgui/videoio. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Ports YOLOv8-TensorRT-CPP from v6
tensorrt-cpp-api(templatedEngine<T>, OpenCV-in-the-signature, nested-vector IO) to v7 (namespacetrtcpp, no-throwStatus/Result, name-keyed tensors, fused GPU preprocessing). Depends on tensorrt-cpp-api #95; thelibs/tensorrt-cpp-apisubmodule is pinned to the merged v7main.Changes
Engine<float>->trtcpp::Engine;Options/buildLoadNetwork->BuildOptions+EngineBuilder::buildAndLoad.preproc::letterboxToTensorover a zero-copytrtcpp::opencv::viewOf(GpuMat)view (BGR->RGB, letterbox, /255), replacing OpenCVcvtColor/resize+ in-engine normalize. (Pitched GpuMat is copied into acv::cuda::createContinuousbuffer first.)runInference+transformOutput-> name-keyedengine.infer(...)+toHost(stream); the detect/pose/seg post-processing math is unchanged.Resultunwraps throw instead of asserting/UB in release builds.Precision::FP32/FP16/INT8->trtcpp::Precision::kFp32/kFp16/kInt8Qdq(INT8 caveat in MIGRATION.md).WITH_OPENCV/BUILD_PREPROC; localstopwatch.h.Synced with
mainThis branch is merged up to current
main(which landed PRs #69/#70/#71/#74). The only content conflict,src/yolov8.cpp, was resolved by keeping the v7 form while folding in main's two new features:trtModelPatharg): re-expressed in v7 astrtcpp::Engine::loadFromFile(...)when no ONNX is given, vsEngineBuilder::buildAndLoad(...)for the ONNX path.Shape(4 + classes + kpts*3for pose vs4 + classesfor detect) instead of a literal.Also fixed
object_detection_csi_jetson.cpp(new in main): it relied onhighgui/videoiobeing pulled in transitively by the v6engine.h. v7's public headers don't leak OpenCV, so those includes are now explicit.Verified
Built and run on an RTX 3080 (CUDA-12.6 OpenCV-CUDA + v7):
detect_object_imageonimages/team.jpg(FP16 YOLOv8n) runs end-to-end and detects the people in the frame (~10 objects; the exact count shifts by one or two across FP16 engine rebuilds, as TensorRT tactic selection moves borderline detections across the confidence threshold). The library +detect_object_imagebuild and run; the video/CSI demo targets additionally need an OpenCV withhighgui/videoio. After pulling, rungit submodule update --init. SeeMIGRATION.md.🤖 Generated with Claude Code