Migrate to tensorrt_cpp_api v7#2
Merged
Merged
Conversation
Port the inference layer from the v6 templated Engine<T> API to v7 (namespace trtcpp, no-throw Status/Result, name-keyed tensors, fused GPU preprocessing). YOLOv9 here is detection-only. - Engine<float> -> trtcpp::Engine; Options/buildLoadNetwork -> BuildOptions/EngineBuilder::buildAndLoad. - Preprocessing now uses the v7 fused kernel preproc::letterboxToTensor over a zero-copy opencv::viewOf(GpuMat) device view (BGR->RGB, letterbox, /255), replacing the OpenCV cvtColor/resize + in-engine normalize. - runInference + transformOutput -> engine.infer(name-keyed) + toHost(stream); the detection post-processing math is unchanged (only dim access and output buffering moved to the v7 Shape / flat Tensor API). - Precision::FP32/FP16/INT8 -> trtcpp::Precision::kFp32/kFp16/kInt8Qdq (INT8 caveat in MIGRATION.md). - CMake: C++20, namespaced targets + WITH_OPENCV/BUILD_PREPROC; added a local stopwatch.h and explicit OpenCV/<iostream> includes that v6 pulled in transitively via engine.h. Requires the libs/tensorrt-cpp-api submodule at v7.0.0+. Syntax-checked against the real v7 + OpenCV headers (incl. the benchmark path); NOT yet compiled/linked/run (prepared on a host with broken OpenCV-CUDA) -- build and run before merging. See MIGRATION.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
From the code-review pass: replace the unchecked .value() calls (tensorShape, Tensor::allocate, TensorView::as<float>) with a must() helper that throws std::runtime_error on error. .value() on an error Result asserts in debug and is undefined behavior in this app's -DNDEBUG release builds. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ource) This repo had the v6 tensorrt-cpp-api committed as in-tree source files under libs/, even though .gitmodules declares it a submodule. Convert it to a proper submodule gitlink pointing at tensorrt-cpp-api v7.0.0-rc1 (matching .gitmodules and the YOLOv8 repo layout), so the v7-migrated YoloV9 code builds. Run `git submodule update --init` after pulling. See MIGRATION.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Built and ran the migration on an RTX 3080 against OpenCV-CUDA + tensorrt_cpp_api v7: detect_object_image on cars.jpg (FP16 yolov9-e) detects 24 objects. Fixes found by that real build/run: - The preprocess fed cv::cuda::GpuMat directly to opencv::viewOf, but a GpuMat's rows are pitched (padded) while a TensorView is contiguous, so viewOf rejected it at runtime. Copy into a cv::cuda::createContinuous buffer when the upload isn't already continuous. - Removed the now-dead <opencv2/cudaimgproc.hpp> includes (only cv::cuda::GpuMat from core/cuda is still used). Bump the libs/tensorrt-cpp-api submodule to the v7 release that builds the static core as PIC (required to link libYoloV9_TRT.so). MIGRATION.md updated to reflect the verified build. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Point the submodule at the squash-merged v7 main of tensorrt-cpp-api (PR #95) instead of the rc1 commit 6a6d4dd, matching the YOLOv8 migration PR. The public headers the app uses are byte-identical between the two commits; only CI workflows and one unused allocator member differ. Rebuilt and re-ran: detect_object_image on images/cars.jpg with FP16 yolov9-e detects 26 objects. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
detect_object_image detects ~26 objects on cars.jpg; the exact count shifts by one or two across FP16 engine rebuilds (TensorRT tactic selection near the confidence threshold), so state it as approximate rather than pinning an exact count. Also note the video demo target needs an OpenCV with highgui/videoio. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Ports YOLOv9-TensorRT-CPP from v6
tensorrt-cpp-api(templatedEngine<T>, OpenCV-in-the-signature, nested-vector IO) to v7 (namespacetrtcpp, no-throwStatus/Result, name-keyed tensors, fused GPU preprocessing). Depends on tensorrt-cpp-api #95; thelibs/tensorrt-cpp-apisubmodule is pinned to the merged v7main.Changes
Engine<float>->trtcpp::Engine;Options/buildLoadNetwork->BuildOptions+EngineBuilder::buildAndLoad.preproc::letterboxToTensorover a zero-copytrtcpp::opencv::viewOf(GpuMat)view (BGR->RGB, letterbox, /255), replacing OpenCVcvtColor/resize+ in-engine normalize. (Pitched GpuMat is copied into acreateContinuousbuffer first.)runInference+transformOutput-> name-keyedengine.infer(...)+toHost(stream); the detection post-processing math is unchanged.Resultunwraps throw instead of asserting/UB in release builds.Precision::FP32/FP16/INT8->trtcpp::Precision::kFp32/kFp16/kInt8Qdq(INT8 caveat in MIGRATION.md).WITH_OPENCV/BUILD_PREPROC; localstopwatch.h.libs/tensorrt-cpp-apihad been committed as in-tree v6 source files; converted to a proper v7 submodule gitlink (rungit submodule update --initafter pulling).Verified
Built and run on an RTX 3080 (CUDA-12.6 OpenCV-CUDA + v7):
detect_object_imageonimages/cars.jpg(FP16 yolov9-e) runs end-to-end and detects the vehicles in the frame (~26 objects; the exact count shifts by one or two across FP16 engine rebuilds, as TensorRT tactic selection moves borderline detections across the confidence threshold). The library +detect_object_imagebuild and run; the video demo target additionally needs an OpenCV withhighgui/videoio. No merge conflict (branch is clean againstmain). SeeMIGRATION.md.🤖 Generated with Claude Code