Optimize metric label cloning#22406
Conversation
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize-metric-label-clone (e049019) to c8b784a (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize-metric-label-clone (e049019) to c8b784a (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize-metric-label-clone (e049019) to c8b784a (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
It shows similar performance improvements as lazy registration. |
alamb
left a comment
There was a problem hiding this comment.
I love it -- this is great @xudong963 🙏
I left some suggestions but I think this PR could be merged as is.
It seems like there may be a few more places where labels are created (like this) that we could (as a follow on PR) use the same trick:
datafusion/datafusion/physical-plan/src/repartition/mod.rs
Lines 1076 to 1080 in dc80bd7
| ) -> Self { | ||
| let name = name.into(); | ||
| let value = value.into(); | ||
| let name = LabelValue::from(name.into()); |
There was a problem hiding this comment.
this is quite a nice way to avoid any public API changes 👍
| fn from(value: Cow<'static, str>) -> Self { | ||
| match value { | ||
| Cow::Borrowed(value) => Self::Static(value), | ||
| Cow::Owned(value) => Self::Shared(Arc::from(value)), |
There was a problem hiding this comment.
This line is my only concern -- it requires an extra allocation (for the Arc) and deallocation (for the String) now.
We can probably remove that overhead as a follow on PR eventually.
There was a problem hiding this comment.
OKay, I'll try in a seperate PR
| @@ -533,19 +536,19 @@ impl Label { | |||
| name: impl Into<Cow<'static, str>>, | |||
| value: impl Into<Cow<'static, str>>, | |||
There was a problem hiding this comment.
We could potentially avoid the extra Arc allocation below by changing this sigature to
pub fn new(
name: impl Into<LabelValue>,
value: impl Into<LabelValue>,And then implement From<'static str> for LabelValue, From<Arc<str>> for LabelValue and From<String> for LabelValue and From<Cow<...>> for LabelValue
🤔
That way the API is the same, but you could create the Arc directly at the callsite
|
FYI @Dandandan -- this is a great find / way to improve the performance of a bunch of the already really fast clickbench queries |
|
run benchmark clickbench_partitioned |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize-metric-label-clone (e049019) to c8b784a (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
e049019 to
3a4b942
Compare
e344f5e to
fe340a8
Compare
|
run benchmark clickbench_partitioned |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize-metric-label-clone (fe340a8) to c8b784a (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
run benchmark clickbench_partitioned |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize-metric-label-clone (fe340a8) to c8b784a (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
One thing I find interesting is that improvement of delaying metric creation you did in #22353 seems to have a bigger positive improvement: As in there is more overhead to these metrics than just the string allocation it seems 🤔 |
Which issue does this PR close?
Rationale for this change
ParquetFileMetrics::newregisters many per-file metrics with the samefilenamelabel. Before this PR, each metric built its own owned filename label withfilename.to_string(), which repeatedly copied the same dynamicstring during parquet scan setup.
This PR keeps parquet metrics eagerly registered, so
ExecutionPlan::metrics()visibility during execution is unchanged, while reducing repeated label string allocation and copying.What changes are included in this PR?
Labelname/value strings behindArc<str>internally, whilekeeping borrowed static label strings allocation-free.
filenamelabel across the per-file parquet metrics inParquetFileMetrics::new.and display the same way.
Are these changes tested?
Yes.
Are there any user-facing changes?
No. Metric registration timing and displayed label values are unchanged.