Reduce contention in attribute metadata caches by Symmetricity · Pull Request #25 · Symmetricity/tilemaker

Symmetricity · 2026-06-04T10:13:39Z

This PR is AI generated.

Summary

This PR is stacked on systemed#902 and should be reviewed after systemed#902.

It reduces lock contention and repeated metadata work in two hot paths used
while writing OpenMapTiles attributes:

increase the small thread-local AttributeStore lookup caches from 64 to 256
entries
cache vector-layer metadata updates per worker thread so repeated writes of
the same layer/key/type do not repeatedly take vectorLayerMetadataMutex

On a warmed Austria OpenMapTiles run stacked on systemed#902, this reduced wall time by
about 4.9%, system CPU by about 43%, voluntary context switches by about 44%,
and peak RSS by about 0.9%.

Background

PR systemed#902 reduces allocation churn in the clipping path. After that change, a
large remaining cost in profiling was synchronization around small repeated
attribute and layer-metadata updates.

AttributeStore already has small thread-local direct-mapped caches to avoid
falling through to the shared sharded stores. This PR keeps that existing
approach, but makes the cache slightly larger so common attribute pairs and
sets survive more collision patterns.

OsmLuaProcessing::setVectorLayerMetadata() records the type of each
layer/key pair for the final vector_layers metadata table. The shared write
still needs a mutex, but most calls repeat the same layer/key/type combination
many times from the same worker thread. This PR skips the mutex path for those
same-thread exact repeats.

Implementation

The AttributeStore change is deliberately small:

cachedAttributePairPointers and cachedAttributePairIndexes grow from 64
to 256 entries per thread
cachedAttributeSetPointers and cachedAttributeSetIndexes grow from 64 to
256 entries per thread
the lookup behavior, shard selection, and locking behavior are otherwise
unchanged

The vector-layer metadata change adds a thread-local cache keyed by layer and
attribute name:

the cache is reset if the worker starts using a different LayerDefinition
instance
exact same-thread repeats of the same layer/key/type return before taking
vectorLayerMetadataMutex
first sightings and type changes still update the shared
layers.layers[layer].attributeMap under the existing mutex

This keeps the shared metadata map as the source of truth and does not remove
the synchronization that protects actual shared writes.

Performance

I compared clean systemed#902 with clean systemed#902 plus only these two cache changes.

Runtime fixture:

fixture: Austria Geofabrik extract
profile: OpenMapTiles profile
output: PMTiles
store: no --store
threads: 8
warmup: PBF plus coastline/landcover sidecar files
method: six alternating warmed runs with /usr/bin/time -v

Combined result:

metric	systemed#902 mean	systemed#902 + this PR mean	delta	delta %
wall time	32.795 s	31.177 s	-1.618 s	-4.93%
user CPU	189.337 s	198.053 s	+8.717 s	+4.60%
system CPU	41.465 s	23.575 s	-17.890 s	-43.14%
peak RSS	4,523,051 KiB	4,480,713 KiB	-42,337 KiB	-0.94%
minor faults	1,223,322	1,222,723	-599	-0.05%
filesystem outputs	1,111,761	1,111,955	+193	+0.02%
voluntary context switches	333,560	185,866	-147,695	-44.28%
involuntary context switches	2,438	2,132	-306	-12.54%

The user-CPU total increased, but the system-time and context-switch reductions
more than offset that in wall-clock time on this fixture.

Possible Regressions

The intended output behavior is unchanged.

Potential costs:

the AttributeStore thread-local caches retain a few more pointer/index
entries per worker thread
the vector-layer metadata cache keeps a small per-thread map of layer/key
types already seen by that worker
if a profile intentionally writes the same layer/key with different types,
type changes still go through the existing mutex and update the shared map

This PR does not make the final metadata type-resolution semantics stricter or
more deterministic. It only avoids repeated same-thread writes that would have
stored the same type again.

Testing

Build:

cmake -S . -B build -DCMAKE_BUILD_TYPE=RelWithDebInfo
cmake --build build --parallel 8

Code style:

git diff --check

Performance testing used the warmed Austria OpenMapTiles PMTiles run described
above.

Semantic checks were run during candidate acceptance on the submitted/open PR
stack used for output comparisons:

fixture: Liechtenstein Geofabrik extract
profile: OpenMapTiles profile

AttributeStore TLS cache size:
  baseline repeat changed_tiles 0
  candidate repeat changed_tiles 0
  baseline -> candidate changed_tiles 0

Vector layer metadata cache:
  baseline repeat changed_tiles 0
  candidate repeat changed_tiles 0
  baseline -> candidate changed_tiles 0

The latest timing table above is narrower: it compares clean systemed#902 with clean
systemed#902 plus only these two cache candidates, so the performance result is not
polluted by other open correctness changes.

Attribute pair and set lookups are very frequent during profile processing. The existing 64-entry thread-local caches leave many repeated lookups to fall back to the shared stores on larger fixtures. Use a still-small 256-entry cache so more hot lookups are served locally without changing generated tile semantics. On the Austria fixture this was neutral to slightly faster and reduced native RSS in warmed alternating runs. Co-authored-by: Codex <noreply@openai.com>

Layer metadata is shared across Lua processing threads and still needs the mutex added by the thread-safety fix in PR systemed#761. On larger fixtures, repeated Attribute() calls now make that coarse lock a major futex source. Remember metadata types already recorded by the current thread for the active LayerDefinition, and skip the lock only for exact same-type repeats. This preserves the protected shared map update for first sightings and type changes while avoiding redundant lock traffic. On the Austria fixture this reduced futex syscalls by about 96%, voluntary context switches by about 89%, and wall time by about 4% without changing decoded Liechtenstein output. Co-authored-by: Codex <noreply@openai.com>

Symmetricity · 2026-06-04T15:21:42Z

Closed as superseded by upstream split PRs: systemed#913 and systemed#914.

Symmetricity and others added 2 commits June 4, 2026 11:52

Symmetricity closed this Jun 4, 2026

Symmetricity deleted the perf/pr902-cache-candidates branch June 4, 2026 15:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce contention in attribute metadata caches#25

Reduce contention in attribute metadata caches#25
Symmetricity wants to merge 2 commits into
perf/reduce-allocation-churnfrom
perf/pr902-cache-candidates

Symmetricity commented Jun 4, 2026

Uh oh!

Symmetricity commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Symmetricity commented Jun 4, 2026

Summary

Background

Implementation

Performance

Possible Regressions

Testing

Uh oh!

Symmetricity commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant