Skip to content

feat: support geojson nested objects#6992

Open
HarelM wants to merge 24 commits intomainfrom
fast-serializing
Open

feat: support geojson nested objects#6992
HarelM wants to merge 24 commits intomainfrom
fast-serializing

Conversation

@HarelM
Copy link
Copy Markdown
Collaborator

@HarelM HarelM commented Jan 21, 2026

This adds a custom binary serializer to facilitate for geojson nested properties.
I've looked into cbor, masgpkr, bson and other binary serializer for json objects and they increase the bundle size too much.
geobuf uses geojson, but the worker data, even though its called geojson worker is holding a tile like data, which in not in geojson format, mainly because it's in tile space coordinates and not wgs84.

Another approach I thought about is adding a flag to the vt-pbf serialization in order to prefix with a special string the JSON.stringify that happens there so that later on we can simply call JSON.parse and return that, this will reduce the need for a custom serialization code and should not impact performance much.

I've benchmarked the serializer against the existing one, and it's about 3x faster for the test I ran, but I don't think it changes the performance of geojson much from the benchmark test that is already there.

@mwilsnd @lucaswoj @wayofthefuture
I would appreciate your thoughts here as I'm a bit stuck in terms of which option to use.

Tests failure: while initially I thought the problem is with the serializer that causes the tests to fail, when I looked into it, I saw that the problem is that the serializer works "too good" in the sense that it serialized and desrialized to the same object, and for MultiPoint this was not the case for fromVectorTileJs method, as it changed the input so that when deserializing it in the main thread it changed multi point points array - from an array of two points to two array of one point, this is the reason for the query test failure.

Launch Checklist

  • Confirm your changes do not include backports from Mapbox projects (unless with compliant license) - if you are not sure about this, please ask!
  • Briefly describe the changes in this PR.
  • Link to related issues.
  • Write tests for all new functionality.
  • Document any changes to public APIs.
  • Post benchmark scores.
  • Add an entry to CHANGELOG.md under the ## main section.
image

@wayofthefuture
Copy link
Copy Markdown
Collaborator

wayofthefuture commented Jan 21, 2026

@HarelM I wish I could help but I believe this is too advanced for me

@mwilsnd
Copy link
Copy Markdown
Collaborator

mwilsnd commented Jan 21, 2026

Using a reserved string to handle nested properties via regular JSON would likely be fine if we stuck with MVT.

If the existing parser from vt-pbf was eliminated, what would our bundle size look like? If we can also reduce bundle size then I'm liking the approach here - we can fully decouple GeoJSON processing from MVT limitations. We could also sunset vt-pbf and just bring over the few remaining bits we still need like the GeoJSONWrapper.

@HarelM
Copy link
Copy Markdown
Collaborator Author

HarelM commented Jan 21, 2026

I tried removing all the references to vt-pbf in order to see the effect on bundle size, it is negligible, the code there is simply using pbf so there's very little code there, even less than what I added here.
The performance of the code I added is better, but as can be seen does not really move the niddle when it comes to geojson serialization.
So the main question here is if to make our vt-pbf geojson friendly or remove it.
I'm fine with both options, I tend to say that the code I wrote is a lot less battle tested than MVT, so there's that too...

@wayofthefuture
Copy link
Copy Markdown
Collaborator

When you ran your benchmarks on updateData how did you wait for the pending transmission of the pbf back to the main thread? Did you use the waitForCompletion option?

@HarelM
Copy link
Copy Markdown
Collaborator Author

HarelM commented Feb 3, 2026

I used the existing benchmarks, which use map idle event as far as I remember.

@wayofthefuture
Copy link
Copy Markdown
Collaborator

That should do it. It's surprising to me that at 3X you aren't seeing any gain

@HarelM
Copy link
Copy Markdown
Collaborator Author

HarelM commented Feb 3, 2026

I think it's not a bottleneck, it might be saving a few milliseconds and not noticeable overall, I guess, not sure...

@wayofthefuture
Copy link
Copy Markdown
Collaborator

Maybe give it a try with 1,000 versus 100,000 points?

@lucaswoj
Copy link
Copy Markdown
Contributor

lucaswoj commented Feb 4, 2026

I used the existing benchmarks, which use map idle event as far as I remember.

Just a heads up — the existing benchmarks use features that have no properties so they don't really exercise this new codepath.

@HarelM
Copy link
Copy Markdown
Collaborator Author

HarelM commented Feb 4, 2026

True, although the serialization of geometries is also a part of this change, which is also interesting.

@lucaswoj
Copy link
Copy Markdown
Contributor

lucaswoj commented Feb 4, 2026

Seems to me that it'd be significantly less code & more performant to use the copy of the GeoJSON that's already on the main thread rather than encoding, transferring, and decoding it.

The only trick is finding a stable ID to associate the vector tile features back to the GeoJSON features (if one isn't provided). That's a rather easy problem to solve, though.

@HarelM
Copy link
Copy Markdown
Collaborator Author

HarelM commented Feb 4, 2026

As far as I understand, which might not be a lot, the main problem here, is query render features - you need to know which geojson geometry is in the relevant tile.
This information does not exist on the main thread in the geojson object as it contains the entire geojson and not just the relevant tile.
If feature index is enough for this and the geojson is mapped from feature index to the relevant geojson feature, then sure, we can probably skip serialization etc, but I'm not sure it's that simple.
In any case, I'll be happy to discard this entire PR if there's a simpler solution.
The main problem this PR is intended to solve, from my point of view, is the limitation of MVT in regards to geojson and the ability to get for example nested properties, which is not supported in MVT.
If this can happen without the need to "invent" a new serialization format, I'm all for it.

@lucaswoj
Copy link
Copy Markdown
Contributor

lucaswoj commented Feb 4, 2026

If feature index is enough for this and the geojson is mapped from feature index to the relevant geojson feature, then sure, we can probably skip serialization etc, but I'm not sure it's that simple.

I'm fairly certain this can work.

@wayofthefuture
Copy link
Copy Markdown
Collaborator

wayofthefuture commented Feb 4, 2026

I might be lost here but I thought we were sending back a sliced tile with extremely simplified geometry at low zoom levels. Additionally, the advantage is that we are incrementally clipping and don't have to re-process feature simplification and re-clip untouched geometry unless the specific tile is affected by a feature - but then we are seeking bottom up versus top down here.

(deleted)

@lucaswoj
Copy link
Copy Markdown
Contributor

lucaswoj commented Feb 5, 2026

@wayofthefuture I think you're overestimating the scope of this change. The only consumers of this serialized JSON format are queryRenderedFeatures and querySourceFeatures. This new serializer doesn't change anything about how those methods handle geometries.

@wayofthefuture
Copy link
Copy Markdown
Collaborator

Yes you're right I've been reading the code and am starting to see what you guys are talking about. The vast majority of my time has been spent in the geojson worker.

@wayofthefuture
Copy link
Copy Markdown
Collaborator

wayofthefuture commented Feb 5, 2026

I wish we could put a separate loadTile method in geojson_worker_source so it wasn't so closely tied-in to vector_tile_worker_source...

Or even better, remove extends VectorTileWorkerSource from GeoJSONWorkerSource entirely - it seems to have almost all of the relevant methods

@HarelM
Copy link
Copy Markdown
Collaborator Author

HarelM commented Feb 5, 2026

I just saw your PR for splitting geojson worker from vector, which is great, but solving query rendered features still remains the problem to be solved. I'll see if I can find the time this month to check out what @lucaswoj suggested, unless you beat me to it @lucaswoj.

@lucaswoj
Copy link
Copy Markdown
Contributor

lucaswoj commented Feb 5, 2026

I don't have any plans to work on this issue. Free free to Ping me on the OSM Slack if you have any questions about what I'm suggesting.

@wayofthefuture
Copy link
Copy Markdown
Collaborator

Would a possible solution be to save the generated vector tile in the worker and send the query to the worker? Then the processing for the tile is only done once and the worker can respond with the features?

@HarelM
Copy link
Copy Markdown
Collaborator Author

HarelM commented Feb 23, 2026

I've decided to use the json serialization "hack" approach.
It's a lot simpler and solves this fairly good I believe.
I'll either use this PR or open a different one.
For the time being I've release a new version of vt-pbf that will allow solving this using a json serialization string prefix.

@HarelM HarelM marked this pull request as ready for review February 24, 2026 08:28
@HarelM HarelM requested a review from louwers February 24, 2026 08:39
@HarelM HarelM changed the title Serialize geojson using a custom serializer feat: support geojson nested objects Feb 24, 2026
@HarelM HarelM added this to the 6.0 milestone Feb 24, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Feb 24, 2026

Codecov Report

❌ Patch coverage is 93.75000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 92.67%. Comparing base (91dd083) to head (65bf51c).
⚠️ Report is 80 commits behind head on main.

Files with missing lines Patch % Lines
src/source/vector_tile_worker_source.ts 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6992   +/-   ##
=======================================
  Coverage   92.66%   92.67%           
=======================================
  Files         289      289           
  Lines       24061    24080   +19     
  Branches     5094     5101    +7     
=======================================
+ Hits        22297    22317   +20     
+ Misses       1764     1763    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

this._z = z;

this.properties = vectorTileFeature.properties;
this.properties = Object.fromEntries(Object.entries(vectorTileFeature.properties).map(e => [e[0], e[1]?.toString().startsWith(JSON_PREFIX) ? JSON.parse(e[1]?.toString().replace(JSON_PREFIX, '')) : e[1]]));
Copy link
Copy Markdown
Collaborator

@wayofthefuture wayofthefuture Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this will affect performance, but here is an alternative if you prefer:

  • No allocations when no prefixed values.
  • Avoids Object.entries() array creation.
  • Avoids .map() intermediate array creation.
  • Avoids Object.fromEntries() work entirely.
  • Skips calls on non-strings. toString()
  • Computes prefix length once.
    constructor(vectorTileFeature: VectorTileFeatureLike, z: number, x: number, y: number, id: string | number | undefined) {
        this.type = 'Feature';
        this._vectorTileFeature = vectorTileFeature;
        this._x = x;
        this._y = y;
        this._z = z;

        this.properties = this.getJSONProperties(vectorTileFeature.properties);
        this.id = id;
    }

    private getJSONProperties(properties: Record<string, any>): Record<string, any> {
        // Fast path: if no value is a JSON-prefixed string, avoid allocations entirely.
        for (const key in properties) {
            const value = properties[key];
            if (typeof value !== 'string') continue;
            if (!value.startsWith(JSON_PREFIX)) continue;

            // Slower path: decode all JSON-prefixed strings.
            return this.decodeJSONPrefixedProperties(properties);
        }

        return properties;
    }

    private decodeJSONPrefixedProperties(properties: Record<string, any>): Record<string, any> {
        const decoded: Record<string, any> = {};

        for (const key in properties) {
            const value = properties[key];

            if (typeof value !== 'string' || !value.startsWith(JSON_PREFIX)) {
                decoded[key] = value;
                continue;
            }

            decoded[key] = JSON.parse(value.slice(JSON_PREFIX.length));
        }

        return decoded;
    }

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth remembering this is only applied to returned features when running query rendered features.
And usually features don't have a lot of properties.
But I was concerned about performance too.
Also you can't break as there might be more than one property with nested objects.
But I'll write something a bit more performant.

Copy link
Copy Markdown
Collaborator

@wayofthefuture wayofthefuture Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure it doesn't break... it's an early return in detection versus decoding. If it continues all the way through it just returns the original object - mimicking the previous implementation. I believe the performance key is the avoidance of .toString and .parse

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update this, give me a sec...

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a778796.
In theory, this means that if the same feature is used multiple times the json parse will only happen once per JSON property, it will be done lazily only when converting the feature to geojson.
No extra allocation, minimal time spent for features without this.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better than my version!


this.properties = Object.fromEntries(Object.entries(vectorTileFeature.properties).map(e => [e[0], e[1]?.toString().startsWith(JSON_PREFIX) ? JSON.parse(e[1]?.toString().replace(JSON_PREFIX, '')) : e[1]]));
for (const key in vectorTileFeature.properties) {
if (typeof vectorTileFeature.properties[key] !== 'string' || !vectorTileFeature.properties[key].startsWith(JSON_PREFIX)) {
Copy link
Copy Markdown
Collaborator

@wayofthefuture wayofthefuture Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blows my version away! consider const value = vectorTileFeature.properties[key]... I'm not sure though I guess this is a style preference or something i don't know

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to avoid allocation as much as possible...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GeoJSON Feature properties stringified in map events

6 participants