Skip to content

Add Global Positioner module #1100

Open
kathirgounder wants to merge 8 commits intoborglab:masterfrom
kathirgounder:GlobalPositioner-v2
Open

Add Global Positioner module #1100
kathirgounder wants to merge 8 commits intoborglab:masterfrom
kathirgounder:GlobalPositioner-v2

Conversation

@kathirgounder
Copy link
Copy Markdown
Collaborator

@kathirgounder kathirgounder commented Apr 1, 2026

!Trying to vibe express Glomap's Global Positioner step with GTSAM to get some Factor Graph Practice and see if it moves the needle for our Multiview Optimizer!

Akshay had already implemented the BilinearAngleTranslationFactors in GTSAM and wired it in the TranslationRecovery class, but we were not combining Translation Averaging + Triangulation into one step like GLOMAP does. I scrolled through Slack for a couple of years and it seems like the furthest we went was we tried jointly estimating Translations and 3D Tracks, expressing the optimization in the exact same manner that GLOMAP does but we threw away the 3D points that came from the optimization, and ran the triangulation step again to regenerate 3D points to initialize BA (data_association in gtsfm). Glomap's insight was that we didn't need to do manual triangulation anymore thanks to the Global Positioner

This PR replaces Translation Averaging + Triangulation completely with the GlobalPositioner and initializes BA that way. Added some SIFT Frontend + GP + BA configs to the CI, the results show that GP for larger datasets provides a better initialization to BA and thus BA does less work executing faster. We also have some slight increases in Pose AUC across all degrees for the larger datasets. Skydio-32 has been failing to produce a good reconstruction in our classic MVO pipeline for a while now, BA seems to drop almost all the tracks and cameras, need to go back and look through CI logs and find the PR that broke it but the GlobalPositioner seems to be able to avoid whatever that bug or issue is.

image

Some videos showing the GlobalPositioner in action:

Palace of Fine Arts:

convergence_palace_smooth.mp4

South Building:

convergence_south.mp4

Gerrard Hall:

convergence_gerrard.mp4

Door:

convergence_native_api.mp4

@dellaert
Copy link
Copy Markdown
Member

dellaert commented Apr 4, 2026

Could you include the comparison with baseline in the PR comment?

@kathirgounder
Copy link
Copy Markdown
Collaborator Author

kathirgounder commented Apr 4, 2026

Could you include the comparison with baseline in the PR comment?

Hi Frank @dellaert, just added a screenshot of the metrics comparison and some videos.

The existing visualize_benchmark_comparison.py compares the same configs across two different CI runs (master vs branch) by matching artifact names, it wasn't built to compare two different configs and code paths within the same run since they have different artifact names and the script pairs by name (unified sift vs unified sift_global_positioner - both are identical besides the fact one does TA + Triangulation verus GlobalPositioning), the global positioner is a clean swap / drop in that replaces the two stages and I added separate configs for it and added it to the ci.yml, so I had claude write a quick script to pull down all the metrics and just compare the results between the two from the same CI Run.

@dellaert
Copy link
Copy Markdown
Member

dellaert commented Apr 4, 2026

Amazing! @akshay-krishnan whaddayathink? I’ll leave you to approve…

@kathirgounder
Copy link
Copy Markdown
Collaborator Author

Amazing! @akshay-krishnan whaddayathink? I’ll leave you to approve…

Will try to clean up and add more documentation on some of the things I added (Viz Script and optimizer logging) and also discuss with Akshay how to go about updating the default CI strategy before trying to merge this in

@kathirgounder
Copy link
Copy Markdown
Collaborator Author

Hey @dellaert @akshay-krishnan !

I was potentially thinking of adding a separate GTSAM class for the GlobalPositioner instead of using the TranslationRecovery class. The semantics are a bit misleading in comparison to how I am using it to express the Global Positioning Optimization. TranslationRecovery's documentation, variable names, and API all describe camera-to-camera translation directions. The current interface does not indicate anywhere that tracks are supported and assumes all measurements are camera-camera.

Our GTSFM code passes C(cam_idx) and L(track_idx) as keys, computes world-frame bearings from feature tracks, and maps results back to cameras+landmarks, all in Python wrapper code that a potential GTSAM user would have to reverse-engineer from our application to understand the GP use case. I think a dedicated class would make the camera+landmark formulation cleaner.

Additionally TranslationRecovery's addPrior method is designed specifically for Translation Averaging, where it tries fix the 4 degrees of freedom required for TA by pinning key1 to the origin and key2 to default scale * measured(). In GP, edges are camera-to-landmark, so key2 is a landmark. The scale prior pins the entire reconstruction's scale to one depth estimate for one landmark rather than establishing a baseline. In our tests the optimizer overcomes this since the prior is soft, but to match Glomaps GP formulation we would only add a prior to the first camera.

I think this will make both the GTSFM code and GTSAM code more explicit and separate. GTSAM already has the factor (BilinearAngleTranslationFactor) and the solver machinery, what's missing is the solve pipeline that makes GP as
easy to use and expressed as a first class citizen like ShonanAveraging and TranslationRecovery are for their respective problems!

I also wanted to learn how to contribute to GTSAM :)

Copy link
Copy Markdown
Collaborator

@akshay-krishnan akshay-krishnan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ultimately we are using the translationrecovery class here as well.

It would be cleaner to re-use the existing averaging_1dsfm code as much as possible. feel free to refactor that class to support this "global positioner" variant if needed.

images,
)

if self.global_positioner is not None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be done in a follow-up PR. I would prefer global positioner and translation averaging + triangulation to inherit from a common base class. That way you wont need this if-else here entirely.

@@ -0,0 +1,565 @@
"""Global positioning of cameras and 3D points via bearing constraints.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how much of this is duplicated code in other parts of gtsfm? can we move such functions to any utils?

# ──────────────────────────────────────────────────────────────────────────────


def compute_world_bearings(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we dont use the term bearing in gtsfm code - its "direction" or "translation direction"

@akshay-krishnan
Copy link
Copy Markdown
Collaborator

akshay-krishnan commented Apr 6, 2026

visualization is very cool! based on what Im seeing it should be fine to stop iterating sooner, not much happens after the first ~10 iterations. There is also a global rotation error guage freedom, we probably need to strengther the prior, likely inside gtsam code.

@akshay-krishnan
Copy link
Copy Markdown
Collaborator

CI seems to be failing due to some formatting errors.

@akshay-krishnan akshay-krishnan marked this pull request as ready for review April 6, 2026 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants