Skip to content

Use UTF-8 safe char8_t type when iterating over encode string input#2575

Open
s77rt wants to merge 2 commits intoExpensify:mainfrom
s77rt:utf8-uri-encode-decode
Open

Use UTF-8 safe char8_t type when iterating over encode string input#2575
s77rt wants to merge 2 commits intoExpensify:mainfrom
s77rt:utf8-uri-encode-decode

Conversation

@s77rt
Copy link
Copy Markdown
Member

@s77rt s77rt commented Apr 8, 2026

Details

When encoding a string, we iterate over each byte as char and char sign is platform dependent, on arm it's unsigned and on x64 it's signed. When this is signed, non-ASCII characters e.g. ā or 11000100 10000001 bytes in UTF-8 are seen as -60 and -127 and after bit-shifting we get -4 and -8. These are used as indexes to access hexChars and since they are negative and out-of-bound this causes UB.

This PR changes that we treat the bytes as char8_t which is guaranteed to be unsigned.

Fixed Issues

Fixes https://github.com/Expensify/Expensify/issues/617781

Tests

  • LibStuff::testEncodeDecodeURIComponent

@s77rt s77rt marked this pull request as ready for review April 8, 2026 22:37
@s77rt s77rt requested a review from a team April 8, 2026 22:37
@melvin-bot melvin-bot bot requested review from carlosmiceli and removed request for a team April 8, 2026 22:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant