Skip to content
Open
9 changes: 7 additions & 2 deletions src/SQLParser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -59,11 +59,16 @@ bool SQLParser::tokenize(const std::string& sql, std::vector<int16_t>* tokens) {
int16_t token = hsql_lex(&yylval, &yylloc, scanner);
while (token != 0) {
tokens->push_back(token);
token = hsql_lex(&yylval, &yylloc, scanner);


if (token == SQL_IDENTIFIER || token == SQL_STRING) {
free(yylval.sval);
yylval.sval = nullptr;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to care about the dangling pointer when we overwrite sval anyways in hsql_lex?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a test that would fail with sanitizers and without the patch?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to care about the dangling pointer when we overwrite sval anyways in hsql_lex?

That’s a fair point, but while hsql_lex does overwrite yylval for strings or identifiers, explicitly nullifying the pointer remains essential for several reasons. First, since yylval is a union, the sval member is typically not modified when the lexer returns tokens that don't require string values, such as semicolons or operators, meaning the stale, freed address stays in memory . Because the yylval structure is reused throughout the loop, this dangling pointer introduces a significant risk of a Double-Free if subsequent logic or future code changes attempt to release sval again while it still holds the old address.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a test that would fail with sanitizers and without the patch?

Done! I have added the regression test to test/sql_parser.cpp.

The test uses a sequence of consecutive identifiers to ensure that the memory is correctly managed and that yylval.sval is properly nullified after being freed. I have verified locally that this test fails with a LeakSanitizer error without my patch and passes successfully with the fix applied.

Please let me know if there are any other adjustments needed!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I noticed we do not run sanitizer builds in the CI. To get such warnings automatically and to verify the PR works as intended, yould you please add sanitizer builds with clang (on Ubuntu and macOS) to the CI workflow that run the tests?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I noticed we do not run sanitizer builds in the CI. To get such warnings automatically and to verify the PR works as intended, yould you please add sanitizer builds with clang (on Ubuntu and macOS) to the CI workflow that run the tests?

All checks are now passing, including the new Sanitizer builds for both Ubuntu and macOS. I have also updated the actions/checkout to version 6 as requested.

The previous run successfully demonstrated that the regression test catches the memory leak in the absence of the fix. Now that the fix is re-applied and everything is green, is there anything else you would like me to address, or is this PR ready for final review?


}

token = hsql_lex(&yylval, &yylloc, scanner);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last sanatizer run was all good. Can you -- just temporarily -- remove your fix to check if it correctly caught by the sanatizer?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last sanatizer run was all good. Can you -- just temporarily -- remove your fix to check if it correctly caught by the sanatizer?

Done! I have temporarily removed the fix as requested to verify the sanitizer. The workflow is now awaiting approval to run. Please approve the CI whenever you're ready.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last sanatizer run was all good. Can you -- just temporarily -- remove your fix to check if it correctly caught by the sanatizer?

The leaks were correctly caught by the Ubuntu Sanitizers/Valgrind (as expected, since LSan is more robust on Linux), confirming the bug's presence.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last sanatizer run was all good. Can you -- just temporarily -- remove your fix to check if it correctly caught by the sanatizer?

The previous run successfully demonstrated that the regression test catches the memory leak in the absence of the fix. Now that the fix is re-applied and everything is green, is there anything else you would like me to address, or is this PR ready for final review?



}

hsql__delete_buffer(state, scanner);
Expand Down
Loading