-
Notifications
You must be signed in to change notification settings - Fork 259
Fix: Resolved deterministic memory leak and dangling pointer in SQLParser::tokenize #262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 8 commits
dd347ac
2b254f3
571dc22
14c5621
23e894d
20603d2
79f3c33
644e35a
745c930
c67194c
802ac0c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14,6 +14,8 @@ jobs: | |
| env: | ||
| CC: ${{matrix.cc}} | ||
| CXX: ${{matrix.cxx}} | ||
| CXXFLAGS: ${{matrix.env_cxxflags}} | ||
| LDFLAGS: ${{matrix.env_ldflags}} | ||
| defaults: | ||
| run: | ||
| shell: bash | ||
|
|
@@ -46,9 +48,24 @@ jobs: | |
| cxx: clang++ | ||
| os: macos-latest | ||
|
|
||
| - name: clang-sanitizer-ubuntu | ||
| cc: clang-19 | ||
| cxx: clang++-19 | ||
| os: ubuntu-latest | ||
| container: ubuntu:24.04 | ||
| env_cxxflags: "-fsanitize=address,undefined" | ||
| env_ldflags: "-fsanitize=address,undefined" | ||
|
|
||
| - name: clang-sanitizer-macOS | ||
| cc: clang | ||
| cxx: clang++ | ||
| os: macos-latest | ||
| env_cxxflags: "-fsanitize=address,undefined" | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I haven't had a deeper look, but this thread describes the same issue as with the latest CI run: https://stackoverflow.com/a/40215639/1147726 |
||
| env_ldflags: "-fsanitize=address,undefined" | ||
|
|
||
| steps: | ||
| - name: Checkout | ||
| uses: actions/checkout@v4 | ||
| uses: actions/checkout@v6 | ||
| if: matrix.name != 'gcc-6' | ||
|
|
||
| - name: Checkout (Ubuntu 18.04) | ||
|
|
@@ -67,14 +84,14 @@ jobs: | |
| git checkout $GITHUB_HEAD_REF | ||
|
|
||
| - name: Setup (macOS) | ||
| if: matrix.name == 'clang-macOS' | ||
| if: matrix.os == 'macos-latest' | ||
| run: | | ||
| brew install bison flex | ||
| echo "BISON=$(brew --prefix bison)/bin/bison" >> $GITHUB_ENV | ||
| echo "FLEX=$(brew --prefix flex)/bin/flex" >> $GITHUB_ENV | ||
|
|
||
| - name: Setup (Ubuntu) | ||
| if: matrix.name != 'clang-macOS' | ||
| if: matrix.os == 'ubuntu-latest' | ||
| run: | | ||
| apt-get update | ||
| apt-get install --no-install-recommends -y bison flex ${CC} ${CXX} make valgrind | ||
|
dey4ss marked this conversation as resolved.
Outdated
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -59,11 +59,13 @@ bool SQLParser::tokenize(const std::string& sql, std::vector<int16_t>* tokens) { | |
| int16_t token = hsql_lex(&yylval, &yylloc, scanner); | ||
| while (token != 0) { | ||
| tokens->push_back(token); | ||
| token = hsql_lex(&yylval, &yylloc, scanner); | ||
|
|
||
| if (token == SQL_IDENTIFIER || token == SQL_STRING) { | ||
| free(yylval.sval); | ||
| yylval.sval = nullptr; | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to care about the dangling pointer when we overwrite
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you also add a test that would fail with sanitizers and without the patch?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That’s a fair point, but while hsql_lex does overwrite yylval for strings or identifiers, explicitly nullifying the pointer remains essential for several reasons. First, since yylval is a union, the sval member is typically not modified when the lexer returns tokens that don't require string values, such as semicolons or operators, meaning the stale, freed address stays in memory . Because the yylval structure is reused throughout the loop, this dangling pointer introduces a significant risk of a Double-Free if subsequent logic or future code changes attempt to release sval again while it still holds the old address.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Done! I have added the regression test to test/sql_parser.cpp. The test uses a sequence of consecutive identifiers to ensure that the memory is correctly managed and that yylval.sval is properly nullified after being freed. I have verified locally that this test fails with a LeakSanitizer error without my patch and passes successfully with the fix applied. Please let me know if there are any other adjustments needed!
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you! I noticed we do not run sanitizer builds in the CI. To get such warnings automatically and to verify the PR works as intended, yould you please add sanitizer builds with clang (on Ubuntu and macOS) to the CI workflow that run the tests?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
All checks are now passing, including the new Sanitizer builds for both Ubuntu and macOS. I have also updated the actions/checkout to version 6 as requested. The previous run successfully demonstrated that the regression test catches the memory leak in the absence of the fix. Now that the fix is re-applied and everything is green, is there anything else you would like me to address, or is this PR ready for final review? |
||
| } | ||
| token = hsql_lex(&yylval, &yylloc, scanner); | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Last sanatizer run was all good. Can you -- just temporarily -- remove your fix to check if it correctly caught by the sanatizer?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Done! I have temporarily removed the fix as requested to verify the sanitizer. The workflow is now awaiting approval to run. Please approve the CI whenever you're ready.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The leaks were correctly caught by the Ubuntu Sanitizers/Valgrind (as expected, since LSan is more robust on Linux), confirming the bug's presence.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The previous run successfully demonstrated that the regression test catches the memory leak in the absence of the fix. Now that the fix is re-applied and everything is green, is there anything else you would like me to address, or is this PR ready for final review? |
||
|
|
||
| } | ||
|
|
||
| hsql__delete_buffer(state, scanner); | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we need this additional "runs" (whatever those are called) here. Can't we have that as a last step on the existing clang runs (
name: clang-19andname: clang-macOS)?