Skip to content
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 20 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ jobs:
env:
CC: ${{matrix.cc}}
CXX: ${{matrix.cxx}}
CXXFLAGS: ${{matrix.env_cxxflags}}
LDFLAGS: ${{matrix.env_ldflags}}
defaults:
run:
shell: bash
Expand Down Expand Up @@ -46,9 +48,24 @@ jobs:
cxx: clang++
os: macos-latest

- name: clang-sanitizer-ubuntu
cc: clang-19
cxx: clang++-19
os: ubuntu-latest
container: ubuntu:24.04
env_cxxflags: "-fsanitize=address,undefined"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we need this additional "runs" (whatever those are called) here. Can't we have that as a last step on the existing clang runs (name: clang-19 and name: clang-macOS)?

env_ldflags: "-fsanitize=address,undefined"

- name: clang-sanitizer-macOS
cc: clang
cxx: clang++
os: macos-latest
env_cxxflags: "-fsanitize=address,undefined"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had a deeper look, but this thread describes the same issue as with the latest CI run: https://stackoverflow.com/a/40215639/1147726

env_ldflags: "-fsanitize=address,undefined"

steps:
- name: Checkout
uses: actions/checkout@v4
uses: actions/checkout@v6
if: matrix.name != 'gcc-6'

- name: Checkout (Ubuntu 18.04)
Expand All @@ -67,14 +84,14 @@ jobs:
git checkout $GITHUB_HEAD_REF

- name: Setup (macOS)
if: matrix.name == 'clang-macOS'
if: matrix.os == 'macos-latest'
run: |
brew install bison flex
echo "BISON=$(brew --prefix bison)/bin/bison" >> $GITHUB_ENV
echo "FLEX=$(brew --prefix flex)/bin/flex" >> $GITHUB_ENV

- name: Setup (Ubuntu)
if: matrix.name != 'clang-macOS'
if: matrix.os == 'ubuntu-latest'
run: |
apt-get update
apt-get install --no-install-recommends -y bison flex ${CC} ${CXX} make valgrind
Comment thread
dey4ss marked this conversation as resolved.
Outdated
Expand Down
8 changes: 5 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,9 @@ GMAKE = make mode=$(mode)
NAME := sqlparser
PARSER_CPP = $(SRCPARSER)/bison_parser.cpp $(SRCPARSER)/flex_lexer.cpp
PARSER_H = $(SRCPARSER)/bison_parser.h $(SRCPARSER)/flex_lexer.h
LIB_CFLAGS = -std=c++17 $(OPT_FLAG)
LIB_CFLAGS = -std=c++17 $(OPT_FLAG) $(CXXFLAGS)
LIB_LFLAGS = $(LDFLAGS)


relaxed_build ?= "off"
ifeq ($(relaxed_build), on)
Expand All @@ -54,12 +56,12 @@ static ?= no
ifeq ($(static), yes)
LIB_BUILD = lib$(NAME).a
LIBLINKER = $(AR)
LIB_LFLAGS = rs
LIB_LFLAGS += rs
else
LIB_BUILD = lib$(NAME).so
LIBLINKER = $(CXX)
LIB_CFLAGS += -fPIC
LIB_LFLAGS = -shared -o
LIB_LFLAGS += -shared -o
endif
LIB_CPP = $(sort $(shell find $(SRC) -name '*.cpp' -not -path "$(SRCPARSER)/*") $(PARSER_CPP))
LIB_H = $(shell find $(SRC) -name '*.h' -not -path "$(SRCPARSER)/*") $(PARSER_H)
Expand Down
4 changes: 3 additions & 1 deletion src/SQLParser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -59,11 +59,13 @@ bool SQLParser::tokenize(const std::string& sql, std::vector<int16_t>* tokens) {
int16_t token = hsql_lex(&yylval, &yylloc, scanner);
while (token != 0) {
tokens->push_back(token);
token = hsql_lex(&yylval, &yylloc, scanner);

if (token == SQL_IDENTIFIER || token == SQL_STRING) {
free(yylval.sval);
yylval.sval = nullptr;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to care about the dangling pointer when we overwrite sval anyways in hsql_lex?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a test that would fail with sanitizers and without the patch?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to care about the dangling pointer when we overwrite sval anyways in hsql_lex?

That’s a fair point, but while hsql_lex does overwrite yylval for strings or identifiers, explicitly nullifying the pointer remains essential for several reasons. First, since yylval is a union, the sval member is typically not modified when the lexer returns tokens that don't require string values, such as semicolons or operators, meaning the stale, freed address stays in memory . Because the yylval structure is reused throughout the loop, this dangling pointer introduces a significant risk of a Double-Free if subsequent logic or future code changes attempt to release sval again while it still holds the old address.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a test that would fail with sanitizers and without the patch?

Done! I have added the regression test to test/sql_parser.cpp.

The test uses a sequence of consecutive identifiers to ensure that the memory is correctly managed and that yylval.sval is properly nullified after being freed. I have verified locally that this test fails with a LeakSanitizer error without my patch and passes successfully with the fix applied.

Please let me know if there are any other adjustments needed!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I noticed we do not run sanitizer builds in the CI. To get such warnings automatically and to verify the PR works as intended, yould you please add sanitizer builds with clang (on Ubuntu and macOS) to the CI workflow that run the tests?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I noticed we do not run sanitizer builds in the CI. To get such warnings automatically and to verify the PR works as intended, yould you please add sanitizer builds with clang (on Ubuntu and macOS) to the CI workflow that run the tests?

All checks are now passing, including the new Sanitizer builds for both Ubuntu and macOS. I have also updated the actions/checkout to version 6 as requested.

The previous run successfully demonstrated that the regression test catches the memory leak in the absence of the fix. Now that the fix is re-applied and everything is green, is there anything else you would like me to address, or is this PR ready for final review?

}
token = hsql_lex(&yylval, &yylloc, scanner);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last sanatizer run was all good. Can you -- just temporarily -- remove your fix to check if it correctly caught by the sanatizer?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last sanatizer run was all good. Can you -- just temporarily -- remove your fix to check if it correctly caught by the sanatizer?

Done! I have temporarily removed the fix as requested to verify the sanitizer. The workflow is now awaiting approval to run. Please approve the CI whenever you're ready.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last sanatizer run was all good. Can you -- just temporarily -- remove your fix to check if it correctly caught by the sanatizer?

The leaks were correctly caught by the Ubuntu Sanitizers/Valgrind (as expected, since LSan is more robust on Linux), confirming the bug's presence.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last sanatizer run was all good. Can you -- just temporarily -- remove your fix to check if it correctly caught by the sanatizer?

The previous run successfully demonstrated that the regression test catches the memory leak in the absence of the fix. Now that the fix is re-applied and everything is green, is there anything else you would like me to address, or is this PR ready for final review?


}

hsql__delete_buffer(state, scanner);
Expand Down
15 changes: 15 additions & 0 deletions test/sql_parser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,18 @@ TEST(SQLParserTokenizeStringifyTest) {
ASSERT(query == cache[token_string]);
ASSERT(&query != &cache[token_string]);
}

// Regression test for the memory leak reported in issue #261.
TEST(SQLParserTokenizeLeakRegressionTest) {
Comment thread
Bouncner marked this conversation as resolved.

const std::string query = "'string_1' 'string_2' 'string_3';";
std::vector<int16_t> tokens;

ASSERT(SQLParser::tokenize(query, &tokens));

ASSERT_EQ(tokens.size(), 4);
ASSERT_EQ(tokens[0], SQL_STRING);
ASSERT_EQ(tokens[1], SQL_STRING);
ASSERT_EQ(tokens[2], SQL_STRING);
ASSERT_EQ(tokens[3], ';');
}
Loading