Perl_regexec_flags - keep and reuse a spare *offs buffer once allocated#24412
Open
richardleach wants to merge 1 commit into
Open
Perl_regexec_flags - keep and reuse a spare *offs buffer once allocated#24412richardleach wants to merge 1 commit into
richardleach wants to merge 1 commit into
Conversation
When executing a regular expression that was the origin of the previous
successful match, the results of that previous match must not be
overwritten in case the current attempt at matching is unsuccessful.
(That is to say, `(PL_curpm && (PM_GETRE(PL_curpm) == rx))` and the
contents of special punctuation variables such as `%+` and `$1` etc.
must not be clobbered if the current regex fails to match.)
Perl does this by moving the the associated heap allocated
`regexp_paren_pair` chunk to one side and working on an identically-sized
`swap` chunk. Currently, the working chunk is allocated on each entry to
`Perl_regexec_flags` and the previous chunk freed prior to exit.
(If the current match is unsuccessful, the contents of the previous
chunk are copied to the new chunk.)
To illustrate, every iteration of the following code snippet (after the
first) will allocate and free a `regexp_paren_pair` chunk.
while ($x =~ /(.)/g) { ... }
While allocation is unavoidable in (probably uncommon) cases where the
regex is re-entrant, for common cases such as the above snippet, only two
allocations are needed - the previous one and the current one - and the
heap management overhead is undesirable.
This commit adds a lazily-allocated spare chunk when first needed, then
keeps it around for future reuse. If additional chunks are needed, they
will still be freshly allocated and then later freed as happens now.
The commit also changes chunk initialization: instead of zeroing it,
the `start` and `end` members of each struct are set to `-1`, which
matches up to what `S_regtry` does. (See the comments in that function
beginning with `XXXX What this code is doing here?!!!` for context.)
Contributor
Author
|
The relative performance improvement will depend upon the complexity of the regular expression being executed. For something very simple, this should give a decent boost.
blead patched
blead patched |
Contributor
|
It looks reasonable to me, but I'd prefer someone with better regexp engine knowledge looked over it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When executing a regular expression that was the origin of the previous successful match, the results of that previous match must not be overwritten in case the current attempt at matching is unsuccessful.
(That is to say,
(PL_curpm && (PM_GETRE(PL_curpm) == rx))and the contents of special punctuation variables such as%+and$1etc. must not be clobbered if the current regex fails to match.)Perl does this by moving the the associated heap allocated
regexp_paren_pairchunk to one side and working on an identically-sizedswapchunk. Currently, the working chunk is allocated on each entry toPerl_regexec_flagsand the previous chunk freed prior to exit. (If the current match is unsuccessful, the contents of the previous chunk are copied to the new chunk.)To illustrate, every iteration of the following code snippet (after the first) will allocate and free a
regexp_paren_pairchunk.While allocation is unavoidable in (probably uncommon) cases where the regex is re-entrant, for common cases such as the above snippet, only two allocations are needed - the previous one and the current one - and the heap management overhead is undesirable.
This commit adds a lazily-allocated spare chunk when first needed, then keeps it around for future reuse. If additional chunks are needed, they will still be freshly allocated and then later freed as happens now.
The commit also changes chunk initialization: instead of zeroing it, the
startandendmembers of each struct are set to-1, which matches up to whatS_regtrydoes. (See the comments in that function beginning withXXXX What this code is doing here?!!!for context.)