Skip to content

Perl_regexec_flags - keep and reuse a spare *offs buffer once allocated#24412

Open
richardleach wants to merge 1 commit into
Perl:bleadfrom
richardleach:spare-regexp_paren_pair
Open

Perl_regexec_flags - keep and reuse a spare *offs buffer once allocated#24412
richardleach wants to merge 1 commit into
Perl:bleadfrom
richardleach:spare-regexp_paren_pair

Conversation

@richardleach
Copy link
Copy Markdown
Contributor

When executing a regular expression that was the origin of the previous successful match, the results of that previous match must not be overwritten in case the current attempt at matching is unsuccessful.

(That is to say, (PL_curpm && (PM_GETRE(PL_curpm) == rx)) and the contents of special punctuation variables such as %+ and $1 etc. must not be clobbered if the current regex fails to match.)

Perl does this by moving the the associated heap allocated regexp_paren_pair‎ chunk to one side and working on an identically-sized swap chunk. Currently, the working chunk is allocated on each entry to Perl_regexec_flags and the previous chunk freed prior to exit. (If the current match is unsuccessful, the contents of the previous chunk are copied to the new chunk.)

To illustrate, every iteration of the following code snippet (after the first) will allocate and free a regexp_paren_pair‎ chunk.

while ($x =~ /(.)/g) { ... }

While allocation is unavoidable in (probably uncommon) cases where the regex is re-entrant, for common cases such as the above snippet, only two allocations are needed - the previous one and the current one - and the heap management overhead is undesirable.

This commit adds a lazily-allocated spare chunk when first needed, then keeps it around for future reuse. If additional chunks are needed, they will still be freshly allocated and then later freed as happens now.

The commit also changes chunk initialization: instead of zeroing it, the start and end members of each struct are set to -1, which matches up to what S_regtry does. (See the comments in that function beginning with XXXX What this code is doing here?!!! for context.)


  • This set of changes requires a perldelta entry, and one will be added when the next dev cycle begins.

When executing a regular expression that was the origin of the previous
successful match, the results of that previous match must not be
overwritten in case the current attempt at matching is unsuccessful.

(That is to say, `(PL_curpm && (PM_GETRE(PL_curpm) == rx))` and the
contents of special punctuation variables such as `%+` and `$1` etc.
must not be clobbered if the current regex fails to match.)

Perl does this by moving the the associated heap allocated
`regexp_paren_pair‎` chunk to one side and working on an identically-sized
`swap` chunk. Currently, the working chunk is allocated on each entry to
`Perl_regexec_flags` and the previous chunk freed prior to exit.
(If the current match is unsuccessful, the contents of the previous
chunk are copied to the new chunk.)

To illustrate, every iteration of the following code snippet (after the
first) will allocate and free a `regexp_paren_pair‎` chunk.

    while ($x =~ /(.)/g) { ... }

While allocation is unavoidable in (probably uncommon) cases where the
regex is re-entrant, for common cases such as the above snippet, only two
allocations are needed - the previous one and the current one - and the
heap management overhead is undesirable.

This commit adds a lazily-allocated spare chunk when first needed, then
keeps it around for future reuse. If additional chunks are needed, they
will still be freshly allocated and then later freed as happens now.

The commit also changes chunk initialization: instead of zeroing it,
the `start` and `end` members of each struct are set to `-1`, which
matches up to what `S_regtry` does. (See the comments in that function
beginning with `XXXX What this code is doing here?!!!` for context.)
@richardleach richardleach added the defer-next-dev This PR should not be merged yet, but await the next development cycle label May 9, 2026
@richardleach
Copy link
Copy Markdown
Contributor Author

The relative performance improvement will depend upon the complexity of the regular expression being executed. For something very simple, this should give a decent boost.

  1. For my $x = "frog"; for (1 .. 100_000_000) { $x =~ /(.)/g }:

blead

          7,866.71 msec task-clock                       #    0.999 CPUs utilized             
               194      context-switches                 #   24.661 /sec                      
                 0      cpu-migrations                   #    0.000 /sec                      
               224      page-faults                      #   28.474 /sec                      
    36,258,435,431      cycles                           #    4.609 GHz                       
       902,005,296      stalled-cycles-frontend          #    2.49% frontend cycles idle      
   149,817,189,709      instructions                     #    4.13  insn per cycle            
    30,103,961,397      branches                         #    3.827 G/sec                     
        27,137,608      branch-misses                    #    0.09% of all branches

patched

          6,076.87 msec task-clock                       #    0.999 CPUs utilized             
                78      context-switches                 #   12.836 /sec                      
                 0      cpu-migrations                   #    0.000 /sec                      
               207      page-faults                      #   34.064 /sec                      
    27,976,836,608      cycles                           #    4.604 GHz                       
     1,418,082,406      stalled-cycles-frontend          #    5.07% frontend cycles idle      
   105,472,414,613      instructions                     #    3.77  insn per cycle            
    19,282,837,040      branches                         #    3.173 G/sec                     
        23,767,198      branch-misses                    #    0.12% of all branches   
  1. For my $x = "frog"; for (1 .. 100_000_000) { $x =~ /./g }:

blead

          7,184.34 msec task-clock                       #    0.997 CPUs utilized             
               120      context-switches                 #   16.703 /sec                      
                 1      cpu-migrations                   #    0.139 /sec                      
               251      page-faults                      #   34.937 /sec                      
    32,871,560,763      cycles                           #    4.575 GHz                       
       569,284,472      stalled-cycles-frontend          #    1.73% frontend cycles idle      
   139,578,635,072      instructions                     #    4.25  insn per cycle            
    28,184,290,446      branches                         #    3.923 G/sec                     
        22,631,054      branch-misses                    #    0.08% of all branches

patched

          5,462.04 msec task-clock                       #    0.999 CPUs utilized             
                53      context-switches                 #    9.703 /sec                      
                 1      cpu-migrations                   #    0.183 /sec                      
               207      page-faults                      #   37.898 /sec                      
    25,205,411,011      cycles                           #    4.615 GHz                       
     1,029,187,998      stalled-cycles-frontend          #    4.08% frontend cycles idle      
    95,171,381,563      instructions                     #    3.78  insn per cycle            
    17,522,603,826      branches                         #    3.208 G/sec                     
        22,943,758      branch-misses                    #    0.13% of all branches 

@tonycoz
Copy link
Copy Markdown
Contributor

tonycoz commented May 13, 2026

It looks reasonable to me, but I'd prefer someone with better regexp engine knowledge looked over it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

defer-next-dev This PR should not be merged yet, but await the next development cycle

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants