optimize additive pubkey tweaking with vartime generator point multiplication (>80% speedup) by theStack · Pull Request #1843 · bitcoin-core/secp256k1

theStack · 2026-04-05T23:47:53Z

Additive public key tweaking (i.e. given a public key $P$ and a tweak scalar $t$, calculating $P' = P + t \cdot G$) is currently implemented using the "double multiply" algorithm secp256k1_ecmult ($R = na \cdot A + ng \cdot G$), with $na=1$ and $ng=t$:

secp256k1/src/ecmult.h

Lines 43 to 47 in 7262adb

    
           /** Double multiply: R = na*A + ng*G 
        
            * 
        
            * Passing NULL as ng is equivalent to the zero scalar but a tiny bit faster. 
        
            */ 
        
           static void secp256k1_ecmult(secp256k1_gej *r, const secp256k1_gej *a, const secp256k1_scalar *na, const secp256k1_scalar *ng);

secp256k1/src/eckey_impl.h

Lines 62 to 65 in 7262adb

    
           static int secp256k1_eckey_pubkey_tweak_add(secp256k1_ge *key, const secp256k1_scalar *tweak) { 
        
               secp256k1_gej pt; 
        
               secp256k1_gej_set_ge(&pt, key); 
        
               secp256k1_ecmult(&pt, &pt, &secp256k1_scalar_one, tweak);

After some tinkering I found that this could be sped up quite a bit. To take full use of the fact that the tweak scalar is typically not considered sensitive data (as opposed to secret keys or nonces), this PR first introduces a variable-time generator point multiplication routine secp256k1_ecmult_gen_var, which is essentially a copy of secp256k1_ecmult_gen [1], but with all the side-channel resistance mitigations stripped out. This function is significantly faster than its constant-time original in all COMB table size configurations, over 66% in all but the smallest 2KB setting (benchmarked via $ ./build/bin/bench_ecmult):

ECMULT_GEN_KB	`ecmult_gen`	`ecmult_gen_var`	speedup
2	11.6 us	8.7 us	~33.3%
22	10.3 us	6.18 us	~66.6%
86 (default)	9.85 us	5.86 us	~68.1%

Plugging that new routine into the internal function secp256k1_eckey_pubkey_tweak_add function (which is easy as the vartime variant is stateless and thus doesn't need a context object anymore) yields the following speedups for the API function secp256k1_ec_pubkey_tweak_add (benchmarked via $ ./build/bin/bench tweak on the second vs. third commit):

ECMULT_GEN_KB	`ec_pubkey_tweak_add` (using `ecmult`)	`ec_pubkey_tweak_add` (using new `ecmult_gen_var`)	speedup
2	16.3 us	11.6 us	~40.5%
22	16.3 us	8.98 us	~81.5%
86 (default)	16.3 us	8.65 us	~88.4%

Note that the improved code path (function secp256k1_eckey_pubkey_tweak_add) is also used in the following API functions, so they should all benefit from it:

secp256k1_xonly_pubkey_tweak_add
secp256k1_xonly_pubkey_tweak_add_check (this one is used for verifying P2TR script path spends in Bitcoin Core and thus consensus-critical, see BIP341)
secp256k1_keypair_xonly_tweak_add
secp256k1_musig_pubkey_{ec,xonly}_tweak_add
secp256k1_silentpayments_recipient_scan_outputs in PR Add "silentpayments" module implementing BIP352 (take 4, limited to full-node scanning) #1765

Improving Silent Payments scanning performance was the original motivation to look into this (see also https://github.com/craigraw/bench_bip352/), and in fact the "common case" benchmarks show a 15-20% speedup if this branch is applied.

[1] I've also experimented with reintroducing the much simpler ECMULT_GEN_PREC_BITS precomputation table (which was used prior to SDMC in versions earlier than 0.5.0) and that performs even better with PREC_BITS=8: theStack@8f7fc93 (needing only 31 point additions rather than 42). However, given that it's unclear where to store this table data (having the choice of either bloating up the library size further or again needing some form of context, if it's calculated at runtime) I decided to stick with a solution that uses the already available table data first.

sipa · 2026-04-06T22:10:03Z

Neat.

My guess would be that even bigger SDMC tables would be even better, because the cost of constant-time table lookups disappears (which needs to reads through the entire table).

Having this available in as a global constant allows to introduce an alternative `ecmult_gen` function that doesn't need access to a context, see next commit. Note that the precomputed constant takes the name of the function that previously generated it at run-time (`secp256k1_ecmult_gen_scalar_diff`), while the function is now renamed to include the "compute" verb (`secp256k1_ecmult_gen_compute_scalar_diff`), to match the naming of the table generation function.

Add a faster variable-time variant for generator point multiplication. This is essentially `ecmult_gen` without side-channel mitigations and without requiring a context object. Intended for use cases where the scalar is not representing sensitive data (e.g. pubkey tweaking). On arm64, this is ~66% faster than the constant-time variant (with the default build table size, i.e. ECMULT_GEN_KB=86): $ ./build/bin/bench_ecmult Benchmark , Min(us) , Avg(us) , Max(us) ecmult_gen , 9.94 , 10.0 , 10.1 ecmult_gen_var , 5.96 , 5.99 , 6.02 .....

Rather than using the generic double multiply (with na=1), use the newly introduced fast variable-time generator point multiplication and add it up to the base public key manually. On arm64, this improves the performance of the `secp256k1_ec_pubkey_tweak_add` API function by about 80%: ----- Before (prior this commit): ----- ``` $ ./build/bin/bench tweak Benchmark , Min(us) , Avg(us) , Max(us) ec_pk_tweak_add , 16.1 , 16.2 , 16.6 ``` ----- After (this commit): ----- ``` $ ./build/bin/bench tweak Benchmark , Min(us) , Avg(us) , Max(us) ec_pk_tweak_add , 8.94 , 8.98 , 9.29 ``` Note that the following API functions also benefit from the improved code path: - secp256k1_xonly_pubkey_tweak_add - secp256k1_xonly_pubkey_tweak_add_check (this one is consensus-critical for P2TR script path spends, see BIP341) - secp256k1_keypair_xonly_tweak_add - secp256k1_musig_pubkey_ec_tweak_add - secp256k1_musig_pubkey_xonly_tweak_add

theStack · 2026-04-08T18:20:12Z

Added a preparatory commit that moves the ecmult_gen_scalar_diff constant precomputation from run-time to compile-time with the precompute_ecmult_gen binary (for the exhaustive tests though, its generated at run-time, similarly how its done with the the table), and addressed #1843 (comment).

My guess would be that even bigger SDMC tables would be even better, because the cost of constant-time table lookups disappears (which needs to reads through the entire table).

Ah, indeed! I've tried with table sizes up to ~18 MB, and at least on my arm64 machine, the ~5MB one (GEN_KB=5120) yields the best results (see theStack@aa1a4e7 and $ ./bench_gen_var_with_all_tables.sh):

ECMULT_GEN_KB	`ecmult_gen`	`ecmult_gen_var`	`ec_pubkey_tweak_add`	speedup to unoptimized `ec_pubkey_tweak_add` (16.3 us)
2	11.7 us	8.81 us	11.7 us	39.3%
22	10.5 us	6.31 us	9.05 us	80.1%
86	9.96 us	5.94 us	8.67 us	88.0%
148	11.4 us	5.18 us	7.91 us	106.1%
256	14.3 us	4.46 us	7.22 us	125.8%
464	21.1 us	4.04 us	6.85 us	138.0%
832	33.1 us	3.66 us	6.46 us	152.3%
1536	61.2 us	3.46 us	6.21 us	162.5%
2816	103.0 us	3.13 us	6.02 us	170.8%
5120	188.0 us	2.85 us	5.80 us	181.0%
9728	371.0 us	2.95 us	6.33 us	157.5%
18432	506.0 us	4.02 us	7.32 us	122.7%

So we would need to embed two different tables to get the most out of it (without degrading the signing/nonce generation speed). Not sure how performance-critical pubkey tweaking is though in practice, the only currently widespread scenario I could think of is verification of taproot commitments. For silentpayments I still have to check how these larger table values would affect overall performance.

w0xlt · 2026-04-09T20:39:02Z

Concept ACK

sipa · 2026-04-10T16:20:39Z

@theStack Yeah, I don't think it's worth having two built-in tables just for this, but it's good to see my intuition confirmed.

w0xlt

The code can be deduplicated.

diff

diff --git a/src/eckey_impl.h b/src/eckey_impl.h
index 9cee88d..566a318 100644
--- a/src/eckey_impl.h
+++ b/src/eckey_impl.h
@@ -61,6 +61,7 @@ static int secp256k1_eckey_privkey_tweak_add(secp256k1_scalar *key, const secp25
 
 static int secp256k1_eckey_pubkey_tweak_add(secp256k1_ge *key, const secp256k1_scalar *tweak) {
     secp256k1_gej pt;
+    /* `tweak` is public here, so the variable-time generator multiplication is safe. */
     secp256k1_ecmult_gen_var(&pt, tweak);
     secp256k1_gej_add_ge_var(&pt, &pt, key, NULL);
 
diff --git a/src/ecmult_gen.h b/src/ecmult_gen.h
index 74942ed..567cc48 100644
--- a/src/ecmult_gen.h
+++ b/src/ecmult_gen.h
@@ -138,6 +138,10 @@ static void secp256k1_ecmult_gen_context_clear(secp256k1_ecmult_gen_context* ctx
 
 /** Multiply with the generator: R = a*G */
 static void secp256k1_ecmult_gen(const secp256k1_ecmult_gen_context* ctx, secp256k1_gej *r, const secp256k1_scalar *a);
+/** Multiply with the generator: R = a*G.
+ *
+ * Variable-time implementation. Only call this for public/non-secret scalars.
+ */
 static void secp256k1_ecmult_gen_var(secp256k1_gej *r, const secp256k1_scalar *a);
 
 static void secp256k1_ecmult_gen_blind(secp256k1_ecmult_gen_context *ctx, const secp256k1_hash_ctx *hash_ctx, const unsigned char *seed32);
diff --git a/src/ecmult_gen_impl.h b/src/ecmult_gen_impl.h
index 8e4209f..3c4460e 100644
--- a/src/ecmult_gen_impl.h
+++ b/src/ecmult_gen_impl.h
@@ -30,17 +30,89 @@ static void secp256k1_ecmult_gen_context_clear(secp256k1_ecmult_gen_context *ctx
     secp256k1_fe_clear(&ctx->proj_blind);
 }
 
+/* Convert a scalar to a zero-padded array of 32-bit words. Only the first 8 words
+ * can contain non-zero data, but the padding avoids out-of-bounds reads from the
+ * scalar when COMB_BITS > 256. */
+static void secp256k1_ecmult_gen_scalar_to_recoded(uint32_t recoded[(COMB_BITS + 31) >> 5], const secp256k1_scalar *s) {
+    int i;
+
+    memset(recoded, 0, ((COMB_BITS + 31) >> 5) * sizeof(*recoded));
+    for (i = 0; i < 8 && i < ((COMB_BITS + 31) >> 5); ++i) {
+        recoded[i] = secp256k1_scalar_get_bits_limb32(s, 32 * i, 32);
+    }
+}
+
+/* Gather the mask(block)-selected bits of the recoded scalar into a packed value:
+ * bits[tooth] = d[(block*COMB_TEETH + tooth)*COMB_SPACING + comb_off].
+ *
+ * This constant-time variant mirrors the power side-channel hardening used by
+ * secp256k1_ecmult_gen. */
+static uint32_t secp256k1_ecmult_gen_lookup_bits(const uint32_t recoded[(COMB_BITS + 31) >> 5], uint32_t comb_off, uint32_t block) {
+    uint32_t bits = 0;
+    uint32_t bit_pos = comb_off + block * COMB_TEETH * COMB_SPACING;
+    uint32_t tooth;
+
+    /* Instead of reading individual bits here to construct the bits variable,
+     * build up the result by xoring rotated reads together. In every iteration,
+     * one additional bit is made correct, starting at the bottom. The bits
+     * above that contain junk. This reduces leakage by avoiding computations
+     * on variables that can have only a low number of possible values (e.g.,
+     * just two values when reading a single bit into a variable.) See:
+     * https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-alam.pdf
+     */
+    for (tooth = 0; tooth < COMB_TEETH; ++tooth) {
+        /* Construct bitdata s.t. the bottom bit is the bit we'd like to read.
+         *
+         * We could just set bitdata = recoded[bit_pos >> 5] >> (bit_pos & 0x1f)
+         * but this would simply discard the bits that fall off at the bottom,
+         * and thus, for example, bitdata could still have only two values if we
+         * happen to shift by exactly 31 positions. We use a rotation instead,
+         * which ensures that bitdata doesn't lose entropy. This relies on the
+         * rotation being atomic, i.e., the compiler emitting an actual rot
+         * instruction. */
+        uint32_t bitdata = secp256k1_rotr32(recoded[bit_pos >> 5], bit_pos & 0x1f);
+
+        /* Clear the bit at position tooth, but sssh, don't tell clang. */
+        uint32_t volatile vmask = ~(1 << tooth);
+        bits &= vmask;
+
+        /* Write the bit into position tooth (and junk into higher bits). */
+        bits ^= bitdata << tooth;
+        bit_pos += COMB_SPACING;
+    }
+    return bits;
+}
+
+/* Same bit packing as secp256k1_ecmult_gen_lookup_bits(), but without the
+ * constant-time/power-analysis hardening as the scalar is public here. */
+static uint32_t secp256k1_ecmult_gen_lookup_bits_var(const uint32_t recoded[(COMB_BITS + 31) >> 5], uint32_t comb_off, uint32_t block) {
+    uint32_t bits = 0;
+    uint32_t bit_pos = comb_off + block * COMB_TEETH * COMB_SPACING;
+    uint32_t tooth;
+
+    for (tooth = 0; tooth < COMB_TEETH; ++tooth) {
+        uint32_t bit = (recoded[bit_pos >> 5] >> (bit_pos & 0x1f)) & 1;
+        bits |= bit << tooth;
+        bit_pos += COMB_SPACING;
+    }
+    return bits;
+}
+
+static void secp256k1_ecmult_gen_lookup_table_index(uint32_t bits, uint32_t *sign, uint32_t *abs) {
+    *sign = (bits >> (COMB_TEETH - 1)) & 1;
+    *abs = (bits ^ -*sign) & (COMB_POINTS - 1);
+    VERIFY_CHECK(*sign == 0 || *sign == 1);
+    VERIFY_CHECK(*abs < COMB_POINTS);
+}
+
 static void secp256k1_ecmult_gen(const secp256k1_ecmult_gen_context *ctx, secp256k1_gej *r, const secp256k1_scalar *gn) {
     uint32_t comb_off;
     secp256k1_ge add;
     secp256k1_fe neg;
     secp256k1_ge_storage adds;
     secp256k1_scalar d;
-    /* Array of uint32_t values large enough to store COMB_BITS bits. Only the bottom
-     * 8 are ever nonzero, but having the zero padding at the end if COMB_BITS>256
-     * avoids the need to deal with out-of-bounds reads from a scalar. */
-    uint32_t recoded[(COMB_BITS + 31) >> 5] = {0};
-    int first = 1, i;
+    uint32_t recoded[(COMB_BITS + 31) >> 5];
+    int first = 1;
 
     memset(&adds, 0, sizeof(adds));
 
@@ -88,9 +160,7 @@ static void secp256k1_ecmult_gen(const secp256k1_ecmult_gen_context *ctx, secp25
     /* Compute the scalar d = (gn + ctx->scalar_offset). */
     secp256k1_scalar_add(&d, &ctx->scalar_offset, gn);
     /* Convert to recoded array. */
-    for (i = 0; i < 8 && i < ((COMB_BITS + 31) >> 5); ++i) {
-        recoded[i] = secp256k1_scalar_get_bits_limb32(&d, 32 * i, 32);
-    }
+    secp256k1_ecmult_gen_scalar_to_recoded(recoded, &d);
     secp256k1_scalar_clear(&d);
 
     /* In secp256k1_ecmult_gen_prec_table we have precomputed sums of the
@@ -171,47 +241,11 @@ static void secp256k1_ecmult_gen(const secp256k1_ecmult_gen_context *ctx, secp25
     comb_off = COMB_SPACING - 1;
     while (1) {
         uint32_t block;
-        uint32_t bit_pos = comb_off;
         /* Inner loop: for each block, add table entries to the result. */
         for (block = 0; block < COMB_BLOCKS; ++block) {
-            /* Gather the mask(block)-selected bits of d into bits. They're packed:
-             * bits[tooth] = d[(block*COMB_TEETH + tooth)*COMB_SPACING + comb_off]. */
-            uint32_t bits = 0, sign, abs, index, tooth;
-            /* Instead of reading individual bits here to construct the bits variable,
-             * build up the result by xoring rotated reads together. In every iteration,
-             * one additional bit is made correct, starting at the bottom. The bits
-             * above that contain junk. This reduces leakage by avoiding computations
-             * on variables that can have only a low number of possible values (e.g.,
-             * just two values when reading a single bit into a variable.) See:
-             * https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-alam.pdf
-             */
-            for (tooth = 0; tooth < COMB_TEETH; ++tooth) {
-                /* Construct bitdata s.t. the bottom bit is the bit we'd like to read.
-                 *
-                 * We could just set bitdata = recoded[bit_pos >> 5] >> (bit_pos & 0x1f)
-                 * but this would simply discard the bits that fall off at the bottom,
-                 * and thus, for example, bitdata could still have only two values if we
-                 * happen to shift by exactly 31 positions. We use a rotation instead,
-                 * which ensures that bitdata doesn't lose entropy. This relies on the
-                 * rotation being atomic, i.e., the compiler emitting an actual rot
-                 * instruction. */
-                uint32_t bitdata = secp256k1_rotr32(recoded[bit_pos >> 5], bit_pos & 0x1f);
-
-                /* Clear the bit at position tooth, but sssh, don't tell clang. */
-                uint32_t volatile vmask = ~(1 << tooth);
-                bits &= vmask;
-
-                /* Write the bit into position tooth (and junk into higher bits). */
-                bits ^= bitdata << tooth;
-                bit_pos += COMB_SPACING;
-            }
-
-            /* If the top bit of bits is 1, flip them all (corresponding to looking up
-             * the negated table value), and remember to negate the result in sign. */
-            sign = (bits >> (COMB_TEETH - 1)) & 1;
-            abs = (bits ^ -sign) & (COMB_POINTS - 1);
-            VERIFY_CHECK(sign == 0 || sign == 1);
-            VERIFY_CHECK(abs < COMB_POINTS);
+            uint32_t bits, sign, abs, index;
+            bits = secp256k1_ecmult_gen_lookup_bits(recoded, comb_off, block);
+            secp256k1_ecmult_gen_lookup_table_index(bits, &sign, &abs);
 
             /** This uses a conditional move to avoid any secret data in array indexes.
              *   _Any_ use of secret indexes has been demonstrated to result in timing
@@ -275,38 +309,22 @@ static void secp256k1_ecmult_gen_var(secp256k1_gej *r, const secp256k1_scalar *g
     uint32_t comb_off;
     secp256k1_ge add;
     secp256k1_scalar d;
-    uint32_t recoded[(COMB_BITS + 31) >> 5] = {0};
-    int i;
+    uint32_t recoded[(COMB_BITS + 31) >> 5];
 
     /* Adjust input scalar for difference and convert to recoded array. */
     secp256k1_scalar_add(&d, &secp256k1_ecmult_gen_scalar_diff, gn);
-    for (i = 0; i < 8 && i < ((COMB_BITS + 31) >> 5); ++i) {
-        recoded[i] = secp256k1_scalar_get_bits_limb32(&d, 32 * i, 32);
-    }
+    secp256k1_ecmult_gen_scalar_to_recoded(recoded, &d);
 
     /* Outer loop: iterate over comb_off from COMB_SPACING - 1 down to 0. */
     secp256k1_gej_set_infinity(r);
     comb_off = COMB_SPACING - 1;
     while (1) {
         uint32_t block;
-        uint32_t bit_pos = comb_off;
         /* Inner loop: for each block, add table entries to the result. */
         for (block = 0; block < COMB_BLOCKS; ++block) {
-            /* Gather the mask(block)-selected bits of d into bits. They're packed:
-             * bits[tooth] = d[(block*COMB_TEETH + tooth)*COMB_SPACING + comb_off]. */
-            uint32_t bits = 0, sign, abs, tooth;
-            for (tooth = 0; tooth < COMB_TEETH; ++tooth) {
-                uint32_t bit = (recoded[bit_pos >> 5] >> (bit_pos & 0x1f)) & 1;
-                bits |= bit << tooth;
-                bit_pos += COMB_SPACING;
-            }
-
-            /* If the top bit of bits is 1, flip them all (corresponding to looking up
-             * the negated table value), and remember to negate the result in sign. */
-            sign = (bits >> (COMB_TEETH - 1)) & 1;
-            abs = (bits ^ -sign) & (COMB_POINTS - 1);
-            VERIFY_CHECK(sign == 0 || sign == 1);
-            VERIFY_CHECK(abs < COMB_POINTS);
+            uint32_t bits, sign, abs;
+            bits = secp256k1_ecmult_gen_lookup_bits_var(recoded, comb_off, block);
+            secp256k1_ecmult_gen_lookup_table_index(bits, &sign, &abs);
 
             /* Perform lookup, negate if necessary and add to r. */
             secp256k1_ge_from_storage(&add, &secp256k1_ecmult_gen_prec_table[block][abs]);

w0xlt

ACK 0ebb1aa mod above nit

real-or-random · 2026-04-13T11:22:23Z

to take full use of the fact that the tweak scalar is typically not considered sensitive data

The word "typically" is what worries me here. We don't really know what people use these functions for. What if they do have secret tweaks?

And isn't the tweak secret even in known applications, e.g., Taproot tweaking, say Q = P + tG? One security feature of Taproot is that it hides the Merkle root until a script-path spend. This only works if the tweak t is secret. If t is public, then the attacker can get P = Q - tG and search for a Merkle root m such that t = hash(P, m).

theStack · 2026-04-13T13:22:16Z

to take full use of the fact that the tweak scalar is typically not considered sensitive data

The word "typically" is what worries me here. We don't really know what people use these functions for. What if they do have secret tweaks?

Hm good point, the "if something sounds too good to be true, it probably is" proverb comes to my mind 😅
Fwiw, what made me believe that the change is fine (in the sense that it doesn't make things worse w.r.t. security to what we have now) is that currently public key tweaking is not side-channel resistant either, as variable-time scalar multiplication (ecmult) is used and the tweak scalar isn't cleared out anywhere. Seems like it would be better to go into the other direction and use ecmult_gen then? (IIRC, I experimented with that a while ago, and it was still a few micro-seconds faster than the current code).

And isn't the tweak secret even in known applications, e.g., Taproot tweaking, say Q = P + tG? One security feature of Taproot is that it hides the Merkle root until a script-path spend. This only works if the tweak t is secret. If t is public, then the attacker can get P = Q - tG and search for a Merkle root m such that t = hash(P, m).

Makes sense. For verifying taproot tweaks though, i.e. secp256k1_xonly_pubkey_tweak_add_check, treating the tweak as public seems to be fine though, as that one is called after revealing the merkle root? Maybe it's even worth it to have two different eckey_pubkey_tweak_add variants internally, one secure and one fast? The former could be used for creating tweaks, the latter for verifying them, or for special use cases like Silent Payments.

sipa · 2026-04-13T14:21:43Z

One security feature of Taproot is that it hides the Merkle root until a script-path spend.

Hmm, I would consider that a privacy feature, and not a security feature. Generally, taproot script trees will involve multiple distrusted participants, and the leaf/script structure will be shared with them. This doesn't work if the script tree is supposed to be secret from a security perspective.

And I think that we're generally not aiming to use constant time operations to protect privacy. Otherwise, it literally applies to everything. You may well want to keep your public keys private, so all functions involving public keys should be constant time?

It may make sense to have a mode/extension/module for "do everything constant time", but I don't think it aligns with how the library has operated so far.

…t-path spends fbffe8a bench: improve `VerifyNestedIfScript` benchmark precision (make stack clearing untimed) (David Gumberg) 616ee6f bench: add script verification benchmark for P2TR script-path spends (Sebastian Falbesoner) Pull request description: Similarly as #34472 already did for key-path spends, this PR adds a benchmark for P2TR script-path spends. So far we don't have a benchmark on master yet that exercises the verification of taproot commitments ([`VerifyTaprootCommitment`](https://github.com/bitcoin/bitcoin/blob/141fbe4d530b51345e62dee1348e82d8a0406ffc/src/script/interpreter.cpp#L1903)). Note that the tapscript leaf intentionally includes a single OP_CHECKSIG as it likely reflects the real world best. Spending tapscript leafs without any signature checks don't make much sense (they could be trivially tampered with and thus stolen by miners), and doing more than one signature check seems the exception rather than the rule. The primary motivation for this PR is to evaluate how potential secp256k1 changes in pubkey tweaking (e.g. [#1843](bitcoin-core/secp256k1#1843)) may impact script verification performance. ACKs for top commit: davidgumberg: reACK fbffe8a l0rinc: ACK fbffe8a sedited: ACK fbffe8a Tree-SHA512: 6fa1f2c336d6332b4f2d22173279ee29ad3ec5e5431109913a6978fef32e22a34d3247729ac9092bfdbbcd8f9dfcad8da50bc4ede2cb7efc1f93d6a744ddf41b

sipa reviewed Apr 6, 2026

View reviewed changes

Comment thread src/ecmult_gen_impl.h Outdated

theStack added 4 commits April 8, 2026 18:28

bench: add benchmark for secp256k1_ec_pubkey_tweak_add

9e55597

theStack force-pushed the pubkey_tweak_add-boost branch from f64306a to 0ebb1aa Compare April 8, 2026 18:21

theStack mentioned this pull request Apr 9, 2026

bench: add script verification benchmark for P2TR script-path spends bitcoin/bitcoin#35038

Merged

theStack mentioned this pull request Apr 11, 2026

optimize pubkey tweaking by using vartime jacobian->affine conversion (~10% speedup) #1844

Open

w0xlt reviewed Apr 12, 2026

View reviewed changes

w0xlt reviewed Apr 13, 2026

View reviewed changes

real-or-random added performance tweak/refactor labels Apr 13, 2026

netmask255 mentioned this pull request Apr 21, 2026

[Community Resource] Quantum Beat Rejection Archive — Daily Anti-Pattern Database aibtcdev/agent-news#573

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize additive pubkey tweaking with vartime generator point multiplication (>80% speedup)#1843

optimize additive pubkey tweaking with vartime generator point multiplication (>80% speedup)#1843
theStack wants to merge 4 commits intobitcoin-core:masterfrom
theStack:pubkey_tweak_add-boost

theStack commented Apr 5, 2026 •

edited

Loading

Uh oh!

sipa commented Apr 6, 2026

Uh oh!

Uh oh!

theStack commented Apr 8, 2026

Uh oh!

w0xlt commented Apr 9, 2026

Uh oh!

sipa commented Apr 10, 2026

Uh oh!

w0xlt left a comment •

edited

Loading

Uh oh!

w0xlt left a comment

Uh oh!

real-or-random commented Apr 13, 2026

Uh oh!

theStack commented Apr 13, 2026

Uh oh!

sipa commented Apr 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	/** Double multiply: R = naA + ngG
	*
	* Passing NULL as ng is equivalent to the zero scalar but a tiny bit faster.
	*/
	static void secp256k1_ecmult(secp256k1_gej r, const secp256k1_gej a, const secp256k1_scalar na, const secp256k1_scalar ng);

	static int secp256k1_eckey_pubkey_tweak_add(secp256k1_ge key, const secp256k1_scalar tweak) {
	secp256k1_gej pt;
	secp256k1_gej_set_ge(&pt, key);
	secp256k1_ecmult(&pt, &pt, &secp256k1_scalar_one, tweak);

Conversation

theStack commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sipa commented Apr 6, 2026

Uh oh!

Uh oh!

theStack commented Apr 8, 2026

Uh oh!

w0xlt commented Apr 9, 2026

Uh oh!

sipa commented Apr 10, 2026

Uh oh!

w0xlt left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

w0xlt left a comment

Choose a reason for hiding this comment

Uh oh!

real-or-random commented Apr 13, 2026

Uh oh!

theStack commented Apr 13, 2026

Uh oh!

sipa commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

theStack commented Apr 5, 2026 •

edited

Loading

w0xlt left a comment •

edited

Loading

sipa commented Apr 13, 2026 •

edited

Loading