enh(Foundation): optimize numeric conversions with digit-pairs algorithm and std::to_chars/from_chars#5301
enh(Foundation): optimize numeric conversions with digit-pairs algorithm and std::to_chars/from_chars#5301
Conversation
1ae9e17 to
2c8cfd7
Compare
aleks-f
left a comment
There was a problem hiding this comment.
given the tradeoff, it seems that the best approach is hybrid - for <1000 the old way, >1000 the new one. This is essentially the same pattern used by small string optimization or SpinlockMutex
POCO Numeric Conversion Benchmark ReportComparison of
NumberFormatter — Integer Formatting
vs standard library baselines (current branch)
POCO is 2.5-4x faster than snprintf for all integer formatting options. NumberFormatter — Float/Double Formatting
vs standard library baselines (current branch)
POCO is 2-4x faster than snprintf for all float formatting options. NumberParser — Integer Parsing
vs standard library baselines (current branch)
POCO matches or beats NumberParser — Float/Double Parsing
vs standard library baselines (current branch)
macOS: POCO is 1.9x slower than raw SummaryImprovements vs main
Regressions vs main
Slower than standard library baselines
Root causes:
vs standard library (current branch)
Optimization opportunitiesformat(double, prec=2) regression (0.6x on macOS):
tryParseFloat vs strtod (1.9x slower on macOS):
format(int) vs std::to_string (0.7x slower): parse(small dec) vs strtol (0.6-0.8x slower): Techniques used
|
e10d9e8 to
4d7619f
Compare
I pushed (almost) completely rewritten formatting/parsing. Benchmarks also indicate that in many cases it does (or would make sense to) use standard library-provided implementations. |
1991226 to
64678a7
Compare
605bd9e to
8b97c51
Compare
There was a problem hiding this comment.
Pull request overview
This PR modernizes and speeds up Foundation numeric string conversions by rewriting NumericString integer/float parsing/formatting to use a digit-pairs algorithm and std::to_chars/from_chars when available, while updating NumberFormatter/NumberParser to use the new internal APIs and refreshing conversion benchmarks/tests.
Changes:
- Reworked integer formatting/parsing (
intToStr/strToInt) with digit-pairs +std::from_charsfast path and a thousand-separator slow path. - Added float/double
std::to_chars/from_charsimplementations behindPOCO_HAS_FLOAT_CHARCONVwith double-conversion fallback. - Updated
NumberFormatter/NumberParsercall sites and rewrote conversion benchmarks/tests accordingly.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| Foundation/testsuite/src/StringTest.h | Removes direct NumericString/stream parsing helpers; adds benchmarkIntToStr() declaration. |
| Foundation/testsuite/src/StringTest.cpp | Rewrites conversion benchmarks to use NumberFormatter/NumberParser APIs and adds validation/baselines. |
| Foundation/testsuite/src/NumberParserTest.cpp | Adjusts whitespace-related expectation in parse error tests. |
| Foundation/src/NumericString.cpp | Implements charconv-based float/double conversions and refactors padding/format helpers. |
| Foundation/src/NumberParser.cpp | Routes parsing through updated strToInt overloads (including range-based hex paths). |
| Foundation/src/NumberFormatter.cpp | Checks intToStr return value before appending to output strings. |
| Foundation/include/Poco/NumericString.h | Makes header “internal-only”, rewrites strToInt/intToStr, and refactors helper templates/constants. |
| Foundation/include/Poco/NumberFormatter.h | Includes NumericString.h via private-include guard and handles intToStr failure. |
| Foundation/include/Poco/Config.h | Adds POCO_HAS_FLOAT_CHARCONV feature detection for float to_chars/from_chars. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Response to review comments
All other review findings have been addressed in the latest push. |
41695b7 to
ddff866
Compare
I'm not sure what is considered "public API surface", but these functions certainly were visible and accessible to users so far. We use strToInt directly and there may be other users using it. Perhaps it could have been better thought initially, but at this point I don't see a rationale for hiding it like this - if someone wants to use it directly, there is no harm. |
Mark as deprecated if used from outside the library? |
I wouldn't. There are some utility functions there that may come handy, like safeMultiply, locale etc. Maybe we could even add string param to decimal and thousand separator functions, defaulting to empty for global locale. We needed that recently and I forgot it was there and recommended the std atrocity. |
|
Removed private guard. |
a50cef0 to
e420b4d
Compare
…hars Rewrite intToStr as inline template in NumericString.h with two paths: - Fast path (no padding/prefix/thSep): converts directly into the result buffer using std::to_chars (decimal) or shift-and-mask (hex). Fully inlined, no PLT calls. - Slow path: converts to temp buffer then calls formatIntResult() in Foundation for padding, prefix, case, and thousand separators. Move NumberFormatter::format/formatHex back to inline in the header so they benefit from the inlined intToStr fast path. Make NumericString.h a private header (include guard enforced via POCO_NUMERIC_STRING_PRIVATE). Add NumberFormatter/intToStr benchmark.
…d tryParseFloat allocation - isSafeIntCast: change return type from T& to bool (was UB) - strToInt: reject bare "+" or "-" without digits (returned true with 0) - tryParseFloat: pass std::string directly to strToDouble instead of s.c_str() which caused needless const char* -> std::string conversion
Use std::from_chars as the primary parsing path for strToInt. For inputs without thousand separators (the common case), from_chars handles sign, overflow, and all bases directly — no manual per-digit loop needed. When from_chars stops early on a non-digit character and a thousand separator is configured (base 10 only), strip separator characters and retry from_chars on the cleaned string. Also fix: use static_cast<unsigned char> for std::isspace to avoid UB on signed char with values > 0x7F.
Test specific formatting scenarios (padding, prefix, hex, octal, thousand separator, binary, edge values) across int/uint/int64/uint64, plus NumberFormatter throughput with random data.
Change all NumberParser::tryParse* calls from strToInt(s.c_str(), ...) to strToInt(s, ...) or strToInt(begin, end, ...) to use the known string length directly, avoiding a redundant strlen scan.
…numbers Remove per-case constant-value intToStr benchmark (compiler can constant-fold, not representative of real usage). Add random small numbers (0-999) to both intToStr and strToInt benchmarks. Use std::string overload in strToInt benchmarks to avoid strlen.
Replace std::to_chars-based intToStr with a direct implementation using the "digit pairs" technique: extract two decimal digits per iteration via division by 100 and a 200-byte lookup table, halving the number of expensive integer divisions. - Base 10: two-digits-at-a-time with integrated thousand separator support - Power-of-2 bases (2, 4, 8, 16): shift-and-mask, no division - Other bases (3, 5, 6, 7, 9, 11-15): division with digit table lookup - Write directly into result buffer, reverse in-place (no temp buffer) Benchmarks (1M random values, macOS Apple Clang): - small int (0-999) dec: 2.1x faster than std::to_chars - int hex: 1.9x faster - Int64 hex: 1.4x faster - int dec / Int64 dec: ~same or slightly faster
- Guard width parameter against buffer overflow (throw RangeException early)
- Replace unreachable default with poco_bugcheck() in power-of-2 switch
- Use if constexpr in isIntOverflow and safeMultiply (C++17 consistency)
- Merge split namespace Impl blocks
- Remove unused <algorithm> include
- Fix misleading whitespace test: parse(" 123") now correctly succeeds
…ars for floats Use std::to_chars/from_chars for float/double formatting and parsing when available (GCC 11+, MSVC 19.24+, non-Apple libc++ 17+, Apple Clang with macOS 26.0+ deployment target). Falls back to bundled double-conversion library on older compilers. - Add POCO_HAS_FLOAT_CHARCONV feature detection macro in Config.h - Conditional compilation in NumericString.cpp - toShortestStr helper respects lowDec/highDec for notation selection - strToFloatImpl handles inf/nan matching manually - std::to_chars uses round-half-to-even (IEEE 754) vs round-half-up
…anup - Fix UB in toShortestStr for NaN/infinity (log10 of non-finite values) - Fix log10 rounding near powers of 10 with power-of-10 verification - Add missing empty-string guard to strToFloat(const std::string&) - Deduplicate float/double string overloads via template helpers - Simplify pad(): replace unique_ptr<string> with plain string - Remove obsolete MSVC #pragma warning(disable: 4146) - Move <cmath> from header to .cpp, remove unused <memory> - Fix POCO_MAX_INT_STRING_LEN comment - Add [[nodiscard]] to strToInt, intToStr, safeMultiply - Add comment documenting 'f' suffix stripping intent
- Use std::to_chars(fixed) + adjustPrecision instead of slow std::to_chars(fixed, precision) for floatToFixedStr/doubleToFixedStr - Fix adjustPrecision carry propagation for negative numbers - Merge floatToStrImpl/floatToFixedStrImpl into floatToStrCommon - Add const to all non-mutated variables in NumericString.h and .cpp - Rewrite benchmarks to use NumberFormatter/NumberParser public API with baselines (std::to_string, snprintf, strtol, strtod, sscanf) - Add round-trip correctness validation to all benchmark sections - Remove unused parseStream helper and MemoryStream dependency from test header
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
- Restore Foundation_API on float/double functions in NumericString.h (inline NumberFormatter methods need exported symbols in shared lib) - Fix POCO_HAS_FLOAT_CHARCONV: raise libc++ threshold from 170000 to 200000 (libc++ 17-19 have float from_chars overloads deleted; only libc++ 20+ / LLVM 20 supports them — fixes Android NDK r28) - Fix [[nodiscard]] warnings: use intToStr return value to guard str.append in NumberFormatter.cpp and NumberFormatter.h - Update double-conversion comment to mention POCO_HAS_FLOAT_CHARCONV
- Fix POCO_HAS_FLOAT_CHARCONV: use __APPLE__ instead of __apple_build_version__ to handle Homebrew LLVM which uses system libc++ with Apple availability annotations - Fix libc++ version threshold: 170000 -> 200000 (libc++ 17-19 have float from_chars overloads deleted, only libc++ 20+ supports them) - Fix from_chars underflow: fall back to strtod on result_out_of_range (GCC leaves result unmodified on range error); fixes JSON parsing of extreme values like "123e-10000000" - Fix benchmark validation on Windows: use strtoul for unsigned comparison (strtol overflows 32-bit long for values > INT_MAX) - Fix [[nodiscard]] warnings: use intToStr return value in NumberFormatter.cpp/.h to guard str.append - Replace non-ASCII characters in comments with ASCII equivalents - Update double-conversion comment to mention POCO_HAS_FLOAT_CHARCONV
- Swap template parameters from <F, T> to <To, From> matching isIntOverflow<To, From>. Allows isSafeIntCast<TargetType>(value) with From deduced from the argument. - Use consistent naming (To/From) across all cast functions. - Return type was already fixed from T& to bool (the old T& returning bool was UB); [[nodiscard]] ensures callers check the result. - Remove unused isSafeIntCast using declaration from StringTest.
Following PR review feedback, NumericString.h is no longer guarded against direct inclusion. The header remains an internal implementation detail of NumberFormatter/NumberParser, but removing the #error guard allows flexibility for downstream code and tests.
e420b4d to
61cdc0e
Compare
Summary
Rewrites integer and floating-point conversion in
NumericString.h/.cppforsignificantly improved performance, using modern C++17/C++20 techniques while
maintaining full API compatibility.
Integer formatting (
intToStr)Based on the implementation suggested by @aleks-f,
replaces the old division-based loop with the two-digits-at-a-time algorithm
using a 200-byte digit-pairs lookup table (as used by {fmt}, jemalloc, Facebook Folly).
Shift-and-mask for hex and power-of-2 bases. Writes directly into the result buffer
in reverse, then reverses in-place -- no intermediate buffer copy.
Integer parsing (
strToInt)Replaced the manual digit loop with
std::from_chars(C++17). Range-based(begin, end)core overload avoidsstrlen. Thousand-separator slow path onlytriggers when
from_charsstops at a non-digit character.Float/double formatting (
floatToStr,doubleToStr)When
POCO_HAS_FLOAT_CHARCONVis defined (GCC 11+, MSVC 19.24+, non-Applelibc++ 20+, or Apple Clang with macOS 26.0+ deployment target), uses
std::to_charsfor shortest representation andstd::to_chars(fixed) + adjustPrecisionfor fixed-precision formatting. Falls back to bundleddouble-conversion library on older compilers.
To enable on Apple Clang, set the deployment target:
```bash
cmake -B build -DCMAKE_OSX_DEPLOYMENT_TARGET=26.0
```
Caveat:
std::to_chars(fixed)+ manual precision adjustment is slower thandouble-conversion's integrated
ToFixedfor fixed-precision formatting on macOSApple Clang (0.6x for
format(double, prec=2)). This is a tradeoff: thestd::to_charspath eliminates ~6800 lines of bundled double-conversion dependencywhen the feature is available. As stdlib implementations improve, this gap should
close. The double-conversion fallback remains available for older compilers where
it is the only option.
Float/double parsing (
strToFloat,strToDouble)When
POCO_HAS_FLOAT_CHARCONVis defined, usesstd::from_charswith manualinf/nan handling (matching double-conversion's exact-match semantics).
Falls back to
strtodfor out-of-range values (extreme underflow/overflow)where
from_charsreportserrc::result_out_of_range.Other changes
POCO_HAS_FLOAT_CHARCONVfeature detection macro inConfig.hNumericString.hremainFoundation_APIexported(required by inline
NumberFormattermethods compiled into other libraries)if constexprinisIntOverflowandsafeMultiply(C++17 consistency)[[nodiscard]]onstrToInt,intToStr,safeMultiplyintToStrusessizeparameter as buffer capacity for bounds checkingpoco_bugcheck()in unreachable switch default#pragma warning(disable: 4146)<cmath>from header to .cpp, removed unused<memory>tryParseHex/tryParseHex64now skip leading whitespace before 0x prefixNumberFormatter/NumberParserpublic API withbaselines (std::to_string, snprintf, strtol, strtod, sscanf) and round-trip
correctness validation
Performance (5M random values, macOS Apple Clang arm64 / Linux GCC arm64)
NumberFormatter -- Integer
NumberFormatter -- Float/Double
NumberParser -- Integer
NumberParser -- Float/Double
vs standard library
Known regressions
format(double, prec=2)on macOS: 0.6x vs main. Root cause:std::to_chars(fixed)+adjustPrecisionis slower than double-conversion'sToFixed. This is the cost ofeliminating the double-conversion dependency when
POCO_HAS_FLOAT_CHARCONVis available.On compilers without charconv support, the double-conversion path is unchanged.
Test plan