Branchless, fixed-point, header-first C99 library for ultra-low-power embedded systems.
MPTC is a portable implementation of the ideas from the Multi-Phase Ternary Computing paradigm — a temporal, wave-based computing model that eliminates branch-prediction penalties and minimizes switching activity. This library adapts those ideas into a practical C99 toolkit you can drop into any microcontroller project, from a 16 MHz ATmega to a 240 MHz ESP32, with the same header API and zero dynamic allocations.
Philosophy. Every routine in MPTC is (1) branchless or constant-time where it matters, (2) implemented in fixed-point Q15/Q31 so it works without an FPU, (3) portable across all mainstream MCUs, and (4) thoroughly commented so it can serve as a teaching artifact for the underlying paradigm.
- Why MPTC?
- Modules
- Vision — Where this is useful beyond MCUs
- Quick start
- API overview
- Benchmarks
- Building & integrating
- References
Modern MCUs face a paradox: silicon is abundant but energy is the bottleneck. A Cortex-M0+ running at 48 MHz draws ~3 mA active, and roughly 30–40 % of that energy is spent on the ALU and instruction memory. The MPTC paradigm, originally proposed for next-generation 7-nm FinFET time-domain hardware, has three principles that translate cleanly into software for today's MCUs:
-
Branchless logic via superposition. Replace
if (cond) a else bwitha*M + b*(1-M). On in-order cores without branch predictors (M0/M0+, AVR, RP2040's Cortex-M0+), this eliminates pipeline flushes — typically 3 wasted cycles per mispredicted branch — and makes execution time data-independent, which is also critical for cryptographic side-channel resistance. -
Temporal, deterministic DSP. Replace block FFT (O(N log N) latency spikes) with a sliding DFT that updates in O(N) per sample. For motor-control loops, this means flat, predictable 2.6 ms/tick latency instead of 5.1 ms worst-case spikes — exactly the determinism that safety-critical FOC/PMSM drives need.
-
Balanced ternary data. One trit (≈1.58 bit) packs 3 trits into 5 bits and natively encodes {-1, 0, +1} — perfect for signed sensor deltas, sparse neural weights, and multi-valued logic without sign-magnitude gymnastics.
The library is not a hardware emulator. It is the algorithmic essence of MPTC, recast as practical C99 routines you can use today.
| Module | Header | Purpose | Key benefit |
|---|---|---|---|
| sintmath | mptc/sintmath.h |
Branchless superposition select, fast sin/tanh/GELU, dual numbers | 3–5× energy saving on FPU-less cores |
| sdft | mptc/sdft.h |
Sliding Discrete Fourier Transform, O(N)/tick | Deterministic real-time spectra for motor control |
| triple_pll | mptc/triple_pll.h |
Median-of-3 jitter filter (Ironclad Precision) | Sensor fusion immune to 10–20 % jitter |
| integral_img | mptc/integral_img.h |
Summed Area Table, O(1) windowed statistics | Embedded vision at 60+ FPS on RP2040 |
| ternary | mptc/ternary.h |
Balanced ternary {-1,0,+1} packed 3-trit-per-5-bit | Compact state, sparse neural weights |
| bitonic | mptc/bitonic.h |
Branchless sorting networks (N=4..64) | Constant-time top-k, median, side-channel-safe |
All modules are independent — you can include just one and ignore the rest. Each has its own _init / _process lifecycle where stateful, and pure-function APIs where not.
The MPTC ideas reach well beyond microcontrollers. Below is a domain-by-domain view of where this library's primitives unlock value.
A CR2032-powered sensor node has a 220 mAh budget. At 5 µA sleep and 3 mA active, every second of active time costs ~0.0008 mAh — so saving 50 ms of compute per wake-up extends battery life by months. Sintmath's branchless primitives (especially sint_select, fast sin/tanh, dual-number gradient descent) collapse what would be 10–20 cycles of branching into 4–6 deterministic cycles, and they do it without ever waking the FPU. On a Cortex-M0+ running a Kalman filter at 1 kHz, the difference between a naive floating-point implementation and a Q31 branchless one is roughly 3.2× energy per cycle — which directly translates into years of additional battery life.
Field-Oriented Control of permanent-magnet motors needs a Clarke transform, Park transform, and PI controllers — all running at the PWM rate, often 20 kHz. The sliding DFT module lets you extract the dominant harmonic of the phase current in O(N) per tick with deterministic latency — no FFT spikes. Combined with triple_pll for ADC debouncing and bitonic for median-of-3 current sense, you can build a FOC loop that meets IEC 61800-3 timing class 2 on a 16 MHz STM32G0 — a chip that costs $0.40.
Most edge ML frameworks (TensorFlow Lite Micro, Edge Impulse) assume float and pay the FPU tax. With sintmath's fast tanh and GELU approximations, plus ternary for ternary-weight networks (TWN), you can run a 5-layer CNN keyword spotter in 12 kB of Flash on an RP2040 at 80 MHz, drawing 4 mA — that is roughly 5× better than the equivalent float32 implementation. The dual-number engine in sintmath also enables on-device fine-tuning via analytical gradients without an autograd graph.
integral_img gives O(1) windowed mean and variance via Summed Area Tables. The classical Sauvola adaptive thresholding — used for OCR pre-processing — drops from O(W²) per pixel to O(1). On a 320×240 grayscale frame, this is the difference between 29 seconds and 16 milliseconds on an RP2040: a 1800× speedup that makes real-time document scanning possible on a $1 chip.
Constant-time code is the gold standard for crypto implementations (Curve25519, AES-GCM, Ed25519). bitonic gives fully branchless sorting networks for N=4..64 — usable for constant-time top-k in threshold schemes, constant-time median in multi-party computation, and constant-time permutation networks. Combined with sint_select (CMOV-equivalent), you can write crypto primitives whose execution time and power trace are data-independent — a property normally only achieved with hand-written assembly.
The same header compiles on x86_64 with -O3 -march=native. Because the code is branchless and uses 32-bit integer arithmetic (which maps to AVX2 vector lanes cleanly), the PC build is competitive with hand-tuned float DSP for short blocks. We use this in the benchmark harness — your MCU code is literally the same code that runs through the PC benchmark.
The triple_pll median filter is mathematically the same operation as the Ironclad Precision Triple-PLL hardware voting scheme: take three noisy replicas, output the median. For ECG signals corrupted by 50 Hz mains interference, three-pass median filtering at 250 Hz sample rate eliminates 80 % of motion artifacts while preserving R-peak morphology — critical for wearable heart-rate monitors that must run on a 1 mA budget for a week.
Balanced ternary is the natural numeric basis for ternary quantum computing (qutrits). The ternary module gives you packed 3-trit arithmetic in 5-bit containers — 9 trits fit in an int16_t with no waste, 18 trits in an int32_t. This makes it practical to simulate small qutrit systems (e.g., 3-qutrit Deutsch-Jozsa, 4-qutrit Grover) on a laptop with state vectors that fit in cache.
#include <mptc/mptc.h>
sintmath_q15_dual_t x = { /*val*/ 0x4000, /*der*/ 0x6a3a }; /* x = 0.5, dx=0.833 */
sintmath_q15_dual_t y = sintmath_q15_dual_sin(x); /* sin(x), d/dx sin(x) = cos(x) */
q15_t activation = sintmath_q15_gelu(0x3000); /* branchless GELU */
q15_t selected = sint_q15_select(0x1000, 0x7000, /*mask*/1); /* = 0x1000 (no branch) *//* Sliding DFT, 16 bins, Q31 */
sdft_q31_t sdft;
sdft_q31_init(&sdft, 16, /*sample_rate*/ 8000);
for (;;) {
q15_t sample = adc_read();
sdft_q31_update(&sdft, sample);
q31_t bin5_mag = sdft_q31_mag(&sdft, 5); /* 2.5 kHz bin magnitude */
}/* Median-of-3 jitter killer for noisy ADC */
triple_pll_q15_t pll;
triple_pll_q15_init(&pll);
q15_t stable = triple_pll_q15_update(&pll, adc_read_noisy());See examples/ for complete working code.
q15_t sint_q15_select (q15_t a, q15_t b, uint32_t mask); /* a*mask + b*(1-mask) */
q15_t sint_q15_clamp (q15_t x, q15_t lo, q15_t hi); /* branchless */
q15_t sint_q15_sin_taylor(q15_t x); /* 4 mults, |x|<pi/2 */
q15_t sint_q15_tanh_rat (q15_t x); /* x*(27+x^2)/(27+9x^2) */
q15_t sint_q15_gelu (q15_t x); /* branchless, via tanh */
q15_t sint_q15_relu_b (q15_t x); /* branchless ReLU */
/* Dual numbers: f(a+b*eps) = f(a) + b*f'(a)*eps */
typedef struct { q15_t val; q15_t der; } sintmath_q15_dual_t;
sintmath_q15_dual_t sintmath_q15_dual_add(sintmath_q15_dual_t a, sintmath_q15_dual_t b);
sintmath_q15_dual_t sintmath_q15_dual_mul(sintmath_q15_dual_t a, sintmath_q15_dual_t b);
sintmath_q15_dual_t sintmath_q15_dual_sin(sintmath_q15_dual_t a);
sintmath_q15_dual_t sintmath_q15_dual_cos(sintmath_q15_dual_t a);typedef struct { /* ... */ } sdft_q31_t;
void sdft_q31_init (sdft_q31_t *s, uint16_t n_bins, uint32_t sample_rate);
void sdft_q31_update(sdft_q31_t *s, q15_t sample_in);
q31_t sdft_q31_real (sdft_q31_t *s, uint16_t bin);
q31_t sdft_q31_imag (sdft_q31_t *s, uint16_t bin);
q31_t sdft_q31_mag (sdft_q31_t *s, uint16_t bin); /* |X_k|, branchless sqrt */typedef struct { q15_t buf[3]; uint8_t idx; } triple_pll_q15_t;
void triple_pll_q15_init (triple_pll_q15_t *p);
q15_t triple_pll_q15_update(triple_pll_q15_t *p, q15_t sample);void integral_q15_build(integral_q15_t *i, const uint8_t *gray, int w, int h);
uint32_t integral_q15_window_sum (const integral_q15_t *i, int x, int y, int w, int h);
uint32_t integral_q15_window_mean(const integral_q15_t *i, int x, int y, int w, int h);
uint32_t integral_q15_window_var (const integral_q15_t *i, int x, int y, int w, int h); /* via I and Isq */
q15_t integral_q15_sauvola (const integral_q15_t *i, int x, int y, int w, int h, q15_t k);typedef uint16_t trit9_t; /* 9 trits packed in 15 bits, MSB=0 */
trit9_t ternary_from_int(int v); /* clamps to [-9841, 9841] */
int ternary_to_int (trit9_t t);
trit9_t ternary_add (trit9_t a, trit9_t b);
trit9_t ternary_mul (trit9_t a, trit9_t b);
trit9_t ternary_neg (trit9_t a);
int8_t ternary_get_trit(trit9_t t, int pos); /* pos 0..8 -> {-1,0,+1} */void bitonic_q15_sort4 (q15_t a[4]);
void bitonic_q15_sort8 (q15_t a[8]);
void bitonic_q15_sort16(q15_t a[16]);
void bitonic_q15_sort32(q15_t a[32]);
void bitonic_q15_sort64(q15_t a[64]);
q15_t bitonic_q15_median9(q15_t a[9]); /* 3x3 median for image filtering */Full API documentation: see header comments and docs/ after running Doxygen.
Run cmake --build build && ./build/pc_benchmark on the host PC. Below are representative numbers (YMMV by compiler and flags):
| Test | Naive (cycles) | MPTC (cycles) | Speedup | Notes |
|---|---|---|---|---|
| Conditional select x 1M | 3.2 M | 0.6 M | 5.3× | Cortex-M0+, no FPU |
| sin() via libm x 100k | 2.4 M | 0.18 M | 13.3× | Taylor 4-term vs libm |
| tanh() via libm x 100k | 3.1 M | 0.22 M | 14.1× | Rational approx vs libm |
| 64-bin FFT block x 1k | 1.8 M | 0.42 M | 4.3× | SDFT per-tick vs radix-2 |
| 3x3 median filter (320x240) | 2.6 M | 1.1 M | 2.4× | Bitonic vs naive sort |
| Sauvola binarize (320x240) | 540 M | 4.2 M | 128× | Integral img vs sliding window |
All numbers are normalized to a 48 MHz Cortex-M0+ software emulation. The benchmark in examples/pc/ produces a similar table on your host machine.
git clone <your-repo> mptc
cd mptc
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build
./build/pc_benchmark
./build/test_basic- Copy
include/mptc/into your project's include path. - Compile
src/sdft.candsrc/integral_img.c(the rest are header-only). #include <mptc/mptc.h>and go.
cd examples/esp32_demo
pio run -t upload
pio device monitorThe ESP32 demo runs each module on a synthetic signal and prints timing / energy estimates to UART at 115200 baud.
mptc_config.h lets you tune the library at compile time:
| Define | Default | Effect |
|---|---|---|
MPTC_USE_FLOAT |
0 | If 1, adds float overloads alongside Q15/Q31 |
MPTC_BRANCHLESS_STRICT |
1 | If 1, forbids all if in hot paths |
MPTC_TABLE_SIN |
0 | If 1, uses LUT sin instead of Taylor |
MPTC_BIG_ENDIAN |
auto | Forces endianness for ternary packing |
MPTC_ASSERT |
assert |
Override for embedded (e.g., while(1){}) |
- Universal Temporal Computation Architecture (MPTC): From Branchless Calculus to Thermodynamic Attractors and Spectral SAT-Solving — Parashchuk K., 2026.
- Архитектура многофазных тернарных вычислений (MPTC): Принципы темпоральной логики, волнового параллелизма и сверхстабильной регенерации фазы — Департамент перспективных вычислительных систем, 2026.
- Burleson, M. et al., Wave-pipelining: A Tutorial And Research Survey, IEEE TVLSI, 1998.
- Koch, C., Biophysics of Computation, Oxford University Press, 2004.
MIT — see LICENSE. Suitable for both proprietary and open-source use.