Micro-Transformer PT-BR (WASM) — from scratch to the browser

Text autocomplete 100% client-side in Brazilian Portuguese: custom BPE tokenizer, small causal Transformer, PyTorch training, INT8 quantization, and in-browser inference via WebAssembly (Rust). Great for a portfolio: it showcases NLP + compression + front-end engineering + low-level skills.

Demo: run locally (steps below) or deploy to S3 + CloudFront to serve it statically.

✨ Highlights

From scratch: your own BPE tokenizer, minimal Transformer, quantization, and WASM runtime.
Browser-only: no server in inference, cost $0 per request.
PT-BR first: vocabulary and corpus tailored for Brazilian Portuguese.
Real stack: Python (training), Rust→WASM (ops), Angular (UI).

🧱 Architecture

data/                 # raw and cleaned data
model/                # tokenizer/model training, quantization and export
wasm/                 # Rust → WebAssembly core (matmul/softmax)
web/angular/          # Angular UI + TypeScript runtime

Flow: data → tokenizer → training → quantization → export .npz → WASM → UI

🔧 Prerequisites

Python 3.10+ (3.11 recommended)
PyTorch (CUDA optional for faster training)
Rust + wasm-pack (cargo install wasm-pack or curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf | sh)
Node.js 18+ (20 recommended) and Angular CLI (npm i -g @angular/cli)

🚀 Get started in 10 minutes

Clone the repository and cd into it.
Prepare data (add .txt files to data/raw/):

python model/clean_texts.py
python model/train_bpe.py

Train the model (tweak steps/batch in model/train.py for a quick run):

python model/train.py

Quantize and export weights for the browser:

python model/quantize.py

This generates web/angular/public/weights.npz (INT8 + scales). vocab.json and merges.txt also live in web/angular/public/.

Build the WASM core:

cd wasm
wasm-pack build --target web --release
mkdir -p ../web/angular/public/wasm
cp -r pkg/* ../web/angular/public/wasm/

Run the Angular UI:

cd ../web/angular
npm install
ng serve

Open http://localhost:4200 and try the playground.

🧠 Model configuration (suggested)

vocab_size: 12,000
n_layers: 6, n_heads: 6, d_model: 384, d_ff: 1536
seq_len: 256 for the initial training
Approx size: 20–30M parameters

You can shrink dimensions for older phones or increase them for better quality on desktops.

📦 Folder structure (detailed)

micro-transformer-ptbr/
  data/
    raw/                   # put your .txt files here
    clean/                 # produced by the cleaning step
    tokenizer/
      vocab.json           # produced by BPE training
      merges.txt           # produced by BPE training
  model/
    clean_texts.py         # basic cleaning/normalization
    tokenizer_bpe.py       # custom BPE (train/encode/decode)
    train_bpe.py           # BPE training script
    transformer.py         # TinyGPT minimal (PyTorch)
    train.py               # training loop
    quantize.py            # INT8 + export .npz for the browser
  wasm/
    Cargo.toml
    src/lib.rs             # matmul and softmax via wasm-bindgen
    pkg/                   # generated by wasm-pack (copy to web/public/wasm)
  web/
    angular/
      src/app/services/
        tokenizer.ts       # BPE TS compatible with Python
        model-runner.service.ts
      src/app/components/playground/
        playground.component.*
      public/
        weights.npz        # quantized weights
        vocab.json
        merges.txt
        wasm/              # generated wasm artifacts
      package.json

🧪 PyTorch ↔ Browser validation

To ensure numerical parity:

Run a forward pass with batch=1/short seq in Python and save activations to .npz (e.g., emb_out, attn_scores, ffn_out, logits).
Replicate the same input in the browser and compare |a−b| < 1e-3.
Common pitfalls: matrix order (row/col-major), head reshapes, quantization scales.

Tip: validate just 1 block first, then stack all blocks.

📈 Suggested metrics

Perplexity on a PT-BR validation set.
Latency (ms/token) on desktop vs. mobile.
Artifact sizes: .wasm, weights.npz, vocab.json.
Throughput (tokens/s) across devices.

Add a small table in your fork’s README with real results.

🚢 Deploy (S3 + CloudFront)

Build the UI: ng build --configuration production
Upload dist/ and web/angular/public/* to a static S3 bucket.
Publish via CloudFront with OAC and short TTL for weights.npz.
Serve .wasm with Content-Type: application/wasm and enable gzip/brotli on assets.

🛠️ Troubleshooting

WASM won’t load → check Content-Type: application/wasm and the /public/wasm/* path.
“Stuck-together” text → run an encode → decode roundtrip test in BPE before training; adjust </w> handling.
Slow on mobile → reduce n_layers/d_model/vocab_size, use smaller topK (e.g., 20), and limit maxNewTokens.
Logit mismatch → verify INT8 scales and Q/K/V reshapes across heads.

🧭 Suggested roadmap

Implement full parity in model-runner.service.ts (LN, attn, FFN, head)
KV cache for token-by-token generation
Per-channel quantization and SIMD
WebGPU port and benchmark vs WASM
Larger model with pruning/knowledge distillation

🙌 Credits

Implementation and engineering by Joseph Alexanndry. Conceptual inspirations: the original Transformer paper, client-side quantization work, and the Rust/WASM community.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
model		model
wasm		wasm
web/angular		web/angular
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Micro-Transformer PT-BR (WASM) — from scratch to the browser

✨ Highlights

🧱 Architecture

🔧 Prerequisites

🚀 Get started in 10 minutes

🧠 Model configuration (suggested)

📦 Folder structure (detailed)

🧪 PyTorch ↔ Browser validation

📈 Suggested metrics

🚢 Deploy (S3 + CloudFront)

🛠️ Troubleshooting

🧭 Suggested roadmap

🙌 Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Micro-Transformer PT-BR (WASM) — from scratch to the browser

✨ Highlights

🧱 Architecture

🔧 Prerequisites

🚀 Get started in 10 minutes

🧠 Model configuration (suggested)

📦 Folder structure (detailed)

🧪 PyTorch ↔ Browser validation

📈 Suggested metrics

🚢 Deploy (S3 + CloudFront)

🛠️ Troubleshooting

🧭 Suggested roadmap

🙌 Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages