Skip to content

imsupeer/microTransformerptBR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Micro-Transformer PT-BR (WASM) — from scratch to the browser

Text autocomplete 100% client-side in Brazilian Portuguese: custom BPE tokenizer, small causal Transformer, PyTorch training, INT8 quantization, and in-browser inference via WebAssembly (Rust). Great for a portfolio: it showcases NLP + compression + front-end engineering + low-level skills.

Demo: run locally (steps below) or deploy to S3 + CloudFront to serve it statically.


✨ Highlights

  • From scratch: your own BPE tokenizer, minimal Transformer, quantization, and WASM runtime.
  • Browser-only: no server in inference, cost $0 per request.
  • PT-BR first: vocabulary and corpus tailored for Brazilian Portuguese.
  • Real stack: Python (training), Rust→WASM (ops), Angular (UI).

🧱 Architecture

data/                 # raw and cleaned data
model/                # tokenizer/model training, quantization and export
wasm/                 # Rust → WebAssembly core (matmul/softmax)
web/angular/          # Angular UI + TypeScript runtime

Flow: data → tokenizer → training → quantization → export .npz → WASM → UI


🔧 Prerequisites

  • Python 3.10+ (3.11 recommended)
  • PyTorch (CUDA optional for faster training)
  • Rust + wasm-pack (cargo install wasm-pack or curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf | sh)
  • Node.js 18+ (20 recommended) and Angular CLI (npm i -g @angular/cli)

🚀 Get started in 10 minutes

  1. Clone the repository and cd into it.

  2. Prepare data (add .txt files to data/raw/):

python model/clean_texts.py
python model/train_bpe.py
  1. Train the model (tweak steps/batch in model/train.py for a quick run):
python model/train.py
  1. Quantize and export weights for the browser:
python model/quantize.py

This generates web/angular/public/weights.npz (INT8 + scales). vocab.json and merges.txt also live in web/angular/public/.

  1. Build the WASM core:
cd wasm
wasm-pack build --target web --release
mkdir -p ../web/angular/public/wasm
cp -r pkg/* ../web/angular/public/wasm/
  1. Run the Angular UI:
cd ../web/angular
npm install
ng serve

Open http://localhost:4200 and try the playground.


🧠 Model configuration (suggested)

  • vocab_size: 12,000
  • n_layers: 6, n_heads: 6, d_model: 384, d_ff: 1536
  • seq_len: 256 for the initial training
  • Approx size: 20–30M parameters

You can shrink dimensions for older phones or increase them for better quality on desktops.


📦 Folder structure (detailed)

micro-transformer-ptbr/
  data/
    raw/                   # put your .txt files here
    clean/                 # produced by the cleaning step
    tokenizer/
      vocab.json           # produced by BPE training
      merges.txt           # produced by BPE training
  model/
    clean_texts.py         # basic cleaning/normalization
    tokenizer_bpe.py       # custom BPE (train/encode/decode)
    train_bpe.py           # BPE training script
    transformer.py         # TinyGPT minimal (PyTorch)
    train.py               # training loop
    quantize.py            # INT8 + export .npz for the browser
  wasm/
    Cargo.toml
    src/lib.rs             # matmul and softmax via wasm-bindgen
    pkg/                   # generated by wasm-pack (copy to web/public/wasm)
  web/
    angular/
      src/app/services/
        tokenizer.ts       # BPE TS compatible with Python
        model-runner.service.ts
      src/app/components/playground/
        playground.component.*
      public/
        weights.npz        # quantized weights
        vocab.json
        merges.txt
        wasm/              # generated wasm artifacts
      package.json

🧪 PyTorch ↔ Browser validation

To ensure numerical parity:

  1. Run a forward pass with batch=1/short seq in Python and save activations to .npz (e.g., emb_out, attn_scores, ffn_out, logits).
  2. Replicate the same input in the browser and compare |a−b| < 1e-3.
  3. Common pitfalls: matrix order (row/col-major), head reshapes, quantization scales.

Tip: validate just 1 block first, then stack all blocks.


📈 Suggested metrics

  • Perplexity on a PT-BR validation set.
  • Latency (ms/token) on desktop vs. mobile.
  • Artifact sizes: .wasm, weights.npz, vocab.json.
  • Throughput (tokens/s) across devices.

Add a small table in your fork’s README with real results.


🚢 Deploy (S3 + CloudFront)

  1. Build the UI: ng build --configuration production
  2. Upload dist/ and web/angular/public/* to a static S3 bucket.
  3. Publish via CloudFront with OAC and short TTL for weights.npz.
  4. Serve .wasm with Content-Type: application/wasm and enable gzip/brotli on assets.

🛠️ Troubleshooting

  • WASM won’t load → check Content-Type: application/wasm and the /public/wasm/* path.
  • “Stuck-together” text → run an encode → decode roundtrip test in BPE before training; adjust </w> handling.
  • Slow on mobile → reduce n_layers/d_model/vocab_size, use smaller topK (e.g., 20), and limit maxNewTokens.
  • Logit mismatch → verify INT8 scales and Q/K/V reshapes across heads.

🧭 Suggested roadmap

  • Implement full parity in model-runner.service.ts (LN, attn, FFN, head)
  • KV cache for token-by-token generation
  • Per-channel quantization and SIMD
  • WebGPU port and benchmark vs WASM
  • Larger model with pruning/knowledge distillation

🙌 Credits

Implementation and engineering by Joseph Alexanndry. Conceptual inspirations: the original Transformer paper, client-side quantization work, and the Rust/WASM community.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors