General

This is a from-scratch implementation of an LLM with the intent to gain experience in current optimization tricks and hyperparamter tweaking. Unfortunately, I do only own 2x RTX 3090 (NVLink coupled) GPUs and not an entire datacenter but I might actually spend money deploying my models on rented hardware once I hit the most recent state-of-the-art wall. I'm using Stanfords CS336 (Language Modeling from Scratch) lecture and unit-test infrastructure as a guide-line but I will also put diffusion-transformer (not covered by this lecture) later into this or a seperate repository.

Transparency: AI Usage in this repository

The code has been written fully by hand, using the course material - which is very sparse when it comes to implement the features - pencil and paper to work out matrix shapes and reshuffling, reading papers at least partially (e.g. Rope and AdamW) whenever something was unclear in the lecture.

I did use Claude Opus when asking for best pratices in logging and debugging my config generation code and once I finished the full training loop to search for non-obvious bugs (yes, they existed).

Features

Current features:

Needs refinement:

Logging / Tensorboard

Not implemented yet:

Sharding (well, 'applying' it, not implementing this really from scratch)
(Gated) Linear attention (=> Mamba2)
Mixture of Experts
KV Cache for inference
Fine-tuning training
Train tasks via Reinforcment-Learning (although some RL algorithms have been implemented in my other repos, I might copy them over, lets see)

Might land in another repo:

Diffusion Transformer

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
debug		debug
templates		templates
.gitignore		.gitignore
Readme.md		Readme.md
__init__.py		__init__.py
activation_fcn.py		activation_fcn.py
attention.py		attention.py
checkpoint.py		checkpoint.py
config.py		config.py
dataloader.py		dataloader.py
decode.py		decode.py
embedding.py		embedding.py
generate_configs.py		generate_configs.py
linear.py		linear.py
loss.py		loss.py
optimizer.py		optimizer.py
profiler.py		profiler.py
rms.py		rms.py
run_all_experiments.sh		run_all_experiments.sh
run_decoder.sh		run_decoder.sh
run_tokenizer.sh		run_tokenizer.sh
tokenizer.py		tokenizer.py
train.py		train.py
transformer.py		transformer.py
utils.py		utils.py
vocab.pickle		vocab.pickle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

General

Transparency: AI Usage in this repository

Features

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

General

Transparency: AI Usage in this repository

Features

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages