This repository contains implementations of various large language models (LLMs) from scratch, including:
- Vanilla Transformer: An implementation of the Transformer architecture introduced in the paper "Attention is All You Need".
- LLaMA-2: A re-implementation of Meta's LLaMA-2 model.
- BERT: A from-scratch implementation of Google's Bidirectional Encoder Representations from Transformers (BERT) model.
- Mistral: An implementation of the Mistral language model.
- Mixture of Experts: A model that combines multiple expert sub-models to achieve better performance.
- Gemma: A re-implementation of Google's Gemma model.
To get started with this repository, follow these steps: