Large Language Models from Scratch

This repository contains implementations of various large language models (LLMs) from scratch, including:

Vanilla Transformer: An implementation of the Transformer architecture introduced in the paper "Attention is All You Need".
LLaMA-2: A re-implementation of Meta's LLaMA-2 model.
BERT: A from-scratch implementation of Google's Bidirectional Encoder Representations from Transformers (BERT) model.
Mistral: An implementation of the Mistral language model.
Mixture of Experts: A model that combines multiple expert sub-models to achieve better performance.
Gemma: A re-implementation of Google's Gemma model.

Getting Started

To get started with this repository, follow these steps:

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Encoder-Decoder-Transformer		Encoder-Decoder-Transformer
Llama-2		Llama-2
Mistral-7B		Mistral-7B
Mixture_Of_Experts		Mixture_Of_Experts
Readme.MD		Readme.MD