LLama-from-scratch

An LLM from Scratch in Pure C++/CUDA

Note: This project is currently a work in progress.

Welcome to the LLama-from-scratch project! Our goal is to build a large language model (LLM) entirely from scratch using C++ and CUDA, leveraging the power of parallel computing for efficient training and inference.

Project Overview

This project aims to implement a full-fledged LLM by following these key steps:

Tensor Operations
CUDA Parallelization
Backpropagation for Tensor Class
Enhanced Parallelization
SentencePiece Tokenizer
Implementing Embeddings
Feed-Forward Networks (FFNs)
Flash Attention Mechanism
Rope Scaling and Other Peripheral Functions
Building Encoders
Integration and Cohesion
Training and Inference
Instruction Fine-Tuning

1. Tensor Operations

Objective: Develop a robust Tensor class to handle multidimensional arrays and basic tensor operations such as addition, subtraction, and multiplication.
Implementation:
- Define tensor data structures and initialize tensors with various data types.
- Implement tensor operations with type safety and memory management.

2. CUDA Parallelization

Objective: Leverage CUDA to parallelize tensor operations for performance improvements.
Implementation:
- Identify computationally intensive operations within the Tensor class.
- Offload these operations to the GPU using CUDA kernels.

3. Backpropagation for Tensor Class

Objective: Implement backpropagation to support training of neural networks.
Implementation:
- Extend the Tensor class to store gradients and support gradient computation.
- Implement backward operations for each tensor operation.

5. SentencePiece Tokenizer

Objective: Implement the SentencePiece tokenizer for efficient text processing.
Implementation:
- Integrate the SentencePiece library to tokenize and detokenize input text.
- Ensure compatibility with the Tensor class for processing tokenized data.

6. Implementing Embeddings

Objective: Develop embedding layers to convert tokens into dense vectors.
Implementation:
- Implement word, positional, and segment embeddings.
- Optimize embedding lookup operations using CUDA.

7. Feed-Forward Networks (FFNs)

Objective: Build FFNs as core components of the neural network.
Implementation:
- Develop fully connected layers with activation functions.
- Optimize forward and backward passes using parallelization.

8. Flash Attention Mechanism

Objective: Implement an efficient attention mechanism using Flash Attention.
Implementation:
- Design attention layers with scaled dot-product attention.
- Optimize memory access patterns and computation using CUDA.

9. Rope Scaling and Other Peripheral Functions

Objective: Implement additional features and scaling techniques for model robustness.
Implementation:
- Incorporate rotary position encodings (RoPE) for better sequence modeling.
- Develop auxiliary functions and utilities to support training and inference.

11. Integration and Cohesion

Objective: Integrate all components to form a cohesive LLM framework.
Implementation:
- Ensure seamless data flow between components.
- Validate the integrated model through rigorous testing.

12. Training and Inference

Objective: Train the LLM and perform efficient inference.
Implementation:
- Develop training loops with backpropagation and optimization algorithms.
- Implement inference mechanisms for real-time text generation.

13. Instruction Fine-Tuning

Objective: Fine-tune the trained LLM for specific instructions and tasks.
Implementation:
- Use supervised fine-tuning techniques with task-specific datasets.
- Optimize the model for low-latency inference and high accuracy.

Contributions

Contributions are welcome, as I prolly don't even know what I am doing lmao. Even this readme was generated by chatgpt to give the readers a crude idea of what I am trying to accomplish. So, if you've got anything, just open a PR and I'll most prolly merge it.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
data		data
external		external
include		include
src		src
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
bpe_tokenizer		bpe_tokenizer
bpe_tokenizer.model		bpe_tokenizer.model
bpe_tokenizer.vocab		bpe_tokenizer.vocab
bpe_trainer.cpp		bpe_trainer.cpp
data_downloader.py		data_downloader.py
libkernels.a		libkernels.a
llamascratch		llamascratch
pybind_module.cpython-312-x86_64-linux-gnu.so		pybind_module.cpython-312-x86_64-linux-gnu.so
tensor_module.so		tensor_module.so
test_entry_point		test_entry_point

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLama-from-scratch

An LLM from Scratch in Pure C++/CUDA

Project Overview

1. Tensor Operations

2. CUDA Parallelization

3. Backpropagation for Tensor Class

5. SentencePiece Tokenizer

6. Implementing Embeddings

7. Feed-Forward Networks (FFNs)

8. Flash Attention Mechanism

9. Rope Scaling and Other Peripheral Functions

11. Integration and Cohesion

12. Training and Inference

13. Instruction Fine-Tuning

Contributions

License

About

Releases

Packages

Languages

License

Codedestructor56/LLama-from-scratch

Folders and files

Latest commit

History

Repository files navigation

LLama-from-scratch

An LLM from Scratch in Pure C++/CUDA

Project Overview

1. Tensor Operations

2. CUDA Parallelization

3. Backpropagation for Tensor Class

5. SentencePiece Tokenizer

6. Implementing Embeddings

7. Feed-Forward Networks (FFNs)

8. Flash Attention Mechanism

9. Rope Scaling and Other Peripheral Functions

11. Integration and Cohesion

12. Training and Inference

13. Instruction Fine-Tuning

Contributions

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages