PaliGemma: Custom Vision-Language Model Implementation

Overview

A custom implementation of PaliGemma, a multimodal vision-language model combining SigLIP vision encoder with Gemma language model.

Features

Custom multimodal transformer architecture
SigLIP-based vision encoder
Gemma language model integration
Custom inference pipeline
Image and text preprocessing

Project Structure

inference.py: Token generation and model inference
model_siglip.py: Vision encoder implementation
modeling_gamma.py: Core model architecture
processing_paligamma.py: Image/text preprocessing
utils.py: Model loading utilities

Prerequisites

Python 3.8+
PyTorch
CUDA-capable GPU recommended

Installation

Clone repository
Install dependencies:
```
pip install -r requirements.txt
```

Inference

Prepare Weights

Download weights from HuggingFace PaliGemma repository
Place in ./weights/ directory

Running Inference

chmod +x launch_inference.sh
./launch_inference.sh

Key Implementation Details

Multimodal cross-attention mechanism
Custom token generation with top-p sampling
Flexible image preprocessing
Support for different device types (CUDA, MPS, CPU)

Limitations

Single image per inference
Experimental implementation
Performance may vary from official implementation

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PaliGemma: Custom Vision-Language Model Implementation

Overview

Features

Project Structure

Prerequisites

Installation

Inference

Prepare Weights

Running Inference

Key Implementation Details

Limitations

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
paligamma		paligamma
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
launch_inference.sh		launch_inference.sh
model_siglip.py		model_siglip.py
modeling_gamma.py		modeling_gamma.py
processing_paligamma.py		processing_paligamma.py
utils.py		utils.py

License

ved1beta/Paligemma

Folders and files

Latest commit

History

Repository files navigation

PaliGemma: Custom Vision-Language Model Implementation

Overview

Features

Project Structure

Prerequisites

Installation

Inference

Prepare Weights

Running Inference

Key Implementation Details

Limitations

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages