Codingscape Vision RAG Image Generator

This application uses real life imagery to generate images that match the user's description. More information about the project can be found on the Codingscape Blog

How it works

Find places using Google's Place API
Send coordinates from step 1 to Google Street view and get images from 0, 90, 180, and 270 degrees for up to 4 of the places
Send those images to llava to generate an overall description of the images
Use condensed description as a prompt to stable diffusion

Model versions

llava-hf/llava-v1.6-mistral-7b-hf
stabilityai/stable-diffusion-3-medium-diffusers

Running

Requirements

The models are hosted at Hugging Face so you will need to get an API key from them. You will also need a Google Maps API key.

You will need approximately 32G of hard drive space to store the models.

You must have the nVidia CUDA SDK installed and the ability to run nvcc

Deployment

SSH is the only supported way of running this on anything that's not local. scp main.py and requirements.txt to your chosen environment.

Application

Pytorch must be installed separately due to varying hardware differences. Once that is installed, run pip install -r requirements.txt. If you are unable to install flash_attn you can remove the requirement and use the --no-flash-attn CLI arg like below.

Right now, it's a standalone script. There are two ways to pass your Google and Hugging Face keys. Either CLI args or ENV vars. CLI args are --google and --hf. The ENV vars are GOOGLE_API_KEY and HF_API_KEY. You must use one of the two. Using the --prompt CLI arg run python main.py.

If you want to generate multiple images you can use the --num CLI arg or the NUM_IMAGES ENV var.

If you don't want to use attn_implementation with the value of "flash_attention_2", use the --no-flash-attn CLI arg.

You will need some pretty beefy hardware to run this on unless you have access to RunPod or GPU instances in AWS.

Samples

luxury goods store in Beverly Hills on Rodeo Drive

cafe in paris france

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
samples		samples
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Codingscape Vision RAG Image Generator

How it works

Model versions

Running

Requirements

Deployment

Application

Samples

About

Releases

Packages

Contributors 2

Languages

codingscape/vision-rag

Folders and files

Latest commit

History

Repository files navigation

Codingscape Vision RAG Image Generator

How it works

Model versions

Running

Requirements

Deployment

Application

Samples

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages