Microsoft Florence-2 Vision Model

This project demonstrates the use of Microsoft's Florence-2 Vision Model for various computer vision tasks using a Gradio interface.

Overview

Florence-2 is a novel vision foundation model designed to handle a variety of computer vision and vision-language tasks with simple text prompts. This project allows you to upload an image and select a task to perform, such as object detection, OCR, captioning, and more.

How to Run

Install the required dependencies: Ensure you have Python installed, then install the required packages using the following command:
```
pip install -r requirements.txt
```
Run the script: Execute the Python script to start the Gradio interface:
```
python ms-florence-2-gradio-v2.py
```
Use the Gradio Interface:
- Open the provided local URL in your web browser.
- Upload an image using the "Upload Image" button.
- Select a task from the "Select Task" dropdown menu.
- Optionally, provide additional text input if required by the selected task.
- Click the "Start" button to process the image and view the results.

Available Tasks

Object Detection
OCR
OCR with Region
Caption
Detailed Caption
More Detailed Caption
Caption to Phrase Grounding
Dense Region Caption
Region Proposal
Expression Segmentation
Open Vocabulary Detection

Example

Upload an image.
Select "Object Detection" from the task dropdown.
Click "Start" to see the detected objects highlighted in the image.

Additional Information

For more details on Florence-2, you can refer to the official publication by Microsoft: Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks.

Requirements

gradio==4.36.1
pillow==10.3.0
transformers==4.41.2
torch==2.3.1
einops==0.8.0
timm==1.0.7

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
.gitignore		.gitignore
README.md		README.md
ms-florence-2-gradio-v2.py		ms-florence-2-gradio-v2.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Microsoft Florence-2 Vision Model

Overview

How to Run

Available Tasks

Example

Additional Information

Requirements

Example of a detailed caption

Example of a segmentation 'a cup of coffee'

Example of detailed object detection

About

Releases

Packages

Languages

skye0402/florence-2-large-playground

Folders and files

Latest commit

History

Repository files navigation

Microsoft Florence-2 Vision Model

Overview

How to Run

Available Tasks

Example

Additional Information

Requirements

Example of a detailed caption

Example of a segmentation 'a cup of coffee'

Example of detailed object detection

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages