Microsoft Phi-3 Vision-the first Multimodal model By Microsoft- Demo With Huggingface

Phi-3-vision is the first multimodal model in the Phi-3 family. It combines text and image capabilities, allowing it to reason about real-world images and extract and understand text from images. The model has been optimized specifically for understanding charts and diagrams. It can generate insights and answer questions related to charts and diagrams. Phi-3-vision builds on the language model Phi-3-mini but adds image understanding capabilities while still being a relatively small model size.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Phi_3_vision_128k_instruct.ipynb		Phi_3_vision_128k_instruct.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Microsoft Phi-3 Vision-the first Multimodal model By Microsoft- Demo With Huggingface

About

Releases

Packages

Languages

shrimantasatpati/Microsoft-Phi-3-Vision

Folders and files

Latest commit

History

Repository files navigation

Microsoft Phi-3 Vision-the first Multimodal model By Microsoft- Demo With Huggingface

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages