Skip to content

Microsoft Phi-3 Vision-the first Multimodal model By Microsoft- Demo With Huggingface

Notifications You must be signed in to change notification settings

shrimantasatpati/Microsoft-Phi-3-Vision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Microsoft Phi-3 Vision-the first Multimodal model By Microsoft- Demo With Huggingface

Phi-3-vision is the first multimodal model in the Phi-3 family. It combines text and image capabilities, allowing it to reason about real-world images and extract and understand text from images. The model has been optimized specifically for understanding charts and diagrams. It can generate insights and answer questions related to charts and diagrams. Phi-3-vision builds on the language model Phi-3-mini but adds image understanding capabilities while still being a relatively small model size.

About

Microsoft Phi-3 Vision-the first Multimodal model By Microsoft- Demo With Huggingface

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published