Phi-3-vision is the first multimodal model in the Phi-3 family. It combines text and image capabilities, allowing it to reason about real-world images and extract and understand text from images. The model has been optimized specifically for understanding charts and diagrams. It can generate insights and answer questions related to charts and diagrams. Phi-3-vision builds on the language model Phi-3-mini but adds image understanding capabilities while still being a relatively small model size.
-
Notifications
You must be signed in to change notification settings - Fork 0
shrimantasatpati/Microsoft-Phi-3-Vision
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Microsoft Phi-3 Vision-the first Multimodal model By Microsoft- Demo With Huggingface
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published