Florence-2: Mastering Multiple Vision Tasks with a Single VLM Model
A Guided Exploration of Florence-2’s Zero-Shot Capabilities: Captioning, Object Detection, Segmentation and OCR. Lihi Gur Arie, PhD · Follow Published in Towards Data Science · 7 min read · 8 hours ago — Image annotations by Author. Original image from Pexels. Introduction In recent years, the field of computer vision has witnessed the rise of foundation models that enable image annotation without the need for training custom models. We’ve seen models like CLIP [2] for