Lihi Gur Arie, PhD

AI

Chat with Your Images using Multimodal LLMs

Learn how to build Llama 3.2-Vision locally in a chat-like mode, and explore its Multimodal skills on a Colab notebook Lihi Gur Arie, PhD · Follow Published in Towards Data Science · 7 min read · 6 hours ago — Annotated image by author. Original image by Pixabay. Introduction The integration of vision capabilities with Large Language Models (LLMs) is revolutionizing the computer vision field through multimodal LLMs (MLLM). These models combine text and visual

Read More »
AI

Florence-2: Mastering Multiple Vision Tasks with a Single VLM Model

A Guided Exploration of Florence-2’s Zero-Shot Capabilities: Captioning, Object Detection, Segmentation and OCR. Lihi Gur Arie, PhD · Follow Published in Towards Data Science · 7 min read · 8 hours ago — Image annotations by Author. Original image from Pexels. Introduction In recent years, the field of computer vision has witnessed the rise of foundation models that enable image annotation without the need for training custom models. We’ve seen models like CLIP [2] for

Read More »
AI

Mastering Object Counting in Videos

Step-by-step guide to counting strolling ants on a tree using detection and tracking techniques. Lihi Gur Arie, PhD · Follow Published in Towards Data Science · 7 min read · 9 hours ago — Ants counting in a video. In and Out counts appear on the upper left corner. Each ant is assigned a unique ID and color. Labels by Author, Original video by Lui Lo Franco at Pexels Introduction Counting objects in videos is

Read More »