סמינר מחלקתי של אבי גואילי- אחיזה רובוטית מכוונת אפס-שוט באמצעות מודלים רב-מודאליים גדולים
ORACLE-Grasp: Zero-Shot Task-Oriented Robotic Grasping using Large Multimodal Models
Monday May 26th 2025 at 15:00
Wolfson Building of Mechanical Engineering, Room 206
Abstract:
Grasping unknown objects in unstructured envi- ronments remains a fundamental challenge in robotics,
requir- ing both semantic understanding and spatial reasoning. Existing methods often rely on dense training datasets or explicit geo- metric modeling, limiting their scalability to real-world tasks. Recent advances in Large Multimodal Models (LMMs) offer new possibilities for integrating vision and language under- standing, but their application to autonomous robotic grasping remains largely unexplored. We present ORACLE-Grasp, a zero-shot framework that leverages LMMs as semantic oracles to guide grasp selection without requiring additional training or human input. The system formulates grasp prediction as a structured, iterative decision process, using dual-prompt tool calling to first extract high-level object context and then select task-relevant grasp regions. By discretizing the image space and reasoning over candidate areas, ORACLE-Grasp mitigates the spatial imprecision common in LMMs and produces human- like, task-driven grasp suggestions. Early stopping and depth- based refinement steps further enhance efficiency and physical grasp reliability. Experiments demonstrate that the predicted grasps achieve low positional and orientation errors relative to human-annotated ground truth and lead to high success rates in real-world pick-up tasks. These results highlight the potential of combining language-driven reasoning with lightweight vision techniques to enable robust, autonomous grasping without task- specific datasets or retraining.
Bio:
Avi holds a B.Sc. in Mechanical Engineering and is currently completing his M.Sc. in Mechanical Engineering at Tel Aviv University, under the supervision of Prof. Avishai Sintov. His academic work focuses on the intersection of robotics, artificial intelligence, computer vision and genAI. Alongside his studies, Avi works as a Computer Vision Engineer at Qapture-Zimark, where he applies advanced visual perception techniques to real-world industrial challenges. He is passionate about developing intelligent systems that bridge the gap between visual understanding and physical interaction in robotic platforms.