№ 02 / SUMMARIES

#computer-vision

Every summary, chronological. Filter by category, tag, or source from the rail.

Tag · #computer-vision
DAY 01Today JUN 30 · 20261 SUMMARIES
arXiv cs.AIAI & LLMs

COMPASS: Improving Compositional Control in Multimodal Models

COMPASS introduces a unified framework that uses a shared 'expert token' to bridge composition perception and generation, enabling precise layout control in multimodal models.

arXiv cs.AI
DAY 02Wednesday JUN 24 · 20261 SUMMARIES
arXiv cs.AIAI & LLMs

OmniPath: Automating Wheelchair Accessibility Audits with AI

OmniPath improves accessibility mapping by fusing OpenStreetMap data with high-density LiDAR to identify physical barriers like slope and surface discontinuities that standard maps ignore.

arXiv cs.AI
DAY 03June 20, 2026 JUN 20 · 20261 SUMMARIES
MarkTechPostAI & LLMs

SpatialClaw: Using Code as an Action Interface for Spatial Reasoning

SpatialClaw is a training-free agent framework that improves spatial reasoning in VLMs by treating Python code—rather than structured tool calls—as the primary interface for perception and geometric tasks.

MarkTechPost
DAY 04June 17, 2026 JUN 17 · 20261 SUMMARIES
MarkTechPostAI & LLMs

Qwen-RobotSuite: Three Foundation Models for Embodied AI

The Qwen team has released a suite of three specialized foundation models—RobotManip, RobotWorld, and RobotNav—designed to address data fragmentation in robotics through unified action representations, language-conditioned world modeling, and scalable navigation interfaces.

MarkTechPost
DAY 05May 27, 2026 MAY 27 · 20261 SUMMARIES
Google Cloud TechAI Automation

Edge-Based Computer Vision for Industrial Food Waste Reduction

Mill uses custom-tuned Gemma models on Nvidia Jetson hardware to process high-frame-rate video at the edge, turning food waste data into actionable procurement insights for commercial kitchens.

Google Cloud Tech
DAY 06May 21, 2026 MAY 21 · 20261 SUMMARIES
MarkTechPostAI & LLMs

ByteDance's Lance: A Unified 3B Model for Vision and Video

Lance is an open-source, 3B parameter unified model that natively integrates image and video understanding, generation, and editing within a single jointly trained framework.

MarkTechPost

Showing 6 of 6