[NeurIPS submitted] DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation
Abstract: Robot manipulation succeeds only when perception preserves the aspects of a scene that matter for action. Yet most robot learning pipelines still rely on visual encoders pre-trained for static recognition or vision-language alignment, leaving motion understanding to downstream policies....