Document Object Model Tutorial Video

Visual Evidence-aware for Object Hallucinations Rectification in LLM-based Video Captioning

Abstract: Recent neural models for video captioning are typically built using a framework that combines a pre-trained visual encoder with a large language model(LLM) decoder. However, large language ...

IEEE

Zero6DOT: Zero-Shot 6D Object Pose Tracking With Monocular RGB Video

Abstract: 6D object tracking plays an important role in various applications, including robotic manipulation and virtual reality. While current methodologies have achieved significant advancements ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Visual Evidence-aware for Object Hallucinations Rectification in LLM-based Video Captioning

Zero6DOT: Zero-Shot 6D Object Pose Tracking With Monocular RGB Video

Trending now