Abstract: This study introduces a visual scene understanding (VSU) pipeline that fuses scene graph generation (SGG) with task planning for agricultural robots. Mask R-CNN detects fruits, leaves, and ...