Abstract: Incorporating multimodal features and heterogeneous common sense knowledge in scene representation and visual reasoning techniques is essential for accurate and intuitive Visual Question ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results