Abstract: Multimodal Question Answering (MMQA) has emerged as a challenging frontier at the intersection of natural language processing (NLP) and computer vision, demanding the integration of diverse ...