2nd Evaluation for Multimodal Generation Workshop

Multimodal generation and retrieval systems are increasingly central to modern information retrieval, powering retrieval-augmented generation (RAG), multimodal search, recommendation, and knowledge-intensive applications. Despite rapid progress in multimodal large language models (MLLMs), robust and principled evaluation of multimodal generation and retrieval remains a major open challenge for the IR community. This workshop aims to foster discussions and research efforts by bringing together researchers and practitioners in information retrieval, natural language processing, computer vision, and multimodal AI. Our goal is to establish evaluation methods for multimodal research and advance research efforts in this direction.

Call for Papers

Both long paper and short papers (up to 9 pages and 4 pages respectively with unlimited references and appendices) are welcomed for submission.

A list of topics relevant to this workshop (but not limited to):

Multimodal retrieval for RAG, Agentic AI, recommendation systems
Evaluation of retrieved cross-modal samples, without relying on augmented generation
Multi-aspect evaluation methods capturing inter- and intra-modal coherence, relevance, grounding, and contextual consistency
Benchmark retrieval datasets, evaluation protocols and annotations for text–image–audio–video–3D generation
Automatic and human-centric metrics for informativeness, factuality, fluency, faithfulness, calibration, and usability for multimodal generation
Methodology for detecting, analysing, and mitigating multimodal bias, stereotypes, toxicity, and hallucinations
Evaluation in multimodal low-resource and multilingual settings, including culturally aware and cross-lingual metrics
Agent-based evaluation of multimodal generation in multi-turn, tool-use, or iterative editing scenarios
Game-theoretic or optimization-based formulations of evaluation objectives and protocols
Evaluation of the generation quality of synthetic multimodal data, provenance/attribution, and downstream impact on training and deployment
Ethical considerations in the evaluation of multimodal text generation, including bias detection and mitigation strategies
Evaluation of Security and Privacy Dimensions in Multimodal Applications

Important Dates

Mar 25, 2026: Submission Open
Apr 21, 2026: Workshop Paper Submission
May 21, 2026: Workshop Paper Notification
July 24, 2026: Workshop Day

Note: All deadlines are 11:59PM UTC-12:00 (“Anywhere on Earth”)

Submission Instructions

You are invited to submit your papers in our OpenReview portal. Papers are required to strictly follow the SIGIR submission guidelines. We invite both long papers (9 pages) and short papers (4 pages) submissions. All the submitted papers have to be anonymous for double-blind review. All accepted papers must be presented in person at the workshop.

Invited Speakers

Mark Sanderson

Talk Title: TBD

Mark Sanderson is the Dean for Research and Innovation in the STEM College of RMIT University. He is also a Professor of Information Retrieval (IR). His work in information retrieval, data analysis, and recommender systems has attracted over 15,000 Google Scholar citations.

Organisers

Wei Emma Zhang, Adelaide University
Xiang Dai, CSIRO
Sarvnaz Karimi, CSIRO
Desmond Elliot, University of Copenhagen
Byron Fang, Oracle
Mong Yuan Sim, Adelaide University & CSIRO

Previous Edition

EvalMG25 @ COLING 2025

2nd Evaluation for Multimodal Generation Workshop

EvalMG26 @ SIGIR 2026

Special Theme: Evaluation of Multimodal Generation and Retrieval Systems