
September 10th
Start Date
October 20th
Submission Deadline
October 25th
Winner Announcement
November 15th
Presentation at ICAIF '25
Location
Singapore
Start: September 10th, 2025
Close: October 20th, 2025
The AI Agentic Retrieval Grand Challenge at ICAIF ’25 is motivated by the need for financial AI systems that can move beyond what traditional methods can achieve in accuracy to deliver interpretable, evidence-grounded answers from complex SEC filings. Participants are tasked with building retrieval systems that can support complex institutional finance questions with character-level evidence attribution.
Unlike traditional retrieval tasks, this competition is built on FinAgentBench,
a benchmark purpose-built to evaluate multi-step, agentic retrieval. Each query requires reasoning across two stages: first, selecting the most relevant document type (Document-Level Ranking), and second, identifying the most relevant passages within that document (Chunk-Level Ranking). The dataset contains over 2,400 examples curated by expert financial analysts from real SEC filings (2023–2024).
Participants must return ranked results for both document-level and chunk-level ranking tasks. Participants who impress with their solutions (and explain how they achieved them) will be invited as a presenter to share their work at ICAIF ’25.
In the AI Agentic Retrieval Grand Challenge, participants are tasked with developing systems capable of retrieving the most relevant evidence from raw SEC filings to answer institutional finance questions. The task is structured as a two-stage agentic retrieval problem: Document-Level Ranking and Chunk-Level Ranking. This two-step design prioritizes accuracy and qualities essential for deploying AI systems in investment research. Participants’ systems will be evaluated on their capacity to reason across both document-type ranking and chunk-level retrieval tasks. All submissions must be made via Databricks Free Edition workspaces as the standardized evaluation interface; optional integration with the Upstage API (Solar LLM) is allowed.
Document-Level Ranking – Given a financial question, systems must rank candidate document types (10-K, 10-Q, 8-K, DEF-14A, earnings transcripts) by their likelihood of containing the adequate information for answering the question.
Chunk-Level Ranking – Given a single document, systems must identify and rank the most relevant passages within the provided document, returning a ranked list of candidate chunks that contain the information needed for answering the question.
The challenge consists of two sequential retrieval tasks. In the document-type ranking task, participants must rank all candidate document types based on their relevance to the input question. In the chunk-level ranking task, given the selected document, participants must rank the top 5 most relevant paragraph-level passages that contain the information necessary for answering the question.
Both tasks will be evaluated using standard ranking metrics: MRR@5 (Mean Reciprocal Rank), MAP@5 (Mean Average Precision), and nDCG@5 (Normalized Discounted Cumulative Gain).

MIT
United States

Prof. Alejandro Lopez Lira
University of Florida
United States

UT Austin & XTX Markets
United States

Google
United States

BlackRock
United States


LinqAlpha
United States

AllianceBernstein
United States

Fidelity Investments
United States

UNIST
South Korea



J.P. Morgan Chase
United States

Universiti Malaya
Malaysia

WeBank
China

Qube Research & Technologies, UK
Sponsored by


Contact
For further details and guidelines, please contact the workshop organizers via kaggle website