AI for Finance

Agentic Retrieval for Financial Documents: AI Grand Challenge on FinAgentBench

Agentic Retrieval for Financial Documents:
AI Grand Challenge on FinAgentBench

Agentic Retrieval for Financial Documents: AI Grand Challenge on FinAgentBench

Co-hosted with the AI for Finance Symposium '25

Competition at ICAIF ‘25

Competition at

ICAIF ‘25

September 10th

Start Date

October 20th

Submission Deadline

October 25th

Winner Announcement

November 15th

Presentation at ICAIF '25

Location

Singapore

To participate in the challenge, please join via Kaggle.

Join on Kaggle

Program Schedule

Introduction to Challenge

Introduction

to Challenge

(tentative) 1:00 PM - 1:15 PM (15 minutes)

(tentative)
1:00 PM - 1:15 PM (15 minutes)

Winning Presentation

Winning

Presentation

(tentative) 1:15 PM - 1:30 PM (15 minutes)

(tentative)
1:15 PM - 1:30 PM (15 minutes)

The goal of the competition is
to advance financial AI systems that can perform multi-step, evidence-grounded retrieval from SEC filings to support accurate and interpretable investment research.

Overview

Start: September 10th, 2025

Close: October 20th, 2025

The AI Agentic Retrieval Grand Challenge at ICAIF ’25 is motivated by the need for financial AI systems that can move beyond what traditional methods can achieve in accuracy to deliver interpretable, evidence-grounded answers from complex SEC filings. Participants are tasked with building retrieval systems that can support complex institutional finance questions with character-level evidence attribution.

Unlike traditional retrieval tasks, this competition is built on FinAgentBench,
a benchmark purpose-built to evaluate multi-step, agentic retrieval. Each query requires reasoning across two stages: first, selecting the most relevant document type (Document-Level Ranking), and second, identifying the most relevant passages within that document (Chunk-Level Ranking). The dataset contains over 2,400 examples curated by expert financial analysts from real SEC filings (2023–2024).

Participants must return ranked results for both document-level and chunk-level ranking tasks. Participants who impress with their solutions (and explain how they achieved them) will be invited as a presenter to share their work at ICAIF ’25.

Description

In the AI Agentic Retrieval Grand Challenge, participants are tasked with developing systems capable of retrieving the most relevant evidence from raw SEC filings to answer institutional finance questions. The task is structured as a two-stage agentic retrieval problem: Document-Level Ranking and Chunk-Level Ranking. This two-step design prioritizes accuracy and qualities essential for deploying AI systems in investment research. Participants’ systems will be evaluated on their capacity to reason across both document-type ranking and chunk-level retrieval tasks. All submissions must be made via Databricks Free Edition workspaces as the standardized evaluation interface; optional integration with the Upstage API (Solar LLM) is allowed.

Task Overview

For each Document-Level Ranking and Chunk-Level Ranking tasks, you are required rank

Document-Level Ranking – Given a financial question, systems must rank candidate document types (10-K, 10-Q, 8-K, DEF-14A, earnings transcripts) by their likelihood of containing the adequate information for answering the question.  
Chunk-Level Ranking – Given a single document, systems must identify and rank the most relevant passages within the provided document, returning a ranked list of candidate chunks that contain the information needed for answering the question.

Task Evaluation

The challenge consists of two sequential retrieval tasks. In the document-type ranking task, participants must rank all candidate document types based on their relevance to the input question. In the chunk-level ranking task, given the selected document, participants must rank the top 5 most relevant paragraph-level passages that contain the information necessary for answering the question.

Both tasks will be evaluated using standard ranking metrics: MRR@5 (Mean Reciprocal Rank), MAP@5 (Mean Average Precision), and nDCG@5 (Normalized Discounted Cumulative Gain).

Organizing Committee