RecSys Challenge 2026

About top

The RecSys 2026 Challenge will be organized by Seungheon Doh (Korea Advanced Institute of Science and Technology, South Korea), Sergio Oramas (Pandora/SiriusXM), Bruno Sguerra (Deezer Research), Abhinav Bohra (Amazon), Claudio Pomo (Politecnico di Bari, Italy), and Francesco Barile (Maastricht University, Netherlands).

The RecSys Challenge 2026: Music-CRS focuses on the evolving landscape of music discovery, where static recommendation lists are being replaced by dynamic, conversational interactions. As users increasingly interact with AI through natural language, there is a critical need for systems that can seamlessly integrate Natural Language Understanding (NLU) with high-precision Recommender Systems (RecSys). This challenge aims to push the boundaries of how AI understands nuanced user preferences, explores musical tastes through dialogue, and provides contextually relevant track recommendations.

By utilizing the TalkPlayData-Challenge dataset, a large-scale conversation resource generated through an advanced agentic pipeline, we invite the global research community to tackle the complexities of multi-turn preference elicitation. As a research community-driven initiative, the challenge dataset features LLM-generated multi-turn dialogues paired with music metadata and user-item interaction data derived from publicly available research datasets. No SiriusXM or Deezer user data is included in the dataset. This challenge serves as a bridge between the NLP and RecSys communities, fostering next-generation innovation in interactive and personalized music information retrieval.


Challenge Task top

The primary goal is to develop a Conversational Music Recommendation system that acts as an intelligent agent capable of navigating user tastes through dialogue.

Main Task: Conversational Music Recommendation. The system must understand user music preferences from previous conversation turns and user profiles to recommend relevant tracks from a catalog while generating natural, helpful responses.


Evaluation top

The challenge employs a multi-dimensional evaluation framework to assess both what a system recommends and how it communicates those recommendations. The final score combines multiple dimensions, and each individual dimension contributes to the overall ranking. Recommendation quality serves as the primary anchor of the evaluation, while diversity and response-quality dimensions capture complementary aspects of system performance.

Development vs. blind evaluation. The public evaluator supports transparent development-set evaluation. Blind A serves as an interim leaderboard for the challenge, while Blind B is the final evaluation split. Blind Dataset B will be released one week before the end of the challenge, and the final leaderboard will be computed on Blind Dataset B.

Dimension What it measures How it is computed Role in evaluation
nDCG@20 Ranking quality of the recommended tracks. Computed from the ranked list of predicted tracks against the ground-truth relevant item. Higher-ranked correct recommendations receive more credit. Primary recommendation metric.
Catalog Diversity How broadly a system covers the music catalog. Number of unique recommended tracks across all predictions divided by the total catalog size. Complementary diversity indicator.
Lexical Diversity How varied the generated language is. Measured with Distinct-2, i.e., unique bigrams divided by total bigrams across generated responses. Complementary response-generation indicator.
LLM-as-a-Judge Quality of the generated explanation. Blind-set responses are judged by a Gemini model used as an automatic judge. The judge evaluates two text-only dimensions: Personalization and Explanation Quality. These dimensions evaluate the written response independently from recommendation accuracy. To preserve the integrity of the blind evaluation, we disclose the judge family but do not publish the evaluation prompt. Blind-set response-quality evaluation.

Aggregation policy. For the public evaluator, retrieval metrics are computed for each session-turn prediction, averaged within each turn, and then macro-averaged across turns. Diversity metrics are computed globally over the full prediction file.

To ensure transparency and fairness, all dimensions used in the evaluation are disclosed publicly. While we do not publish the exact aggregation formula on the website, participants should note that every listed dimension contributes to the final score. For the response-quality evaluation, we disclose that the automatic judge is based on Gemini, but we do not publish the evaluation prompt.

The public evaluator supports transparent development-set evaluation, while blind-set scoring is performed on the official leaderboard infrastructure.


Dataset: TalkPlayData-Challenge top

The challenge utilizes TalkPlayData-Challenge, a synthetic conversational music recommendation dataset featuring multi-turn dialogues and music listening history. The dataset is split into four subsets: Train, Development, Blind A, and Blind B.

The leaderboard evaluation proceeds in two blind stages. Blind A supports the interim leaderboard during the main phase of the challenge. Blind B is the final hidden evaluation set, released one week before the end of the challenge, and the final leaderboard is based on Blind B.

Dataset Components

For further details, please refer to the dedicated website .


Useful Resources top

The following resources are provided to support participation in the challenge.

Conversation Datasets top

Shared Recommendation Resources top

The following resources are shared across Train, Development, Blind A, and Blind B.

Code and Participation top

Prize top



Participation top

Registration & Data Access

Registration details are available on the linked website.



Timeline top

When? What?
4 April, 2026 Website published.
10 April, 2026 Dataset release (including Blind Dataset A).
15 April, 2026 Submission system and leaderboard go live (Blind Dataset A).
23 June, 2026 Blind Dataset B release. Final evaluation phase opens.
30 June, 2026 End of the challenge. Participants must submit final predictions for Blind Dataset B.
6 July, 2026 Final leaderboard released based on Blind Dataset B.
6 July, 2026 Winners announcement. EasyChair opens for paper submissions.
12 July, 2026 Paper submission deadline.
September 2026 RecSys Challenge Workshop at ACM RecSys 2026.

Paper Submission Guidelines top

Submission website: TBA

Organization top

Organizing Committee top