The RecSys 2026 Challenge will be organized by Seungheon Doh (Korea Advanced Institute of Science and Technology, South Korea), Sergio Oramas (Pandora/SiriusXM), Bruno Sguerra (Deezer Research), Abhinav Bohra (Amazon), Claudio Pomo (Politecnico di Bari, Italy), and Francesco Barile (Maastricht University, Netherlands).
The RecSys Challenge 2026: Music-CRS focuses on the evolving landscape of music discovery, where static recommendation lists are being replaced by dynamic, conversational interactions. As users increasingly interact with AI through natural language, there is a critical need for systems that can seamlessly integrate Natural Language Understanding (NLU) with high-precision Recommender Systems (RecSys). This challenge aims to push the boundaries of how AI understands nuanced user preferences, explores musical tastes through dialogue, and provides contextually relevant track recommendations.
By utilizing the TalkPlayData-Challenge dataset, a large-scale conversation resource generated through an advanced agentic pipeline, we invite the global research community to tackle the complexities of multi-turn preference elicitation. As a research community-driven initiative, the challenge dataset features LLM-generated multi-turn dialogues paired with music metadata and user-item interaction data derived from publicly available research datasets. No SiriusXM or Deezer user data is included in the dataset. This challenge serves as a bridge between the NLP and RecSys communities, fostering next-generation innovation in interactive and personalized music information retrieval.
The primary goal is to develop a Conversational Music Recommendation system that acts as an intelligent agent capable of navigating user tastes through dialogue.
Main Task: Conversational Music Recommendation. The system must understand user music preferences from previous conversation turns and user profiles to recommend relevant tracks from a catalog while generating natural, helpful responses.
The challenge employs a multi-dimensional evaluation framework to assess both what a system recommends and how it communicates those recommendations. The final score combines multiple dimensions, and each individual dimension contributes to the overall ranking. Recommendation quality serves as the primary anchor of the evaluation, while diversity and response-quality dimensions capture complementary aspects of system performance.
Development vs. blind evaluation. The public evaluator supports transparent development-set evaluation. Blind A serves as an interim leaderboard for the challenge, while Blind B is the final evaluation split. Blind Dataset B will be released one week before the end of the challenge, and the final leaderboard will be computed on Blind Dataset B.
| Dimension | What it measures | How it is computed | Role in evaluation |
|---|---|---|---|
| nDCG@20 | Ranking quality of the recommended tracks. | Computed from the ranked list of predicted tracks against the ground-truth relevant item. Higher-ranked correct recommendations receive more credit. | Primary recommendation metric. |
| Catalog Diversity | How broadly a system covers the music catalog. | Number of unique recommended tracks across all predictions divided by the total catalog size. | Complementary diversity indicator. |
| Lexical Diversity | How varied the generated language is. | Measured with Distinct-2, i.e., unique bigrams divided by total bigrams across generated responses. | Complementary response-generation indicator. |
| LLM-as-a-Judge | Quality of the generated explanation. | Blind-set responses are judged by a Gemini model used as an automatic judge. The judge evaluates two text-only dimensions: Personalization and Explanation Quality. These dimensions evaluate the written response independently from recommendation accuracy. To preserve the integrity of the blind evaluation, we disclose the judge family but do not publish the evaluation prompt. | Blind-set response-quality evaluation. |
Aggregation policy. For the public evaluator, retrieval metrics are computed for each session-turn prediction, averaged within each turn, and then macro-averaged across turns. Diversity metrics are computed globally over the full prediction file.
To ensure transparency and fairness, all dimensions used in the evaluation are disclosed publicly. While we do not publish the exact aggregation formula on the website, participants should note that every listed dimension contributes to the final score. For the response-quality evaluation, we disclose that the automatic judge is based on Gemini, but we do not publish the evaluation prompt.
The public evaluator supports transparent development-set evaluation, while blind-set scoring is performed on the official leaderboard infrastructure.
The challenge utilizes TalkPlayData-Challenge, a synthetic conversational music recommendation dataset featuring multi-turn dialogues and music listening history. The dataset is split into four subsets: Train, Development, Blind A, and Blind B.
The leaderboard evaluation proceeds in two blind stages. Blind A supports the interim leaderboard during the main phase of the challenge. Blind B is the final hidden evaluation set, released one week before the end of the challenge, and the final leaderboard is based on Blind B.
Dataset Components
For further details, please refer to the dedicated website .
The following resources are provided to support participation in the challenge.
The following resources are shared across Train, Development, Blind A, and Blind B.
| When? | What? |
|---|---|
| 4 April, 2026 | Website published. |
| 10 April, 2026 | Dataset release (including Blind Dataset A). |
| 15 April, 2026 | Submission system and leaderboard go live (Blind Dataset A). |
| 23 June, 2026 | Blind Dataset B release. Final evaluation phase opens. |
| 30 June, 2026 | End of the challenge. Participants must submit final predictions for Blind Dataset B. |
| 6 July, 2026 | Final leaderboard released based on Blind Dataset B. |
| 6 July, 2026 | Winners announcement. EasyChair opens for paper submissions. |
| 12 July, 2026 | Paper submission deadline. |
| September 2026 | RecSys Challenge Workshop at ACM RecSys 2026. |
Submission website: TBA
Important Note on Publication Fees
Please be aware that this workshop follows the new ACM International Conference Proceedings Series (ICPS) model. Under this model, the financial responsibility for the Article Processing Charge (APC) lies with the authors of accepted papers.
However, the RecSys conference organizers are committed to supporting the community. Authors who face genuine difficulties in covering the APC will have the opportunity to submit a motivated request for financial support. These requests will be evaluated by the main conference organizers on a case-by-case basis. Please note that support is not guaranteed.
For more details on the ACM ICPS model, please refer to the ICPS FAQ and organizer guidance.