About ^top

The RecSys 2026 Challenge will be organized by Seungheon Doh (Korea Advanced Institute of Science and Technology, South Korea), Sergio Oramas (Pandora/SiriusXM), Bruno Sguerra (Deezer Research), Abhinav Bohra (Amazon), Claudio Pomo (Politecnico di Bari, Italy), and Francesco Barile (Maastricht University, Netherlands).

The RecSys Challenge 2026: Music-CRS focuses on the evolving landscape of music discovery, where static recommendation lists are being replaced by dynamic, conversational interactions. As users increasingly interact with AI through natural language, there is a critical need for systems that can seamlessly integrate Natural Language Understanding (NLU) with high-precision Recommender Systems (RecSys). This challenge aims to push the boundaries of how AI understands nuanced user preferences, explores musical tastes through dialogue, and provides contextually relevant track recommendations.

By utilizing the TalkPlayData-Challenge dataset, a large-scale conversation resource generated through an advanced agentic pipeline, we invite the global research community to tackle the complexities of multi-turn preference elicitation. As a research community-driven initiative, the challenge dataset features LLM-generated multi-turn dialogues paired with music metadata and user-item interaction data derived from publicly available research datasets. No SiriusXM or Deezer user data is included in the dataset. This challenge serves as a bridge between the NLP and RecSys communities, fostering next-generation innovation in interactive and personalized music information retrieval.

Challenge Task ^top

The primary goal is to develop a Conversational Music Recommendation system that acts as an intelligent agent capable of navigating user tastes through dialogue.

Main Task: Conversational Music Recommendation. The system must understand user music preferences from previous conversation turns and user profiles to recommend relevant tracks from a catalog while generating natural, helpful responses.

Recommendation (Ranked Retrieval). Given the dialogue history and user context, the system must output a ranked list of the top 20 music tracks from the provided catalog. This requires capturing both explicit requests and latent preferences (e.g., mood, genre-specific nuances) within the conversation.
Response Generation. The system must generate a natural language response to accompany the recommendations. This response should justify the picks based on the user's profile or previous turns and maintain a coherent conversational flow.

Evaluation ^top

The challenge employs a multi-dimensional evaluation framework to assess both what a system recommends and how it communicates those recommendations. The final score combines multiple dimensions, and each individual dimension contributes to the overall ranking. Recommendation quality serves as the primary anchor of the evaluation, while diversity and response-quality dimensions capture complementary aspects of system performance.

Development vs. blind evaluation. The public evaluator supports transparent development-set evaluation. Blind A serves as an interim leaderboard for the challenge, while Blind B is the final evaluation split. Blind Dataset B will be released one week before the end of the challenge, and the final leaderboard will be computed on Blind Dataset B.

Dimension	What it measures	How it is computed	Role in evaluation
nDCG@20	Ranking quality of the recommended tracks.	Computed from the ranked list of predicted tracks against the ground-truth relevant item. Higher-ranked correct recommendations receive more credit.	Primary recommendation metric.
Catalog Diversity	How broadly a system covers the music catalog.	Number of unique recommended tracks across all predictions divided by the total catalog size.	Complementary diversity indicator.
Lexical Diversity	How varied the generated language is.	Measured with Distinct-2, i.e., unique bigrams divided by total bigrams across generated responses.	Complementary response-generation indicator.
LLM-as-a-Judge	Quality of the generated explanation.	Blind-set responses are judged by a Gemini model used as an automatic judge. The judge evaluates two text-only dimensions: Personalization and Explanation Quality. These dimensions evaluate the written response independently from recommendation accuracy. To preserve the integrity of the blind evaluation, we disclose the judge family but do not publish the evaluation prompt.	Blind-set response-quality evaluation.

Aggregation policy. For the public evaluator, retrieval metrics are computed for each session-turn prediction, averaged within each turn, and then macro-averaged across turns. Diversity metrics are computed globally over the full prediction file.

To ensure transparency and fairness, all dimensions used in the evaluation are disclosed publicly. While we do not publish the exact aggregation formula on the website, participants should note that every listed dimension contributes to the final score. For the response-quality evaluation, we disclose that the automatic judge is based on Gemini, but we do not publish the evaluation prompt.

The public evaluator supports transparent development-set evaluation, while blind-set scoring is performed on the official leaderboard infrastructure.

Dataset: TalkPlayData-Challenge ^top

The challenge utilizes TalkPlayData-Challenge, a synthetic conversational music recommendation dataset featuring multi-turn dialogues and music listening history. The dataset is split into four subsets: Train, Development, Blind A, and Blind B.

The leaderboard evaluation proceeds in two blind stages. Blind A supports the interim leaderboard during the main phase of the challenge. Blind B is the final hidden evaluation set, released one week before the end of the challenge, and the final leaderboard is based on Blind B.

Dataset Components

Conversation Dataset: Includes multi-turn dialogues (avg. 8 turns) with fields for session_id, user_id, turn_number, and the utterance text.
Track Metadata: Contains information for over 1 million tracks, including track_id, track name, artist name, album name, tags (genre/mood), and release date.
User Profiles: Provides demographic and historical context, including user_id, age, gender, country, and a list of track_ids representing listening history.
Pre-extracted Embeddings: To ensure focus on model architecture, we provide pre-computed multimodal track embeddings (audio/content) and user embeddings (collaborative filtering).

For further details, please refer to the dedicated website .

Useful Resources ^top

The following resources are provided to support participation in the challenge.

Conversation Datasets ^top

Train / Development Dataset: TalkPlayData-Challenge-Dataset
Blind Dataset A: TalkPlayData-Challenge-Blind-A
Blind Dataset B: Link will be published at release time.

Shared Recommendation Resources ^top

The following resources are shared across Train, Development, Blind A, and Blind B.

User Metadata: TalkPlayData-Challenge-User-Metadata
Track Metadata: TalkPlayData-Challenge-Track-Metadata
User Embeddings: TalkPlayData-Challenge-User-Embeddings
Track Embeddings: TalkPlayData-Challenge-Track-Embeddings

Code and Participation ^top

Registration Website: Music-CRS Challenge Website
Baseline Code: music-crs-baselines
Evaluation Code: music-crs-evaluator

Prize ^top

First team from the participants - $1000
First academic team - $1000

Participation ^top

Registration & Data Access

Registration details are available on the linked website.

Timeline ^top

When?	What?
4 April, 2026	Website published.
10 April, 2026	Dataset release (including Blind Dataset A).
15 April, 2026	Submission system and leaderboard go live (Blind Dataset A).
23 June, 2026	Blind Dataset B release. Final evaluation phase opens.
30 June, 2026	End of the challenge. Participants must submit final predictions for Blind Dataset B.
6 July, 2026	Final leaderboard released based on Blind Dataset B.
6 July, 2026	Winners announcement. EasyChair opens for paper submissions.
12 July, 2026	Paper submission deadline.
September 2026	RecSys Challenge Workshop at ACM RecSys 2026.

Paper Submission Guidelines ^top

Submission website: TBA

All participants of the challenge are invited to submit if they consider their submission particularly effective, novel, otherwise interesting, or exploiting identified particularities of the data.
Note: paper submission is mandatory if you want to be eligible for a prize. Accepted papers are given a presentation slot at the workshop. At least one author of each accepted paper must attend the workshop in person and present their work. Please note that attending the RecSys Challenge Workshop requires registration for the conference and the workshop; there is no separate registration option for the challenge alone. A badly written paper or the absence of an author at the workshop may prevent a team from being eligible for the prize. Please contact the workshop organizers if none of the authors will be able to attend.
Page limit: 4 pages + 1 page for references (ACM SIG Format) in double-column format. Instructions for Word and LaTeX authors are given below:
- Microsoft Word: Write your paper using ACM’s interim template . Follow the embedded instructions to apply the paragraph styles to your various text elements. The text is in double-column format and no additional formatting is required at this stage.
- LaTeX: Please use the latest version of the Primary Article Template – LaTeX to create your submission. Start the document with the \documentclass[sigconf]{acmart} command to generate the output in a double-column format. Please see the LaTeX documentation and ACM’s LaTeX best practices guide for further instructions, ignoring the single-column instructions. Do not use the “manuscript” option, otherwise the document will not be compiled in double-column, as required. Check the sample-sigconf.tex file included in the template package for a formatting example. To ensure 100% compatibility with The ACM Publishing System (TAPS), please restrict the use of packages to the whitelist of approved LaTeX packages .
Anonymization of submissions is not required; please include your team name in the abstract and text, as well as a link to your code repository, the achieved score, and a reference to the RecSys Challenge Website (https://www.recsyschallenge.com/2026/). Note: this will be replaced with a reference to an overview paper in the RecSys proceedings for the camera-ready version.
Submission website: TBA
The topics of interest include, but are not limited to:
- Conversational recommendation for music discovery.
- Multi-turn preference elicitation and dialogue-aware recommendation.
- Joint modeling of ranked retrieval and response generation.
- Personalization using user profiles, listening history, and conversational context.
- Methods for improving explanation quality in music recommendation dialogues.
- Diversity, novelty, and exploration in conversational music recommendation.
- Efficient and scalable architectures for large-scale music recommendation.
- Analysis of conversational recommendation behavior and failure cases.
The submitted papers will be evaluated based on novelty, clarity, and presented empirical results.
Each paper will be reviewed by at least two PC members.
Our proceedings will be published in the ACM Digital Library within its International Conference Proceedings Series.
Accepted papers must be presented in the RecSys Challenge Workshop by at least one registered author.
Important Note on Publication Fees

Please be aware that this workshop follows the new ACM International Conference Proceedings Series (ICPS) model. Under this model, the financial responsibility for the Article Processing Charge (APC) lies with the authors of accepted papers.

However, the RecSys conference organizers are committed to supporting the community. Authors who face genuine difficulties in covering the APC will have the opportunity to submit a motivated request for financial support. These requests will be evaluated by the main conference organizers on a case-by-case basis. Please note that support is not guaranteed.

For more details on the ACM ICPS model, please refer to the ICPS FAQ and organizer guidance.

Organization ^top

Organizing Committee ^top

Seungheon Doh — KAIST, South Korea
Sergio Oramas — Pandora/SiriusXM
Bruno Sguerra — Deezer Research
Abhinav Bohra — Amazon
Claudio Pomo — Politecnico di Bari, Italy
Francesco Barile — Maastricht University, Netherlands

RecSys Challenge 2026

About top

Challenge Task top

Evaluation top

Dataset: TalkPlayData-Challenge top

Useful Resources top

Conversation Datasets top

Shared Recommendation Resources top

Code and Participation top

Prize top

Participation top

Timeline top

Paper Submission Guidelines top

Organization top

Organizing Committee top