RecSys Challenge 2024

About top

The RecSys 2024 Challenge will be organized by Johannes Kruse and Kasper Lindskow (Ekstra Bladet), Anshuk Uppal, Michael Riis Andersen, and Jes Frellsen (Technical University of Denmark), Marco Polignano (University of Bari Aldo Moro, Italy), Claudio Pomo (Politecnico di Bari, Italy), and Abhishek Srivastava (IIM Visakhapatnam, India) based on the data provided by Ekstra Bladet. This year’s challenge focuses on online news recommendation, addressing both the technical and normative challenges inherent in the design of effective and responsible recommender systems for news publishing.

The challenge will delve into the unique aspects of news recommendation. These include modeling user preferences based on implicit behavior, accounting for the influence of the news agenda on user interests, and managing the rapid decay of news items. Furthermore, our challenge also embraces the normative complexities. These involve investigating the effects of recommender systems on the news flow, and whether they resonate with editorial values. By providing participants with a comprehensive dataset and a robust news recommendation evaluation framework, our goal is to tackle these multifaceted challenges head-on. As part of the challenge, Ekstra Bladet will be releasing an anonymized dataset with approximately 2 million random users who engaged with EkstraBladet.dk over a six-week period.


Challenge Task top

The Ekstra Bladet RecSys Challenge aims to predict which article a user will click from a list of articles that was seen during a specific impression. Utilizing the user's browsing history, session details (like time and device used), and personal metadata (including gender and age), along with a list of candidate news articles, listed in an impression log. The challenge's objective is to rank the candidate articles based on the user's personal preferences. This involves developing models that encapsulate both the users and the articles through their content and the users' interests. The models are to estimate the likelihood of a user clicking each article by evaluating the compatibility between the article's content and the user's preferences. The articles are ranked based on these likelihood scores, and the precision of these rankings is measured against the actual selections made by users.


Evaluation top

To evaluate the models we use several standard metrics in the recommendation field, including the area under the ROC curve (AUC), mean reciprocal rank (MRR), and normalized discounted cumulative gain (nDCG@K) for K shown recommendations. To address the normative complexities inherent in news recommendations, the test set incorporates samples specifically designed to assess models based on normative properties. This includes evaluating models on Beyond-Accuracy Objectives, such as intra-list diversity, serendipity, novelty, coverage, among others. The final result is the average of these metrics across all impression logs.


DataSet top

The Ekstra Bladet News Recommendation Dataset (EB-NeRD) is a large-scale Danish dataset created by Ekstra Bladet to support advancements and benchmarking in news recommendation research. EB-NeRD comprises over 2.7 million users and more than 600 million impression logs from Ekstra Bladet. Alongside, we offer a collection of more than 120 thousands news articles, enriched with textual content features such as titles, abstracts, and bodies. This enables text features in a low-resource language as context for recommender systems.

EBNeRD

To support advancements in news recommendation research, we have constructed the Ekstra Bladet News Recommendation Dataset (EB-NeRD). It was collected from the user behavior logs at Ekstra Bladet. We collected behavior logs from active users during the 6 weeks from April 27th to June 8, 2023. This timeframe was selected to avoid major events, e.g., holidays or elections, that could trigger atypical behavior at Ekstra Bladet.
The active users were defined as users who had at least 5 and at most 1,000 news click records in a three-week period from May 18th to June 8, 2023. In order to protect user privacy, every user was de-linked from the production system when securely hashed into an anonymized ID using onetime salt mapping. Alongside, we provide Danish news articles published by Ekstra Bladet. Each article is enriched with textual context features such as title, abstract, body, categories, among others. Furthermore, we provide features that have been generated by proprietary models, including topics, named entity recognition (NER), and article embeddings.

Dataset Format

Each dataset bundle—demo, small, and large—consists of a training set and validation set, together with the articles (articles.parquet) present in the bundle. The official test set is to be downloaded separately from these. Each data split has two files: 1) the behavior logs for the 7-day data split period (behaviors.parquet) and 2) the users' click histories (history.parquet), i.e., 28 days of clicked news articles prior to the data split's behavior logs. The click histories are fixed to the period prior to the behavior logs; i.e., they are not updated within the data split period.

# File Name Description
1 behaviors.parquet The impression logs
2 history.parquet The click histories of users
3 articles.parquet The information of news articles
4 artifacts.parquet The embeddings of the articles textual information

For further details, please refer to the dedicated website of Ekstra Bladet.


Prize top



Participation and Data top

Registration & Data Access is open now!


Timeline top

When? What?
8 March, 2024 Start RecSys Challenge

Release dataset

25 March, 2024 Submission System Open
4 April, 2024 Leaderboard live
21 June, 2024 End RecSys Challenge
24 June, 2024 Final Leaderboard & Winners

EasyChair open for submissions

1 July, 2024 Code Upload

Upload code of the final predictions

15 July, 2024 Paper Submission Due
3 August, 2024 Paper Acceptance Notifications
29 August, 2024 Camera-Ready Papers
October 2024 RecSys Challenge Workshop

@ ACM RecSys 2024




Organization top

Workshop Program Committee top

TBA