There is a newer version of the record available.

Published June 22, 2026 | Version v2

SMIDGE Telegram dataset

Authors/Creators

  • 1. FASresearch GmbH

Description

The Telegram dataset was compiled by collecting messages from 86 active and publicly accessible English-language channels. These channels were selected from the top 100 channels with the highest subscriber counts, identified as relevant to the dissemination of disinformation and conspiracy theories in a prior analysis.
Utilizing the Telegram Application Programming Interface (API) through the Python Telethon library, a comprehensive data collection was conducted for the period spanning September 25, 2023, to September 24, 2024. This initial process gathered all messages, posts, and associated metadata from the selected channels, yielding a total corpus of 449,621 messages.
The dataset provided for analysis is a curated and cleaned sample drawn from this initial corpus, consisting of 279,000 Telegram messages. This subset was generated through a data cleaning process designed to optimize the dataset for text-based analysis. To achieve this, messages that consisted solely of non-textual content—such as hyperlinks, images, or videos without accompanying text—were systematically excluded. This filtering ensures that the resulting sample is composed of messages containing substantive textual content suitable for publication and further research.
Column description:
•    lang: Language of the message. All records in this dataset are English.
•    fasID: Unique internal identifier assigned to each message.
•    date: Date and time the message was posted (UTC), in YYYY-MM-DD HH:MM:SS format.
•    channel_name: Name of the public Telegram channel the message was posted in.
•    message_clean: Cleaned text content of the message used for text-based analysis.
•    url: URL(s) referenced or linked within the message, if any.
•    domain: Domain name(s) extracted from the URL(s) in the message.

Files

Files (35.1 MB)

Name Size Download all
md5:43a7ffa9e242c6c065919465bddfb48b
35.1 MB Download

Additional details

Funding

European Commission
SMIDGE - Social Media narratives: addressing extremism in middle age 101095290