New Dataset: ‘Digital Communication Content Corpus’

A dataset titled ‘Digital Communication Content Corpus — a manually and automatically annotated corpus’, authored by Dr. Maja Sawicka from the University of Warsaw (UW), has been published in the QDA.

The Digital Communication Content Corpus was created as part of the project ‘Methods for studying digital communication and textual data’, funded under the 16th edition of the University of Warsaw’s Didactic Innovation Fund competition in the years 2020–2022. The project was carried out in collaboration between the Faculty of Sociology at UW, the Institute of Polish Language at UW, and the CLARIN-PL consortium.

The corpus includes material from public profiles on major social media platforms used by Polish internet users: Facebook, Twitter, and YouTube. The communication recorded in the corpus concerns five topics that participants of the classes conducted within the project deemed important: methods for diagnosing misinformation and combating it (fact-checking), issues related to climate change (climate), body positivity and attitudes toward the body (body), communication by actors hostile to the EU (anti-EU), and current conspiracy theories (conspiracy_theories).