Ook dit jaar zal Textgain zijn steentje bijdragen aan de Safer Internet Day. Met een dag vol gratis webinars willen we professionals sensibiliseren voor de laatste trends binnen het online haatspraak. Programma: 10:00 Emoji 🧐 & Haatspraak (NL) Elke dag … Read More
Resources
Resources
Here you’ll find a collection of technical reports, scientific publications and our open-source software and data sets.
Textgain Technical Reports (ISSN 2684-4842)
A technical report series detailing ongoing work at Textgain
Online anti-Semitism across platforms (TGTR-7)
Abstract Authors Abstract We created a fine-grained AI system for the detection of anti-Semitism. This Explainable AI will identify English and German anti-Semitic expressions of dehumanization, verbal agression and conspiracies in online social media messages across platforms, to support high-level … Read More
The sexist narrative on alternative social media dissected (TGTR-6)
Abstract Authors Abstract Toxic language use on fringe social media platforms and image boards (4chan, 8kun, GAB, etc.) is being increasingly well-documented, with many studies describing the racist, nationalist and antisemitic rhetoric that are flooding these boards in particular. Yet, … Read More
Onze Echokamers: likes onder de loep (TGTR-5)
Abstract Authors Abstract Textgain onderzocht hoe echokamers automatisch in kaart kunnen worden gebracht aan de hand van publieke data van de twitter-accounts van nieuwssites, influencers en politici. In dit artikel beschrijven we de huidige stand van zaken in het Nederlandse … Read More
GeenStijl.nl embeddings (TGTR-4)
Abstract Authors Abstract We collected over 8M messages from the controversial Dutch websites GeenStijl and Dumpert to train a word embedding model that captures the toxic language representations contained in the dataset. The trained word embeddings (±150MB) are released for … Read More
Profanity & Offensive Words (POW): Multilingual fine-grained lexicons for hate speech (TGTR-3)
Abstract Authors Abstract The POW lexicons are a steadily growing, interpretable NLP resource for online hate speech detection. They are currently available in English, German, French, Dutch and Hungarian, capturing thousands of verbal expressions of abusive, aggressive, dehumanizing, discriminatory, offensive … Read More
MAL NLP Lexicon: Melancholy, Anxiety & Loneliness during lockdown (TGTR-2)
Abstract Authors Abstract We have created a new practice-based NLP resource for monitoring mental health on social media, in particular brooding. The resource is currently available for Dutch and captures 2,000+ expressions of anger, fear and sadness, along with various … Read More
4chan & 8chan embeddings (TGTR-1)
Abstract Authors Abstract We have collected over 30 million messages from the publicly available /pol/ message boards on 4chan and 8chan, and compiled them into a model of toxic language use. The trained word embeddings (±0.4GB) are released for free … Read More
Publications
Publications co-authored by Textgain team members. Some of these are available for download from this site.
QAnon: Spreading Conspiracy Theories on Twitter
Abstract Authors Abstract From 1st October to 5th November 2020, Textgain analyzed half a million Twitter messages related to QAnon conspiracy theories, using our Natural Language Processing (NLP) technology. This report outlines the results of the findings of our quantitative … Read More
Using a Personality-Profiling Algorithm to Investigate Political Microtargeting
Abstract Authors Abstract Political advertisers have access to increasingly sophisticated microtargeting techniques. One such technique is tailoring ads to the personality traits of citizens. Questions have been raised about the effectiveness of this political microtargeting (PMT) technique. In two experiments, … Read More
Automatic detection of cyberbullying in social media text
Summary Authors Summary While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection … Read More
Automatic Detection of Online Jihadist Hate Speech
Summary Author Summary We have developed a system that automatically detects online jihadist hate speech with over 80% accuracy, by using techniques from Natural Language Processing and Machine Learning. The system is trained on a corpus of 45,000 subversive Twitter … Read More
Code
True to our roots, we have several open-source libraries available on Github.
Our latest blog posts
Belgisch AI-bedrijf leidt EU-onderzoek naar online haat
Textgain leidt opstart van nieuw Europees expertisecentrum rond social media hate speech Antwerpen, 18 januari 2021 – Het Belgische technologiebedrijf Textgain zal een Europees onderzoekscentrum naar online haat, desinformatie en ethisch verantwoorde Artificiële Intelligentie (AI) leiden voor alle Europese taalgebieden. … Read More
Depolarisation: Road to deeper trust
call for workshops 5th International Conference “Civic Actors in Conflicts” September 29th – October 2, 2020 Bratislava, Slovakia We invite you to submit an ONLINE workshop proposal that responds to the three main conference sections (find out more about the … Read More