Catch Me If You GAN: Generation, Detection, and Obfuscation of Deepfake Texts


Tutorial at The Web Conference 2023 in Austin, TX, USA.

Deepfake Text Tutorial Slides

Check out our Video teaser



Overview

In recent years, Natural Language Generation (NLG) techniques have greatly advanced, especially in the realm of open-ended text generation using deep learning methods. With respect to the quality of generated texts , therefore, it has become no longer trivial to tell the difference between human-written and NLG-generated texts (so-called deepfake texts). While this is a celebratory feat for NLG, however, it poses new security risks (e.g., generation of misinformation or phishing message at scale). To combat this novel challenge, researchers have developed diverse techniques to automatically detect NLG-generated texts. While this niche field of deepfake text detection is growing, the field of NLG is growing at a much faster rate, thus making it difficult to understand the complex interplay between state-of-the-art NLG methods and the detectability of their deepfake texts. Scaling up the problem further to the case of k NLG methods (k > 1), each generating uniquely-different yet human-quality texts, two new computational problems emerge: “Neural” Authorship Attribution (AA) and “Neural” Authorship Obfuscation (AO) problems, where the AA problem is concerned with attributing the authorship of a given deepfake text to one of k NLG methods, while the AO problem is to evade the authorship of a given text by modifying parts of the text. Both problems lie in the intersection between Machine Learning and Security/Privacy, and its importance and implications are growing rapidly on the domain of World-Wide Web, where the bulk of “information” is text-based (e.g., NLG-made Wikipedia articles with factual errors on obscure subjects may evade human curation and jeopardize the Web eco-system). In this tutorial, therefore, we call-attention to the serious security risks both emerging problems pose and give a comprehensive review of recent literature on the: (1) generation, (2) detection (3) obfuscation of deepfake text authorship, (4) their utility in the Web applications and (5) critical implications in the society. Our tutorial will be mainly of lecture-style, together with hands-on examples of the generation and detection of deepfake texts for interactive participation from audience. We selectively select the word “GAN”, which stands for Generative Adversarial Network, as part of the title to emphasize its symbolic pioneer of modern AI generative models.


See similar Tutorial at INLG 2022 Conference, Tutorial on Artificial Text Detection


Tutorial Parts

  1. Introduction
  2. Authorship Attribution
  3. Authorship Obfuscation
  4. Conclusion & Future Work



Materials

Tutorial Slides



Organizers

image
Adaku Uchendu,
Pennsylvania State University

image
Thai Le,
University of Mississippi

image
Dongwon Lee,
Pennsylvania State University