Catch Me If You GPT: Tutorial on Deepfake Texts


Tutorial at 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics at Mexico City, Mexico
Deepfake Text Tutorial Slides (draft will update later!)



Description

In recent years, Natural Language Generation (NLG) techniques have greatly advanced, especially in the realm of Large Language Models (LLMs). With respect to the quality of generated texts, it is no longer trivial to tell the difference between human-written and LLM-generated texts (i.e., deepfake texts). While this is a celebratory feat for NLG, it poses new security risks (e.g., the generation of misinformation). To combat this novel challenge, researchers have developed diverse techniques to detect deepfake texts. While this niche field of deepfake text detection is growing, the field of NLG is growing at a much faster rate, thus making it difficult to understand the complex interplay between state-of-the-art NLG methods and the detectability of their generated texts. To understand such inter-play, two new computational problems emerge: (1) Deepfake Text Attribution (DTA) and (2) Deepfake Text Obfuscation (DTO) problems, where the DTA problem is concerned with attributing the authorship of a given text to one of k NLG methods, while the DTO problem is to evade the authorship of a given text by modifying parts of the text. In this cutting-edge tutorial, therefore, we call attention to the serious security risk both emerging problems pose and give a comprehensive review of recent literature on the detection and obfuscation of deepfake text authorships. Our tutorial will be 3 hours long with a mix of lecture and hands-on examples for interactive audience participation


Tutorial Parts

  1. Introduction
  2. Deepfake Text Detection
  3. Deepfake Text Obfuscation
  4. Conclusion & Future Work


See Similar Tutorials:

  1. Tutorial at 2023 NSF Cybersecurity Summit at Berkley, CA,
  2. Tutorial at The Web Conference 2023,
  3. Tutorial at 15th International Natural Language Generation Conference 2022 Conference,


Presenters

image
Adaku Uchendu is currently a Technical staff at MIT Lincoln Laboratory. She earned her Ph.D. in Information Sciences and Technology from The Pennsylvania State University. As a graduate student, she was an Alfred P. Sloan scholar, an NSF CyberCorps SFS scholar, and a Button-Waller fellow. Her dissertation is titled “Reverse Turing Test in the Age of Deepfake Texts.” She has authored several papers in deepfake text detection at top-tier conferences & journals - EMNLP, KDD Exploration, Web Conference, AAAI HCOMP, etc. In addition, she led three similar Tutorials at the INLG conference in July 2022, the Web conference in April 2023, and the 2023 NSF Cybersecurity Summit in October 2023. She is interested in building robust and explainable deepfake text detectors to assist in both automatic and human detection of deepfake texts. More details of her research can be found at: https://adauchendu.github.io/. E-mail: adaku.uchendu@ll.mit.edu

image
Saranya Venkatraman is a Ph.D. student at The Pennsylvania State University, working under the guidance of Dr. Dongwon Lee in the College of Information Sciences and Technology. Her research focuses on using psycholinguistics theories and theories of human cognition to inform natural language processing techniques, with a focus on deepfake text detection and deepfake text obfuscation. She also contributed to and presented a Tutorial on Artificial Text Detection at the INLG conference, in July 2022 and has published in top-tier conferences like EACL and EMNLP. More details of her research can be found at: https://saranya-ven.github.io/. E-mail: saranyav@psu.edu

image
Thai Le is an Assistant Professor at The University of Mississippi since 2022. Before starting at the University of Mississippi, he worked at Amazon Alexa and obtained his doctorate degree from The Pennsylvania State University. He has published several relevant works at top-tier conferences such as KDD, ICDM, ACL, EMNLP, and Web Conference. He is also one of the Instructors in a similar Tutorial presented at the Web conference in April 2023 and the 2023 NSF Cybersecurity Summit in October 2023. In general, he researches the trustworthiness of machine learning and AI, with a focus on explainability and adversarial robustness of machine learning models. He also contributed to the adversarial NLP open-source repository TextAttack1, which will be used to demonstrate the authorship obfuscation section of the tutorial. More details of his research can be found at: https://lethaiq.github.io/tql3. E-mail: thaile@olemiss.edu

image
Dongwon Lee is a Professor in the College of Information Sciences and Technology (a.k.a. iSchool) at Penn State University, USA, and also an ACM Distinguished Scientist and Fulbright Cyber Security Scholar. Before starting at Penn State, he worked at AT&T Bell Labs and obtained his Ph.D. in Computer Science from UCLA. From 2015 to 2017, he also served as a Program Director at the National Science Foundation (NSF), co-managing cybersecurity education and research programs and contributing to the development of national research priorities. In general, he researches problems in the areas of data science, machine learning, and cybersecurity. Since 2017, in particular, he has led the SysFake project at Penn State, investigating computational and socio-technical solutions to better combat fake news. More details of his research can be found at: http://pike.psu.edu. Previously, he has given numerous tutorials at various venues, including WWW, AAAI, CIKM, SDM, ICDE, and WebSci. E-mail: dongwon@psu.edu