A Comprehensive Overview of Backdoor Attacks in Large Language Models within Communication Networks

A Comprehensive Overview of Backdoor Attacks in Large Language Models within Communication Networks

Problem Description

Backdoor attacks in LLMs are classified into several types based on their trigger mechanisms:

  • Input-triggered
  • Prompt-triggered
  • Instruction-triggered (maybe)
  • Demonstration-triggered

The evaluation metrics used are ASR (Attack Success Rate) and BA (Benign Accuracy), with a high ASR indicating a successful attack, while maintaining a high BA to minimize the impact on benign functionality.

Input-trigger

Layer-weight poisoning:

  • Li L et al., “Backdoor attacks on pre-trained models by layerwise weight poisoning,” 2021. [2]
  • Yang W et al., “Be careful about poisoned word embeddings: Exploring the vulnerability of the embedding layers in NLP models,” 2021. [3]

Stealthy and natural backdoor attack triggers:

  • Li S et al., “Hidden backdoors in human-centric language models,” ACM SIGSAC 2021. [4]
  • Zhang X et al., “Trojaning language models for fun and profit,” EuroS&P 2021. [5]
  • Pan X et al., “Hidden trigger backdoor attack on NLP models via linguistic style manipulation,” USENIX Security 2022. [6]
  • Lyu W et al., “Backdoor Attacks Against Transformers with Attention Enhancement,” ICLR 2023 Workshop. [8]
  • Zhou X et al., “Backdoor Attacks with Input-unique Triggers in NLP,” 2023. [9]
  • Chen L et al., “Backdoor Learning on Sequence to Sequence Models,” 2023. [10]

Prompt-trigger

  • The manipulation of prompts can either inject a trigger or compromise the prompt through malicious user inputs:
    • Cai X et al., “Badprompt: Backdoor attacks on continuous prompts,” NeurIPS 2022. [11]
    • Perez F et al., “Ignore previous prompt: Attack techniques for language models,” 2022. [12]
    • Zhao S et al., “Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models,” 2023. [13]

Instruction-trigger

  • Xu J et al., “Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models,” 2023. [14]

Demonstration-triggered

  • Wang J et al., “Adversarial Demonstration Attacks on Large Language Models,” 2023. [15]



Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Model for Time Series
  • Backdoor Attack
  • Heterogeneous Graph Neural Network
  • Interesting Paper
  • Efficient LLM