Dealing with Cognitive Dissonance, the AI Way

How do language models handle conflicting instructions in its prompt?

Given contradictory instructions in the system message, the prompt, and examples, which instructions will an LLM follow in its response? Created by the author.

How do language models handle conflicting instructions in a prompt?

Cognitive dissonance is a psychological term describing the mental discomfort experienced by an individual holding two or more contradictory beliefs. For example, if you’re at the grocery store and see a checkout lane for “10 items or fewer” but everyone in the line has 10 or more items, what are you supposed to do?

Within the context AI, I wanted to know how large language models (LLMs) deal with cognitive dissonance in the form of contradictory instructions (for example, prompting an LLM to translate from English to Korean, but giving examples of English to French translations).

In this article, I conduct experiments by providing LLMs with contradictory information to ascertain which of the contradictory information LLMs are more likely to align with.

System message, prompt instructions, and few-shot examples

As a user, you can tell an LLM what to do in one of three ways:

  • Directly describing the task in the system message
  • Directly describing the task in the normal…