How do language models handle conflicting instructions in its prompt?
How do language models handle conflicting instructions in a prompt?
Cognitive dissonance is a psychological term describing the mental discomfort experienced by an individual holding two or more contradictory beliefs. For example, if you’re at the grocery store and see a checkout lane for “10 items or fewer” but everyone in the line has 10 or more items, what are you supposed to do?
Within the context AI, I wanted to know how large language models (LLMs) deal with cognitive dissonance in the form of contradictory instructions (for example, prompting an LLM to translate from English to Korean, but giving examples of English to French translations).
In this article, I conduct experiments by providing LLMs with contradictory information to ascertain which of the contradictory information LLMs are more likely to align with.
System message, prompt instructions, and few-shot examples
As a user, you can tell an LLM what to do in one of three ways:
- Directly describing the task in the system message
- Directly describing the task in the normal…