Your AI Has a Favorite Opinion—And It’s Not Yours

Known biases in LLMs

Newer AI models such as LLMs are not immune to the problems of bias identified and measured in machine learning algorithms (Nazer et al., 2023) and which have plagued predictive algorithms in real-world uses cases going back to at least the 1930s (Christian, 2021, Ch.2). Unsurprisingly, LLMs are better at recalling facts that occur frequently within the training data and struggle with long-tail knowledge (Kandpal et al., 2023). Das et al. (2024) identify a range of shortcomings of LLMs in attempting to generate human-like texts, such as underrepresenting minority viewpoints and reducing the broad concept of “positive” text to that simply of expressing “joy”.

Figure 1: Model collapse example from Shumailov 2023.

Recent work attempts to address these issues through a variety of methods, for example by upsampling underrepresented features on which prediction is otherwise sub-optimal (Gesi et al., 2023), or evaluating the importance of input data using shapely values (Karlas et al ˇ ., 2022). However, the mechanistic interpretability work on LLMs to date suggest that our understanding, while improving, is still very limited (e.g. Kramar et al ´ ., 2024; Wu et al., 2023). As such, direct methods for overcoming such biases are, at a minimum, not close at hand. Finally, while much of the focus is naturally on overt racial and gender biases, there may also be pervasive but less observable biases in the content and form of the output. Wendler et al. (2024), for example, provide evidence that that current LLMs trained on large amounts of English text ‘rely on’ English in their latent representations, as if a kind of reference language.

One particular area in which the diversity of LLM outputs has been analyzed is on a token-by-token level in the context of decoding strategies. In some situations, using beam search to choose the most likely next token can create degenerate repetitive phrases (Su et al., 2022). Furthermore, a bit like Thelonious Monk’s melodic lines, humans do not string together sequences of the most likely words but occasionally try to surprise the listener by sampling from low-probability words, defying conventions, etc. Holtzman et al. (2020) (referring to Grice, 1975).

Your AI Has a Favorite Opinion—And It’s Not Yours | HackerNoon

Table of Links

Known biases in LLMs

Prompt Engineering for Coding Tasks

LLaVA-Phi: Limitations and What You Can Expect in the Future | HackerNoon

William Shatner faces backlash for using AI art cover on new music album

Supercharge Your Portfolio with Future Tech Stocks!

Join us for Profitable Insights & Expert Tips!

Your AI Has a Favorite Opinion—And It’s Not Yours | HackerNoon

Table of Links

Known biases in LLMs

Prompt Engineering for Coding Tasks

LLaVA-Phi: Limitations and What You Can Expect in the Future | HackerNoon

William Shatner faces backlash for using AI art cover on new music album

Supercharge Your Portfolio with Future Tech Stocks!

Join us for Profitable Insights & Expert Tips!

Subscribe