In this article, with Patrik Szigeti, we designed a data and network methodology supported by graph visuals to outline the complex social network behind the original Dune trilogy.
Following the success of Dune both at the box office and with the critics in 2021, Dune: Part Two was one of the most anticipated movies of 2024, and it didn’t disappoint. On track to earn more, and holding higher ratings both on Rotten Tomatoes and iMDB than its prequel at the time of writing this article, with its ever changing political landscape, Dune is the perfect franchise to dive into through network science. In this short piece, we aimed to explore the connections between the different Houses and people of the Impremium based on the first three books of Frank Herbert — Dune (1965), Dune Messiah (1969) and Children of Dune (1976).
In the first part of this article, we present a Python-based approach to collecting character profile data from the Dune Wiki and turn those profiles into a catchy network graph. Then, in the second — rather spoiler-heavy — section, we dive into the depth of the network and extract all the stories it has to say about the first trilogy of the Dune.
All images were created by the authors.
1 Building the Network
First, we use Python to collect the full list of Dune characters. Then, we download their biography profiles from each character’s fan wiki site and count the number of times each character’s story mentions any other character’s story, assuming these mentions encode various interactions between any two pairs of characters. Then, we will use network science to turn these relationships into a complex graph.
1.1 Collecting the list of characters
First off, we collected the list of all relevant characters from the Dune fan wiki site. Namely, we by used urllib and bs4 to extracted the names and fan wiki id-s of each character mentioned and has its own wiki page encpoded by their id. We did this for the first three books: Dune, Dune Messiah and Childen of Dune. These three books cover the rise of the house of Atreides.
Sources:
- https://dune.fandom.com/wiki/Dune_(novel)
- https://dune.fandom.com/wiki/Dune_Messiah
- https://dune.fandom.com/wiki/Children_of_Dune_(novel)
First, download the character listing site’s html:
dune_meta = {
'Dune': {'url': 'https://dune.fandom.com/wiki/Dune_(novel)'},
'Dune Messiah': {'url': 'https://dune.fandom.com/wiki/Dune_Messiah'},
'Children of Dune': {'url': 'https://dune.fandom.com/wiki/Children_of_Dune_(novel)'}
}for book, url in dune_meta.items():
sauce = urlopen(url['url']).read()
soup = bs.BeautifulSoup(sauce,'lxml')
dune_meta[book]['chars'] = soup.find_all('li')
A little manual help to fine-tune the character name and id:
dune_meta['Dune']['char_start'] = 'Abulurd'
dune_meta['Dune']['char_end'] = 'Arrakis'
dune_meta['Dune Messiah']['char_start'] = 'Abumojandis'
dune_meta['Dune Messiah']['char_end'] = 'Arrakis'
dune_meta['Children of Dune']['char_start'] = '2018 Edition'
dune_meta['Children of Dune']['char_end'] = 'Categories'
Then, we extracted all the potentially relevant names and the corresponding profile urls. Here, we manually checked from which tag blocks the names start (e.g. as opposed to the outline of the character listing site). Additionally, we decided to drop the characters marked by ‘XD’ and ‘DE’ corresponding to the extended series, as well as characters that were “Mentioned only” in a certain book:
for k, v in dune_meta.items():
names_urls = {}
keep_row = False
print(f'----- {k} -----')
for char in v['chars']:
if v['char_start'] in char.text.strip():
keep_row = True
if v['char_end'] in char.text.strip():
keep_row = False
if keep_row and 'Video' not in char.text:
try:
url = 'https://dune.fandom.com' + str(char).split('href="')[1].split('" title')[0]
name = char.text.strip()
if 'wiki' in url and 'XD' not in name and 'DE' not in name and '(Mentioned only)' not in name:
names_urls[name] = url
print(name)
except:
pass
dune_meta[k]['names_urls'] = names_urls
This code block then outputs the list of characters, such as:
Finally, we check the number of characters we collected and save their profile URLs and identifiers for the next subchapter.
dune_names_urls = {}
for k, v in dune_meta.items():
dune_names_urls.update(dune_meta[k]['names_urls'])names_ids = {n : u.split('/')[-1] for n, u in dune_names_urls.items()}
print(len(dune_names_urls))
The outputs of this cell, showing 119 characters with profile URLs:
1.2 Downloading character profiles
Our goal is to map out the social network of the Dune characters — which means that we need to figure out who interacted with whom. In the previous sub chapter, we got the list of all the ‘whom,’ and now we will get the info about their personal stories. We will get those stories by again using simple web scraping techniques, and then save the source of each characters personal site in a separate file locally:
# output folder for the profile htmls
folderout = 'fandom_profiles'
if not os.path.exists(folderout):
os.makedirs(folderout)# crawl and save the profile htmls
for ind, (name, url) in enumerate(dune_names_urls.items()):
if not os.path.exists(folderout + '/' + name + '.html'):
try:
fout = open(folderout + '/' + name + '.html', "w")
fout.write(str(urlopen(url).read()))
except:
pass
The result of running this code will be a folder in our local directory with all the fan wiki site profiles belonging to every single selected character.
1.3 Building the network
To build the network between characters, we count the number of times each character’s wiki site source references any other character’s wiki identifier using the following logic. Here, we build up the edge list — the list of connections which contain both the source and the target node (character) of the connections as well as the weight (co-reference frequency) between the two characters’ pages.
# extract the name mentions from the html sources
# and build the list of edges in a dictionary
edges = {}for fn in [fn for fn in os.listdir(folderout) if '.html' in fn]:
name = fn.split('.html')[0]
with open(folderout + '/' + fn) as myfile:
text = myfile.read()
soup = bs.BeautifulSoup(text,'lxml')
text = ' '.join([str(a) for a in soup.find_all('p')[2:]])
soup = bs.BeautifulSoup(text,'lxml')
for n, i in names_ids.items():
w = text.split('Image Gallery')[0].count('/' + i)
if w>0:
edge = 't'.join(sorted([name, n]))
if edge not in edges:
edges[edge] = w
else:
edges[edge] += w
len(edges)
Once we run this block of code, we will get the result of 307 as the number of edges connecting the 119 Dune characters.
Next, we use the NetworkX graph analytics library to turn the edge list into a graph object and output the number of nodes and edges the graph has:
# create the networkx graph from the dict of edges
import networkx as nx
G = nx.Graph()
for e, w in edges.items():
if w>0:
e1, e2 = e.split('t')
G.add_edge(e1, e2, weight=w)G.remove_edges_from(nx.selfloop_edges(G))
print('Number of nodes: ', G.number_of_nodes())
print('Number of edges: ', G.number_of_edges())
The result of this code block:
The number of nodes is only 72, meaning 47 characters were not linked to any central member in their — probably rather brief — wiki profiles. Additionally, we see a decrease of four in the number of edges because a few self-loops were removed as well.
Let’s take a brief view of the network using the built-in Matplotlib plotter:
# take a very brief look at the network
import matplotlib.pyplot as plt
f, ax = plt.subplots(1,1,figsize=(15,15))
nx.draw(G, ax=ax, with_labels=True)
The output of this cell:
While this visual already shows some network structure, we exported the graph into a Gephi file using the following line of code, and designed the network attached on the figure below (the how-to of such network visuals will be the topic of an upcoming tutorial article):
nx.write_gexf(G, 'dune_network.gexf')
The full Dune network:
2 Reading the Network
Warning: the following paragraphs contain spoilers from the first three books of the Dune franchise. The two movies (Dune and Dune: Part Two) are based on the first book.
It’s no surprise that we would find Paul Atreides (also known as Lisan al-Gaib, Muad’Dib, Usul, Kwisatz Haderach, and many others) in the middle of our network. He’s the protagonist of the first book (and the movies), and he’s a central figure who, in the end, takes his place as the emperor of the Imperium. During the second book, Dune Messiah, we meet a different Paul, a leader who, after years of fighting and being cursed by the gift of foresight, walks into the desert as a blind Fremen, offering himself up to Shai–Hulud. He then appears in Children of Dune as The Preacher, a mysterious figure operating and preaching from the desert, and eventually meeting his end. During this journey he crosses paths with many other characters. This is perfectly illustrated by the fact that his so-called ego network — the subgraph of all his connections and the connections between those — contains about half of all the nodes and 64% of all the links. Also, the figure is shown below.
As we keep reading the network, we can see House Atreides in the middle of it, and of course, centered around Paul is his family. His parents, Lady Jessica, the concubine of Leto Atreides I, and Reverend Mother of the Bene Gesserit. Jessica is the daughter of Vladimir Harkonnen of House Harkonnen, giving us the first connection between the yellow and light blue node groups. We can see a strong connection between Paul and his Fremen concubine, Chani, further connecting to their children, Leto II and Ghanima. Paul is also closely connected to his mentors and good friends, Duncan Idaho and Gurney Halleck, as well as Reverend Mother Gaius Helen Mohiam, who keeps popping up across the books to strengthen the Bene Gesserit’s cause.
Even though the network is clearly centered on Paul, and we can see the distinct groups of House Corrino (brown), House Harkonnen (light blue), and the Fremens (blue), what’s really interesting is how much this simple network created based on Wikipedia articles tells us about the plot unfolding during these three books.
We see Liet Kynes, the de facto leader of the Fremen and platenologist, whose dream was to see the barren planet of Arrakis become rich in green pastures and water supplies. His daughter, Chani, connects to Stilgar, a prominent figure in Paul’s life, a religious follower, and through him to all of the Fremen. But edged between them is Scytale, who plotted to destroy the royal family after Muad’Dib’s Jihad through Hayt, the ghola (an artificially created human replicated from a dead individual) of Duncan Idaho. To those who only saw the movies, it may come as a surprise that Duncan is such a central figure in our network, but after serving as the swordmaster of House Atreides, and falling in the Desert War on Arrakis, he came back as the aforementioned ghola, and played an important part — marrying Alia Atreides, daughter of Lady Jessica, sister of Paul.
Movie-seers might also be surprised by the coloring of Thufir Hawat as part of House Harkonnen. He was a Mentat responsible for the security of House Atreides, but after the Harkonnens replaced the Atreides as rulers of Arrakis, he was forced into their service, and plotted against them, even though his true goal was always to revenge the death of his beloved Duke, thinking Lady Jessica was behind the attack. He later gained redemption by refusing to kill Paul and committing suicide instead.
The most fascinating part about this network, however, is no matter how small a character’s node appears, that doesn’t mean they didn’t play an important role in the plot. They might have said the right thing to the wrong audience (Bronso of Ix claiming Paul lost something essential to his humanity before becoming Muad’Dib), became the lover of Alia Atreides (Javid), or plotted to kill the Atreides twins, Leto and Ghanima (Tyekanik). We could go on and on, these are just a few examples of the intriguing interconnected political landscape of Frank Herbert’s Dune.
3 Conclusion
With this article, we aimed to create a bridge between data science fans and Dune fans — and potentially entertain the already existing overlapping communities between these two. First, we presented a relatively generic framework in Python that allows us to map out the social network of any fan wiki site that we come across. Second, we also gave detailed interpretations of how these network visuals show entire stories unfolding — a picture worth a thousand words, or even more — a trilogy.