Topic Modelling Your Personal Data

Using Traditional and Transformer Models to Explore Personal Data Stored by Brokers

Image created by author using ChatGPT 4o and DALL-E-3

In a prior article, I described how you can access your personal data that is stored and used by the front-line, consumer-facing companies that we engage with every day. These include retailers, social media, cell providers, financial service firms, and many others. I explored how to use various machine learning models and visualizations to discover how those companies perceive you.

In the process of working on that article, I discovered that the front-line firms frequently share our personal data with another set of companies generally known as data brokers or data aggregators (hereafter referred to aggregators). Aggregators enhance our data with other types of data sourced from public records, other aggregators, and similar sources to create profiles of us. They then sell the profiles back to consumer-facing companies and other organizations for marketing or other purposes.

My curiosity was aroused: Just what types of data do these aggregators keep about me? How many features do they store? Are there major types of data that individual aggregators focus on? And if there are, what does that tell me about their end-customers? What industries are those end-customers in, and what personal data do…