How to Extract Personal Information from Text Corpus Using NER Like a Pro
Introduction
Okay, imagine this — you’ve got mountains of articles, journals, and blogs stuffed with information that you want to process. Now imagine you think it will be helpful for the community too if they get a chance to work with this data, HOWEVER, you wouldn’t want to share the data right away as it may contain certain personal information that shouldn’t be shared without the consent of those people.
Since it’s not viable to ask for permission from all those people, you decide to use your skills and mask any personal information under FERPA guidelines. It is common for companies to mask their data when sharing it outside for analysis or demo purposes and it’s easier with numeric data. And here we want to do the same but with textual data.
Now here, since we are talking about text data, we will be employing a technique in Natural Language Processing (NLP). Enter Named Entity Recognition (NER), a trusty NLP detective unlocking those hidden data treasures. The purpose here is to identify the personal information.
Let’s dive deeper into how NER works, the concept behind the NER mechanism, ways to implement NER, which…