Everything you need to know to get started with text mining
The ideal dataset for data analysis is like Table_1:
However, the datasets that we encounter in reality are mostly like Table_2:
The main differences between these two tables are whether the data is well organized with rows and column and only presented in numbers or text. Due to these differences, the data in Table_1 is called structured data while the data in Table_2 is categorized as unstructured data.
Unstructured data refers to information that doesn’t have a predetermined structure or format. It’s difficult to store and manage in relational database. But it often contains valuable information which is useful for generating data insights, training machine learning models, or performing natural language processing (NLP).
In this article, I’ll introduce 7 advanced SQL techniques used to hand unstructured data…