Social CRM Starts with Categorization
by Jen Roberts August 23, 2010Categorization is the foundation of Social CRM and the most important stage in automated text analytics. The end results of Social CRM – whether reporting, dash boarding, or workflow – is a direct reflection of how well the data was categorized. If you begin with a bad data set your output will not be accurate, and this can lead to misinformed business decisions.
The act of “categorizing” data is the application of filters to a stream of inbound data, separating the desired content from the rest and making it available for analysis. Filters play a major role in this process; once the data stream is primed for categorization, it is the filters that determine what content will be segmented. Simply put, if your filters are bad your data is bad
Traditionally, people filter by keyword. We have added the ability to apply semantic filtering, as well. Keyword and Boolean based logic use terms, phrases and strings of logic (OR, AND, NOT, NEAR) to segment data. Semantic filters are a more advanced form of language modeling that deciphers the context of the language used – the meaning, not just what terms are present – and matches semantically similar content to categorized data.
For example, semantic categorization allows automated detection and segmentation of content about Jaguar the car, and omits jaguar the animal and the Jaguars football team.
Because categorization is such a vital component of Social CRM and text analytics, Collective Intellect has developed CI:Insight, which uses our proprietary semantic engine. CI:Insight applies semantic categorization to millions of blog posts, Facebook posts, Tweets and news and message boards, creating a robust and content-rich dataset. Once you have this dataset, you are ready to do some deep and extensive analysis.
So, how do we know our semantic analytics engine works? We test it. One of those tests involved using a set of documents called “Reuters-21578, Distribution 1.0“, the most widely used text categorization test collection. This set of documents had been categorized by a group of researchers. We wanted to compare CI:Insight’s ability to categorize documents compared to the original categorization created by the researchers. You can read more about the background of the collection and testing protocol in the “Reuters-21578, Distribution 1.0″ test collection” document. Using CI:Insight semantic categorization, 92% of the training data categorized within the top 2 rankings, with the vast majority aligning perfectly with the researcher’s judgment.
Categorization is the foundation of Social CRM, and plays an imperative role in automated text analytics. You can’t begin to surface themes and then voice of customer, sentiment and actionable insights if the data from which you are working is poor.
