We have developed and use a variety of algorithms to calculate the sentiment expressed in a document regarding a topic. All these algorithms have been tuned over a period of years and have various carefully set parameters. The algorithms have been played off against one another to find a combination of basic algorithms that yield the best result.
When we started our work, we focused on parts of speech that we called “opinion expressing words”. This class was broad enough not just to include adjectives, but also nouns (e.g. “The scoundrel!”) and certain adverbial combinations. We later extended our algorithms to consider the occurrences of verbs (“He emphasized that ….”), adverb-verb phrases (e.g. “He strongly reiterated that…..”), and showed that there is a strong correlation between how such terms are used in a document and how readers’ intensity of sentiment on the topic is formed.
We then developed algorithms that would first identify the sentences in a document that were relevant to a given topic. Each such sentence would be subjected to a careful linguistic analysis and based on the precise manner in which some of the phrases mentioned in the preceding paragraph appeared in that sentence, we would assign a score to the intensity expressed in that sentence on the topic. Scores across multiple sentences would be seamlessly aggregated.
Later, we extended these algorithms to take statements of fact and descriptions of events into account and tried to account for their role in shaping the perceptions of readers.
SentiMetrix is the first company to develop such a wide ranging body of technology spanning all aspects of opinion mining – linguistic and non-linguistic – and using a carefully devised statistical model to provide a proven solution to the problem.