Sample sentence 1

Sample Sentence 2

Since these are really long, let’s instead gain some insight by looking at the Wordclouds formed by combining the positive and negative reviews:

Generate word cloud

A wordcloud is a pretty common visualization in textual data, where word sizes are proportional to their occurences in the data. So, it is really handy to visualize, occurences of words keeping in mind their sentiments.

Extremely Negative
Extremely Positive

we can clearly infer from the Wordclouds, the sentiment expressed, starting from high counts of bad in the extremely negative cloud to best in the extremely positive cloud.

We will be using the following metrics to benchmark our performance:

  1. Accuracy

  2. Confusion Matrix

Before we start on anything in NLP, data cleansing is very important. See this post for data cleansing.


Simple Dictionary Lookup

A classical technique for sentiment analysis, dictionary based lookups have recieved tons of criticism for being inexhaustive, ignoring semantic meaning and many others. Yet, they were amongst the first and simplest techniques to be applied.

The steps are simple:

  1. Have a dictionary with a key-value pair as word:score, where score should be positive for positive words and negative for negative words.
  2. Start iterating through a given review word by word with a score counter of 0.
  3. If the word being considered is present in the dictionary, add its score to the score counter.
  4. The final value of the score counter and the end of the review determines the label to be assigned.
Extremely Positive

For this model we have used the AFINN dictionary

Finally, if your looking for a list of the best additional dictionaries to experiment with you can check this link.

Bag of Words Approach(BOW)