Cover Image
Trutz Fries

Analyze Amazon reviews with BERTopic and Python

10/10/2021 • Reading time: ca. 5 min • by Trutz Fries

In this article, we want to show you how to use a few lines of Python code to extract relevant aspects from Amazon reviews using Google's pre-trained neural network BERT. The entire Jupyter Notebook is linked at the end of the article.

  1. What is BERT?
  2. What is BERTopic?
  3. Analysis of Amazon Reviews
  4. Conclusion

What is BERT?

BERT stands for "Bidirectional Encoder Representations from Transformers," a model pre-trained by Google and applied in Natural Language Processing (NLP). It was pre-trained using a multilayered neural network on a very large text corpus. Despite the use of many Google servers, this training took several days.

Simply put, a model was developed that understands the relationships between words and their meaning better than other comparable NLP models. BERT can solve various NLP tasks such as text classification, answering simple questions, or "Named Entity Recognition" (NER), which aims to extract and classify subjects and objects from a text, e.g., people, companies, dates, and much more.

What is BERTopic?

BERTopic is a Python module developed by Maarten Grootendorst, specialized in extracting topics from texts. It uses a method based on BERT and a modified TF-IDF analysis. If you want to learn more about BERTopic, check out this or this article. You don't need to understand the module in detail to follow the example below.

Analysis of Amazon Reviews

For brands, reviews are valuable feedback they receive for free. The only problem is the complexity arising from the total number and the number of reviews per product.

This is where a machine learning approach can be practical, extracting and grouping relevant topics from all reviews.

An example will illustrate this. Let's take a popular product on Amazon with many reviews, the bite away.

bite away

We'll use a simple Jupyter Notebook to analyze the data.

First, let's install all necessary libraries:

# Install
!pip install bertopic
!pip install nltk

# Import
import pandas as pd 
import numpy as np
from bertopic import BERTopic

To remove stop words from the reviews later, we'll use the nltk module and download the stop words for the German language:

# Stopwords
import nltk'punkt')'stopwords')
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

We'll need a function to remove stop words from an array of sentences/reviews:

# Remove stopwords from an array of strings
def remove_stopwords(data):
    for sentence in data:
        for word in str(sentence).split():
            if word.lower() not in stopwords:
        output_array.append(' '.join(temp_list))
    return output_array

Next, we'll load the reviews into a DataFrame using the Pandas library. You can obtain the reviews, for example, via a report from our AMALYTIX tool:

df = pd.read_csv("./reviews-B003CMKQTS-de.csv", engine='python')
All bite away reviews
Example reviews of bite away

Now, we'll remove the positive reviews (4-5 stars) from the dataset to understand what customers criticize. We'll then convert the DataFrame into a list and remove the stop words using the remove_stopwords function defined above:

# Only get critical reviews (1-3 stars)
dfNegative = df[df.rating < 4]

# Convert df column "text" to list 
reviewsNegativeText = dfNegative.text.to_list()

# Remove stopwords
reviewsNegativeText = remove_stopwords(reviewsNegativeText)

Now, let's create our model based on BERTopic:

modelNegative = BERTopic(language="german", nr_topics="auto")

Next, we need to "fine-tune" this model:

topics, probabilities = modelNegative.fit_transform(reviewsNegativeText)

By default, the model only contains single words. Using a simple model update, we'll also include bi-grams (combinations of 2 words):

modelNegative.update_topics(reviewsTextFiltered, topics, n_gram_range=(1, 2))

That's it. Now we can take a look at the results. Let's first look at the number of topics:

All bite away reviews
The first ten topics

Here you can see the first ten topics. The topic with -1 is the topic with unassigned content, so we can ignore it.

Let's look at the first topic:

[('broken', 0.011148147661148565),
 ('month', 0.008799115337334707),
 ('months', 0.007398879507378811),
 ('months more', 0.007363461462032497),
 ('is broken', 0.007026721331499257),
 ('already broken', 0.006190625875256987),
 ('late', 0.0061362178850270805),
 ('device alredy', 0.005747283493183007),
 ('return period', 0.005500258254624393),
 ('device 3x', 0.005473215223609577)]

Here you can see that many customers complain about poor durability. It seems the product breaks down quite often.

Let's check out topic 3:

[('disappointed', 0.007670611779864671),
 ('painful', 0.006489256905951043),
 ('hot', 0.006296142117342345),
 ('pain', 0.006095052433498948),
 ('hot children', 0.006035989865595163),
 ('heat', 0.005749678921426524),
 ('pain use', 0.005692691419037523),
 ('hurts', 0.005277188297220635),
 ('sensitive', 0.005030634228804426),
 ('actually sensitive', 0.0050215790931766755)]

It seems the product causes pain for some users, which isn't surprising since it fights mosquito bites using heat.

To quickly display relevant keywords for each topic, this simple loop is useful:

for x in range(0, 10):
    first_tuple_elements = []
    for tuple in modelPositive.get_topic(x):


['stars', 'processing', 'hell of a pain', 'adult', 'hour', 'short time', 'pen inserted', 'inserted hour', 'stars earned', 'two points']

['defective', 'month', 'months', 'months more', 'expired', 'already defective', 'delay', 'device already', 'return period', 'device 3x']

['price', 'cheap', 'effect', 'buyer', 'buy', 'worse', 'poorly constructed', 'new batteries', 'battery cover', 'reviews']

If we compare this with the keywords Amazon extracts, the results are impressive:

  • bite away
  • next day
  • keeps the promise
  • works perfect
  • riemser pharma
  • clench your teeth
  • for kids
  • works perfectly
  • absolute purchase recommendation
  • made in germany


With around 30 lines of code, we were able to extract relevant information from Amazon reviews. The challenge in analyzing multiple products is the duration. Tuning the model via the fit_transform method takes a few minutes each time. Nevertheless, the effort could be worthwhile, especially since manual analysis certainly won't be faster.

You can download the notebook here.

Would you like to have a better overview on Amazon?
Monitor your listings 14 days for free!
Do you have any questions? Don't hesitate to call us or send us an email!
Tel. +49 221-29 19 12 32 |