Analyze the Amazon Brand Analytics search term report with Python and Jupyter Notebooks

10/01/2021 • Reading time: ca. 8 min • by Trutz Fries

For brands, Amazon provides the search term report in Brand Analytics to analyze the search and click behavior of Amazon's customers. This helps to operationalize your Amazon SEO activities.

In this post I want to show you how how to analyze this data using a standard tool like Python, Pandas and Jupyter Notebooks. With some lines of code, we demonstrate how to quickly find the most used search terms, seasonal search terms, a brand's performance, and much more.

To follow along you need:

To make some interesting analyses it makes sense to not only load the current report but e.g. the 52 weeks of reports. You can download those as well in Brand Analytics. Takes some time, but it is doable.

Once you have it, we need to further make some preparations to load and work with it.

Ready? Let's go!


First things first. We need to load all necessary libraries:

# Setup
import pandas as pd
import numpy as np
import time

import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
import seaborn as sns
import datetime
from datetime import timedelta
import glob

%config InlineBackend.figure_format='retina'
plt.rcParams["figure.dpi"] = 200

Let's also define some helper methods. These methods are not necessary if you would like to only analyze the data but we use them here to better show the results in this blog post.

# Get link to an ASIN
def get_ASIN_Link(ASIN, domain="US"):
    if domain == "DE":
      domain = "de"
    elif domain == "US":
      domain = "com"
    link = '<a href="https://www.amazon.' + domain + '/dp/' + ASIN + '" target="_blank">' + ASIN + '</a>'
    return link

# Define a function for colouring (red for negative, changes number format)
def highlight_max(s):
    if s.dtype == np.object:
        is_neg = [False for _ in range(s.shape[0])]
        is_neg = s < 0
    return ['color: red;' if cell else 'color:black' 
        for cell in is_neg]

Define some variables

First, let's create some variables we can use later on and make changes to our script easier:

# Common
currentWeek  = '33'
previousWeek = '32'
market       = 'US'
yearString   = '2021'

yearWeekCurrent = yearString + '-' + currentWeek
yearWeekPrevious = yearString + '-' + previousWeek

# Variables
pathToProducts = "/your/path/to/products.csv"
columnsInProducts = ['ASIN', 'marketplaceTitle', 'category', 'brand', 'productImageUrl', 'productTitle'] # US, adjust for other markets

pathToReports = "/your/path/to/brandanalyticsdata/" 
columnsInReports = ["Search Term","Search Frequency Rank","#1 Clicked ASIN","#2 Clicked ASIN","#3 Clicked ASIN"] # US, adjust for other markets
columnsInReportsSearchTerm = ["Search Term","Search Frequency Rank"] # US, adjust for other markets

thousandSeparator = "," # US, adjust for other markets

# For two weeks based analysis only:
currentWeekPath = pathToReports + "Amazon-Searchterms_US_2021_" + currentWeek + ".csv"
previousWeekPath = pathToReports + "Amazon-Searchterms_US_2021_" + previousWeek + ".csv"

Our script expects that your files should follow the following naming format:


This allows us later to extract the marketplace, the year, and the calendar week from the filename. This data will be added later to our data frame to allow for time series analysis.

Here is a quick tip: As Amazon's search term reports are quite large (1 Mio. lines per week) you can shorten those via this quick shell script you can run e.g. in a bash shell:

for file in ./csv/Amazon*.csv; do
    head -n 500000 "$file" > "short/$file"

This will reduce each file to the first 500,000 lines and save a short version to the short folder. If you analyze e.g. 1 year of data this will reduce the speed of your analyses significantly.

Importing Brand Analytics search term data

First, we need to load all search term reports into memory. Again, the script expects that these files sit together in a single directory with some strict naming conditions:

# Initialze empty dataframe we'll append data to
dfFinal = pd.DataFrame()

# Get all files
all_files = sorted(glob.glob(pathToReports  + "/*.csv"))
print ('\n'.join(all_files))

# Import files to df
i = 0
for file in all_files:
  i = i + 1
  print(str(i) + ". " + file)

  # Read from CSV file
  dfTemp = pd.read_csv(file, thousands=thousandSeparator, usecols=columnsInReports, engine="python", error_bad_lines=True, encoding='utf-8', skiprows=1,  sep=",")

  # Add week from filename to dataframe as new column
  week = file[-6:][:2] # e.g. 06
  year = file[-11:][:4] # e.g. 2020
  yearWeek = year + '-' + week # e.g. 2020-06
  marketplaceTitle = file[-14:][:2]
  dfTemp['week'] = week
  dfTemp['year'] = year
  dfTemp['yearWeek'] = yearWeek
  dfTemp['marketplaceTitle'] = marketplaceTitle

  # Concat with previous results
  dfFinal = pd.concat([dfTemp, dfFinal])

# We rename ASIN1 into `1` as this becomes handy when we unmelt the report
dfFinal.columns = ['searchterm', 'rank', '1', '2', '3', 'week', 'year', 'yearWeek', 'marketplaceTitle']

# Change data type to int
dfFinal = dfFinal.astype({"week": int, "year": int, "year": int})
Dataframe with search term data
Dataframe with search term data

Some of the analyses shown below only require loading the current and the previous week of data. If you want to do that, you can just load the 2 respective files, e.g. like this:

dfCurrent = pd.read_csv(currentWeekPath, thousands=thousandSeparator, usecols=columnsInReportsSearchTerm, engine="python", error_bad_lines=True, encoding='utf-8', skiprows=1,  sep=",")

dfPrevious= pd.read_csv(previousWeekPath, thousands=thousandSeparator, usecols=columnsInReportsSearchTerm, engine="python", error_bad_lines=True, encoding='utf-8', skiprows=1,  sep=",")

The thousandSeparator parameter depends on your locale. It's either the dot or a comma. For US reports it is the comma.

We will need to unmelt all this data. More on that soon.

Before we also load some products data (e.g. the brand) for each ASIN found in all search term report files which will be handy to enrich the data later:

dfProducts = pd.read_csv(pathToProducts, engine="python", error_bad_lines=True, encoding='utf-8', sep=",")

If you don't have this data, you can extract this with some work from the product title of the search term reports. However, this is not covered here.

Re-formatting / shaping the data

Next, we need to bring our data into the right format. Basically, we need to "unpivot" the data we have which will make the following analyses possible.

Pandas provides a very handy function for that called melt (see documentation (opens new window)), which makes this very easy:

# Unmelt dataframe

# Create a copy
dfWideMultiReports = dfFinal

# Unmelt from wide to long
dfWideMultiReportsLong = dfWideMultiReports.melt(id_vars=["searchterm", "rank", "week", "year", "yearWeek", "marketplaceTitle"], var_name="position", value_name="ASIN")

# Add column with a 1 so we can sum by this colum
dfWideMultiReportsLong['count'] = 1

# Make position an int
dfWideMultiReportsLong = dfWideMultiReportsLong.astype({"position": int})
Unmelted dataframe
Unmelted data

Now we have one row for each search term, ASIN, week combination.

Next, we enrich our search term report data with the product data to e.g. have a column for the ASIN's brand:

# Left join products data to searchterm data
dfWideMultiReportsLongExtended = pd.merge(left=dfWideMultiReportsLong, right=dfProducts, how='left', left_on=['marketplaceTitle','ASIN'], right_on = ['marketplaceTitle','ASIN'])

# Drop rows where searchterm or brand is NaN
dfWideMultiReportsLongExtended = dfWideMultiReportsLongExtended.dropna(subset=['searchterm', 'ASIN', 'rank', 'week', 'year', 'yearWeek', 'marketplaceTitle', 'position', 'brand', 'productTitle']) # Only dop rows with N/A in specific columns

We also drop rows where we are missing some data using the dropna method.

The final dataframe we are going to work with looks like this now:

Extended dataframe
Extended data

Analysis of the data

Now as we have everything in place we can start doing some simple analysis. We will start by analyzing changes between the current and the previous week. As shown above we have this data in our dfCurrent and dfPrevious dataframes.

We will look at:

  • Search terms
  • Brands
  • Products

Search term analysis

Prepare the data

We need to merge our 2 dataframes containing the current and the previous week of data for some analyses.

# Change column names
dfCurrent.columns = ['searchterm', 'currentRank']
dfPrevious.columns = ['searchterm', 'previousRank']

# Make currentRank and previousRank an integer
dfCurrent = dfCurrent.astype({"currentRank": int})
dfPrevious = dfPrevious.astype({"previousRank": int})

# Merge current and previous dataframes
dfMerged = pd.merge(left=dfCurrent, right=dfPrevious, how='left', left_on=['searchterm'], right_on = ['searchterm'])
dfPrevious = dfPrevious.astype({"previousRank": int})

# Calculate the changes between current and previous

# Positive value of Change (to see biggest movers in absolute terms)

With that out of the way, let's do some more fun stuff.

Get the top search terms of the current week

Let's start slow with some simple tables. Let's get the top search terms from the current week and show the difference to the previous week. We also add some colors based on the changes using TailwindCSS (opens new window) classes. However, this is not necessary for the sole analysis of the data. It just makes the following tables prettier.

# Get current Top 10 with previous
dfTop10Searchterms = dfMerged
dfTop10Searchterms = dfTop10Searchterms.drop('DeltaAbsPos', 1)

dfTop10Searchterms.columns = ['Searchterm', 'Rank cur.', 'Rank prev.', 'Change']
dfTop10Searchterms['Change'] = dfTop10Searchterms.apply(lambda row: '<span class="text-green-600">↑ ' + str(int(np.nan_to_num(row.Change))) + '</span>' if row.Change < 0 else '<span class="text-red-600">↓ ' + str(int(np.nan_to_num(row.Change))) + '</span>', axis=1)
searchTermsTop = dfTop10Searchterms.head(10).to_markdown(index=False)

The dfTop10Searchterms dataframe looks like this then:

Searchterm Rank cur. Rank previous Change
disposable face masks 1 1 ↓ 0
face mask 2 2 ↓ 0
pop it 3 3 ↓ 0
desk 4 5 ↑ -1
iphone 12 pro max case 5 7 ↑ -2
earbuds 6 6 ↓ 0
n95 mask 7 4 ↓ 3
apple watch band 8 11 ↑ -3
black disposable face masks 9 24 ↑ -15
laptop 10 15 ↑ -5

The search term black disposable face masks has quite improved compared to the previous week. Remember, the lower the search frequency rank, the more often this term is searched on Amazon.

Get newcomer search terms

Now we can try to find the top search terms (based on search frequency rank) where there was no search frequency rank in the previous week:

# Get new entries where there was no rank in previous week
dfTopNewcomer = dfMerged[dfMerged['previousRank'].isnull()]
dfTopNewcomer = dfTopNewcomer.drop(['previousRank','DeltaAbs', 'DeltaAbsPos'], 1)
dfTopNewcomer = dfTopNewcomer.sort_values(by=['currentRank'], ascending=True).head(10)

dfTopNewcomer.columns = ['New searchterms', 'Rank cur.']

This will e.g. look like this:

New searchterms Rank cur.
pixel 5a 5g case 2027
infinity game table 2459
cicy bell womens casual blazer 5234
donsfield lint remover 6945
go oats oatmeal in a ball 9887
ivernectin 10034
go oats 10367
ruthless empire rina kent 10711
cicy bell blazer 11057
pixel 5a 5g screen protector 11222

As you can see these are very often long-tail keywords.

Find search terms with the highest rank change

Now let's look at the "mover's and shakers" of high-traffic search terms. With "high-traffic" we mean that we reduce the result to search terms with a search frequency rank below 1,000. Feel free to play with this parameter to get different results.

# Only include searchterms which are now < 1000
filterWinners = (dfMerged['currentRank'] < 1000)

# Winners
dfWinners = dfMerged[filterWinners].sort_values(by=['DeltaAbs'], ascending=True).head(10)

# Drop DeltaAbsPos column
dfWinners = dfWinners.drop('DeltaAbsPos', 1)

# Rename columns
dfWinners.columns = ['Searchterm', 'Rank CW ' + currentWeek, 'Rank CW ' + previousWeek, 'Delta']

For the winners this will look like this:

Searchterm Rank CW 33 Rank CW 32 Delta
black american flags 847 14,708 -13,861
paw patrol movie 523 4,817 -4,294
sexy dresses for women 762 2,924 -2,162
headphones for kids for school 869 2,460 -1,591
room decor aesthetic 825 2,373 -1,548
hunger games 996 2,412 -1,416
eyebrow pencil 480 1,859 -1,379
throw pillow covers 355 1,641 -1,286
bentgo kids lunch box 705 1,902 -1,197
respect 927 2,076 -1,149

Top movers

Let's find out for which high-traffic search terms (again with a search frequency rank below 1,000) the search frequency rank has changed most compared to the previous week. We include both positive and negative changes as we sort by the absolute changes here.

# Top Movers within a certain rank threshold
filterMovers = (dfMerged['previousRank'] < 1000) & (dfMerged['currentRank'] < 1000)
dfMerged = dfMerged.astype({"currentRank": float}) # Needs to be there otherwise currentRank column is string?!
dfMovers = dfMerged[filterMovers].sort_values(by=['DeltaAbsPos'], ascending=False).head(10)
dfMovers['DeltaAbs'] = dfMovers.apply(lambda row: '<span class="text-green-600">↑ ' + str(int(np.nan_to_num(row.DeltaAbs))) + '</span>' if row.DeltaAbs < 0 else '<span class="text-red-600">↓ ' + str(int(np.nan_to_num(row.DeltaAbs))) + '</span>', axis=1)

# Drop DeltaAbs column
dfMovers = dfMovers.drop('DeltaAbsPos', 1)

# Rename columns
dfMovers.columns = ['Searchterm', 'Rank CW ' + currentWeek, 'Rank CW ' + previousWeek, 'Delta']

The result looks like this:

Searchterm Rank CW 33 Rank CW 32 Delta
field of dreams 998 196 ↓ 802
friday the 13th 902 210 ↓ 692
pop it fidget toys 241 929 ↑ -688
n95 masks for virus protection 911 395 ↓ 516
wireless earbuds bluetooth 302 785 ↑ -483
college essentials 875 454 ↓ 421
fall clothes for women 297 678 ↑ -381
air conditioner 583 206 ↓ 377
usb c cable 618 991 ↑ -373
iphone charging cords 949 594 ↓ 355

Of course, you can change the thresholds as you like.

Brand analyses

If we have the brand attached to each entry, we can also analyze by brand. Here we use our big data frame instead of the 2 separate weeks.

Let's find out which brands show up most often:

filtCurrentWeek = (dfWideMultiReportsLongExtended['yearWeek'] == yearWeekCurrent)
filtPreviousWeek = (dfWideMultiReportsLongExtended['yearWeek'] == yearWeekPrevious)

dfBrandCurrentWeek = dfWideMultiReportsLongExtended[filtCurrentWeek]
dfBrandPreviousWeek = dfWideMultiReportsLongExtended[filtPreviousWeek]

dfTopBrandsCurrentWeek = dfBrandCurrentWeek['brand'].value_counts().sort_values(ascending=False).to_frame('countCurrent')
dfTopBrandsPreviousWeek = dfBrandPreviousWeek['brand'].value_counts().sort_values(ascending=False).to_frame('countPrevious')

brandsTop = dfTopBrandsCurrentWeek.head(10).to_markdown().replace('countCurrent', 'Count')

The brandsTop dataframe will look like this:

Amazon Basics 9670
Apple 5620
Nike 5469
365 by Whole Foods Market 3577
adidas 3546
Amazon 3079
HP 3002
Generic 2809
LEGO 2737

We just count by a brand's appearance in the search term report. You could include the search frequency rank to give search terms with a low rank a higher weight. This was not done here.

Let's now find out which brands have improved compared to the previous week. Again we do a plain count here:

dfTopBrandsMerged = dfTopBrandsCurrentWeek.merge(dfTopBrandsPreviousWeek, how='left', left_index=True, right_index=True)
dfTopBrandsMerged['DeltaAbs'] = dfTopBrandsMerged['countCurrent'] - dfTopBrandsMerged['countPrevious']
dfTopBrandsMerged['DeltaAbsPos'] = dfTopBrandsMerged['DeltaAbs'].abs()

# Winners and Losers
dfBrandLoosers = dfTopBrandsMerged.sort_values(by=['DeltaAbs'], ascending=True).head(10)

dfBrandWinners = dfTopBrandsMerged.sort_values(by=['DeltaAbs'], ascending=False).head(10)
dfBrandWinners.columns = ['# CW ' + currentWeek, '# CW ' + previousWeek, 'Delta', 'Delta Abs']
dfBrandWinners = dfBrandWinners.drop('Delta Abs', 1)
brandsWinner = dfBrandWinners.to_markdown()

The brandsWinner dataframe will look like this:

# CW 33 # CW 32 Delta
Amazon 3079 2833 246
SAMSUNG 3171 2939 232
Bentgo 814 638 176
HP 3002 2863 139
Disguise 820 690 130
Texas Instruments 718 589 129
DXLOVER 249 131 118
Generic 2809 2696 113
Logitech 1208 1095 113
Spooktacular Creations 293 182 111

Product analyses

What we did with search terms and brands can also be done with products. Let's look for the products which show up most or more often compared to the previous week. We will use our full data set again here.

First, we'll filter the big dataset, then we'll merge the current and the previous week, and then we'll count the number of occurrences.

filtCurrentWeek = (dfWideMultiReportsLongExtended['yearWeek'] == yearWeekCurrent)
filtPreviousWeek = (dfWideMultiReportsLongExtended['yearWeek'] == yearWeekPrevious)

dfProductCurrentWeek = dfWideMultiReportsLongExtended[filtCurrentWeek]
dfProductPreviousWeek = dfWideMultiReportsLongExtended[filtPreviousWeek]

dfTopProductsCurrentWeek = dfProductCurrentWeek['ASIN'].value_counts().sort_values(ascending=False).to_frame('countCurrent')
dfTopProductsPreviousWeek = dfProductPreviousWeek['ASIN'].value_counts().sort_values(ascending=False).to_frame('countPrevious')

# ASIN is index, create column from it, add marketplaceTitle for making the merge possible
dfTopProductsCurrentWeek['ASIN'] = dfTopProductsCurrentWeek.index
dfTopProductsCurrentWeek['marketplaceTitle'] = market

dfTopProductsPreviousWeek['ASIN'] = dfTopProductsPreviousWeek.index
dfTopProductsPreviousWeek['marketplaceTitle'] = market

# Add product data, e.g. product title
dfTopProductsCurrentWeek = pd.merge(left=dfTopProductsCurrentWeek, right=dfProducts, how='left', left_on = ['marketplaceTitle', 'ASIN'], right_on = ['marketplaceTitle', 'ASIN'])

# Merge current with previous df
dfTopProductsMerged = dfTopProductsCurrentWeek.merge(dfTopProductsPreviousWeek, how='left', left_on = ['marketplaceTitle', 'ASIN'], right_on = ['marketplaceTitle', 'ASIN'])

# Calculate the changes
dfTopProductsMerged['DeltaAbs'] = dfTopProductsMerged['countCurrent'] - dfTopProductsMerged['countPrevious']
dfTopProductsMerged['DeltaAbsPos'] = dfTopProductsMerged['DeltaAbs'].abs()

# Shorten the product title and add ASIN
dfTopProductsMerged['productTitleShort'] = dfTopProductsMerged['productTitle'].astype(str).replace('|', '-').str[0:30] + ' (' + get_ASIN_Link(dfTopProductsMerged['ASIN']) + ')'

# Get rid of some columns and prepare output
dfTopProductsForOutput = dfTopProductsMerged[['productTitleShort', 'countCurrent']]
dfTopProductsForOutput.columns = ['Product title', 'Count']

productsTop = dfTopProductsForOutput.head(10).to_markdown(index = False)

# Winners
dfProductsWinners = dfTopProductsMerged.sort_values(by=['DeltaAbs'], ascending = False)
dfProductsWinners = dfProductsWinners[['productTitleShort', 'DeltaAbs' ]]
dfProductsWinners.columns = ['Product title', 'Delta']
productsWinner = dfProductsWinners.head(10).to_markdown(index=False)

# Loosers
dfProductsLoosers = dfTopProductsMerged.sort_values(by=['DeltaAbs'], ascending = True)
dfProductsLoosers = dfProductsLoosers[['productTitleShort', 'DeltaAbs' ]]
dfProductsLoosers.columns = ['Product title', 'Delta']
productsLoser = dfProductsLoosers.head(10).to_markdown(index=False)

The first rows of productsTop look like this then:

Product title Count
EnerPlex 3-Ply Reusable Face M (B088C2WD5F) 287
Matein Travel Laptop Backpack, (B06XZTZ7GB) 226
Blink Mini – Compact indoor pl (B07X6C9RMF) 219
Gildan Men's Crew T-Shirts, Mu (B091FNSFN1) 210
Crocs Men's and Women's Classi (B09BBT7T3P) 207
AIHOU 50PC Kids Butterfly Disp (B08LL8H78D) 195
Apple AirPods Pro (B07ZPC9QD4) 195
Beckham Hotel Collection Bed P (B01LYNW421) 179
Apple AirPods with Charging Ca (B07PXGQC1Q) 177
TCP Global Salon World Safety (B08FXLVHTY) 176

Again we only count the appearance. We have not weighted this by the search frequency rank of the respective keywords.

Respectively productsWinner looks like this:

Product title Delta
KATCHY Indoor Insect and Flyin (B07B6RZP4H) 160
Texas Instruments TI-84 Plus C (B00TFYYWQA) 64
5 Layer Protection Breathable (B08F8Y3X59) 59
Columbia Baby Girls' Benton Sp (B077FNDN8N) 55
Fire 7 Kids Edition Tablet, 7" (B07H936BZT) 54
BISSELL Crosswave All in One W (B01DTYAZO4) 51
$20 PlayStation Store Gift Car (B004RMK4BC) 50
Germ Guardian True HEPA Filter (B004VGIGVY) 50
Texas Instruments TI-84 PLUS C (B00XOLOOPY) 47
SPACE JAM Heroes of Goo JIT Zu (B08S3GWM5W) 44

Let's now find out which are the hottest newcomer products and which products showed high changes. The code looks for this looks like this:

# Best newcoming products
filtBothWeeks = (dfWideMultiReportsLongExtended['yearWeek'] == yearWeekCurrent) | (dfWideMultiReportsLongExtended['yearWeek'] == yearWeekPrevious)
dfProductCurrentWeek = dfWideMultiReportsLongExtended[filtBothWeeks]

df_pivot_current = pd.pivot_table(dfProductCurrentWeek,index=["ASIN"], columns=["yearWeek"], values=["searchterm", "rank"],aggfunc={"searchterm":len,"rank":min})
df_pivot_current['marketplaceTitle'] = market
df_pivot_current.columns= ['minRankPrevious', 'minRankCurrent', 'countPrevious', 'countCurrent', 'marketplaceTitle']
df_pivot_current.reset_index(inplace = True)

df_pivot_current.sort_values(by=['countCurrent'], ascending = False)

df_pivot_current['countDelta'] = df_pivot_current['countCurrent'] - df_pivot_current['countPrevious']
df_pivot_current.sort_values(by=['countDelta'], ascending = False)

# Add product data, e.g. product title
df_pivot_current_ext = pd.merge(left=df_pivot_current, right=dfProducts, how='left', left_on = ['marketplaceTitle', 'ASIN'], right_on = ['marketplaceTitle', 'ASIN'])

# Shorten the product title and add ASIN
df_pivot_current_ext['productTitleShort'] = df_pivot_current_ext['productTitle'].astype(str).replace('|', '-').str[0:30] + ' (' + get_ASIN_Link(df_pivot_current_ext['ASIN'], market) + ')'

# Products with most change between current an prev. week
df_pivot_current_ext_short = df_pivot_current_ext[['productTitleShort', 'countCurrent', 'minRankCurrent', 'countDelta']]
df_pivot_current_ext_short = df_pivot_current_ext_short.sort_values(by=['countDelta'], ascending = False)
df_pivot_current_ext_short.columns = ['Product title', '# searchterms', 'Best rank', 'Delta']

productsNewcomer = dfTopProductNewcomer.head(10).to_markdown(index=False)

Let's look at the movers and shakers first:

Product title # searchterms Best rank Delta
KATCHY Indoor Insect and Flyin (B07B6RZP4H) 174 775 160
Texas Instruments TI-84 Plus C (B00TFYYWQA) 82 1914 64
5 Layer Protection Breathable (B08F8Y3X59) 92 1276 59
Columbia Baby Girls' Benton Sp (B077FNDN8N) 58 2817 55
Fire 7 Kids Edition Tablet, 7" (B07H936BZT) 66 1387 54
BISSELL Crosswave All in One W (B01DTYAZO4) 91 5223 51
Germ Guardian True HEPA Filter (B004VGIGVY) 98 149 50
$20 PlayStation Store Gift Car (B004RMK4BC) 66 6829 50
Texas Instruments TI-84 PLUS C (B00XOLOOPY) 65 1914 47
SPACE JAM Heroes of Goo JIT Zu (B08S3GWM5W) 52 1277 44

Let's figure out which are the best newcomer products:

# Best Newcomer Products (without rankings in previous week)
dfTopProductNewcomer = df_pivot_current_ext[df_pivot_current_ext['countPrevious'].isnull()]
dfTopProductNewcomer = dfTopProductNewcomer.sort_values(by=['countCurrent'], ascending = False)
dfTopProductNewcomer['productTitleShort'] = dfTopProductNewcomer['productTitle'].astype(str).replace('|', '-').str[0:30] + ' (' + get_ASIN_Link(dfTopProductNewcomer['ASIN'], market) + ')'

dfTopProductNewcomer = dfTopProductNewcomer[['productTitleShort', 'countCurrent', 'minRankCurrent']]
dfTopProductNewcomer = dfTopProductNewcomer.sort_values(by=['countCurrent'], ascending = False)
dfTopProductNewcomer.columns = ['Product title', '# searchterms', 'Best rank']

productsNewcomer looks like this:

Product title # searchterms Best rank
adidas Unisex-Child Questar Fl (B08TK7P6CN) 160 1966
Kids KN95 Face Mask - 25 Pack (B08Q8J5MNQ) 142 844
Crocs Unisex-Child Kids' Class (B08LDBPB3J) 50 2595
TeeTurtle - The Original Rever (B088X4XFNQ) 47 10620
adidas Women's Grand Court Sne (B09746H9SC) 45 588
Kids Face Mask, Disposable Kid (B08R8CHZN3) 40 17142
Potaroma Cat Wall Toys, 7.7 by (B08T9CL5T9) 31 16802
MIAODAM Face Mask Lanyards, Ma (B09CDDP2Z3) 28 279
KN95 Particulate Respirator - (B0876J4F4Y) 27 235
Earbuds Earphones Wired Stereo (B098L2DSS3) 26 3532

Additional Amazon Keyword and Brand Analyses

We can do so much more with this data. Let me show you some examples:

Single keyword analysis by week

Once we have loaded the data for e.g. the last 52 weeks we can plot the development of a single search term over time:

searchTerm = "pool"

# Filter data
dfSingleKeyword = dfFinal.loc[dfFinal['searchterm'] == searchTerm].sort_values('week')

# Plot the data
kwPlot = dfSingleKeyword.plot(x ='week', y='rank', figsize=(10,6), title="Keyword: " + searchTerm)

This will give us a better understanding of how the demand for certain products behaves over time.

Here is a graph for the term "pool" shown above.

Graph for the search term pool
Graph for the search term "pool" in 2021

It's easy to see when the pool season started and ended in 2021.

Show generic search terms a specific brand ranks for

Here we search for all search terms the brand "OPPO" ranks for but does not include the brand's name. Sorted by rank ascending.

Here we assume that the data for a single brand analytics report sits in the dataframe called df:

# Show searchterms for a given brand and sort by rank ASC
myFilter = (df['brand'] == 'OPPO') & (df["Searchterm"].str.contains('oppo')==False)

This should give us generic search terms.

This of course can also be done for multiple brands across multiple reports:

brands = ["Kasa Smart", "ARENTI", "GW Security", "Blink Home Security"]
brandsLowerCase = [x.lower() for x in brands]

# Get all rows from selected brands
dfBrandTerms = dfWideMultiReportsLongExtended[dfWideMultiReportsLongExtended.brand.isin(brands)]

# Group result
dfBrandTermsGrouped = dfBrandTerms.groupby('searchterm')['rank'].agg(['mean']).sort_values(by=['mean'])

# Reset index
dfBrandTermsGrouped.reset_index(level=0, inplace=True)
dfBrandTermsGrouped = dfBrandTermsGrouped.astype({"mean": int})

The dfBrandTermsGrouped dataframe now looks like this:

Keywords specific brands rank for
Top keywords for selected security camera brands

If we only want to find generic keywords, let's remove the search terms which include a brand's name:

brandKeywordsWithoutBrand = dfBrandTermsGrouped[~dfBrandTermsGrouped.searchterm.str.contains('|'.join(brandsLowerCase))]

Want to export both all and non-brand keywords to Excel? Pandas has you covered:

with pd.ExcelWriter('camera-export-brand-keywords.xlsx') as writer:  
    brandKeywordsWithoutBrand.to_excel(writer, sheet_name='Generic Keywords', float_format="%.2f")
    dfBrandTermsGrouped.to_excel(writer, sheet_name='All Keywords', float_format="%.2f")

Classify search terms

We could even go one step further and try to classify this for every search term in the report using this function.

To do this, we first need to put all brands into a list:

# Get unique list of brands
allBrands = dfWideMultiReportsLongExtended.brand.unique()

# Lower each name
allBrands = [ x.lower() for x in allBrands if len(str(x)) > 3 ]

Now we can analyze each row in our dataframe and check if the search term contains the brand's brand name, the brand name of a different brand, or is a generic searchterm.

def searchTermType (row):
    brand = str(row['brand']).lower()
    searchterm = str(row['searchterm']).lower()
    if (not pd.isnull(brand)) & (brand in searchterm):
        return 'brand'
    if not pd.isnull(brand):
        for brand2 in allBrands:
            if not pd.isnull(brand2):
                if brand2.lower() in searchterm:
                    return 'other-brand'
    return 'generic'

This function checks, if the search term contains any brand's name.

To apply this function to a small sample of dataframe we need to run the following command:

dfPlay = dfWideMultiReportsLongExtended.head(10)
dfPlay['searchTermType'] = dfPlay.apply(searchTermType, axis=1)

Show a brand's performance over time

dfAll = dfWideMultiReportsLongExtended
brandsInScope = ["Nike", "adidas", "Under Armour"]

# Create pivot table
dfPivot = pd.pivot_table(dfAll[dfAll['brand'].isin(brandsInScope)],index=["yearWeek"], columns=["brand"], values=["ASIN"], aggfunc='count', fill_value=0)

# Plot data
plt = dfPivot.plot(fontsize=4, figsize=(4, 2))
plt.legend(loc=7, prop={'size': 3}, bbox_to_anchor=(0.23,0.90))
plt.set_ylabel('Count', fontsize=5)
plt.set_xlabel('Week', fontsize=5)

The chart looks like this:

Development of specific brands over time
Development of specific brands over time


And there you have it. Although we covered quite some ground there is so much more you can do with this data.

I hope we could inspire you with some new ideas to try out.

You can find the full source code of the code above on Github (opens new window). This of course excludes the search term data as this is only availabe to registered brand owners.

Related Articles
How to Write an Amazon Product Description in 2022 - Guidelines


How to Write an Amazon Product Description in 2022 - Guidelines

The Amazon product description allows a detailed description of the offered product as free flowing text and is a substitute for the sales talk in the store. What you need to consider when creating an Amazon product description, we explain in this article.

Trutz Fries

Amazon Categories and Product Type - How It's All Related


Amazon Categories and Product Type - How It's All Related

What is the Amazon product type? And how is it related to categories? We explain all the connections.

Trutz Fries

Amazon Bullet Points: Guidelines, Formatting, and Optimization in 2022


Amazon Bullet Points: Guidelines, Formatting, and Optimization in 2022

Bullet points are an essential part of Amazon product listing. Through them, you can convince potential buyers of your product. They briefly and concisely describe the most important features of the product. You can learn how to optimize bullet points here.

Trutz Fries

Would you like to have a better overview on Amazon?
Monitor your listings 14 days for free!
Do you have any questions? Don't hesitate to call us or send us an email!
Tel. +49 221-29 19 12 32 | info@amalytix.com