Cover Image
Trutz Fries

Analyze the Amazon Brand Analytics search term report with Python and Jupyter Notebooks

10/01/2021 • Reading time: ca. 8 min • by Trutz Fries

For brands, Amazon provides the search term report in Brand Analytics to analyze the search and click behavior of Amazon's customers. This helps to operationalize your Amazon SEO activities.

In this post I want to show you how how to analyze this data using a standard tool like Python, Pandas and Jupyter Notebooks. With some lines of code, we demonstrate how to quickly find the most used search terms, seasonal search terms, a brand's performance, and much more.

To follow along you need:

To make some interesting analyses it makes sense to not only load the current report but e.g. the 52 weeks of reports. You can download those as well in Brand Analytics. Takes some time, but it is doable.

Once you have it, we need to further make some preparations to load and work with it.

Ready? Let's go!

Setup

First things first. We need to load all necessary libraries:

# Setup
import pandas as pd
import numpy as np
import time

import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
import seaborn as sns
import datetime
from datetime import timedelta
import glob

%config InlineBackend.figure_format='retina'
plt.rcParams["figure.dpi"] = 200

Let's also define some helper methods. These methods are not necessary if you would like to only analyze the data but we use them here to better show the results in this blog post.

# Get link to an ASIN
def get_ASIN_Link(ASIN, domain="US"):
    if domain == "DE":
      domain = "de"
    elif domain == "US":
      domain = "com"
    link = '<a href="https://www.amazon.' + domain + '/dp/' + ASIN + '" target="_blank">' + ASIN + '</a>'
    return link

# Define a function for colouring (red for negative, changes number format)
def highlight_max(s):
    if s.dtype == np.object:
        is_neg = [False for _ in range(s.shape[0])]
    else:
        is_neg = s < 0
    return ['color: red;' if cell else 'color:black' 
        for cell in is_neg]

Define some variables

First, let's create some variables we can use later on and make changes to our script easier:

# Common
currentWeek  = '33'
previousWeek = '32'
market       = 'US'
yearString   = '2021'

yearWeekCurrent = yearString + '-' + currentWeek
yearWeekPrevious = yearString + '-' + previousWeek

# Variables
pathToProducts = "/your/path/to/products.csv"
columnsInProducts = ['ASIN', 'marketplaceTitle', 'category', 'brand', 'productImageUrl', 'productTitle'] # US, adjust for other markets

pathToReports = "/your/path/to/brandanalyticsdata/" 
columnsInReports = ["Search Term","Search Frequency Rank","#1 Clicked ASIN","#2 Clicked ASIN","#3 Clicked ASIN"] # US, adjust for other markets
columnsInReportsSearchTerm = ["Search Term","Search Frequency Rank"] # US, adjust for other markets

thousandSeparator = "," # US, adjust for other markets

# For two weeks based analysis only:
currentWeekPath = pathToReports + "Amazon-Searchterms_US_2021_" + currentWeek + ".csv"
previousWeekPath = pathToReports + "Amazon-Searchterms_US_2021_" + previousWeek + ".csv"

Our script expects that your files should follow the following naming format:

...
Amazon-Searchterms_US_2021_27.csv
Amazon-Searchterms_US_2021_28.csv
Amazon-Searchterms_US_2021_29.csv
Amazon-Searchterms_US_2021_30.csv
Amazon-Searchterms_US_2021_31.csv
Amazon-Searchterms_US_2021_32.csv
Amazon-Searchterms_US_2021_33.csv
...

This allows us later to extract the marketplace, the year, and the calendar week from the filename. This data will be added later to our data frame to allow for time series analysis.

Here is a quick tip: As Amazon's search term reports are quite large (1 Mio. lines per week) you can shorten those via this quick shell script you can run e.g. in a bash shell:

for file in ./csv/Amazon*.csv; do
    head -n 500000 "$file" > "short/$file"
done

This will reduce each file to the first 500,000 lines and save a short version to the short folder. If you analyze e.g. 1 year of data this will reduce the speed of your analyses significantly.

Importing Brand Analytics search term data

First, we need to load all search term reports into memory. Again, the script expects that these files sit together in a single directory with some strict naming conditions:

# Initialze empty dataframe we'll append data to
dfFinal = pd.DataFrame()

# Get all files
all_files = sorted(glob.glob(pathToReports  + "/*.csv"))
print ('\n'.join(all_files))

# Import files to df
i = 0
for file in all_files:
  i = i + 1
  print(str(i) + ". " + file)

  # Read from CSV file
  dfTemp = pd.read_csv(file, thousands=thousandSeparator, usecols=columnsInReports, engine="python", error_bad_lines=True, encoding='utf-8', skiprows=1,  sep=",")

  # Add week from filename to dataframe as new column
  week = file[-6:][:2] # e.g. 06
  year = file[-11:][:4] # e.g. 2020
  yearWeek = year + '-' + week # e.g. 2020-06
  marketplaceTitle = file[-14:][:2]
  dfTemp['week'] = week
  dfTemp['year'] = year
  dfTemp['yearWeek'] = yearWeek
  dfTemp['marketplaceTitle'] = marketplaceTitle

  # Concat with previous results
  dfFinal = pd.concat([dfTemp, dfFinal])

# We rename ASIN1 into `1` as this becomes handy when we unmelt the report
dfFinal.columns = ['searchterm', 'rank', '1', '2', '3', 'week', 'year', 'yearWeek', 'marketplaceTitle']

# Change data type to int
dfFinal = dfFinal.astype({"week": int, "year": int, "year": int})
Dataframe with search term data
Dataframe with search term data

Some of the analyses shown below only require loading the current and the previous week of data. If you want to do that, you can just load the 2 respective files, e.g. like this:

dfCurrent = pd.read_csv(currentWeekPath, thousands=thousandSeparator, usecols=columnsInReportsSearchTerm, engine="python", error_bad_lines=True, encoding='utf-8', skiprows=1,  sep=",")

dfPrevious= pd.read_csv(previousWeekPath, thousands=thousandSeparator, usecols=columnsInReportsSearchTerm, engine="python", error_bad_lines=True, encoding='utf-8', skiprows=1,  sep=",")

The thousandSeparator parameter depends on your locale. It's either the dot or a comma. For US reports it is the comma.

We will need to unmelt all this data. More on that soon.

Before we also load some products data (e.g. the brand) for each ASIN found in all search term report files which will be handy to enrich the data later:

dfProducts = pd.read_csv(pathToProducts, engine="python", error_bad_lines=True, encoding='utf-8', sep=",")

If you don't have this data, you can extract this with some work from the product title of the search term reports. However, this is not covered here.

Re-formatting / shaping the data

Next, we need to bring our data into the right format. Basically, we need to "unpivot" the data we have which will make the following analyses possible.

Pandas provides a very handy function for that called melt (see documentation), which makes this very easy:

# Unmelt dataframe

# Create a copy
dfWideMultiReports = dfFinal

# Unmelt from wide to long
dfWideMultiReportsLong = dfWideMultiReports.melt(id_vars=["searchterm", "rank", "week", "year", "yearWeek", "marketplaceTitle"], var_name="position", value_name="ASIN")

# Add column with a 1 so we can sum by this colum
dfWideMultiReportsLong['count'] = 1

# Make position an int
dfWideMultiReportsLong = dfWideMultiReportsLong.astype({"position": int})
Unmelted dataframe
Unmelted data

Now we have one row for each search term, ASIN, week combination.

Next, we enrich our search term report data with the product data to e.g. have a column for the ASIN's brand:

# Left join products data to searchterm data
dfWideMultiReportsLongExtended = pd.merge(left=dfWideMultiReportsLong, right=dfProducts, how='left', left_on=['marketplaceTitle','ASIN'], right_on = ['marketplaceTitle','ASIN'])

# Drop rows where searchterm or brand is NaN
dfWideMultiReportsLongExtended = dfWideMultiReportsLongExtended.dropna(subset=['searchterm', 'ASIN', 'rank', 'week', 'year', 'yearWeek', 'marketplaceTitle', 'position', 'brand', 'productTitle']) # Only dop rows with N/A in specific columns

We also drop rows where we are missing some data using the dropna method.

The final dataframe we are going to work with looks like this now:

Extended dataframe
Extended data

Analysis of the data

Now as we have everything in place we can start doing some simple analysis. We will start by analyzing changes between the current and the previous week. As shown above we have this data in our dfCurrent and dfPrevious dataframes.

We will look at:

  • Search terms
  • Brands
  • Products

Search term analysis

Prepare the data

We need to merge our 2 dataframes containing the current and the previous week of data for some analyses.

# Change column names
dfCurrent.columns = ['searchterm', 'currentRank']
dfPrevious.columns = ['searchterm', 'previousRank']

# Make currentRank and previousRank an integer
dfCurrent = dfCurrent.astype({"currentRank": int})
dfPrevious = dfPrevious.astype({"previousRank": int})

# Merge current and previous dataframes
dfMerged = pd.merge(left=dfCurrent, right=dfPrevious, how='left', left_on=['searchterm'], right_on = ['searchterm'])
dfPrevious = dfPrevious.astype({"previousRank": int})

# Calculate the changes between current and previous
dfMerged['DeltaAbs']=dfMerged['currentRank']-dfMerged['previousRank']

# Positive value of Change (to see biggest movers in absolute terms)
dfMerged['DeltaAbsPos']=dfMerged['DeltaAbs'].abs()

With that out of the way, let's do some more fun stuff.

Get the top search terms of the current week

Let's start slow with some simple tables. Let's get the top search terms from the current week and show the difference to the previous week. We also add some colors based on the changes using TailwindCSS classes. However, this is not necessary for the sole analysis of the data. It just makes the following tables prettier.

# Get current Top 10 with previous
dfTop10Searchterms = dfMerged
dfTop10Searchterms = dfTop10Searchterms.drop('DeltaAbsPos', 1)

dfTop10Searchterms.columns = ['Searchterm', 'Rank cur.', 'Rank prev.', 'Change']
dfTop10Searchterms['Change'] = dfTop10Searchterms.apply(lambda row: '<span class="text-green-600">↑ ' + str(int(np.nan_to_num(row.Change))) + '</span>' if row.Change < 0 else '<span class="text-red-600">↓ ' + str(int(np.nan_to_num(row.Change))) + '</span>', axis=1)
searchTermsTop = dfTop10Searchterms.head(10).to_markdown(index=False)

The dfTop10Searchterms dataframe looks like this then:

SearchtermRank cur.Rank previousChange
disposable face masks11↓ 0
face mask22↓ 0
pop it33↓ 0
desk45↑ -1
iphone 12 pro max case57↑ -2
earbuds66↓ 0
n95 mask74↓ 3
apple watch band811↑ -3
black disposable face masks924↑ -15
laptop1015↑ -5

The search term black disposable face masks has quite improved compared to the previous week. Remember, the lower the search frequency rank, the more often this term is searched on Amazon.

Get newcomer search terms

Now we can try to find the top search terms (based on search frequency rank) where there was no search frequency rank in the previous week:

# Get new entries where there was no rank in previous week
dfTopNewcomer = dfMerged[dfMerged['previousRank'].isnull()]
dfTopNewcomer = dfTopNewcomer.drop(['previousRank','DeltaAbs', 'DeltaAbsPos'], 1)
dfTopNewcomer = dfTopNewcomer.sort_values(by=['currentRank'], ascending=True).head(10)

dfTopNewcomer.columns = ['New searchterms', 'Rank cur.']
searchTermsNewcomer=dfTopNewcomer.head(10).to_markdown(index=False)

This will e.g. look like this:

New searchtermsRank cur.
pixel 5a 5g case2027
infinity game table2459
cicy bell womens casual blazer5234
donsfield lint remover6945
go oats oatmeal in a ball9887
ivernectin10034
go oats10367
ruthless empire rina kent10711
cicy bell blazer11057
pixel 5a 5g screen protector11222

As you can see these are very often long-tail keywords.

Find search terms with the highest rank change

Now let's look at the "mover's and shakers" of high-traffic search terms. With "high-traffic" we mean that we reduce the result to search terms with a search frequency rank below 1,000. Feel free to play with this parameter to get different results.

# Only include searchterms which are now < 1000
filterWinners = (dfMerged['currentRank'] < 1000)

# Winners
dfWinners = dfMerged[filterWinners].sort_values(by=['DeltaAbs'], ascending=True).head(10)

# Drop DeltaAbsPos column
dfWinners = dfWinners.drop('DeltaAbsPos', 1)

# Rename columns
dfWinners.columns = ['Searchterm', 'Rank CW ' + currentWeek, 'Rank CW ' + previousWeek, 'Delta']

For the winners this will look like this:

SearchtermRank CW 33Rank CW 32Delta
black american flags84714,708-13,861
paw patrol movie5234,817-4,294
sexy dresses for women7622,924-2,162
headphones for kids for school8692,460-1,591
room decor aesthetic8252,373-1,548
hunger games9962,412-1,416
eyebrow pencil4801,859-1,379
throw pillow covers3551,641-1,286
bentgo kids lunch box7051,902-1,197
respect9272,076-1,149

Top movers

Let's find out for which high-traffic search terms (again with a search frequency rank below 1,000) the search frequency rank has changed most compared to the previous week. We include both positive and negative changes as we sort by the absolute changes here.

# Top Movers within a certain rank threshold
filterMovers = (dfMerged['previousRank'] < 1000) & (dfMerged['currentRank'] < 1000)
dfMerged = dfMerged.astype({"currentRank": float}) # Needs to be there otherwise currentRank column is string?!
dfMovers = dfMerged[filterMovers].sort_values(by=['DeltaAbsPos'], ascending=False).head(10)
dfMovers['DeltaAbs'] = dfMovers.apply(lambda row: '<span class="text-green-600">↑ ' + str(int(np.nan_to_num(row.DeltaAbs))) + '</span>' if row.DeltaAbs < 0 else '<span class="text-red-600">↓ ' + str(int(np.nan_to_num(row.DeltaAbs))) + '</span>', axis=1)

# Drop DeltaAbs column
dfMovers = dfMovers.drop('DeltaAbsPos', 1)

# Rename columns
dfMovers.columns = ['Searchterm', 'Rank CW ' + currentWeek, 'Rank CW ' + previousWeek, 'Delta']

The result looks like this:

SearchtermRank CW 33Rank CW 32Delta
field of dreams998196↓ 802
friday the 13th902210↓ 692
pop it fidget toys241929↑ -688
n95 masks for virus protection911395↓ 516
wireless earbuds bluetooth302785↑ -483
college essentials875454↓ 421
fall clothes for women297678↑ -381
air conditioner583206↓ 377
usb c cable618991↑ -373
iphone charging cords949594↓ 355

Of course, you can change the thresholds as you like.

Brand analyses

If we have the brand attached to each entry, we can also analyze by brand. Here we use our big data frame instead of the 2 separate weeks.

Let's find out which brands show up most often:

filtCurrentWeek = (dfWideMultiReportsLongExtended['yearWeek'] == yearWeekCurrent)
filtPreviousWeek = (dfWideMultiReportsLongExtended['yearWeek'] == yearWeekPrevious)

dfBrandCurrentWeek = dfWideMultiReportsLongExtended[filtCurrentWeek]
dfBrandPreviousWeek = dfWideMultiReportsLongExtended[filtPreviousWeek]

dfTopBrandsCurrentWeek = dfBrandCurrentWeek['brand'].value_counts().sort_values(ascending=False).to_frame('countCurrent')
dfTopBrandsPreviousWeek = dfBrandPreviousWeek['brand'].value_counts().sort_values(ascending=False).to_frame('countPrevious')

brandsTop = dfTopBrandsCurrentWeek.head(10).to_markdown().replace('countCurrent', 'Count')

The brandsTop dataframe will look like this:

Count
Amazon Basics9670
Apple5620
Nike5469
365 by Whole Foods Market3577
adidas3546
SAMSUNG3171
Amazon3079
HP3002
Generic2809
LEGO2737

We just count by a brand's appearance in the search term report. You could include the search frequency rank to give search terms with a low rank a higher weight. This was not done here.

Let's now find out which brands have improved compared to the previous week. Again we do a plain count here:

dfTopBrandsMerged = dfTopBrandsCurrentWeek.merge(dfTopBrandsPreviousWeek, how='left', left_index=True, right_index=True)
dfTopBrandsMerged['DeltaAbs'] = dfTopBrandsMerged['countCurrent'] - dfTopBrandsMerged['countPrevious']
dfTopBrandsMerged['DeltaAbsPos'] = dfTopBrandsMerged['DeltaAbs'].abs()

# Winners and Losers
dfBrandLoosers = dfTopBrandsMerged.sort_values(by=['DeltaAbs'], ascending=True).head(10)

dfBrandWinners = dfTopBrandsMerged.sort_values(by=['DeltaAbs'], ascending=False).head(10)
dfBrandWinners.columns = ['# CW ' + currentWeek, '# CW ' + previousWeek, 'Delta', 'Delta Abs']
dfBrandWinners = dfBrandWinners.drop('Delta Abs', 1)
brandsWinner = dfBrandWinners.to_markdown()

The brandsWinner dataframe will look like this:

# CW 33# CW 32Delta
Amazon30792833246
SAMSUNG31712939232
Bentgo814638176
HP30022863139
Disguise820690130
Texas Instruments718589129
DXLOVER249131118
Generic28092696113
Logitech12081095113
Spooktacular Creations293182111

Product analyses

What we did with search terms and brands can also be done with products. Let's look for the products which show up most or more often compared to the previous week. We will use our full data set again here.

First, we'll filter the big dataset, then we'll merge the current and the previous week, and then we'll count the number of occurrences.

filtCurrentWeek = (dfWideMultiReportsLongExtended['yearWeek'] == yearWeekCurrent)
filtPreviousWeek = (dfWideMultiReportsLongExtended['yearWeek'] == yearWeekPrevious)

dfProductCurrentWeek = dfWideMultiReportsLongExtended[filtCurrentWeek]
dfProductPreviousWeek = dfWideMultiReportsLongExtended[filtPreviousWeek]

dfTopProductsCurrentWeek = dfProductCurrentWeek['ASIN'].value_counts().sort_values(ascending=False).to_frame('countCurrent')
dfTopProductsPreviousWeek = dfProductPreviousWeek['ASIN'].value_counts().sort_values(ascending=False).to_frame('countPrevious')

# ASIN is index, create column from it, add marketplaceTitle for making the merge possible
dfTopProductsCurrentWeek['ASIN'] = dfTopProductsCurrentWeek.index
dfTopProductsCurrentWeek['marketplaceTitle'] = market

dfTopProductsPreviousWeek['ASIN'] = dfTopProductsPreviousWeek.index
dfTopProductsPreviousWeek['marketplaceTitle'] = market

# Add product data, e.g. product title
dfTopProductsCurrentWeek = pd.merge(left=dfTopProductsCurrentWeek, right=dfProducts, how='left', left_on = ['marketplaceTitle', 'ASIN'], right_on = ['marketplaceTitle', 'ASIN'])

# Merge current with previous df
dfTopProductsMerged = dfTopProductsCurrentWeek.merge(dfTopProductsPreviousWeek, how='left', left_on = ['marketplaceTitle', 'ASIN'], right_on = ['marketplaceTitle', 'ASIN'])

# Calculate the changes
dfTopProductsMerged['DeltaAbs'] = dfTopProductsMerged['countCurrent'] - dfTopProductsMerged['countPrevious']
dfTopProductsMerged['DeltaAbsPos'] = dfTopProductsMerged['DeltaAbs'].abs()

# Shorten the product title and add ASIN
dfTopProductsMerged['productTitleShort'] = dfTopProductsMerged['productTitle'].astype(str).replace('|', '-').str[0:30] + ' (' + get_ASIN_Link(dfTopProductsMerged['ASIN']) + ')'

# Get rid of some columns and prepare output
dfTopProductsForOutput = dfTopProductsMerged[['productTitleShort', 'countCurrent']]
dfTopProductsForOutput.columns = ['Product title', 'Count']

productsTop = dfTopProductsForOutput.head(10).to_markdown(index = False)

# Winners
dfProductsWinners = dfTopProductsMerged.sort_values(by=['DeltaAbs'], ascending = False)
dfProductsWinners = dfProductsWinners[['productTitleShort', 'DeltaAbs' ]]
dfProductsWinners.columns = ['Product title', 'Delta']
productsWinner = dfProductsWinners.head(10).to_markdown(index=False)

# Loosers
dfProductsLoosers = dfTopProductsMerged.sort_values(by=['DeltaAbs'], ascending = True)
dfProductsLoosers = dfProductsLoosers[['productTitleShort', 'DeltaAbs' ]]
dfProductsLoosers.columns = ['Product title', 'Delta']
productsLoser = dfProductsLoosers.head(10).to_markdown(index=False)

The first rows of productsTop look like this then:

Product titleCount
EnerPlex 3-Ply Reusable Face M (B088C2WD5F)287
Matein Travel Laptop Backpack, (B06XZTZ7GB)226
Blink Mini – Compact indoor pl (B07X6C9RMF)219
Gildan Men's Crew T-Shirts, Mu (B091FNSFN1)210
Crocs Men's and Women's Classi (B09BBT7T3P)207
AIHOU 50PC Kids Butterfly Disp (B08LL8H78D)195
Apple AirPods Pro (B07ZPC9QD4)195
Beckham Hotel Collection Bed P (B01LYNW421)179
Apple AirPods with Charging Ca (B07PXGQC1Q)177
TCP Global Salon World Safety (B08FXLVHTY)176

Again we only count the appearance. We have not weighted this by the search frequency rank of the respective keywords.

Respectively productsWinner looks like this:

Product titleDelta
KATCHY Indoor Insect and Flyin (B07B6RZP4H)160
Texas Instruments TI-84 Plus C (B00TFYYWQA)64
5 Layer Protection Breathable (B08F8Y3X59)59
Columbia Baby Girls' Benton Sp (B077FNDN8N)55
Fire 7 Kids Edition Tablet, 7" (B07H936BZT)54
BISSELL Crosswave All in One W (B01DTYAZO4)51
$20 PlayStation Store Gift Car (B004RMK4BC)50
Germ Guardian True HEPA Filter (B004VGIGVY)50
Texas Instruments TI-84 PLUS C (B00XOLOOPY)47
SPACE JAM Heroes of Goo JIT Zu (B08S3GWM5W)44

Let's now find out which are the hottest newcomer products and which products showed high changes. The code looks for this looks like this:

# Best newcoming products
filtBothWeeks = (dfWideMultiReportsLongExtended['yearWeek'] == yearWeekCurrent) | (dfWideMultiReportsLongExtended['yearWeek'] == yearWeekPrevious)
dfProductCurrentWeek = dfWideMultiReportsLongExtended[filtBothWeeks]

df_pivot_current = pd.pivot_table(dfProductCurrentWeek,index=["ASIN"], columns=["yearWeek"], values=["searchterm", "rank"],aggfunc={"searchterm":len,"rank":min})
df_pivot_current['marketplaceTitle'] = market
df_pivot_current.columns= ['minRankPrevious', 'minRankCurrent', 'countPrevious', 'countCurrent', 'marketplaceTitle']
df_pivot_current.reset_index(inplace = True)

df_pivot_current.sort_values(by=['countCurrent'], ascending = False)

df_pivot_current['countDelta'] = df_pivot_current['countCurrent'] - df_pivot_current['countPrevious']
df_pivot_current.sort_values(by=['countDelta'], ascending = False)

# Add product data, e.g. product title
df_pivot_current_ext = pd.merge(left=df_pivot_current, right=dfProducts, how='left', left_on = ['marketplaceTitle', 'ASIN'], right_on = ['marketplaceTitle', 'ASIN'])

# Shorten the product title and add ASIN
df_pivot_current_ext['productTitleShort'] = df_pivot_current_ext['productTitle'].astype(str).replace('|', '-').str[0:30] + ' (' + get_ASIN_Link(df_pivot_current_ext['ASIN'], market) + ')'

# Products with most change between current an prev. week
df_pivot_current_ext_short = df_pivot_current_ext[['productTitleShort', 'countCurrent', 'minRankCurrent', 'countDelta']]
df_pivot_current_ext_short = df_pivot_current_ext_short.sort_values(by=['countDelta'], ascending = False)
df_pivot_current_ext_short.columns = ['Product title', '# searchterms', 'Best rank', 'Delta']

productsNewcomer = dfTopProductNewcomer.head(10).to_markdown(index=False)

Let's look at the movers and shakers first:

Product title# searchtermsBest rankDelta
KATCHY Indoor Insect and Flyin (B07B6RZP4H)174775160
Texas Instruments TI-84 Plus C (B00TFYYWQA)82191464
5 Layer Protection Breathable (B08F8Y3X59)92127659
Columbia Baby Girls' Benton Sp (B077FNDN8N)58281755
Fire 7 Kids Edition Tablet, 7" (B07H936BZT)66138754
BISSELL Crosswave All in One W (B01DTYAZO4)91522351
Germ Guardian True HEPA Filter (B004VGIGVY)9814950
$20 PlayStation Store Gift Car (B004RMK4BC)66682950
Texas Instruments TI-84 PLUS C (B00XOLOOPY)65191447
SPACE JAM Heroes of Goo JIT Zu (B08S3GWM5W)52127744

Let's figure out which are the best newcomer products:

# Best Newcomer Products (without rankings in previous week)
dfTopProductNewcomer = df_pivot_current_ext[df_pivot_current_ext['countPrevious'].isnull()]
dfTopProductNewcomer = dfTopProductNewcomer.sort_values(by=['countCurrent'], ascending = False)
dfTopProductNewcomer['productTitleShort'] = dfTopProductNewcomer['productTitle'].astype(str).replace('|', '-').str[0:30] + ' (' + get_ASIN_Link(dfTopProductNewcomer['ASIN'], market) + ')'

dfTopProductNewcomer = dfTopProductNewcomer[['productTitleShort', 'countCurrent', 'minRankCurrent']]
dfTopProductNewcomer = dfTopProductNewcomer.sort_values(by=['countCurrent'], ascending = False)
dfTopProductNewcomer.columns = ['Product title', '# searchterms', 'Best rank']

productsNewcomer looks like this:

Product title# searchtermsBest rank
adidas Unisex-Child Questar Fl (B08TK7P6CN)1601966
Kids KN95 Face Mask - 25 Pack (B08Q8J5MNQ)142844
Crocs Unisex-Child Kids' Class (B08LDBPB3J)502595
TeeTurtle - The Original Rever (B088X4XFNQ)4710620
adidas Women's Grand Court Sne (B09746H9SC)45588
Kids Face Mask, Disposable Kid (B08R8CHZN3)4017142
Potaroma Cat Wall Toys, 7.7 by (B08T9CL5T9)3116802
MIAODAM Face Mask Lanyards, Ma (B09CDDP2Z3)28279
KN95 Particulate Respirator - (B0876J4F4Y)27235
Earbuds Earphones Wired Stereo (B098L2DSS3)263532

Additional Amazon Keyword and Brand Analyses

We can do so much more with this data. Let me show you some examples:

Single keyword analysis by week

Once we have loaded the data for e.g. the last 52 weeks we can plot the development of a single search term over time:

searchTerm = "pool"

# Filter data
dfSingleKeyword = dfFinal.loc[dfFinal['searchterm'] == searchTerm].sort_values('week')

# Plot the data
kwPlot = dfSingleKeyword.plot(x ='week', y='rank', figsize=(10,6), title="Keyword: " + searchTerm)
kwPlot.set_xlabel("Weeks")
kwPlot.set_ylabel("Rank")
kwPlot

This will give us a better understanding of how the demand for certain products behaves over time.

Here is a graph for the term "pool" shown above.

Graph for the search term pool
Graph for the search term "pool" in 2021

It's easy to see when the pool season started and ended in 2021.

Show generic search terms a specific brand ranks for

Here we search for all search terms the brand "OPPO" ranks for but does not include the brand's name. Sorted by rank ascending.

Here we assume that the data for a single brand analytics report sits in the dataframe called df:

# Show searchterms for a given brand and sort by rank ASC
myFilter = (df['brand'] == 'OPPO') & (df["Searchterm"].str.contains('oppo')==False)
df.loc[myFilter].sort_values(by=['Rank'])

This should give us generic search terms.

This of course can also be done for multiple brands across multiple reports:

brands = ["Kasa Smart", "ARENTI", "GW Security", "Blink Home Security"]
brandsLowerCase = [x.lower() for x in brands]

# Get all rows from selected brands
dfBrandTerms = dfWideMultiReportsLongExtended[dfWideMultiReportsLongExtended.brand.isin(brands)]

# Group result
dfBrandTermsGrouped = dfBrandTerms.groupby('searchterm')['rank'].agg(['mean']).sort_values(by=['mean'])

# Reset index
dfBrandTermsGrouped.reset_index(level=0, inplace=True)
dfBrandTermsGrouped = dfBrandTermsGrouped.astype({"mean": int})

The dfBrandTermsGrouped dataframe now looks like this:

Keywords specific brands rank for
Top keywords for selected security camera brands

If we only want to find generic keywords, let's remove the search terms which include a brand's name:

brandKeywordsWithoutBrand = dfBrandTermsGrouped[~dfBrandTermsGrouped.searchterm.str.contains('|'.join(brandsLowerCase))]

Want to export both all and non-brand keywords to Excel? Pandas has you covered:

with pd.ExcelWriter('camera-export-brand-keywords.xlsx') as writer:  
    brandKeywordsWithoutBrand.to_excel(writer, sheet_name='Generic Keywords', float_format="%.2f")
    dfBrandTermsGrouped.to_excel(writer, sheet_name='All Keywords', float_format="%.2f")

Classify search terms

We could even go one step further and try to classify this for every search term in the report using this function.

To do this, we first need to put all brands into a list:

# Get unique list of brands
dfWideMultiReportsLongExtended['brand'].nunique()
allBrands = dfWideMultiReportsLongExtended.brand.unique()

# Lower each name
allBrands = [ x.lower() for x in allBrands if len(str(x)) > 3 ]

Now we can analyze each row in our dataframe and check if the search term contains the brand's brand name, the brand name of a different brand, or is a generic searchterm.

def searchTermType (row):
    brand = str(row['brand']).lower()
    searchterm = str(row['searchterm']).lower()
    
    if (not pd.isnull(brand)) & (brand in searchterm):
        return 'brand'
    if not pd.isnull(brand):
        for brand2 in allBrands:
            if not pd.isnull(brand2):
                if brand2.lower() in searchterm:
                    return 'other-brand'
    return 'generic'

This function checks, if the search term contains any brand's name.

To apply this function to a small sample of dataframe we need to run the following command:

dfPlay = dfWideMultiReportsLongExtended.head(10)
dfPlay['searchTermType'] = dfPlay.apply(searchTermType, axis=1)

Show a brand's performance over time

dfAll = dfWideMultiReportsLongExtended
brandsInScope = ["Nike", "adidas", "Under Armour"]

# Create pivot table
dfPivot = pd.pivot_table(dfAll[dfAll['brand'].isin(brandsInScope)],index=["yearWeek"], columns=["brand"], values=["ASIN"], aggfunc='count', fill_value=0)

# Plot data
plt = dfPivot.plot(fontsize=4, figsize=(4, 2))
plt.legend(loc=7, prop={'size': 3}, bbox_to_anchor=(0.23,0.90))
plt.set_ylabel('Count', fontsize=5)
plt.set_xlabel('Week', fontsize=5)

The chart looks like this:

Development of specific brands over time
Development of specific brands over time

Conclusion

And there you have it. Although we covered quite some ground there is so much more you can do with this data.

I hope we could inspire you with some new ideas to try out.

You can find the full source code of the code above on Github. This of course excludes the search term data as this is only availabe to registered brand owners.

Would you like to have a better overview on Amazon?
Monitor your listings 14 days for free!
Do you have any questions? Don't hesitate to call us or send us an email!
Tel. +49 221-29 19 12 32 | info@amalytix.com