Amazon is a search engine
When most people talk about Amazon, they often refer to it as the world's largest marketplace. But this analogy is not entirely accurate. I stroll through a marketplace from one merchant stand to the next. I enter what I am looking for on Amazon into the search slot and then get matching results. So Amazon is primarily a search engine - with the option to buy the products found.
Amazon, therefore, has remarkable similarities with Google. However, Google's homepage is even more reduced and consists almost only of a search slot. The challenge is the same for both companies: Amazon's and Google's search engines have to filter out the relevant ones from a multitude of so-called "documents" and then sort them from "very relevant" to "less relevant." In the case of Google, the documents are essentially web pages. In the case of Amazon, they are the products.
Amazon's algorithm A9
When talking about Amazon's search engine, one often speaks of A9. This abbreviation stands for A9.com Inc., a company of the same name founded in 2003, which initially set out to be an independent search engine for the World Wide Web (WWW). The search engine could be reached via a9.com and was most recently a meta-search engine that used other sources such as Wikipedia, Google, or the IMDB. You could even search books on Amazon back then. In 2008, the portal a9.com was discontinued, and since then, under the guard of Amazon's search technology has been continuously developed.
The challenges of Amazon's search are the same then as now. The search engine has to search and index an immense, constantly growing corpus of products and, if the user wishes, deliver results in fractions of a second.
Amazon itself describes the challenge as follows:
product search typically operates in two stages: matching and ranking
Amazon uses matching to identify the products that match the search query and ranking to sort them. We want to go into both in more detail in the following.
Find relevant products
In an article published in 2019 about Semantic Search (opens new window), Amazon visualizes the search as follows:
Direct match keywords
Amazon essentially says, "Products that contain words in the query (Qi) are the primary candidates." Amazon uses the texts contained in the Product listing and compares them with the query. If there is a match, the algorithm considers the product relevant and goes to the next stage of processing.
However, this approach also has several disadvantages. Thus Amazon says:
Pure lexical matching falls short in this respect due to several factors: a) lack of understanding of hypernyms, synonyms, and antonyms, b) fragility to morphological variants (e.g. "woman" vs. "women"), and c) sensitivity to spelling errors.
By hypernyms, one understands the generalization of an expression, synonyms are related words ("wallet" vs. "purse"), and by anonymous, one understands opposite expressions. Example for an antonym: If a person searches for "gloves without latex," then there is a chance that Amazon will also show "latex gloves" because two out of three search terms (opens new window) are contained in the product. Not always do search engines understand modifiers like the word "without" or "free" correctly.
Further problems exist with singular and plural spellings. A search engine often uses the technique of Stemming, i.e., words like queries are deliberately shortened to standardize different endings (singular, plural, female/male form).
Also, regarding wrong spellings, it is not always easy to recognize them correctly and assign them to the correct search phrase. It is estimated that 10-15% of all queries are misspelled. Amazon needs to correct these and understand what the customer is looking for. To avoid misspelled phrases, Amazon also uses the so-called Suggest function, among other things, where it tries to complete the first entries and makes search suggestions based on the first entries.
But even correctly written search queries are not always easy for a machine to understand, so someone looking for a "business suit" is probably looking for a suit. On the other hand, the phrase "socks suit" is more likely to search for socks, although both terms contain "suit."
Another challenge is that Amazon's approach to direct matching is dependent on the quality of the product description. If the retailer does not make an effort to describe the products in detail and does not use synonyms, for example, the pure comparison with the search phrase may fail, even though the product would be relevant. Search engines often form an extended index to a product and classify it in the background with synonyms and hypernyms matching a term.
Amazon uses not only from the texts, which the seller or the manufacturer holds ready but falls back to contents of, e.g., Amazon reviews (source (opens new window)). In Amazon reviews, customers describe what they use a product for differently. Amazon stores this information and determines the search intention from the respective search query. If there is a match, the product is relevant.
For the reasons mentioned above, it is often not sufficient to rely on a purely lexical check between the product's search phrase and text corpus. Amazon, therefore also includes buyer signals in the relevance analysis. Signals that buyers send out are, e.g., clicks on a product, adding a product to the shopping cart, and the purchase. But there are also negative signals: a customer looks at a product but does not buy it. The latter could indicate that the product does not match the customer's search intention.
With such signals, you have to take care to normalize them. For example, a customer's probability clicks on a result in the top 3 is significantly higher than for a product on page 3. Conversely, you have to be careful not to completely exclude products without buyer signals, as they may suffer from the hen-egg problem (no sales, no visibility, no visibility, no sales). Click rates are adjusted by the expected click rates on a specific search result position, and only under- or above-average values are considered.
Correctly interpreted, such signals are therefore golden. Signals like purchasing a product are the ultimate confirmation that the customer has finally found what he was initially looking for. In this way, Amazon can link search phrases with matching products.
Determine Amazon Rankings
As if the determination of relevant products was not already complicated enough, the supreme discipline follows: Sorting the results or determining the ranking of the products found.
Amazon writes in another article (opens new window):
Learning to rank (LTR) models rely on several features for a given query to rank documents. Many LTR features are based on users' interactions with documents such as impressions, clicks, and purchases. We call these features behavioral features. Ranking models are trained to optimize user engagement, and therefore, such behavioral features tend to be the most critical training signals.
User signals also play a significant role here. However, not all buyers are the same. Many have different preferences. A "one-size-fits-all" approach does not go far enough here. Amazon is, therefore, additionally experimenting with personalized results. For example, suppose Amazon learns during a session that a customer is always looking for products of a particular brand. In that case, the algorithm can improve the result by prioritizing products of this brand. Personalized results are already the rule at Google, not least because Amazon also plays out regionally different results.
Amazon writes (opens new window) about this:
We reformulate product search as a dynamic ranking problem. We leverage users' implicit feedback on the presented products as short-term context and refine the ranking of remaining items when the users request the following result pages.
Ranking models are often about identifying the relevant "features" (characteristics) that are crucial for a "good" ranking, thus allowing the user to find what he is looking for. However, the so-called "feature engineering" can be done in very different ways. The vigorous development of machine learning and neural networks (artificial intelligence) opens the door for many new approaches. Finding the right approach is more of an art than science. For example, such systems first have to be extensively trained, with selecting the training set essentially determining the result. Moreover, not enough data is available for all search queries.
Specific categories also require a different approach. For example, in the "Clothing" category, female buyers often click on expensive "high-fashion" products without ever buying them. The signal of the click would thus be overweighted here. On the other hand, customers also want to see such products. If only the cheap products were visible, they would feel like being on the grab table and not in a boutique.
Due to changing assortments and changing purchasing behavior, Amazon is always facing the challenge of readjusting its search algorithm. However, you must also take your interests (e.g., profitability) into account. Minimal improvements in the search algorithm can have a massive impact on Amazon's (financial) results. Amazon is constantly striving to improve its algorithm and is working together with the best scientists in the world.
As a retailer, it isn't easy to adjust to this. What is considered correct today to use for one’s own goals may be outdated tomorrow, but it is still helpful for sellers to deal with the topic of Amazon SEO (opens new window).
By the way: If you are interested in the differences and similarities in search engine optimization between Google and Amazon, we have written the following exciting article for you: Amazon SEO vs. Google SEO - differences and similarities (opens new window)