TEMP

image recognition for e-commerce:­ PREDICTING YOUR CUSTOMER'S TASTE

foto van Dennie van den Biggelaar
Dennie van den Biggelaar
Data Translator

August 10th, 2017

Since the rise of e-commerce, consumers have shifted their purchasing behavior to the digital world. Brick and mortar stores have been substituted by web shops and sales persons by search queries, clicking and recommendation systems, backed by predictive analytics. When consumers are clicking through web shops, whether it’s for travel, clothing or musical instruments, their choices are heavily influenced by pictures. This brings up the following question. Why is website intelligence still so focused on text and tags, when its users clearly aren’t? To tackle this contradiction, a solution can be found in the combination of predictive analytics and image recognition, in which images are analyzed.

The importance of images in e-commerce

Why are people so drawn to images? The answer is threefold. Firstly, images speak more quickly to the human brain than words do. They attract attention because one can immediately recognize a product or object, within a second. Therefore, when visiting a web shop one can immediately see if a product matches his or her taste. This is crucial in a wide range of ecommerce environments where taste plays a substantial role, for example in fashion and jewelry.

Secondly, people have a better memory for pictures than they have for textual content.

To conclude, people are drawn to images because they can more easily picture themselves using or owning the product they’re searching for. The picture makes the product much more tangible. This is worth striving for, because the easier one can imagine him or herself using or owning the product, the more value is attached to the product. In behavioral economics also known as the endowment effect, therefore displaying the right picture is crucial.

Typical use of an e-commerce website

Let’s dive deeper into the online process of buying a pair of shoes. When a consumer is clicking through a web shop searching for shoes, information that is typically used as input for recommendation engines include:

Brand: Adidas
Type: Stan Smith
Manufacturing year: 2017
Color: White
Materials used: Leather
Price: €109,-

But what about shape, or style? These are very critical decision criteria when buying a pair of shoes. It is highly unlikely consumers would spend over hundred euros on a pair of shoes only based on a set of generic characteristics, without having a clear image of how the product would look like.

Buying shoes is just one example. Other examples of ecommerce environments where images are more important than text are online jewelry stores or web shops for travel. Another application, besides analyzing images for recommendation, is displaying the right picture to the right person. Consider the following example: when looking for all-inclusive resorts, different consumers might be interested in different aspects of these type of vacations. While some are attracted to hanging out at the pool all day, others might enjoy nearby activities like visiting cultural sites more. To attract the attention of the visitor to a product and help find the ideal vacation for a specific customer, you should display an image which matches the taste of that specific consumer. This can be achieved by analyzing the images they clicked on, using historical data and display images accordingly to the taste of that customer.

These are examples of taste-sensitive industries, where taste is hard to convey clearly in mere text or tags.


taste sensitive markets
 Example of taste-sensitive industries

 
The information about taste is already there in the images accompanying the text, albeit in an unstructured way. What if we could structure this data? What if we could estimate the taste of an individual consumer based on pictures solely?

THE HUMAN EYE, A POWERFUL PROCESSOR
If we want to accomplish this, it is helpful to understand the functioning of the human eye and how images are interpreted. The basic principle of the functioning of the eye is simple: the eye is an effective light sensor. This becomes clear when there is an absence of light: in a room without any light, the human eye can’t detect its surroundings.

The waveforms of light are translated by the eye into visual sensory information. The next step is processing, which takes place in one’s brain. The brain processes the visual sensory information using different complex mechanisms. The result is an understanding of what we perceive in terms of size, shape, color and place. All these interpretations are constructed in to one image, which is what we perceive.

We can conclude that there is a massive amount of information to be processed for perceiving a single image.

 

simplified working process of the eye

Simplified working process of the eye (source: www.theeyewearboutique.co.za)
 

How image recognition works

Understanding of the working of the human eye, can be used by data scientists to replicate this process, and create algorithms for image recognition.

But how exactly do we transform this complex, unstructured, data into something that can be analyzed, into structured data? Which is necessary, as mathematical algorithms cannot handle unstructured data, in this case the original format of images (jpeg, png). Our data science solution transforms unstructured data into structured data using advanced analytics.

Note: Some data scientists prefer to see images as structured data since, by definition, it is offered to us in a structured way (i.e. the location and color of each pixel is perfectly clear). However, patterns in image data cannot be captured properly by conventional data science techniques like a human eye can.

First, let’s investigate how pixel data is interpreted. All pixels of an image can be analyzed individually. This will provide us with information about color (RGB or hexagon color code). Although, this information is incomplete to draw conclusions about the image. Therefore, convolution data is used to provide some important additional information. Convolution data takes surrounding pixels into account, creating the possibility to detect shape and structure in an image. This divides the image into different parts, for instance the object of interest and the background.

These pixel analysis techniques combined provide extensive information. But also, demand tremendous computable power. To ensure the recommendation system on a website can operate in real-time, the amount of data has to be reduced, to lower the computable power needed while keeping a maximum amount of predictive power. This makes the data not only usable in real-time, but also increases the interpretability of the data. The output of each analyzed image is a score, which can be compared to other images using mathematical algorithms. This comparison makes it possible to put images in certain clusters. These clusters represent the taste of a consumer.

 

How image recognition works

Simplified summary of the image recognition process

Create a digital sales person with image recognition

As stated earlier, people shifted from brick and mortar stores to ecommerce. Downside of this shift is the lowering personal service a store employee could provide in finding the right product. This is an iterating process of communicating your taste or style to the sales representative for e.g. a pair of pants or an all-inclusive holiday. The store representative can either do this by asking you questions or by analyzing the clothes you are wearing when visiting the store. This process is very important in making the right purchase decision.

In an online environment, a similar kind of service is delivered, in the form of recommendation systems. When browsing a web shop, help is offered to some degree, in the form of recommendation algorithms. However, current recommendation systems, only use overlapping transaction patterns in historical data to predict the right product. It analyzes the generic characteristics of what you have bought or just clicked on in the past. Much more powerful and personal recommendations are needed to equal the level of a store representative. Image recognition is going to play a major role in achieving this, especially in taste-sensitive industries.

Using image recognition, one’s personal taste can be predicted based on e.g. shape, refinement and the level of detail displayed in a products’ image. It also enables companies to predict what would become the next best-seller of a new collection based on popular taste characteristics of the old collection.

In conclusion, image recognition optimizes recommendation systems to help consumers to find the right product. This leads to more cross-selling, and faster and higher conversion rates. And not entirely unimportant, your customers feel that you understand their specific taste. This enables companies to grow their ecommerce revenue and to increase customer satisfaction.