Case

AllPrices
Price aggregator

Service type:: Agile development
Product type:: Price aggregator

Product development in a highly competitive market

E-commerce is a large niche of IT products that was one of the first to appear, so competition among the products in this industry is pretty high. In this case, we not only built a great product in this domain and solved its most difficult technical problem but also learned all of the difficulties of product development in a highly competitive market. This cooperation resulted in a price aggregator that works with online stores using the CPA model. AllPrices collected goods from the many online stores and showed them to users by search queries. We managed to build the product, which analyzed a vast quantity of goods and helped its users to find the best proposals.

Daiquiri: Allprices price-Agregator description

During finding an effective business model, we conducted several significant product pivots while maintaining its effectiveness. We solved for our client all development tasks in two essential stages of the project: during the Pre-Seed stage when we launched the product in the fashion industry, and during the Seed stage, when the product had to cover all product categories. As the result of the Pre-Seed stage, we built the product and successfully received a seed investment. With this money, the product started competing with companies bigger than our client by capitalization. Our client deeply learned a direction, and we solved all technological tasks while checking up on the business model. The product was started in 2016, but it was officially closed in 2021, and our client gave us all the special rights to publish the working process in detail. That's why this case contains a lot of information rarely illuminated publicly.

Pre-seed stage (2016)

In 2016 e-commerce market in the fashion industry in Eastern Europe was differentiated: there were a lot of sellers who sold quite different products. The idea of the first version of AllPrices was simple: we needed to collect all the products of the fashion industry on one site, build a user-friendly structure of categories and filters and display real discounts for all products. At this stage, the main traffic channel was organic, and a significant role was given to the low-frequency queries. Our client saw an opportunity to massively prepare pages with relevant products for such queries and thus get substantial traffic.

Low-frequency queries are search queries that get little traffic, up to 1000 or even up to 100 visits per month. Many websites do not pay enough attention to promotion for such requests because high-quality development of these pages is often too expensive.

At that point, we decided that such an idea could work great in the fashion industry with many products. People could usually search by exact query — for example, sneakers brand, the scarf color, or text on the t-shirt. On the first page of the search, they see pages of individual stores that do not contain all the necessary products, which are not always relevant to the query. That's why our future page with goods from all online stores for this exact request was an excellent example for the visitor — it could bring the user to the desired purchase with a high possibility.

In addition to the convenience of searching among all the stores within the direction, the product had to offer the user fair discounts. We were watching product prices every day, so we could truly evaluate how the current price differs from the usual. And this discount would be honest, in a difference with many stores, which could simply put a discount badge on the product, when the price was same for the previous 30 days.

Product goals

The goals we formulated with our client were more about technology than the product itself. The main task at this stage was to prove that such an IT product could be built during the Pre-Seed stage. The goals were:

To build a technology that could 1-4 times per day collect 1M of goods from 100+ stores and calculate for each product whether it has one of the 100K+ characteristics. Data sources are XML files with the goods of this store, constantly updated by sellers. These files contain names, descriptions, URLs to the product webpages, and other data. The sellers allowed us to scrap these product URLs to enrich our database with more data.
To develop a technology that allows building 100K+ landing pages for the most popular search queries in the fashion niche. Also, each page should contain at least 30 search-relevant products that were available in the store in the last 24 hours.
To provide a good user experience through a quality product card, product price history, and retailer's promo codes.

Technical stack

For our client, the stack of the development was a principal question. Our client had an experience in the product development from his job in Cuponation (one of the significant players in this market), where he tightly cooperate with developers. The specialists who consulted them recommended the next stack:

Yii2 – famous in the Eastern Europe PHP-framework for web developers at that time.

PostgreSQL – one of the two the most popular open-source RDBMS.

Memcached – the PHP library for caching page's data.

Redis – for caching data of intermediate calculations.

We deeply learned the task and noticed several product risks, that's why we proposed a more suitable stack. Yii2 was a perfect solution for us cause that time, that framework was the most suitable for agile development, which we'd already known. Also, PostgreSQL was a solid choice for open-source RDBMS. We decided that Memcached and Redis were not fundamentally essential technologies of the product, although they could be used when necessary.

We noticed significant computational complexity of computing goods categories and properties. That's why we proposed to develop a special solution in C++, which accepts collected data from sources as input, and uploads data on categories and properties directly to the database. Also, we needed to collect data from more than 1M pages to enrich the data set. We suggested a dedicated service in Python to collect this data and import it into our DB. The client endorsed our ideas.

Goods normalization algorithm

A hard task was product characterization process development. We aimed at defining 100K+ characteristics for 1M+ products, so manual development was out of the question. We developed a system that allowed the system to determine whether an item had each characteristic. We had raw product data from XML dumps and data from HTML pages as inputs. Simply put, these were "Description", "Material", "Size", and other fields with numeric or string values. We developed a primitive expression language, which defined the criteria if a product has some characteristic. We named such expressions normalization rules. They looked something like this:

((Name with prefix “плат”) OR (Description with prefix “плаття”)) AND

NOT ((Name with prefix “пляжн”) OR (Description with prefix “пляжн”))

Sometimes сарафани names “пляжними платтями”. They have to be in the category “сарафани”, that's why the name has prefix плат and prefix пляжн — this is сарафан, not плаття. We could generate such expressions automatically for most characteristics. Also, the content managers manually elaborated these expressions for all categories and the most popular characteristics.

Bulk generation of SEO texts

The next significant task was developing the admin panel to manage the structure of all the site's pages. The category tree was a fairly typical task for this case. At the same time, creating quality filter pages looked like a task about which there's no information in a public space. The main difficulty was the automatic generation of SEO texts based on templates since manual content development for 100K+ pages was not possible.

We had to create landing pages for categories like "sneakers" and "dresses," properties like "leather" and "embroidered," and the general characteristics "female," "male," and "for children. Such pages would be titled "children's red sneakers" or "evening dresses with embroidery.

Example of template:

{name_plural_nominative} from {min_price} to {max_price} UAH {Low|Favorable|Attractive} prices {for the purchase|for} {product_count} {name_plural_genitive} in {store_count} {online stores|stores}

Examples of generated texts:

Dress from 500 to 50 000 UAH. Attractive prices for the purchase of 12 544 dresses in 105 online stores

Sneakers from 2500 to 20 000 UAH. Attractive prices for 1 244 sneakers in 105 online stores

Necklaces from 200 to 200 000 UAH. Attractive prices for the purchase of 4 025 necklaces in 44 online stores

The special difficulty was that we developed the product for the Baltic-Slavic branch of languages, where adjectives have different forms depending on noun's grammatical gender. Also, it was important to follow the correct word order during content generation. Some characteristics had to occur before the noun, like "red sneakers," and some after, like “sneakers with high heels". To solve the mass generation of such pages, content managers had to specify all the necessary word forms of product properties and categories. The algorithm chose the proper word form for the grammatically correct construction of the parts of SEO texts for successful page promotion.

Product development

After agreement on business requirements, we start to work. We decided to split the development on several stages to have an opportunity to check the quality achievement of goals upon completion of each of them:

Data collection module, xml-parser

The solution had to process thousands of XML feeds every day, and each file could contain up to a million products. We decided that the solution should correctly process up to 5M products. Also, our module should receive and parse product pages on the store's website daily for each product in XML feeds. Then we stored all the structured data in the database.

The solution of collecting data from webpages was the strong feature. XML feeds didn't always contain complete and up-to-date information, whereas pages on websites were supposed to be relevant. That's why all the necessary data was there. The best example of an important data is the item's price on the site. We had accurate information about product price history by constantly collecting prices from web pages, which allowed us to calculate accurate price discounts based on actual price history.

The administrator's panel, admin

It had to contain interfaces for managing data sources, the tree of categories, filters, blog pages, and other entities. The content manager should be able to add a new store; set rules for parsing XML files and HTML code of pages. Also, they can specify rules to determine whether a product falls into a particular category or filter. In addition, we needed features for managing the SEO content of pages: manual editing of data for specific pages and mass generation of content by a set of templates.

Data normalizer, core

This module received the data collected by xml-parser from the database and determined for each product its categories and filters, according to the settings that content managers defined on the admin admin. After that core saves the data in the database in an appropriate form for the web application.

Technically task was like this. We have about 10^7 of texts — data from product fields. For each text, we need to define all occurrences of strings from normalization rules as substrings or prefixes. There were about 10^6 of such strings. The best algorithm to solve this problem was an Aho–Corasick algorithm — the classic algorithm for searching the set of substring from the dictionary in a given string. On the biggest real data set in the Pre-Seed stage, this algorithm solved tasks with less than two hours, with a planned limit of 12 hours.

The website

It had to contain pages of categories and filters, pages of goods, and blog publications. As search engine optimization was one of the most critical tasks, we reached a high page load speed — 99% of pages were loaded at no more than 200 ms.

Everything passed smoothly — we solved all technical difficulties and successfully released the product. By the end of 2016, active development was stopped because the product had all the essential things for testing a business model.

At its peak, the product collected more than 100K visits per month and generated more than 1000 sales with an average check of about $25.
The strategy with low-frequency queries worked. Most of the sales came from them. And the tools for the mass category + properties detection and SEO text generation did an excellent job.
The product was ready to increase the limits by about ten times at the end of this stage.

With this product, the client took part in many conferences and held pitches for many investors. At the same time, the product keep continued to generate income and validate its business model.

Seed stage (2018)

In the second half of 2018, our client raised a Seed round from a strategic investor. The client defined the task of this stage of product development as covering the entire e-commerce market that works by the CPA model. This development vector opened up new technological challenges for the product. Here are the most interesting of them:

The previous product version worked with fashion goods, which were technically similar and had the same properties. Now we had to work with all product groups, so we had to provide a different set of properties for each of them.
We had to collect the same items into a unified card, defining that offers on the websites of different online stores represent one product.
The new product should increase user loyalty to the service. For this purpose, we wanted to provide an option for users to register and sign up for the mailing list of best deals in the categories and filters that interest them.
Multiple filtering of the product output with a calculation of the number of products after a user applied each filter. The main difficulty was that many combinations of filters had many items in the database. Because of this, it was impossible to calculate on-the-fly how many products would be available after applying each additional filter since it would significantly increase the server response time.
Also, those filter combinations with many items affected the sorting functionality. You can sort the items on the fly if there are a few items. But for the large sets, it is worth pre-calculating the goods for each sorting to respond to queries quickly.

Comparative infographics of the size of the fashion niche in relation to e-commerce for 2015

Our client faced an equally challenging task. While Google as the main traffic channel was OK for a Pre-Seed MVP, we needed scalable traffic channels to keep the product growing. In addition, the product required a significant store base growth, and it wasn't easy to make agreements with some of them. Therefore, we could use a minimum of the founder's time in the product development process.

All the tasks of the new product version were not entirely new for us, and we had a clear plan for achieving our goals. So we implemented all of the product improvements successfully. Let's look at the most challenging tasks: building an abstract product card structure, aggregating offers into one product card, and pre-calculating product sets for each filter.

The previous solution was a background routine that regularly processed all the data from the sources and prepared the data for fast API response. The overall scheme was as follows:

Collected goods data was stored in the database in a flexible JSON structure.

The Core unloads the offers data in a batch and unpacks the JSON objects into RAM for processing.

The Core collects offers into product cards, and defines properties for each offer and product. For example, RAM: 128GB, Material: leather, Sleeves: loose.

Finally, the Core saves normalized goods and offers data to the database.

We extended this solution as follows. As we processed the goods, we began to maintain all the filter combinations with relevant offers in RAM. For each combination, we stored data needed for sorting. Thus, after processing all the goods in the category, we had all the data we needed to calculate the filter statistics. Since we had a list of product IDs for each combination, we computed the order of goods for each "sort by" option. We also optimized the DB schema and created appropriate indexes, improving response time for resource-intensive API requests.

We solved all tasks in a similar decision-making process. After implementing the new application's functionality, our client still had several unresolved business issues. We learned about this only after the plan's implementation because the client decided to manage the product concept themselves for commercial confidentiality reasons.

Our client's marketing strategy did not show the right results in the first months after the release of the new product. The second problem was the quality of the content, which could no longer be generated as easily as in the first version, and the planned budget for content managers was insufficient. So the client suggested that we make several improvements to the product that would allow it to reach its potential.

CPC monetization

Some online stores were ready to pay for the product by CPC model: upon the lead visiting their website. Mostly these were small shops with exclusive products.

We developed a web application for such stores, where they could refill their balance and keep track of the generated traffic. We also changed the ranking algorithm so the CPC products would get more reach. We ranked products based on the cost per click value, which the store set in the cabinet.

AllPrices White-label

Our client's sales outreach created an opportunity for the product pivot. Some websites with a large audience wanted to do a white-label revenue sharing with AllPrices. According to this model, we would integrate our product into their resource in such a way that users would get to the product pages without leaving the partner's site.

To do that we had to adapt the front-end application so that it would look like the partner's website. We've also made a number of security enhancements to the API so that only verified partners have access to it, and the load on the API from one client doesn't pose a problem for the main service or other partners.

Restarting of SEO strategy

In the summer of 2019, we learned from our client that search traffic metrics were not as expected. The client decided to turn to another company for a second opinion. This time the guys from Olshansky&Partners consulted us. As part of this work, we changed the URL structure on many pages, changed the construction of snippets for search engines, and changed the semantic structures in the DOM tree of landing pages.

The active stage of cooperation ended in the third quarter of 2019. The client told us that the project did not confirm the business hypotheses in attracting an audience, so we needed to suspend the work for an indefinite period. Technologically, the product solved all the tasks, but it was impossible to build an effective business model. The main reason for this, our client believes, is the difficulty of building an effective marketing strategy. This project gave us significant food for thought. The main point we made was that we want to work on products as a whole, not on their parts. Only in this way can we qualitatively share responsibility for the future of products with our customers.

The first stage of product development was an absolute success. The product proved its profitability and sustainability, and our client's company attracted the next investment round. The second version gave our client, and partly to us, the experience of understanding such business models and the principles of building products for them.

We have developed solid solutions and learned a lot about the market. A company from a related domain in the UAE has bought this product. The purchase's primary purpose is to learn and use our product's experience in their projects.

At the end of 2021, we consulted the new owner of the product on developed solutions, which helped him understand the product market much faster and deeper.

We will implement your ideas!
Want to start a project?

Let's talk

AllPricesPrice aggregator