BETA

Similar Products

This feature is part of our Machine Learning APIs that are available in the Google Cloud Regions in Europe and North America.

Find Similar Products in a product catalog.

The Similar Products API uses machine learning to search through a product catalog for Products that are similar to each other. The API allows you to specify what product information the similarity search is based on, and what Products to compare.

With the help of the Similar Products API, you can find related products and detect potential duplicates in your catalog. Removing duplicate data can help improve search engine optimization.

Read our tech blog for insights on how to improve data quality using the Similar Products API.

General concepts

The Similar Products API performs requests asynchronously. This is because the time required the perform the comparisons depends on the number of products to compare. To manage this delay, the API has an initiation endpoint and a status endpoint.

To make a request, you need to specify two product sets using ProductSetSelectors, define how to measure similarity, and initiate the search. After your initial request, poll the search status endpoint to get the results of the search.

Product set selection

Product set selection is the stage in the processing of this request where product catalogs are filtered down to the Products to be compared. The search compares two sets of Products and you can specify how those sets are selected. The ProductSetSelector object specifies the selection criteria. By default, all Products in a Project will be selected for comparison. Optionally, you can specify a set of Product IDs or a set of Product Type IDs to filter by.

Product set selection examples

Compare all Products in the Project "sunrise" to a specific Product with the ID {product_id} in a Project {projectKey}

...
"productSetSelectors": [
{ "projectKey": "{projectKey}",
"productIds": ["{product_id}"] },
{ "projectKey": "sunrise" }
]
...

Compare all Products with type {product_type_id_1} with all Products with type {product_type_id_2} in the specified Projects

...
"productSetSelectors": [
{ "projectKey": "{projectKey}",
"productTypeIds": ["{product_type_id_1}"] },
{ "projectKey": "{projectKey}",
"productTypeIds": ["{product_type_id_2}"] }
]
...

Compare the staged version of Product {product_id_1} with the current version of Products {product_id_2} and {product_id_3}

...
"productSetSelectors": [
{ "projectKey": "{projectKey}",
"productIds": ["{product_id_1}"],
"staged": true },
{ "projectKey": "{projectKey}",
"productIds": ["{product_id_2}", "{product_id_3}"],
"staged": false }
]
...

Product comparisons

The maximum number of allowed Product comparisons is 20 000 000. When a Product is present in both sets, the API skips comparing the Product to itself. Therefore, the total number of comparisons depends on underlying sets.

For example:

Similarity specification

The similarityMeasures attribute defines which aspects of a Product to use to calculate product similarity, and how important each aspect is to the overall similarity score between Products.

You can use the following attributes for comparisons:

  • name
  • description
  • price
  • variantCount
  • attribute

Similarity Measure examples

Compare similarity using name, description, and attribute, where the similarity of attribute is twice as important to overall similarity:

...
"similarityMeasures": {
"name": 1,
"description": 1,
"attribute": 2
}
...

Compare similarity based on price and attribute, where both are equal to overall similarity:

...
"similarityMeasures": {
"price": 1,
"attribute": 1
}
...

Representations

ProductSetSelector

A set of ProductData for comparison. If no optional attributes are specified, all current ProductData are selected for comparison. See Product set selection for more details and examples.

Default ProductSetSelector

Compare all Products within a Project.

[{ "projectKey": "{projectKey}" }, { "projectKey": "{projectKey}" }]

SimilarityMeasures

Specify which ProductData attributes to use for estimating similarity and how each attribute is weighted. The attribute's weight has to be a non-negative integer value (whole number greater than or equal to 0). The larger the integer, the higher the attribute's importance when assessing similarity. A value of 0 means that the attribute is not used during comparison. See Similarity Specification for more details and examples.

  • name - Integer - Optional
    Importance of the name attribute in overall similarity. Default: 1.
  • description - Integer - Optional
    Importance of the description attribute in overall similarity. Default: 1.
  • attribute - Integer - Optional
    Importance of the Product Variant's attribute values in overall similarity. Default: 1.
  • variantCount - Integer - Optional
    Importance of the number of Product Variants in overall similarity. Default: 0.
  • price - Integer - Optional
    Importance of the price attribute in overall similarity. Default: 0.

Default SimilarityMeasures

{
"name": 1,
"description": 1,
"attribute": 1
}

SimilarProductSearchRequest

SimilarProduct

One part of a SimilarProductPair. Refers to a specific Product Variant.

  • product - Reference to a Product
  • variantId - Integer
    ID of the ProductVariant that was compared.
  • meta - SimilarProductMeta - Optional
    Supplementary information about the data used for similarity estimation. This information helps you understand the estimated confidence score, but it should not be used to identify a Product.

SimilarProductMeta

  • name - LocalizedString - Optional
    Localized Product name used for similarity estimation.
  • description - LocalizedString - Optional
    Localized Product description used for similarity estimation.
  • price - Money - Optional
    The Product Price in cents using the currency defined in SimilarProductSearchRequest. If multiple Prices exist, the median value is taken as a representative amount.
  • variantCount - Integer - Optional
    Total number of Product Variants associated with the Product.

SimilarProductPair

A pair of SimilarProducts.

SimilarProductSearchRequestMeta

Metadata about the search parameters.

TaskStatus

A response wrapper for an asynchronous requests.

  • state - TaskStatusState
    The current status of the task.
  • expires - DateTime
    The expiry date of the result. You cannot access the result after the expiry date. Default: 1 day after the result first becomes available. This is only available when the TaskStatus state is SUCCESS.
  • result - Any Type
    The response to an asynchronous request. Only populated when the status is SUCCESS.

TaskStatusState

  • PENDING: The search has started and is awaiting completion, or the task ID does not exist.
  • SUCCESS: The search completed successfully.

TaskToken

Represents a URL path to poll to get the results of asynchronous requests.

  • taskId - String
    The ID for the task. Used to find the status of the task.
  • uriPath - String
    The URI path to poll for the status of the task.

Endpoints

Initiation endpoint

Host: one of the Machine Learning hosts.
Endpoint: /{projectKey}/similarities/products
Method: POST
OAuth 2.0 Scopes: view_products:{projectKey}
Response Representation: TaskToken
Request Representation: SimilarProductSearchRequest

Status endpoint

After completing a search, the status endpoint serves the results for one day. If a search completes unsuccessfully, the status endpoint returns an error response.

Host: one of the Machine Learning hosts.
Endpoint: /{projectKey}/similarities/products/status/{task_id}
Method: GET
OAuth 2.0 Scopes: view_products:{projectKey}
Response Representation: TaskStatus of a PagedQueryResult with results containing an array of SimilarProductPairs and the meta information of SimilarProductSearchRequestMeta. The SimilarProductPairs are sorted by confidence score in descending order.

Examples

Find Similar Products within a Project between Products in two different Product TypesTerminal
curl -X POST https://ml-{mlRegion}.europe-west1.gcp.commercetools.com/{projectKey}/similarities/products \
-H "Content-Type: application/json" \
-H 'Authorization: Bearer {access_token}' \
-d \
'
{
"limit" : 3,
"similarityMeasures" : {
"name": 1
},
"productSetSelectors" : [
{
"projectKey": "{projectKey}",
"productTypeIds": [ "8b50b0b0-8091-8e32-4601-948a8b504606" ],
"staged": true
},
{
"projectKey": "{projectKey}",
"productTypeIds": [ "46068292-4a41-4601-948a-948a8b508b50" ],
"staged": true
}
]
}
'
Example Task Token Responsejson
{
"taskId": "078b4eb3-8e29-1276-45b1-8964cf118707",
"location": "/{projectKey}/similarities/products/078b4eb3-8e29-1276-45b1-8964cf118707"
}
Poll for the resultTerminal
curl -sH 'Authorization: Bearer {access_token}' https://ml-{mlRegion}.europe-west1.gcp.commercetools.com/{projectKey}/similarities/products/078b4eb3-8e29-1276-45b1-8964cf118707
Example Responsejson
{
"result": {
"count": 3,
"limit": 3,
"offset": 0,
"total": 15,
"meta": {
"productSet": 6,
"similarityMeasures" : {
"name": 1
}
},
"results": [
{
"confidence": 0.68427,
"products": [
{
"product": {
"id": "b0b08091-8e32-4601-948a-8b504606d3ac",
"typeId": "product"
},
"variantId": 1,
"meta": {
"name": {
"en": "White T-Shirt | Commercetools Hackathon Edition | Available in S/M/L"
}
}
},
{
"product": {
"id": "46014606-8b50-4606-8292-4a414601948a",
"typeId": "product"
},
"variantId": 1,
"meta": {
"name": {
"en": "Limited edition of the Commercetools T-Shirt - White Color - Now on Sale!"
}
}
}
]
},
"... other similar products ..."
]
},
"state": "SUCCESS",
"expires": "2019-08-10T17:08:51.244390Z"
}