Error handling through timeout and retries

Best practices for handling errors from Composable Commerce with timeouts and retries.

To build resilient applications on commercetools Composable Commerce, it's essential to configure timeouts and retries correctly. This guide explains timeout and retry principles, outlines best practices, and provides recommended timeout values based on use cases performed with Composable Commerce APIs.

Why timeouts and retries matter

Network latency, transient failures, and service disruptions are inevitable. Without proper timeout and retry mechanisms, applications are vulnerable to errors and disruptions, leading to degraded performance and poor user experience. Implementing robust timeout and retry strategies ensures that applications can gracefully handle these challenges and continue functioning effectively.

By implementing the strategies recommended in this guide, you can:

  • Prevent operations from hanging indefinitely, which can lead to resource exhaustion.
  • Improve user experience by failing fast and enabling error recovery or retry behavior.
  • Reduce the impact of intermittent network issues and temporary service slowdowns.
  • Improve fault tolerance in distributed systems and asynchronous workflows.
  • Maintain responsiveness by retrying after a designated timeout instead of waiting indefinitely.

Applications should avoid indefinite waits by setting a maximum timeout and initiating retries when responses exceed it. This ensures responsiveness and reduces user-perceived latency.

Assumptions and context

This guidance assumes your application design follows our performance tips. Those cover aspects like concurrency, API request planning, and caching of API responses, but this guide focuses on the strategies you can implement in the HTTP client itself.
We highly recommend implementing your applications with our SDKs that provide configuration parameters for timeouts and retries, for example in the Java SDK and in the TypeScript SDK.
We also assume that you constantly log and monitor API usage of your application on your side.
This document recommends best practices for the APIs that manage resources in a Composable Commerce Project directly. If your application uses the Import API, find some best practices in the dedicated page. If you orchestrate third-party APIs as well, check their recommendations and take their latencies into account also.

GET requests

While response times are consistently low, small numbers of outlier requests can sometimes exceed this window. For complex queries that consistently take longer than 2 seconds, review to apply best practices.

Recommended timeout value and retry policy:

  • Resource lookups by ID or key, 1 second with exponential backoff.
  • Resource queries with predicate, up to 5 seconds, immediate retries for transient failures.

POST requests

POST requests on most Composable Commerce APIs lead to write operations (create or update) on resources in your Project and involve integrity checks and potentially further validations before the new state is persisted.

POST requests to our Search APIs are read methods and should be treated like GET requests in your timeout and retry configuration.

Recommended timeout values for creating and updating:

  • simple resources, like Inventory Entry and Standalone Price: 2 - 5 seconds
  • complex resources, such as Carts, Orders, and Products: 5 - 10 seconds
If your POST request is forwarded to an API Extension, its latency should be taken into account. Consider an example in which a Cart update action triggers an API Extension that calls a Payment Servivce Provider, and the p99 latency including transfer for this is 8 seconds. If we say we would give the update action alone a 5 seconds timeout, setting a timeout of 11 seconds would be recommended to make these requests robust in the rare case of slower performance.

Choose values based on the slowest observed performance in production, with some additional overhead. Ensure your application accounts for possible duplicate writes or version conflicts, retry update requests only if an update is still required. Keep in mind that requests can still succeed after the client timed out.

Retry policies

Your application must gracefully handle 502 Bad Gateway and 503 Service Unavailable errors. Enable retries on network errors and selected 5xx statusCode responses. Implement a backoff strategy, progressively increasing the delay between retries. Load test your application to evaluate your retry strategy.

Exponential backoff

Use delays before retrying the next request. Recommended default is 200 ms. A good practice is to implement exponential backoff and gradually increases the time delay between successive retry attempts to reduce network congestion and increase resilience during transient failures.

Concurrent Modification error

If a 409 ConcurrentModification error is returned on retry, check the newest state of the resource and retry only if the resource update is still required. If the expected version in the message of the 409 Concurrent Modification error is the same as the actual version, a previous request is still in progress. Fetch the resource after 1 second delay in this case to get the updated version of the resource.

Maximum Retry Attempts

Define a maximum number of retry attempts to avoid indefinite retries, ensuring that the application gracefully handles persistent failures. A good default is 5 retries for the same error event, though 3 retries is sufficient for most use cases. Use exponential backoff delays before retrying the next request.

Logging and monitoring

Effective timeout and retry strategies require visibility into their behavior. Integrate:

Use this telemetry to refine timeout values and retry policies over time.

Summary

Timeouts and retries are essential tools for building robust integrations with Composable Commerce. To ensure the highest reliability:

  • Use short, targeted timeouts wherever possible.
  • Do not set timeouts longer than 60 seconds.
  • Avoid applying the same timeout globally. Configure timeouts based on use case and system architecture.
  • Use exponential backoff delays before retrying the request.
  • Define a maximum number of retry attempts.
  • Monitor and log all retry and timeout behavior for observability and tuning.
  • Adjust retry behavior based on observed performance in production.
Request typeTimeoutRetry strategy
GET a small resource by identifier1 secondRetry with exponential backoff
GET a filtered collection5 secondsRetry immediately, without delay
POST (Create/Update)10 secondsRetry with exponential backoff (if applicable)

By adopting these practices, your application can better tolerate latency, maintain responsiveness, and handle failure better.