Import API best practices
To best utilize the Import API, we recommend some best practices while implementing the Import API.
Using Import Containers effectively
Organizing Import Containers
It is entirely up to you how to organize your Import Containers.
A general recommendation is to use more Import Containers that contain less data (over more data in fewer containers). But, it may differ based on your use case and organization/monitoring needs:
- When importing full data sets, creating a new dedicated Import Container for the task can help in distinguishing imports performed on different occasions.
- When performing routine data imports, reusing a dedicated Import Container may be better.
- When importing data from multiple sources, using an Import Container for each source can help in organizing and monitoring progress.
In these three use cases, Import Containers would be organized by resource type, reusing a container for recurring import activities, or by data source.
Use Case | Possible Import Container organization |
---|---|
Import Product and Category | Create separate containers for Product and Category |
Import Price changes daily at 5 PM | Create a reusable container. If there are more than 200 000 imports per day, this may be broken down by some other business logic or temporary container for excess counts. |
Import Product changes from multiple sources | Create one container per source for Product imports. |
Optimizing performance
To achieve the best performance with the Import API, we recommended having fewer than 200 000 Import Operations per Import Container. This way, monitoring activities at the container level will not be costly.
As Import Operations are automatically deleted 48 hours after they are created, you can reuse Import Containers over time. An example schedule is as follows:
Day | Import Operation total count | Import Containers (Import Operation count) |
---|---|---|
Day 1 | 100 000 | container-a (100 000) |
Day 2 | 500 000 | container-a (100 000), container-b (200 000), container-c (200 000) |
Day 3 | 400 000 | container-a (0), container-b (200 000), container-c (200 000) |
Day 4 | 200 000 | container-a (200 000), container-b (0), container-c (0) |
On Day 3, container-a will be empty as all the Import Operations have reached 48 hours. container-a is now ready to be reused. Similarly container-b and container-c can be reused from Day 4.
Limits
You can have any number of Import Containers.
Import Operations and Import Requests per Import Container have no limits, but we recommend using fewer than 200 000 Import Operations per Import Container, especially if you would be querying these containers.
Cleaning up data from Import Containers and removing unused Import Containers
You do not need to clean up Import Containers as Import Operations are automatically deleted 48 hours after they are created.
You can delete Import Containers at your own convenience. This will immediately delete all Import Operations in the container. However, data that has been imported to your Project will not be affected.
Sending Import Requests to Import Containers
The batch size is limited to 20 per Import Request. If you have a huge number of resources to import, we recommend using thread optimization to send your data as fast as possible to an Import Container.
Please note that the asynchronous import process starts as soon as the first Import Request is received by the Import Container.
Choosing the right Product import endpoint
When importing ProductDraft, Product, or ProductVariant data to update an existing resource, you must include existing values for fields or they will be removed when the data is imported.
The Import API has multiple endpoints for importing Product data. The following table summarizes what can be imported by each endpoint.
Endpoint | What can be imported |
---|---|
ProductDraft | Product data including Product Variants and Prices. |
Product | Product data without Product Variants and Prices. |
ProductVariant | Product Variant data without Prices. |
ProductVariantPatch | Product Variant Attribute data. |
EmbeddedPrice | Price data for a specific Product Variant. |
The following information explains common use cases for each endpoint.
ProductDraft
Use ProductDraftImport for large payloads that include complete sets of Product data including Product Variants and Embedded Prices. Effective use of this endpoint can remove the need to call three separate endpoints (Products, Product Variants, and Embedded Prices).
Common use cases:
- when you want to update a Product with 10+ new Product Variants, each with 20+ Embedded Prices.
- when you want to update a large number of Product Variants with new Attribute values and Prices.
Product
Use ProductImport to create or update Products without Product Variant or Price data. As ProductImport does not import any ProductVariant or Price data, it will result in better performance due to the smaller payload.
ProductVariant
Use ProductVariantImport to create or update Product Variants without Price data.
ProductVariantPatch
Use ProductVariantPatchImport to update the Attributes of existing Product Variants.
EmbeddedPrice
Use EmbeddedPriceImport to update the Price data of existing Product Variants.
Managing the published state of Products
Both ProductImport and ProductDraftImport have the field publish
which accepts a boolean value. The result of using this field varies based on whether you are importing data to create or update Products.
When importing data to create a Product
If true
, the Product is created and published immediately to the current projection. If false
, the Product is created but not published.
When importing data to update an existing Product
The result of updating existing Products depends on if you are including changes in your import request, and if the Product currently has staged changes.
Value of publish | Does the import request have changes? | Does the Product have staged changes before importing? | Result |
---|---|---|---|
true | No | Yes | The staged changes are applied to the current projection and the Product is published. |
true | No | No | If the Product is currently unpublished, it is published to the current projection. |
true | Yes | N/A | The changes are applied to both the current and staged projection.
|
false | No | N/A | The Product is unpublished. |
false | Yes | N/A | The changes are applied to the staged projection, the Product is unpublished, and hasStagedChanges becomes true . |
Monitoring the import progress
Two of our monitoring endpoints, Import Summary and Query Import Operations, are container-based. Call the Import Summary endpoint for a quick summary and (later) the Query Import Operations to fetch details.
- Import Summary can be used to get an aggregated progress summary, which gives you the information if you have any errors,
unresolved
orcompleted
states. - Query Import Operations can be used with filters like states to query specific situations. For example, query to fetch all the errors to fix those, or to fetch the unresolved to resolve those.
- You can use debug mode to fetch the unresolved references if there is something in the
unresolved
states. This way, you know which references to resolve.
Utilizing the 48 hours lifetime of Import Operations
Import Operations are kept for 48 hours to allow you to send other referenced data (unresolved references) during this time period.
For example, one of your teams is responsible for Product import but the business validation usually delays the Product import for 1-2 days, and another team that imports Prices is very fast in importing the data, the Import API keeps the Price data for up of 48 hours and waits for the Product to be imported.
Importing large data sets
You can import as much data you need using the Import API, keeping in mind the best practices for rate limits and payload size per request. For more information on optimization, see optimizing performance.
Check the API limits to ensure that your import requests do not exceed the resource limitations of your Project.
Rate limits
Currently, there are no rate limits. However, to ensure the best performance, we recommend that you send 300 API calls per second, per Project, to the Import API.
To ensure the best performance, send 300 API calls per second per Project to the Import API.
For example, if you send 300 Import Category requests per second, and each CategoryImportRequest contains 20 CategoryImport items, this means that 360 000 Categories can be sent to the Import API every minute.
Handling retries
You only need to retry if your Import Operation has the rejected
status. In other cases, the Import API will handle the retry internally without you needing to do anything.
What not to do
- Do not send duplicate import requests concurrently. Since the Import API imports data asynchronously, the order is not guaranteed. It may also lead to a concurrent modification error.
- In case of errors, do not query Import Operations or the Import Summary endpoint frequently without fixing the problems as it may slow down the import process. For assistance in debugging issues or errors, query the Import Operation and consult the
errors
field.
Avoiding concurrency errors
When importing Product Variant Patches, provide the reference to the Product that contains the Product Variant.
The value for the product
field in the ProductVariantPatch minimizes concurrency errors during the import process.
If you set the product
field on one ProductVariantPatch, you have to set it for every ProductVariantPatch in the same ProductVariantPatchRequest.
Otherwise, the API returns an InvalidField error.