Batch processing of events

A code-optimization strategy for avoiding chatty event topics.

If your shop has a trigger event that fires incessantly (looking at you products/update) due to some combination of high volume operations, extensive task usage of the same event topic, and tasks that make updates to the same object types they are listening to update events for, then your affected tasks are likely good candidates for batch processing optimization.

circle-info

Since the aforementioned products/update event is typically the primary culprit in many shops, this technique will concentrate on mitigating its usage. However, this technique could be applied to any other objects that have an excessive amount of update events.

Original automation criteria:

  • Task should listen for product creation

  • Task should listen for product updates

  • Task will update the product in some way

  • ~10 thousand active products

  • ~500 orders / day

  • A 3rd party integration updates the products in bulk at irregular schedules

    • These updates often include information relevant to this automation

A single task developed with this criteria would very likely have no problem efficiently processing the expected volume of product/update events that would be generated by sales, the 3rd party integration, and its own updates to products (provided that techniques like Preventing action loops and Writing a high-quality task are adhered to).

Over time though, there may be more tasks added that operate with similar criteria, the sales volume hopefully increases (❀️), the product catalog might expand, and additional apps and integrations will likely be making their own product updates.

This can lead to the dreaded jammed queue.

circle-check

Some good questions to answer before refactoring a task:

  • What level of immediacy is actually needed by this task for processing updated products? (i.e. what is the longest acceptable interval between scheduled task runs?)

  • How many products on average would be updated in this interval?

If a task can get away with a daily scheduled run to process all recently updated products, then using bulk operations might be a good idea. With this approach the task could optionally continue to listen on products/create if that is useful (i.e. this specific task would do useful work on a newly created product).

Task scenario

Let's instead assume that a high-level of immediacy is desired for this exercise, and go with the most frequent 10 minute scheduler option. With this approach there generally isn't a need to include a products/create event due to the frequency of scheduled task runs.

Below is how a skeleton task might look for this scenario prior to refactoring. Note that this task already has a manually triggered event that includes paginated querying of up to 25 thousand products. Manually triggered events are typically used for initial task setup, or when massive bulk changes are expected across the product catalog (and the automation will proactively be disabled for that duration πŸ˜‰).

Refactoring the task

To convert the above task code to a much more queue-friendly version, the following steps would be taken:

  • Remove the products/create and products/update listener block

  • Convert the manual trigger block to an if statement and add a contains check for any mechanic/scheduler event

  • Add Mechanic cache checking and setting using the last task run time

  • (Optionally) Add task configuration to allow manual runs to process all active products, or some subset larger than the amount in a typically interval

circle-check

The refactored task is below, which includes an option to query all products on manual runs. Importantly, when that option is enabled, this task will not schedule events, to avoid potential race conditions between task runs.

circle-exclamation

Last updated

Was this helpful?