Designing a Fast, Scalable, and Fault-Tolerant Create Order Process

@fakhrulnugrohoDecember 6, 2025

When users click “Create Order”, they expect one thing: speed. But behind that single click often lies a chain of database writes and external API calls that quietly slow everything down. This article shows how a simple event-driven approach can turn a slow, fragile order flow into one that is fast, scalable, and resilient—without overengineering.

🎯 “Why is my create order so slow?”

This is a classic question that often appears as a system starts to grow. After taking a closer look, that seemingly simple create order flow often hides a lot of workload behind the scenes:

Save the order to the database
Call a logistics API to generate a tracking number (AWB)
Call a WMS (Warehouse Management System) API to request shipment

It looks like just three steps. But when executed synchronously, users can end up waiting 3–10 seconds. Even worse, if an external API is slow or down, the entire process can fail.

This is the point where many developers realize something important:

“Not every process has to finish before the user gets a response.”

The solution? Event-Driven Architecture.

🌟 Why Event-Driven Architecture Is a Great Fit

Because it allows us to clearly separate:

Lightweight steps → synchronous
Heavy steps → asynchronous

The result?

Fast user responses
Stable servers
External APIs no longer blocking critical requests

🧩 Architectural Design: Separate Fast Steps from Heavy Steps

Let’s look at the architecture at a high level.

🔄 End-to-End Flow: From Click to Warehouse

1. User clicks “Create Order”

The backend receives the request.

2. The system does only what truly matters

Validate input data
Persist the order to the database
Publish the ORDER_CREATED event

Processing time: extremely fast (50–150 ms)

3. The user immediately gets a response

“Order successfully created.”

4. Background workers handle the rest (asynchronously)

🎯 Logistics Processor

Listens to the ORDER_CREATED event
Calls the logistics API
Generates the tracking number (AWB)
Updates the database
Publishes the AWB_GENERATED event

📦 WMS Processor

Listens to the AWB_GENERATED event
Calls the WMS API
Updates the order status (e.g., Ready for Fulfillment)

All of this happens in the background. The user never has to wait.

🚀 Benefits of This Architecture (and Why Big Companies Use It)

✔ Much better user experience

Requests are no longer blocked by external APIs.

✔ Loose coupling with vendor reliability

If the logistics API is down → the worker retries.

✔ Horizontally scalable

Traffic increases? Just add more workers.

✔ Fault-tolerant by design

Queues, retries, DLQs, and idempotency keep the system safe.

🧱 Critical Pieces to Make This Production-Ready

1️⃣ Idempotency

Events can be delivered more than once. Workers can crash. Networks can time out.

You must ensure:

Tracking numbers are not generated twice
WMS requests are not sent twice
Order status is checked before processing

Common approaches:

status columns
event log tables
the outbox pattern

2️⃣ Retry with Exponential Backoff

Avoid retrying every second. Use a pattern like:

1s → 5s → 20s → 60s → DLQ

If it still fails, move the message to a Dead Letter Queue for investigation.

3️⃣ Reasonable Timeouts

External APIs are unpredictable.

Recommendations:

3–5 seconds timeout
never let workers block for too long

4️⃣ Observability

You need visibility into:

How many events are in the queue?
Which workers are failing?
How long external APIs are down?

Use:

logs
metrics
distributed tracing

👑 Why This Approach Saves So Many Developers

By moving heavy processes into background workers:

Users never wait too long
Servers don’t get overwhelmed
Orders don’t fail just because a logistics API is broken
Developers can focus on building features instead of fighting fires

In short:

“Do what matters now. Let the patient workers handle the rest.”

✨ Closing Thoughts

A powerful order system doesn’t have to be complex. By clearly separating synchronous and asynchronous responsibilities, you can build a system that is:

fast
scalable
fault-tolerant
easy to evolve

Event-Driven Architecture provides exactly that foundation.

If you’re building e-commerce platforms, marketplaces, logistics systems, or any kind of transactional system — this pattern is one of the best long-term investments you can make.