Designing a Fast, Scalable, and Fault-Tolerant Create Order Process
When users click “Create Order”, they expect one thing: speed. But behind that single click often lies a chain of database writes and external API calls that quietly slow everything down. This article shows how a simple event-driven approach can turn a slow, fragile order flow into one that is fast, scalable, and resilient—without overengineering.
🎯 “Why is my create order so slow?”
This is a classic question that often appears as a system starts to grow. After taking a closer look, that seemingly simple create order flow often hides a lot of workload behind the scenes:
- Save the order to the database
- Call a logistics API to generate a tracking number (AWB)
- Call a WMS (Warehouse Management System) API to request shipment
It looks like just three steps. But when executed synchronously, users can end up waiting 3–10 seconds. Even worse, if an external API is slow or down, the entire process can fail.
This is the point where many developers realize something important:
“Not every process has to finish before the user gets a response.”
The solution? Event-Driven Architecture.
🌟 Why Event-Driven Architecture Is a Great Fit
Because it allows us to clearly separate:
- Lightweight steps → synchronous
- Heavy steps → asynchronous
The result?
- Fast user responses
- Stable servers
- External APIs no longer blocking critical requests
🧩 Architectural Design: Separate Fast Steps from Heavy Steps
Let’s look at the architecture at a high level.
🔄 End-to-End Flow: From Click to Warehouse
1. User clicks “Create Order”
The backend receives the request.
2. The system does only what truly matters
- Validate input data
- Persist the order to the database
- Publish the
ORDER_CREATEDevent
Processing time: extremely fast (50–150 ms)
3. The user immediately gets a response
“Order successfully created.”
4. Background workers handle the rest (asynchronously)
🎯 Logistics Processor
- Listens to the
ORDER_CREATEDevent - Calls the logistics API
- Generates the tracking number (AWB)
- Updates the database
- Publishes the
AWB_GENERATEDevent
📦 WMS Processor
- Listens to the
AWB_GENERATEDevent - Calls the WMS API
- Updates the order status (e.g., Ready for Fulfillment)
All of this happens in the background. The user never has to wait.
🚀 Benefits of This Architecture (and Why Big Companies Use It)
✔ Much better user experience
Requests are no longer blocked by external APIs.
✔ Loose coupling with vendor reliability
If the logistics API is down → the worker retries.
✔ Horizontally scalable
Traffic increases? Just add more workers.
✔ Fault-tolerant by design
Queues, retries, DLQs, and idempotency keep the system safe.
🧱 Critical Pieces to Make This Production-Ready
1️⃣ Idempotency
Events can be delivered more than once. Workers can crash. Networks can time out.
You must ensure:
- Tracking numbers are not generated twice
- WMS requests are not sent twice
- Order status is checked before processing
Common approaches:
- status columns
- event log tables
- the outbox pattern
2️⃣ Retry with Exponential Backoff
Avoid retrying every second. Use a pattern like:
1s → 5s → 20s → 60s → DLQ
If it still fails, move the message to a Dead Letter Queue for investigation.
3️⃣ Reasonable Timeouts
External APIs are unpredictable.
Recommendations:
- 3–5 seconds timeout
- never let workers block for too long
4️⃣ Observability
You need visibility into:
- How many events are in the queue?
- Which workers are failing?
- How long external APIs are down?
Use:
- logs
- metrics
- distributed tracing
👑 Why This Approach Saves So Many Developers
By moving heavy processes into background workers:
- Users never wait too long
- Servers don’t get overwhelmed
- Orders don’t fail just because a logistics API is broken
- Developers can focus on building features instead of fighting fires
In short:
“Do what matters now. Let the patient workers handle the rest.”
✨ Closing Thoughts
A powerful order system doesn’t have to be complex. By clearly separating synchronous and asynchronous responsibilities, you can build a system that is:
- fast
- scalable
- fault-tolerant
- easy to evolve
Event-Driven Architecture provides exactly that foundation.
If you’re building e-commerce platforms, marketplaces, logistics systems, or any kind of transactional system — this pattern is one of the best long-term investments you can make.