Digital Shijil
Field notes · Operations · 6 min read

Where most store automation quietly breaks.

Three failure patterns that look like “the integration is fine” — right up until they cost you a launch week. None of them show up in a status page.

1. The silent timeout

Most workflow tools retry on hard errors but not on slow ones. When an upstream API gets sluggish — common during sales — the workflow times out, returns “success” because nothing errored, and quietly drops events. You only notice when the data downstream looks thin a week later.

The fix: instrument every workflow with a tail count. If the daily volume drops below a threshold, fire an alert. The system should notice missing events before you do.

2. The schema drift

Vendors change their payloads. A field becomes optional. A new value appears in an enum. Your workflow assumed five values; now there are six, and the sixth falls through to no branch at all. The integration didn’t break — your assumption did.

The fix: log unknown values with a default branch and a weekly review. The first time something new shows up, you find out — not the customer.

3. The auth that didn’t expire (until it did)

OAuth tokens, app passwords, and shared secrets all rotate on schedules nobody remembers setting. The integration runs perfectly for months and then dies on a Tuesday. Worse: some tools fail silently when auth lapses, swallowing the error and carrying on as if nothing fired.

The fix: a single credentials register with rotation dates and ownership. Boring, durable, low-maintenance.

Most automations don’t break loudly. They drift. The job is to catch the drift, not just the crash.

See the audit