October 7, 20251 min read

Cron Jobs That Don't Lie

Scheduled jobs are where silent failures go to hide. Here is the pattern I use to make them loud.

CronObservabilityReliability

The silent failure problem

A cron job that fails once is a bug. A cron job that fails every night for a month without anyone noticing is a career event. The default Linux cron setup will email nobody, log nowhere useful, and retry nothing.

The setup I ship

BullMQ or node-cron inside the service that owns the data
A heartbeat row written on every run, success or failure
A liveness probe that alerts when the heartbeat is older than the job's own interval + a small buffer
Structured logs keyed by job_id and run_id
Idempotent handlers so replays are safe

Why this beats a cron alert service

External monitors (like cronitor or healthchecks.io) are great, but they only see whether the job started. The heartbeat row tells you whether the job finished its meaningful work. On the factory floor, those are very different events.

If a job has no idempotency story, assume it will one day run twice in the same minute. Plan accordingly.