Cron Jobs That Don't Lie
Scheduled jobs are where silent failures go to hide. Here is the pattern I use to make them loud.
The silent failure problem
A cron job that fails once is a bug. A cron job that fails every night for a month without anyone noticing is a career event. The default Linux cron setup will email nobody, log nowhere useful, and retry nothing.
The setup I ship
- BullMQ or node-cron inside the service that owns the data
- A heartbeat row written on every run, success or failure
- A liveness probe that alerts when the heartbeat is older than the job's own interval + a small buffer
- Structured logs keyed by
job_idandrun_id - Idempotent handlers so replays are safe
Why this beats a cron alert service
External monitors (like cronitor or healthchecks.io) are great, but they only see whether the job started. The heartbeat row tells you whether the job finished its meaningful work. On the factory floor, those are very different events.
If a job has no idempotency story, assume it will one day run twice in the same minute. Plan accordingly.







