Runbook: Incident Response Checklist
Severity Assessment
| Severity | Definition | Example |
|---|---|---|
| P1 — Critical | Site down or major feature broken for all users | Homepage 500, database down |
| P2 — High | Significant feature broken | Search not working, no ads loading |
| P3 — Medium | Partial degradation | Slow page loads, one page type broken |
| P4 — Low | Minor issue | Cosmetic bug, admin-only issue |
P1: Site Down
Immediate (0–5 minutes)
- [ ] Verify the site is actually down:
curl -I https://www.dezeen.com/ - [ ] Check Cloudflare status: cloudflarestatus.com
- [ ] Check if it's a specific node: test each web node IP directly
- [ ] Check Varnish backend health:
varnishadm "backend.list"
Diagnose (5–15 minutes)
- [ ] SSH into web nodes (WS1–WS4) and check Apache status:
systemctl status apache2 - [ ] Check PHP errors:
tail -100 /var/log/apache2/error.log - [ ] Check database:
mysql -u root -p -e "SHOW PROCESSLIST" - [ ] Check disk space:
df -h - [ ] Check memory:
free -m - [ ] Check WordPress debug log:
tail -100 wp-content/debug.log
Resolve
- [ ] Apache down:
systemctl restart apache2 - [ ] MySQL down:
systemctl restart mysql - [ ] Memory exhaustion: Kill runaway processes; increase limits
- [ ] Disk full: Clear WP Rocket cache, old logs, temporary files
- [ ] Bad deployment: Redeploy last known good commit via DeployHQ
- [ ] Plugin crash: Disable the problem plugin via WP-CLI:
wp plugin deactivate <plugin-name>
Post-Incident
- [ ] Purge all caches (see cache-purge.md)
- [ ] Verify all 4 web nodes are healthy
- [ ] Monitor for 30 minutes
- [ ] Write incident report
P2: Feature Broken
Search Down
- [ ] Check Algolia dashboard for API status
- [ ] Verify
ALGOLIA_APPLICATION_IDand API keys - [ ] Check which Algolia plugin is active (only one PHP variant should be)
- [ ] Test search endpoint directly:
curl https://www.dezeen.com/wp-json/...
Ads Not Loading
- [ ] Check Google Ad Manager status
- [ ] Verify Cookiebot is not blocking ad scripts
- [ ] Check browser console for JS errors
- [ ] Test in incognito / with consent accepted
Comments Not Loading
- [ ] Check Disqus status: status.disqus.com
- [ ] Verify Disqus plugin is active
- [ ] Check for JavaScript errors in browser console
Newsletter Forms Broken
- [ ] Test CM API: POST to
dezeen-campaign-monitor/v1/test - [ ] Check Campaign Monitor service status
- [ ] Verify API keys in wp-config.php
P3: Performance Degradation
- [ ] Check Query Monitor for slow queries
- [ ] Check Varnish hit rate:
varnishstat - [ ] Check Cloudflare analytics for traffic spike
- [ ] Review recent deployments for performance regression
- [ ] Check if a heavy admin operation is running (bulk edit, import)
- [ ] Run
wp cron event listto check for stuck cron jobs
Communication Template
[TIMESTAMP] - Dezeen.com Incident
Status: [Investigating / Identified / Resolved]
Severity: P[1-4]
Impact: [Description of user impact]
Cause: [Known cause or "Under investigation"]
ETA: [Expected resolution time]
Actions taken: [List of actions]Key Contacts
| Role | Contact | Notes |
|---|---|---|
| Hosting (Jelastic) | — | Check Enscale dashboard |
| CDN (Cloudflare) | — | Check dashboard / status page |
| Outgoing team | — | Available until October 2026 |
| Stakeholders | — | Notify for P1/P2 incidents |
Rollback Procedure
- Open DeployHQ
- Find the last successful deployment
- Click "Redeploy" to restore previous version
- Alternatively:
git revert <commit>and push tomaster - Purge all caches after rollback
Gotchas During Incidents
- Don't restart all nodes at once — Keep at least 2 nodes in rotation
- Check the admin server separately —
admin.dezeen.comis on a different server - Database changes are not versioned — SQL rollback requires a backup restore
- Varnish takes time to recover — After a node comes back, 5–15 second delay before traffic routes to it
- Cloudflare may mask the issue — If Cloudflare is caching, users may see old content even when origin is down