Skip to main content

DevOps & Security

Incident Postmortem Writer

Generates a structured blameless postmortem from incident timelines, alerts, and deploy logs with root cause analysis, impact assessment, and owned action items. Useful for producing first-draft postmortems under operational pressure. SRE and platform teams running incident retros, engineering leads writing postmortems under time pressure, compliance teams preparing SOC 2 / ISO 27001 incident-response evidence, founders who do incident response for their own small teams. After an incident, the people with the most context are the ones too tired to write, and the ones with the energy to write don't have the context. The result is either a postmortem that gets written weeks late (losing accuracy) or a shallow one that doesn't surface real learnings. A structured writer takes the raw material — timeline of pages, deploy logs, chat messages — and produces a first-draft postmortem that the team reviews and corrects, cutting the emotional and time cost by 80%.

Nexus CertifiedClaude CodeCodexOpenClawGoogle Antigravity
postmortemsincident-responseoperationsanalysisdocumentation

One-Time Purchase

$19.99

Sample Output

Postmortem — INC-2026-042: Payments API 5xx Spike

Severity: SEV2 (degraded primary service, no full outage) · Author: @oncall-platform

One-paragraph summary

A deploy to the payments service landed last week with a regression in the idempotency-key cache. The cache returned stale write-state for ~9% of POST /charges requests, causing a 5xx spike that lasted 26 minutes from first alert to rollback. No money moved incorrectly. Mitigation was a rollback; resolution was a forward-fix the next day. Detection was fast (1 minute alert-to-page), mitigation was slow (we hesitated to roll back during a peak window).

52 min

Total detect-to-resolve. Detect-to-mitigate (rollback) was 26 minutes; the second 26 minutes was forward-fix + verification. SLO impact: 0.04% of the error budget for the month.


Impact

DimensionValue
Requests affected~14,200 POST /charges with 5xx response
Customers affected1,840 unique merchants
Funds at risk$0 — no incorrect charges
Downstream services degraded2 (Subscriptions retried; Webhooks fell behind by ~8 min)
Error budget consumed0.04% of the monthly budget

Timeline (UTC)

What happened, in order

Deploy of payments-svc v3.2.0 lands in prod (canary 10% → 100% in 8 min)14:11
Datadog alert fires: 5xx rate on /charges crosses 1.5%14:12
On-call paged; acknowledges in 47 seconds14:13
IC declared; #inc-payments-042 channel opened14:18
Root cause hypothesis: idempotency-cache regression in v3.2.014:24
Decision to roll back; deploy queued behind a feature-flag PR14:32
Rollback complete; 5xx rate returns to baseline within 2 minutes14:38
Forward-fix deployed and verified; incident resolved15:04

Contributing Factors

FactorSeverityNotes
Idempotency cache returned stale write-state for some keysHighDirect cause; introduced in v3.2.0
Canary metrics didn't catch the regressionMediumCache hit-rate threshold was too lenient; 10% canary was not enough traffic to surface
Rollback hesitation during peak hourMediumWe waited ~6 minutes debating forward-fix vs rollback; should have been a rollback the moment cause was identified
Runbook didn't list the cache as a possible 5xx sourceLowSlowed root-cause hypothesis by ~4 minutes

Action Items

#ActionOwnerDuePriority
1Add cache-consistency check to canary gates@platform-teamThis sprintP0
2Tighten 5xx canary threshold; promote at 25% not 10%@platform-teamThis sprintP0
3Update payments runbook: cache as 5xx source@oncall-platformNext weekP1
4Rollback-first policy doc: cause-clear → rollback within 2 min@eng-leadershipTwo weeksP1
5Idempotency-cache integration test for stale-read scenario@payments-teamNext sprintP2

What Went Well

Detection

Alert fired within 60 seconds of the deploy. On-call acked in under a minute. The instrumentation we shipped last quarter is paying for itself.

Comms

Status-page update went out at 14:21 — nine minutes after alert. Customers reported they appreciated the early heads-up even before we had a root cause.


What to Change

Default to rollback

The biggest single time loss was deliberating between rollback and forward-fix while customers were getting 5xx. If a deploy is identified as the cause and the rollback is clean, the rollback should start within 2 minutes of cause identification. The forward-fix is a separate decision.

Canary gates need teeth

A 10% canary that runs for under 10 minutes will not catch a 9%-of-requests bug at any reasonable threshold. Either expand the canary or add cache-specific signals as gates.


This sample illustrates the skill's output format. Names, numbers, and timelines are illustrative.

View full sample →

All sales final. No refunds on digital products.

Includes support for Claude Code, Codex, OpenClaw, and Google Antigravity in the same license.

Also in Incident Response

Bundle price: $55. Compare this skill with the full workflow bundle or Pro access.

Best for

On-call engineers and SREs who need to ship a postmortem within 48 hours of an incident while context is fresh but writing energy is low. Most useful for teams running a real blameless retro practice where the first draft is meant to be reviewed and corrected, not published as-is.

Not ideal for

A standalone source for regulatory incident disclosures (HIPAA breach notification, GDPR Article 33 reports, public-company material event filings). It is fine to use the output as a first draft for those, but counsel must review and edit before anything that carries legal weight is filed externally.

Included in this purchase

  • Claude Code, Codex, OpenClaw, and Google Antigravity skill files.
  • Setup guidance for the right adapter in your workspace.
  • One-time license for the purchased skill version.

Setup

Plan for a short copy-and-configure setup in your preferred agent workspace. No custom integration is required for the skill file itself.

Claude CodeCodexOpenClawGoogle Antigravity

Related Skills

Incident Response
Outage Response Playbook
Generates structured, role-clear incident response playbooks for specific failure scenarios. Covers detection through resolution and post-mortem — ready to use when an incident actually happens.
Claude CodeCodexOpenClawGoogle Antigravity
outage-responsereliabilityrunbooks

$19.99

One-time license

View Skill
Security Scanning
OWASP Top 10 Scanner
Scans code for OWASP Top 10 vulnerability patterns including injection, XSS, IDOR, and insecure deserialization with severity ratings and remediation snippets. Useful for pre-commit security checks and enterprise compliance.
Claude CodeCodexOpenClawGoogle Antigravity
securityowaspvulnerabilities

$19.99

One-time license

View Skill
Security Scanning
Secret Leakage Preventer
Scans code and commits for hardcoded secrets, API keys, connection strings, and credentials, then proposes secure alternatives. Useful for preventing the leading class of AI-era security incidents.
Claude CodeCodexOpenClawGoogle Antigravity
securitysecretscredentials

$19.99

One-time license

View Skill

Future Updates

This purchase includes the current version of the skill. If you want future adapter updates — meaning compatibility and packaging updates as supported platforms evolve — plus new catalog additions included automatically, upgrade to Pro.

Upgrade to Pro