Incident Management

Q: What are SEV-1, SEV-2, SEV-3, and SEV-4 severity levels?

OpsPilot uses four severity levels: SEV-1 is a critical outage with full business impact requiring an all-hands response. SEV-2 is major degradation with key functionality impaired, requiring immediate action. SEV-3 is partial impact affecting non-critical systems, handled with a monitored response. SEV-4 is a minor issue with low user impact, scheduled for investigation.

Q: How does OpsPilot handle incident tasks and action items?

OpsPilot Tasks brings all action items onto a unified board connected to the incident that created them. Whether it is a follow-up item, a runbook step converted into an action, or standalone maintenance work, every task lives in one place — preventing remediation steps from being lost after an incident is marked resolved.

Q: How does OpsPilot integrate incident management with the Service Catalog?

Every incident in OpsPilot is automatically contextualized using the Service Catalog, which stores ownership, service tiers, dependencies, and runbooks for every microservice. This gives Coworker AI the foundational context it needs to auto-triage incidents faster and run more targeted investigations.

Question 1

What is incident management in OpsPilot?

Answer

OpsPilot incident management is a centralized, high-velocity response environment built into the OpsPilot AI observability platform. It provides a standardized lifecycle (Triage → Respond → Resolve → Closed), a four-tier severity system (SEV-1 through SEV-4), a live activity timeline, a contextual sidebar with active runbooks and SLA budgets, and a post-mortem gatekeeper — all in a single unified workspace.

Question 2

What are SEV-1, SEV-2, SEV-3, and SEV-4 severity levels?

Answer

OpsPilot uses four severity levels. SEV-1 is a critical outage with full business impact requiring an all-hands response. SEV-2 is major degradation with key functionality impaired, requiring immediate action. SEV-3 is partial impact affecting non-critical systems, handled with a monitored response. SEV-4 is a minor issue with low user impact, scheduled for investigation. This standardized model ensures every engineer responds with the same urgency — automatically.

Question 3

How does OpsPilot's live activity timeline work?

Answer

The live activity timeline tracks every state change, internal communication, and automated alert in real time within a single incident workspace. It creates an immutable audit trail so any engineer joining mid-incident can get up to speed immediately. Entries are logged automatically by the OpsPilot alerting engine and Coworker AI, alongside manual updates from the incident team.

Question 4

What does the contextual sidebar show during an incident?

Answer

The contextual sidebar displays active runbooks with live step progress, linked services from the Service Catalog, real-time SLA budget with breach warnings, and incident ownership — all without leaving the incident workspace. It is designed to eliminate tab switching and context loss during high-pressure events.

Question 5

What is the post-mortem gatekeeper?

Answer

The post-mortem gatekeeper prevents incidents from being closed without completing remediation accountability. When an incident moves to resolved, OpsPilot opens a post-mortem editor tied to the incident record — ensuring root cause documentation, follow-up actions, and SLA budget notes are tracked and completed, not abandoned after the immediate fire is out.

Question 6

When does OpsPilot send incident notifications?

Answer

OpsPilot sends precision notifications at three key moments: when an SLA budget is approaching a breach threshold, when an upstream dependency impacts a service you own, and when a critical task is assigned to your plate. Notifications are delivered via Slack, Microsoft Teams, or wherever your team works — alerting you only when your attention is genuinely required.

Question 7

What is Coworker AI and how does it help with incident response?

Answer

Coworker is OpsPilot's AI SRE teammate. During incidents it investigates alerts, correlates metrics, logs, and traces, and surfaces root cause analysis directly in the incident workspace. Autonomous incident response is in development: Coworker will proactively spin up incidents, dynamically assign tasks, page the correct service owners, and autonomously execute remediation steps to stop outages before they escalate.

Question 8

Does OpsPilot replace PagerDuty or other incident tools?

Answer

OpsPilot is an AI-powered observability and site reliability platform with full incident management built in — covering alerting, incident lifecycle, tasks, post-mortems, and SLA tracking. Teams using Datadog, Dynatrace, or New Relic who want to consolidate incident tooling with AI-driven investigation can replace standalone tools like PagerDuty as part of an OpsPilot adoption.

Question 9

What observability tools does OpsPilot work with?

Answer

OpsPilot ingests telemetry via OpenTelemetry (OTLP), Prometheus remote_write, and the FusionReactor APM agent. It works alongside Grafana, Prometheus, Datadog, Dynatrace, and New Relic — no rip-and-replace required. Incident context is automatically enriched with the metrics, logs, and traces already flowing into OpsPilot.

Question 10

Is incident management available on all OpsPilot plans?

Answer

Yes. All Incidents features — including the live activity timeline, contextual sidebar, post-mortem gatekeeper, tasks board, and precision notifications — are live and rolled out to every OpsPilot workspace. No additional configuration or agent migration is required.

Intelligent AIOps

Coworker

Application Performance Monitoring

Metrics

Distributed Tracing

Incident Management

Intelligent Alerting

Log Management

Kubernetes

Dashboards

Contact us

Blog

Docs

OpsPilot App

Incident Management

Stop fighting fires.
Start controlling them.

One shared language
for every incident.

Everything that happened.
In order. In real time.

Everything you need.
Right where you are.

Built for the full
incident lifecycle.

Faster response starts
today.

Frequently asked questions

Contact Info

Incident Management

Stop fighting fires.Start controlling them.

One shared languagefor every incident.

Everything that happened.In order. In real time.

Everything you need.Right where you are.

Built for the fullincident lifecycle.

Faster response startstoday.

Frequently asked questions

Stop fighting fires.
Start controlling them.

One shared language
for every incident.

Everything that happened.
In order. In real time.

Everything you need.
Right where you are.

Built for the full
incident lifecycle.

Faster response starts
today.