OpenTelemetry - OTEL

But why OpenTelemetry

OpenTelemetry is the second most active project in the CNCF,
with only Kubernetes being more active.

No Vendor Lock-in

Using an open standard keeps you from being tied to one vendor.

Easy to use

Using an open standard keeps you from being tied to one vendor.

All Use Cases

OpenTelemetry is your complete answer for all telemetry needs.

Standardized Observability

One standard for all telemetry signals boosts developer efficiency and teamwork consistency.

OpenTelemetry Coverage — OpsPilot
OpenTelemetry Coverage

Everything OpsPilot
reads from your stack

OpsPilot ingests all four OTEL signal types and cross-correlates them to deliver expert-level analysis. Connect once via OpenTelemetry and get continuous AI-powered recommendations across your entire infrastructure.

4Signal Types
6Analysis Domains
50+Technologies
24/7Continuous Analysis
How data flows through OpsPilot
01📡InstrumentAdd OTEL SDK or auto-instrumentation
02📦CollectOTel Collector aggregates signals
03🧠AnalyzeOpsPilot AI runs on your schedule
04📊InterpretCross-correlates all signals for context
05💬DeliverPrioritized actions arrive in Slack
OTEL Signal Types
6 signals
Metrics

Metrics

Numerical measurements over time. OpsPilot analyzes counters, gauges, and histograms to surface performance bottlenecks, cost waste, and degradation trends invisible to the human eye.

countersgaugeshistogramssummariesupdown countersobservable gauges
Logs

Logs

Structured and unstructured event records. OpsPilot mines log patterns, error rates, and severity distributions to detect anomalies and coverage gaps across your services.

structured JSONerror logsaudit logsseverity levelslog attributesresource logs
Traces

Traces

End-to-end request journeys across services. OpsPilot maps trace topology, identifies latency hotspots, and detects services missing from your instrumentation coverage.

distributed tracestrace contextparent-child spanstrace IDsbaggagesampling
Spans

Spans

Individual units of work within a trace. OpsPilot analyzes span duration, status codes, and attribute completeness to pinpoint exactly where time is being spent.

span durationspan statusspan eventsspan attributesspan kinddb statements
Events

Events

Point-in-time occurrences attached to spans. OpsPilot tracks exception events, message events, and custom annotations to reconstruct root cause timelines.

exception eventsmessage eventscustom eventstimestamps
Profiles

Profiles

Continuous profiling data where available. OpsPilot correlates CPU, memory, and goroutine profiles with trace anomalies to surface deep performance inefficiencies.

CPU profilesmemory profilesgoroutinesflamegraphsbeta
What OpsPilot analyzes
6 domains

Performance Optimization

From metrics + spans + traces
  • P99 latency regressions in critical paths
  • Slow database queries identified from span attributes
  • N+1 query patterns detected across trace topology
  • Connection pool saturation and thread contention
  • Cache hit rate degradation over rolling windows
  • API timeout patterns and downstream dependency lag
💰

Cost Optimization

From metrics + resource attributes
  • Over-provisioned Kubernetes pods and nodes
  • Unused Lambda functions with provisioned concurrency
  • Idle container replicas during low-traffic periods
  • Log verbosity waste (DEBUG in production)
  • Redundant trace sampling at excessive rates
  • Unused or stale metric time series
🔥

Error Rate Analysis

From logs + span status + events
  • Error rate spikes correlated across services
  • New exception types not seen in baseline
  • Retry storm patterns degrading downstream services
  • 5xx cascades tracing back to root cause span
  • Silent failures with no span error attribute set
  • Error budget burn rate against SLO thresholds
🔍

Observability Gap Detection

From trace topology + log coverage
  • Services present in traces but emitting no logs
  • Spans missing essential attributes (db.statement, etc.)
  • Critical flows with incomplete trace propagation
  • Services with no health or readiness metrics
  • Missing SLI metrics for key user journeys
  • Alert coverage gaps on high-error-rate endpoints
🔔

Alerting Effectiveness

From metrics + historical patterns
  • Noisy alerts with low signal-to-noise ratio
  • Flapping alerts that never resolve cleanly
  • Missing alerts on services with elevated error rates
  • Static thresholds that don't adapt to traffic patterns
  • Duplicate alert coverage on the same symptom
  • Alerts with no runbook or remediation guidance
🛡️

Security Posture

From logs + span attributes + traces
  • Anomalous authentication failure spike patterns
  • Unusual service-to-service call patterns in traces
  • Sensitive data leaking through log attributes
  • Services calling deprecated or unpatched endpoints
  • Unexpected egress in service topology
  • Audit log coverage gaps on sensitive operations
Supported Technologies
50+ integrations
☁️Cloud Platforms & Infra10 integrations
AWS Lambda
AWS ECS / Fargate
AWS EC2
AWS RDS / Aurora
Google Cloud Run
Google GKE
Azure AKS
Azure Functions
Cloudflare Workers
DigitalOcean Droplets
⚙️Kubernetes & Containers7 integrations
Kubernetes
Docker
Helm
Istio Service Mesh
Envoy Proxy
Linkerd
Karpenter
🗄️Databases & Data Stores10 integrations
PostgreSQL
MySQL / MariaDB
MongoDB
Redis
Elasticsearch
Cassandra
CockroachDB
DynamoDB
ClickHouse
Snowflake
🔧Backend Languages & Runtimes9 integrations
Node.js
Python
Go
Java / JVM
Ruby on Rails
.NET / C#
Rust
PHP
Elixir / Erlang
📨Messaging & Queues7 integrations
Apache Kafka
RabbitMQ
AWS SQS / SNS
Google Pub/Sub
NATS
Apache Pulsar
Celery (Python)
🌐Web Frameworks & APIs9 integrations
Express.js
FastAPI
Django
Spring Boot
gRPC
GraphQL
Nginx
Kong Gateway
Traefik
Full support Partial support Beta
Example Slack analysis output
what you receive
#ops-insights — OpsPilot Daily Digest
// ─── OpsPilot Daily Digest · 2026-02-24 09:00 UTC ──────────────────
 
📊 Stack Health Score: 87/100 (+3 this week · better than 78% of similar teams)
 
// ─── HIGH PRIORITY ──────────────────────────────────────────────────
 
🔴 [HIGH] Performance · checkout-service
   Source: traces + db spans
   Finding: P99 latency 4.2s → 12.8s (+204%) since deploy 3h ago
   Cause: N+1 query in OrderRepository.findByUser() — 47 queries per request
   Fix: Add JOIN FETCH or batch load. Effort: ~30 min. Impact: -8s latency
 
// ─── MEDIUM PRIORITY ─────────────────────────────────────────────────
 
🟡 [MEDIUM] Cost · notification-service
   Source: metrics + resource attributes
   Finding: 8 Lambda functions with provisioned concurrency, 0 invocations in 30d
   Fix: Remove provisioned concurrency. Effort: 15 min. Savings: $180/month
 
// ─── LOW PRIORITY ────────────────────────────────────────────────────
 
🟢 [LOW] Gap Detection · payment-service
   Source: trace topology analysis
   Finding: payment-service appears in traces but emits no structured logs
   Fix: Add OTEL log SDK. Critical path — blind during incidents.
 
// ─── 3 recommendations · next analysis in 23h 58m ───────────────────

Connect your stack in under 10 minutes

Point your OpenTelemetry Collector to OpsPilot's endpoint and start receiving AI-powered analysis on your schedule — hourly, daily, or weekly.

Connect Now →
Scroll to Top