OpenTelemetry - OTEL

But why OpenTelemetry

OpenTelemetry is the second most active project in the CNCF,
with only Kubernetes being more active.

No Vendor Lock-in

Using an open standard keeps you from being tied to one vendor.

Easy to use

Using an open standard keeps you from being tied to one vendor.

All Use Cases

OpenTelemetry is your complete answer for all telemetry needs.

Standardized Observability

One standard for all telemetry signals boosts developer efficiency and teamwork consistency.

OpenTelemetry Coverage โ€” OpsPilot
OpenTelemetry Coverage

Everything OpsPilot
reads from your stack

OpsPilot ingests all four OTEL signal types and cross-correlates them to deliver expert-level analysis. Connect once via OpenTelemetry and get continuous AI-powered recommendations across your entire infrastructure.

4Signal Types
6Analysis Domains
50+Technologies
24/7Continuous Analysis
How data flows through OpsPilot
01๐Ÿ“กInstrumentAdd OTEL SDK or auto-instrumentation
02๐Ÿ“ฆCollectOTel Collector aggregates signals
03๐Ÿง AnalyzeOpsPilot AI runs on your schedule
04๐Ÿ“ŠInterpretCross-correlates all signals for context
05๐Ÿ’ฌDeliverPrioritized actions arrive in Slack
OTEL Signal Types
6 signals
Metrics

Metrics

Numerical measurements over time. OpsPilot analyzes counters, gauges, and histograms to surface performance bottlenecks, cost waste, and degradation trends invisible to the human eye.

countersgaugeshistogramssummariesupdown countersobservable gauges
Logs

Logs

Structured and unstructured event records. OpsPilot mines log patterns, error rates, and severity distributions to detect anomalies and coverage gaps across your services.

structured JSONerror logsaudit logsseverity levelslog attributesresource logs
Traces

Traces

End-to-end request journeys across services. OpsPilot maps trace topology, identifies latency hotspots, and detects services missing from your instrumentation coverage.

distributed tracestrace contextparent-child spanstrace IDsbaggagesampling
Spans

Spans

Individual units of work within a trace. OpsPilot analyzes span duration, status codes, and attribute completeness to pinpoint exactly where time is being spent.

span durationspan statusspan eventsspan attributesspan kinddb statements
Events

Events

Point-in-time occurrences attached to spans. OpsPilot tracks exception events, message events, and custom annotations to reconstruct root cause timelines.

exception eventsmessage eventscustom eventstimestamps
Profiles

Profiles

Continuous profiling data where available. OpsPilot correlates CPU, memory, and goroutine profiles with trace anomalies to surface deep performance inefficiencies.

CPU profilesmemory profilesgoroutinesflamegraphsbeta
What OpsPilot analyzes
6 domains
โšก

Performance Optimization

From metrics + spans + traces
  • P99 latency regressions in critical paths
  • Slow database queries identified from span attributes
  • N+1 query patterns detected across trace topology
  • Connection pool saturation and thread contention
  • Cache hit rate degradation over rolling windows
  • API timeout patterns and downstream dependency lag
๐Ÿ’ฐ

Cost Optimization

From metrics + resource attributes
  • Over-provisioned Kubernetes pods and nodes
  • Unused Lambda functions with provisioned concurrency
  • Idle container replicas during low-traffic periods
  • Log verbosity waste (DEBUG in production)
  • Redundant trace sampling at excessive rates
  • Unused or stale metric time series
๐Ÿ”ฅ

Error Rate Analysis

From logs + span status + events
  • Error rate spikes correlated across services
  • New exception types not seen in baseline
  • Retry storm patterns degrading downstream services
  • 5xx cascades tracing back to root cause span
  • Silent failures with no span error attribute set
  • Error budget burn rate against SLO thresholds
๐Ÿ”

Observability Gap Detection

From trace topology + log coverage
  • Services present in traces but emitting no logs
  • Spans missing essential attributes (db.statement, etc.)
  • Critical flows with incomplete trace propagation
  • Services with no health or readiness metrics
  • Missing SLI metrics for key user journeys
  • Alert coverage gaps on high-error-rate endpoints
๐Ÿ””

Alerting Effectiveness

From metrics + historical patterns
  • Noisy alerts with low signal-to-noise ratio
  • Flapping alerts that never resolve cleanly
  • Missing alerts on services with elevated error rates
  • Static thresholds that don't adapt to traffic patterns
  • Duplicate alert coverage on the same symptom
  • Alerts with no runbook or remediation guidance
๐Ÿ›ก๏ธ

Security Posture

From logs + span attributes + traces
  • Anomalous authentication failure spike patterns
  • Unusual service-to-service call patterns in traces
  • Sensitive data leaking through log attributes
  • Services calling deprecated or unpatched endpoints
  • Unexpected egress in service topology
  • Audit log coverage gaps on sensitive operations
Supported Technologies
50+ integrations
โ˜๏ธCloud Platforms & Infra10 integrationsโ–ถ
AWS Lambda
AWS ECS / Fargate
AWS EC2
AWS RDS / Aurora
Google Cloud Run
Google GKE
Azure AKS
Azure Functions
Cloudflare Workers
DigitalOcean Droplets
โš™๏ธKubernetes & Containers7 integrationsโ–ถ
Kubernetes
Docker
Helm
Istio Service Mesh
Envoy Proxy
Linkerd
Karpenter
๐Ÿ—„๏ธDatabases & Data Stores10 integrationsโ–ถ
PostgreSQL
MySQL / MariaDB
MongoDB
Redis
Elasticsearch
Cassandra
CockroachDB
DynamoDB
ClickHouse
Snowflake
๐Ÿ”งBackend Languages & Runtimes9 integrationsโ–ถ
Node.js
Python
Go
Java / JVM
Ruby on Rails
.NET / C#
Rust
PHP
Elixir / Erlang
๐Ÿ“จMessaging & Queues7 integrationsโ–ถ
Apache Kafka
RabbitMQ
AWS SQS / SNS
Google Pub/Sub
NATS
Apache Pulsar
Celery (Python)
๐ŸŒWeb Frameworks & APIs9 integrationsโ–ถ
Express.js
FastAPI
Django
Spring Boot
gRPC
GraphQL
Nginx
Kong Gateway
Traefik
Full support Partial support Beta
Example Slack analysis output
what you receive
#ops-insights โ€” OpsPilot Daily Digest
// โ”€โ”€โ”€ OpsPilot Daily Digest ยท 2026-02-24 09:00 UTC โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
 
๐Ÿ“Š Stack Health Score: 87/100 (+3 this week ยท better than 78% of similar teams)
 
// โ”€โ”€โ”€ HIGH PRIORITY โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
 
๐Ÿ”ด [HIGH] Performance ยท checkout-service
   Source: traces + db spans
   Finding: P99 latency 4.2s โ†’ 12.8s (+204%) since deploy 3h ago
   Cause: N+1 query in OrderRepository.findByUser() โ€” 47 queries per request
   Fix: Add JOIN FETCH or batch load. Effort: ~30 min. Impact: -8s latency
 
// โ”€โ”€โ”€ MEDIUM PRIORITY โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
 
๐ŸŸก [MEDIUM] Cost ยท notification-service
   Source: metrics + resource attributes
   Finding: 8 Lambda functions with provisioned concurrency, 0 invocations in 30d
   Fix: Remove provisioned concurrency. Effort: 15 min. Savings: $180/month
 
// โ”€โ”€โ”€ LOW PRIORITY โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
 
๐ŸŸข [LOW] Gap Detection ยท payment-service
   Source: trace topology analysis
   Finding: payment-service appears in traces but emits no structured logs
   Fix: Add OTEL log SDK. Critical path โ€” blind during incidents.
 
// โ”€โ”€โ”€ 3 recommendations ยท next analysis in 23h 58m โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

Connect your stack in under 10 minutes

Point your OpenTelemetry Collector to OpsPilot's endpoint and start receiving AI-powered analysis on your schedule โ€” hourly, daily, or weekly.

Connect Now โ†’
Scroll to Top