Docs / Production Tracing
Production Tracing
Deploy AI applications with confidence using Evaligo's production tracing. Monitor performance, catch issues early, and understand real-world usage patterns through comprehensive observability.
Production tracing bridges the gap between development evaluation and live performance. While experiments help you build great AI features, tracing ensures they stay great once deployed to real users with real data at real scale.
This guide walks you through setting up tracing for your AI application, from initial SDK integration to advanced monitoring and alerting. You'll instrument your code, configure sampling, and create dashboards that help you maintain quality in production.
Tracing works with any LLM provider, any application architecture, and any deployment environment. Whether you're running a simple chatbot or a complex multi-agent system, these patterns will help you understand what's happening in production.
Why Production Tracing Matters
Even the best laboratory evaluation can't predict every real-world scenario. Users find edge cases you didn't anticipate, model performance shifts over time, and infrastructure issues can degrade quality in subtle ways.
Production tracing captures this reality, giving you data-driven insights into how your AI performs with actual users. It's your early warning system for quality degradation, cost spikes, and emerging edge cases.


Prerequisites
Before setting up tracing, ensure you have the following components ready for integration.
- 1
Evaligo Project Create a project in Evaligo to organize your production traces.
- 2
API Key Generate a production API key with tracing permissions from your project settings.
- 3
Application Access Ability to modify your AI application code to add SDK instrumentation.
Step 1: Install and Initialize the SDK
The Evaligo SDK provides automatic instrumentation for popular LLM providers and frameworks. Install it in your application environment and configure it with your project credentials.
The SDK automatically captures request/response data, timing information, token usage, and errors. You can extend this with custom metadata to track user sessions, feature flags, or business-specific context.
# Install the Evaligo SDK
npm install @evaligo/tracing
# or
pip install evaligo-tracing
# Basic initialization in your application
import { EvaaligoTracer } from '@evaligo/tracing'
const tracer = new EvaaligoTracer({
apiKey: process.env.EVALIGO_API_KEY,
projectId: process.env.EVALIGO_PROJECT_ID,
environment: 'production', // or 'staging', 'dev'
serviceName: 'customer-support-bot',
version: '1.2.0'
})
// Initialize tracing (call once at app startup)
await tracer.init()
Step 2: Instrument Your LLM Calls
Wrap your existing LLM calls with Evaligo's tracing decorators. This captures the complete request lifecycle including prompts, responses, metadata, and performance metrics.
The SDK supports auto-instrumentation for OpenAI, Anthropic, AWS Bedrock, and other major providers. For custom providers or complex workflows, use manual instrumentation to capture exactly what you need.
Include relevant metadata with each trace to enable powerful filtering and analysis later. User IDs, session identifiers, feature flags, and request types all help you understand patterns in your data.
// Auto-instrumentation (recommended)
import { OpenAI } from 'openai'
import { instrument } from '@evaligo/tracing'
const openai = instrument(new OpenAI({
apiKey: process.env.OPENAI_API_KEY
}))
// Your existing code works unchanged
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: userQuery }],
// Evaligo metadata (optional)
metadata: {
userId: req.user.id,
sessionId: req.sessionId,
feature: 'customer-support',
priority: 'high'
}
})


Step 3: Configure Sampling and Performance
Production systems generate large volumes of traces. Smart sampling balances observability needs with performance and cost constraints while ensuring you capture representative data.
Evaligo supports multiple sampling strategies: percentage-based for uniform coverage, rate-limiting for high-volume endpoints, and intelligent sampling that prioritizes errors and outliers.
Configure different sampling rates for different parts of your application. Critical user flows might trace at 100%, while background processes might sample at 1%.
// Configure sampling strategies
const tracer = new EvaaligoTracer({
// ... other config
sampling: {
// Default sampling rate (10% of all requests)
defaultRate: 0.1,
// Always trace errors and slow requests
alwaysTrace: {
errors: true,
slowRequests: true, // > 5 seconds
highCost: true // > $0.10 per request
},
// Per-endpoint configuration
rules: [
{ pattern: '/api/support/*', rate: 0.5 }, // High-value user flows
{ pattern: '/api/internal/*', rate: 0.01 }, // Background jobs
{ pattern: '/health', rate: 0 } // Health checks
]
}
})
Step 4: Verify and Monitor Your Traces
Once tracing is deployed, verify data is flowing correctly and set up monitoring to catch issues proactively. The Evaligo dashboard provides real-time visibility into your application's performance.
Create saved queries for common investigations like error patterns, slow requests, high-cost operations, and user-specific issues. Set up alerts to notify you when key metrics exceed thresholds.
Video



Troubleshooting Common Issues
Production tracing setups can encounter various issues. Here are solutions to the most common problems teams face when getting started.
Missing Traces
Check API key permissions, network connectivity, and sampling configuration. Verify the SDK is initialized before making LLM calls.
High Overhead
Reduce sampling rates, increase batch sizes, or implement more selective tracing based on request characteristics.
Incomplete Data
Ensure all LLM providers are instrumented and custom spans are properly closed. Check for exceptions that might prevent trace completion.
Next Steps
With production tracing operational, you can leverage this data for continuous improvement, automated quality gates, and proactive issue detection.