Telemetry and metrics

ToolHive includes built-in instrumentation using OpenTelemetry, providing comprehensive observability for your MCP server interactions. Export traces and metrics to popular observability backends like Jaeger, Honeycomb, Datadog, and Grafana Cloud, or expose Prometheus metrics directly.

What you can monitor

ToolHive's telemetry captures detailed information about MCP interactions including traces, metrics, and performance data. For a comprehensive overview of the telemetry architecture, metrics collection, and monitoring capabilities, see the observability overview.

Enable telemetry

There are two ways to configure telemetry: a shared MCPTelemetryConfig resource (recommended) or inline spec.telemetry on each MCPServer.

Shared telemetry configuration (recommended)

The MCPTelemetryConfig CRD lets you define telemetry settings once and reference them from multiple MCPServer resources. Each server can override its serviceName for distinct identity in your observability backend.

Step 1: Create an MCPTelemetryConfig resource

shared-otel-config.yaml
apiVersion: toolhive.stacklok.dev/v1alpha1
kind: MCPTelemetryConfig
metadata:
  name: shared-otel
  namespace: toolhive-system
spec:
  openTelemetry:
    enabled: true
    endpoint: otel-collector-opentelemetry-collector.monitoring.svc.cluster.local:4318
    insecure: true
    metrics:
      enabled: true
    tracing:
      enabled: true
      samplingRate: '0.05'
  prometheus:
    enabled: true

kubectl apply -f shared-otel-config.yaml

Step 2: Reference from an MCPServer

Reference the config by name in telemetryConfigRef:

mcpserver-with-shared-otel.yaml
apiVersion: toolhive.stacklok.dev/v1alpha1
kind: MCPServer
metadata:
  name: gofetch
  namespace: toolhive-system
spec:
  image: ghcr.io/stackloklabs/gofetch/server
  transport: streamable-http
  proxyPort: 8080
  telemetryConfigRef:
    name: shared-otel
    serviceName: mcp-fetch-server

Service name

Set serviceName to a meaningful name for each MCP server. This helps identify the server in your observability backend. The default is toolhive-mcp-proxy.

kubectl apply -f mcpserver-with-shared-otel.yaml

Step 3: Verify

kubectl get mcpotel -n toolhive-system

The REFERENCES column shows which workloads use this config. The READY column confirms validation passed.

Configuration details

Set spec.openTelemetry.endpoint to the address of your OTLP-compatible collector or backend. ToolHive supports exporting traces, metrics, or both simultaneously, as shown in the example above.

note

Specify the endpoint as a hostname and optional port, without a scheme or path (for example, api.honeycomb.io or api.honeycomb.io:443, not https://api.honeycomb.io). ToolHive uses HTTPS by default; set insecure: true to disable TLS.

Set spec.openTelemetry.tracing.samplingRate to control the percentage of requests traced, as a value between 0 and 1.0. The default is 0.05 (5%).

To expose a Prometheus-compatible /metrics endpoint for pull-based scraping, enable spec.prometheus.enabled. Access the metrics at http://<HOST>:<PORT>/metrics, where <HOST> is the resolvable address of the ToolHive ProxyRunner fronting your MCP server pod and <PORT> is the port the ProxyRunner service exposes for traffic.

Authentication headers

If your OTLP endpoint requires authentication, add headers to the MCPTelemetryConfig resource. Use headers for non-secret values or sensitiveHeaders to reference credentials stored in Kubernetes Secrets. A header name cannot appear in both fields.

otel-config-with-auth.yaml
apiVersion: toolhive.stacklok.dev/v1alpha1
kind: MCPTelemetryConfig
metadata:
  name: otel-with-auth
  namespace: toolhive-system
spec:
  openTelemetry:
    enabled: true
    endpoint: <OTLP_ENDPOINT>
    sensitiveHeaders:
      - name: Authorization
        secretKeyRef:
          name: otel-credentials
          key: api-key
    tracing:
      enabled: true
    metrics:
      enabled: true

Inline telemetry configuration

Deprecated

The inline spec.telemetry field on MCPServer is deprecated and will be removed in a future release. Use telemetryConfigRef to reference a shared MCPTelemetryConfig resource instead. You cannot set both fields on the same MCPServer.

To enable telemetry inline, specify the configuration directly in the MCPServer or MCPRemoteProxy custom resource. The inline fields mirror the shared MCPTelemetryConfig structure under spec.telemetry:

apiVersion: toolhive.stacklok.dev/v1alpha1
kind: MCPServer # or MCPRemoteProxy
metadata:
  name: gofetch
  namespace: toolhive-system
spec:
  image: ghcr.io/stackloklabs/gofetch/server
  transport: streamable-http
  proxyPort: 8080
  mcpPort: 8080
  # ... other spec fields ...
  telemetry:
    openTelemetry:
      enabled: true
      endpoint: otel-collector-opentelemetry-collector.monitoring.svc.cluster.local:4318
      serviceName: mcp-fetch-server
      insecure: true
      metrics:
        enabled: true
      tracing:
        enabled: true
        samplingRate: '0.05'
    prometheus:
      enabled: true

Observability backends

ToolHive can export telemetry data to many different observability backends. It supports exporting traces and metrics to any backend that implements the OTLP protocol. Some common examples are listed below, but specific configurations will vary based on your environment and requirements.

note

The backend examples below use MCPTelemetryConfig resources. Reference them from your MCPServer resources using telemetryConfigRef as shown in the shared telemetry configuration section above.

OpenTelemetry Collector (recommended)

The OpenTelemetry Collector is a vendor-agnostic way to receive, process and export telemetry data. It supports many backend services, scalable deployment options, and advanced processing capabilities.

To deploy the OpenTelemetry Collector in a Kubernetes cluster, see the OpenTelemetry Collector documentation. A minimal collector configuration that receives OTLP data and exports traces and metrics:

otel-collector.yaml
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: otel-collector
  namespace: monitoring
spec:
  config:
    receivers:
      otlp:
        protocols:
          http:
            endpoint: 0.0.0.0:4318
    processors:
      batch:
        send_batch_size: 1024
        timeout: 5s
    exporters:
      otlp/traces:
        endpoint: <TRACE_BACKEND>:4317
        tls:
          insecure: true
      prometheus:
        endpoint: 0.0.0.0:8889
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch]
          exporters: [otlp/traces]
        metrics:
          receivers: [otlp]
          processors: [batch]
          exporters: [prometheus]

Then point your MCPTelemetryConfig at the collector's OTLP HTTP receiver port (default 4318):

apiVersion: toolhive.stacklok.dev/v1alpha1
kind: MCPTelemetryConfig
metadata:
  name: otel-collector
  namespace: toolhive-system
spec:
  openTelemetry:
    enabled: true
    endpoint: otel-collector-collector.monitoring.svc.cluster.local:4318
    insecure: true
    metrics:
      enabled: true
    tracing:
      enabled: true

Prometheus

This example scrapes the /metrics endpoint exposed by each MCP server directly. To aggregate metrics through an OpenTelemetry Collector instead (ToolHive pushes to the collector, Prometheus scrapes the collector), see the OpenTelemetry Collector section.

To enable scraping, enable Prometheus in your telemetry configuration and add the following to your Prometheus configuration:

prometheus.yml
scrape_configs:
  - job_name: 'toolhive-mcp-proxy'
    static_configs:
      - targets: ['<MCP_SERVER_PROXY_SVC_URL>:<MCP_SERVER_PORT>']
    scrape_interval: 15s
    metrics_path: /metrics

Add multiple MCP servers to the targets list. Replace <MCP_SERVER_PROXY_SVC_URL> with the ProxyRunner SVC name and <MCP_SERVER_PORT> with the port number exposed by the SVC.

Jaeger

Jaeger is a popular open source distributed tracing system that natively supports OTLP. Point your telemetry configuration directly at Jaeger's OTLP HTTP port (default 4318):

apiVersion: toolhive.stacklok.dev/v1alpha1
kind: MCPTelemetryConfig
metadata:
  name: jaeger-tracing
  namespace: toolhive-system
spec:
  openTelemetry:
    enabled: true
    endpoint: jaeger-collector.monitoring.svc.cluster.local:4318
    insecure: true
    tracing:
      enabled: true

Honeycomb

Send OpenTelemetry data directly to Honeycomb's OTLP endpoint, or use the OpenTelemetry Collector to forward data to Honeycomb. This example sends data directly:

apiVersion: toolhive.stacklok.dev/v1alpha1
kind: MCPTelemetryConfig
metadata:
  name: honeycomb
  namespace: toolhive-system
spec:
  openTelemetry:
    enabled: true
    endpoint: api.honeycomb.io:443
    sensitiveHeaders:
      - name: x-honeycomb-team
        secretKeyRef:
          name: honeycomb-credentials
          key: api-key
    tracing:
      enabled: true
    metrics:
      enabled: true

Find your Honeycomb API key in your Honeycomb account settings. Store it in a Kubernetes Secret referenced by sensitiveHeaders.

Datadog

Datadog has multiple options for collecting OpenTelemetry data:

The OpenTelemetry Collector is recommended for existing OpenTelemetry users or users wanting a vendor-neutral solution.
The Datadog Agent is recommended for existing Datadog users.

Grafana Cloud

Send OpenTelemetry data to Grafana Cloud using Grafana Alloy, Grafana Labs' supported distribution of the OpenTelemetry Collector. This is the recommended method for production deployments.

To send data directly to Grafana Cloud's OTLP endpoint:

apiVersion: toolhive.stacklok.dev/v1alpha1
kind: MCPTelemetryConfig
metadata:
  name: grafana-cloud
  namespace: toolhive-system
spec:
  openTelemetry:
    enabled: true
    endpoint: <GRAFANA_OTLP_ENDPOINT>
    sensitiveHeaders:
      - name: Authorization
        secretKeyRef:
          name: grafana-cloud-credentials
          key: auth-header
    tracing:
      enabled: true
    metrics:
      enabled: true

Replace <GRAFANA_OTLP_ENDPOINT> with the OTLP endpoint from your Grafana Cloud portal (for example, otlp-gateway-prod-us-central-0.grafana.net:443). Store your base64-encoded instanceID:apiToken credentials in a Kubernetes Secret referenced by sensitiveHeaders.

Performance considerations

Sampling rates

Adjust sampling rates based on your environment:

Development: samplingRate: '1.0' (100% sampling)
Production: samplingRate: '0.01' (1% sampling for high-traffic systems)
Default: samplingRate: '0.05' (5% sampling)

Network overhead

Telemetry adds minimal overhead when properly configured:

Use appropriate sampling rates for your traffic volume
Monitor your observability backend costs and adjust sampling accordingly

Next steps

Set up audit logging for structured request and authorization event tracking
Secure your servers with authentication and authorization

Tutorial: Collect telemetry for MCP workloads - step-by-step guide to set up a local observability stack
Telemetry and monitoring concepts - overview of ToolHive's observability architecture
Kubernetes CRD reference - reference for the MCPServer Custom Resource Definition (CRD)
Deploy the operator - install the ToolHive operator

What you can monitor​

Enable telemetry​

Shared telemetry configuration (recommended)​

Configuration details​

Authentication headers​

Inline telemetry configuration​

Observability backends​

OpenTelemetry Collector (recommended)​

Prometheus​

Jaeger​

Honeycomb​

Datadog​

Grafana Cloud​

Performance considerations​

Sampling rates​

Network overhead​

Next steps​

Related information​