Observability¶

Ella Core supports four observability pillars: Metrics, Logs, Traces, and Profiles.

1. Metrics¶

Ella Core exposes Prometheus metrics to monitor the health of an Ella Core instance.

Please refer to the metrics API documentation for more information on accessing metrics in Ella Core.

Default Go metrics¶

These metrics are used to monitor the health of the Go runtime and garbage collector. These metrics start with the go_ prefix.

Custom metrics¶

These metrics are used to monitor the health of the system and the performance of the network. These metrics start with the app_ prefix. The following custom metrics are exposed by Ella Core:

Metric	Description	Type
app_connected_radios	Number of radios currently connected to Ella Core	Gauge
app_ngap_messages_total	Total number of received NGAP message per type	Counter
app_registered_subscribers	Number of subscribers currently registered in Ella Core	Gauge
app_registration_attempts_total	Total number of subscriber registration attempts by type and result	Counter
app_pdu_sessions_total	Number of PDU sessions currently in Ella Core.	Gauge
app_pdu_session_establishment_attempts_total	Total PDU session establishment attempts by result	Counter
app_ip_addresses_allocated_total	The total number of IP addresses currently allocated to subscribers.	Gauge
app_ip_addresses_total	The total number of IP addresses available for subscribers.	Gauge
app_xdp_action_total	The total number of packets, with labels for the interface (n3, n6), and action taken.	Counter
app_xdp_fib_lookup_total	FIB lookup outcomes in the XDP data plane, with labels for interface (n3, n6) and result matching kernel return codes (success, no_neigh, blackhole, unreachable, prohibit, no_src_addr, frag_needed, not_fwded, fwd_disabled, unsupp_lwt).	Counter
app_xdp_ifindex_mismatch_total	Packets dropped because the FIB-resolved interface did not match the expected N3/N6 interface, with label for interface (n3, n6).	Counter
app_uplink_bytes	The total number of bytes transmitted in the uplink direction (N3 -> N6). This value includes the Ethernet header.	Counter
app_downlink_bytes	The total number of bytes transmitted in the downlink direction (N6 -> N3). This value includes the Ethernet header.	Counter
app_api_requests_total	Total number of HTTP requests by method, endpoint, and status code	Counter
app_api_request_duration_seconds	HTTP request duration histogram in seconds	Histogram
app_api_authentication_attempts_total	Total number of authentication attempts by type and result	Counter
app_database_storage_bytes	The total storage used by the database in bytes. This is the size of the database file on disk.	Gauge
app_database_queries_total	Total number of database queries by table and operation	Counter
app_database_query_duration_seconds	Duration of database queries	Histogram
app_raft_changeset_bytes_total	SQLite changeset bytes applied through the Raft FSM. Emitted only when clustering is enabled.	Counter

Note

When clustering is enabled, Ella Core also exports the full upstream hashicorp/raft metrics suite (prefix raft_). These cover cluster state, leadership, replication, FSM apply latency, and snapshotting. The most useful ones for HA monitoring are:

raft_state_leader, raft_state_follower, raft_state_candidate — counters incremented on each state transition. Rate indicates leadership flapping.
raft_leader_lastContact — time since the leader last heard from a majority of peers (leader-only). Stale values indicate leader isolation.
raft_peers — number of servers in the cluster configuration.
raft_fsm_apply — FSM apply latency histogram. Covers the changeset apply path.
raft_replication_appendEntries_rpc, raft_replication_heartbeat — per-peer replication latency, labeled by peer_id. Slow or absent values indicate an unhealthy follower.
raft_transition_heartbeat_timeout, raft_transition_leader_lease_timeout — counters for failure-driven transitions.
raft_oldestLogAge — age of the oldest retained log entry. Growing unbounded indicates snapshot/compaction is stuck.
raft_commitTime, raft_commitNumLogs — commit latency and batch size on the leader.

2. Logs¶

Ella Core produces three types of logs:

System Logs: General operational information about the system.
Audit Logs: Logs of user actions for security and compliance. You can view audit logs and manage their retention via the API and the Web UI.
Radio Logs: Logs related to NGAP messages. You can view radio logs and manage their retention via the API and the Web UI.

All logs are output in JSON format with structured fields for easy parsing and ingestion into log aggregation systems like Loki, Elasticsearch, or Splunk.

For more information on configuring logging in Ella Core, refer to the Configuration File documentation.

Note

Ella Core does not assist with log rotation; we recommend using a log rotation tool to manage log files.

3. Traces¶

Ella Core supports distributed tracing using OpenTelemetry. Traces are exported via OTLP (gRPC) to any compatible backend such as Jaeger, Tempo, or Honeycomb.

Traces are collected for the following components:

NGAP: Traces for NGAP message handling between gNodeBs and Ella Core.
API: Traces for HTTP requests to the REST API.

For more information on configuring tracing in Ella Core, refer to the Configuration File documentation.

4. Profiles¶

Ella Core exposes the http/pprof API for CPU and memory profiling analysis. This allows users to collect and analyze profiles of Ella Core using visualization tools like pprof or pyroscope.

For more information on accessing the pprof API in Ella Core, refer to the pprof API documentation.

Alert Rules¶

Ella Core ships with pre-configured Grafana alert rules that detect the most important failure scenarios.

Network Health¶

Alert	Severity	Condition
No Radios Connected	Critical	No radios connected for 2 minutes
High Registration Failure Rate	Critical	More than 10% of subscriber registrations rejected over 5 minutes
High PDU Session Failure Rate	Critical	More than 10% of PDU session establishments rejected over 5 minutes
IP Address Pool Near Exhaustion	Warning	More than 90% of the data network IP pool is allocated

Data Plane Health¶

Alert	Severity	Condition
High XDP Packet Drop Rate	Warning	More than 10 packets/s dropped by XDP for 5 minutes
No Data Plane Traffic	Critical	Radios connected but zero throughput for 10 minutes
XDP Aborted Actions	Critical	Any XDP_ABORTED events for 2 minutes (indicates eBPF program errors)

API Health¶

Alert	Severity	Condition
High API Error Rate	Warning	More than 5% of API responses are 5xx errors over 5 minutes
High API Latency	Warning	P99 API response time exceeds 2 seconds over 5 minutes
Authentication Failure Spike	Warning	More than 25% of API authentication attempts fail over 5 minutes

Infrastructure Health¶

Alert	Severity	Condition
Instance Down	Critical	Ella Core instance is unreachable
High Memory Usage	Warning	Process memory exceeds 1 GiB for 5 minutes
High Goroutine Count	Warning	More than 10,000 goroutines for 5 minutes
High Database Query Latency	Warning	P99 database query latency exceeds 500ms over 5 minutes
Large Database Size	Warning	Database file exceeds 1 GiB for 10 minutes

Dashboards¶

Ella Core ships with Grafana dashboards that you can import using the Dashboard IDs provided below.

Network Health¶

This dashboard uses Prometheus metrics to provide real-time visibility into all aspects of your 5G private network deployment, from radio connectivity and subscriber sessions to system performance and data plane throughput.

Data Sources: Prometheus
Dashboard ID: 24751
View online: grafana.com/grafana/dashboards/24751/

Deep Dive (for developers)¶

This dashboard uses metrics, logs, traces, and profiles to provide deep insights into the internal workings of Ella Core. It is intended for developers and advanced users who want to understand the performance and behavior of Ella Core at a granular level. We recommend running Grafana Alloy to collect all signals (example configuration file). A complete example observability stack (Grafana, Mimir, Loki, Tempo, Pyroscope) is provided as a Docker Compose setup.

Deep Dive Dashboard — Grafana dashboard for Deep Dive.

Data Sources: Mimir, Loki, Tempo, Pyroscope
Dashboard ID: 24770
View online: grafana.com/grafana/dashboards/24770/