Observability

Abstract

There are 3 pillars of observability Log, Metric and Trace

Important

Observability is a broader concept that includes Monitoring. The higher the observability, the faster we can find the root cause when notified of an issue.

Linux performance observability tools

Metric

Offer a snapshot of a system’s performance over time
Collecting different types of metrics helps us gain business insights and understand the system’s health status

Tool

Prometheus

Datadog

Aggregated Level Metric

Metric that indicates the top-level health of system by measuring its useful output
Examples are success rate & error rate

Host Level Metric

Metric that indicates timely information of physical resources like CPU & Main Memory
Examples are utilisation

Key Business Metrics

Daily active users, retention, revenue

Log

A detailed list of events that occur within the system/application, including when and why they happened

Example

Web server logs, which contain the IP address, date, and time of HTTP request.

Important

Helps to identify errors and problems in the system.

Tool

Datadog

Grafana Loki

Elasticsearch

Log Router

A tool or service that collects log data from various sources and forwards it to one or more destinations

Important

These tools play a crucial role in centralised logging architectures, especially in environments with multiple applications, services, or systems that generate logs.

Tool

Fluentd

Fluent Bit (a lightweight, high-performance log shipper ideal for containerised or edge environments)

Logstash (part of the ELK Stack)

AWS FireLens (for Amazon ECS and EKS)

CS Notes

Recent Updates

SQL Batch Processing

Inter-Process Communication

URL

Explorer

Observability

Abstract

Metric

Aggregated Level Metric

Host Level Metric

Key Business Metrics

Log

Log Router

References

Table of Contents

Backlinks

Graph View