title: "Health Check Pattern" description: "Design and implementation of the health check pattern with pluggable health indicators" category: patterns tags:

patterns
health
monitoring
architecture related:
reference/patterns/repository-pattern.md
reference/api/health-api.md
examples/health-service-example.md last_updated: March 27, 2025 version: 1.0

Health Check Pattern

Overview

The Health Check Pattern provides a standardized way to assess the operational status of an application and its dependencies. It enables monitoring systems to detect issues and facilitates automated recovery procedures.

Problem Statement

Modern applications have numerous dependencies (databases, external services, caches, etc.) that can fail independently. Applications need to:

Report their own operational status
Check the status of all dependencies
Provide detailed diagnostics for troubleshooting
Support both simple availability checks and detailed health information
Allow easy extension for new components

Solution: Health Check Pattern with Pluggable Indicators

The Health Check Pattern in Navius uses a provider-based architecture with these components:

HealthIndicator Trait: Interface for individual component health checks
HealthProvider Trait: Interface for components that provide health indicators
HealthDiscoveryService: Automatically discovers and registers health indicators
HealthService: Orchestrates health checks and aggregates results
HealthDashboard: Tracks health history and provides detailed reporting

Pattern Structure

┌─────────────────┐          ┌───────────────────┐
│   HealthService │◄─────────┤HealthIndicator(s) │
└────────┬────────┘          └───────────────────┘
         │                            ▲
         │                            │ implements
         │                   ┌────────┴────────┐
         │                   │Component-specific│
         │                   │HealthIndicators │
         ▼                   └─────────────────┘
┌─────────────────┐
│HealthController │
└─────────────────┘

Implementation

1. Health Indicator Interface

The HealthIndicator trait defines the contract for all health checks:

#![allow(unused)]
fn main() {
pub trait HealthIndicator: Send + Sync {
    /// Get the name of this health indicator
    fn name(&self) -> String;
    
    /// Check the health of this component
    fn check_health(&self, state: &Arc<AppState>) -> DependencyStatus;
    
    /// Optional metadata about this indicator
    fn metadata(&self) -> HashMap<String, String> {
        HashMap::new()
    }
    
    /// Order in which this indicator should run (lower values run first)
    fn order(&self) -> i32 {
        0
    }
    
    /// Whether this indicator is critical (system is DOWN if it fails)
    fn is_critical(&self) -> bool {
        false
    }
}
}

2. Health Provider Interface

The HealthProvider trait enables components to provide their own health indicators:

#![allow(unused)]
fn main() {
pub trait HealthProvider: Send + Sync {
    /// Create health indicators for the application
    fn create_indicators(&self) -> Vec<Box<dyn HealthIndicator>>;
    
    /// Whether this provider is enabled
    fn is_enabled(&self, config: &AppConfig) -> bool;
}
}

3. Health Service

The HealthService aggregates and manages health indicators:

#![allow(unused)]
fn main() {
pub struct HealthService {
    indicators: Vec<Box<dyn HealthIndicator>>,
    providers: Vec<Box<dyn HealthProvider>>,
}

impl HealthService {
    pub fn new() -> Self { /* ... */ }
    
    pub fn register_indicator(&mut self, indicator: Box<dyn HealthIndicator>) { /* ... */ }
    
    pub fn register_provider(&mut self, provider: Box<dyn HealthProvider>) { /* ... */ }
    
    pub async fn check_health(&self) -> Result<HealthStatus, ServiceError> { /* ... */ }
}
}

4. Health Discovery

The HealthDiscoveryService automatically discovers health indicators:

#![allow(unused)]
fn main() {
pub struct HealthDiscoveryService;

impl HealthDiscoveryService {
    pub fn new() -> Self { /* ... */ }
    
    pub async fn discover_indicators(&self) -> Vec<Box<dyn HealthIndicator>> { /* ... */ }
}
}

Benefits

Standardization: Consistent approach to health monitoring across components
Extensibility: Easy to add health checks for new components
Automation: Facilitates automated monitoring and recovery
Detailed Diagnostics: Provides rich health information for troubleshooting
Dynamic Discovery: Automatically detects new health indicators
Priority Execution: Checks dependencies in correct order

Implementation Considerations

1. Defining Health Status

Health status should be simple but descriptive:

UP: Component is functioning normally
DOWN: Component is not functioning
DEGRADED: Component is functioning with reduced capabilities
UNKNOWN: Component status cannot be determined

2. Health Check Categories

Organize health checks into categories:

Critical Infrastructure: Database, cache, file system
External Dependencies: APIs, third-party services
Internal Components: Message queues, background tasks
Environment: Disk space, memory, CPU

3. Health Check Response

The health API should support multiple response formats:

Simple UP/DOWN for load balancers and basic monitoring
Detailed response with component-specific health for diagnostics
Historical data for trend analysis

4. Security Considerations

Health endpoints contain sensitive information:

Secure detailed health endpoints with authentication
Limit information in public health endpoints
Don't expose connection strings or credentials

API Endpoints

The health service exposes these standard endpoints:

/actuator/health: Basic health status (UP/DOWN)
/actuator/health/detail: Detailed component health
/actuator/dashboard: Health history dashboard

Example Implementation

Basic Health Indicator

#![allow(unused)]
fn main() {
pub struct DatabaseHealthIndicator {
    connection_string: String,
}

impl HealthIndicator for DatabaseHealthIndicator {
    fn name(&self) -> String {
        "database".to_string()
    }
    
    fn check_health(&self, _state: &Arc<AppState>) -> DependencyStatus {
        match check_database_connection(&self.connection_string) {
            Ok(_) => DependencyStatus::up(),
            Err(e) => DependencyStatus::down()
                .with_detail("error", e.to_string())
                .with_detail("connection", &self.connection_string)
        }
    }
    
    fn is_critical(&self) -> bool {
        true
    }
    
    fn order(&self) -> i32 {
        10  // Run early since other components may depend on DB
    }
}
}

Health Response Format

{
  "status": "UP",
  "timestamp": "2024-03-26T12:34:56.789Z",
  "components": [
    {
      "name": "database",
      "status": "UP",
      "details": {
        "type": "postgres",
        "version": "14.5"
      }
    },
    {
      "name": "redis-cache",
      "status": "UP",
      "details": {
        "used_memory": "1.2GB",
        "uptime": "3d"
      }
    },
    {
      "name": "external-api",
      "status": "DOWN",
      "details": {
        "error": "Connection timeout",
        "url": "https://api.example.com/status"
      }
    }
  ]
}

Circuit Breaker Pattern: Used with health checks to prevent cascading failures
Bulkhead Pattern: Isolates components to prevent system-wide failures
Observer Pattern: Health indicators observe component status
Repository Pattern: Often used with health checks for data access
Strategy Pattern: Different health check strategies can be implemented

Navius Documentation