> For the complete documentation index, see [llms.txt](https://host2host.onibonje.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://host2host.onibonje.com/docs/17-monitoring-and-logging.md).

# Monitoring and Logging Library

## 1. Overview

The **Monitoring and Logging Library** (`h2h-observability`) provides a unified observability layer for the H2H platform: **structured logging**, **distributed tracing**, **metrics**, **health checks**, and **SLA monitoring**. All modules emit telemetry through this library — no ad-hoc logging or metrics in route code.

**Module:** `h2h-observability`

**Package root:** `com.heirs.h2h.observability`

**Stack alignment:** OpenTelemetry, Micrometer, SLF4J/Logback, Prometheus, Grafana, Jaeger

```mermaid
flowchart LR
  subgraph runtime [H2H Runtime Pods]
    Routes[Camel Routes]
    Obs[h2h-observability]
  end

  subgraph signals [Telemetry Backends]
    Prom[Prometheus]
    Jaeger[Jaeger / Tempo]
    OS[OpenSearch / ELK]
    Grafana[Grafana]
  end

  Routes --> Obs
  Obs -->|metrics| Prom
  Obs -->|traces| Jaeger
  Obs -->|logs| OS
  Prom --> Grafana
  Jaeger --> Grafana
  OS --> Grafana
```

***

## 2. Design Principles

| Principle                | Implementation                                      |
| ------------------------ | --------------------------------------------------- |
| Correlation everywhere   | Every log line and span carries `correlationId`     |
| Structured JSON logs     | Machine-parseable; no unstructured `println`        |
| PII-safe logging         | `DataMaskingService` applied before log emission    |
| Consistent metric names  | `h2h_*` prefix, documented catalog                  |
| Low overhead             | Async log appenders, sampled tracing in high volume |
| Cloud agnostic exporters | OTLP — works on any cloud or on-prem                |

***

## 3. Module Structure

```
h2h-observability/
├── api/
│   ├── H2hLogger.java                 # Structured logger facade
│   ├── MetricsRecorder.java           # Business metrics API
│   ├── TraceManager.java              # Span creation and propagation
│   └── AuditEventPublisher.java       # Kafka audit events
├── logging/
│   ├── StructuredLogEncoder.java      # JSON log format
│   ├── CorrelationIdFilter.java       # MDC injection
│   └── PiiSafeLogFilter.java          # Masking before write
├── metrics/
│   ├── H2hMetrics.java                # Metric name constants + registration
│   ├── PaymentMetrics.java
│   └── FileMetrics.java
├── tracing/
│   ├── OtelTraceManager.java          # OpenTelemetry implementation
│   ├── CamelTraceInterceptor.java     # Auto-trace Camel routes
│   └── KafkaTracePropagator.java      # W3C traceparent in Kafka headers
├── health/
│   ├── FinacleHealthIndicator.java
│   ├── KafkaHealthIndicator.java
│   ├── VaultHealthIndicator.java
│   └── SftpHealthIndicator.java
├── audit/
│   ├── AuditEventPublisherImpl.java
│   └── AuditEvent.java
├── alert/
│   ├── SlaMonitor.java
│   └── AlertEventPublisher.java
└── config/
    └── ObservabilityAutoConfiguration.java
```

***

## 4. Structured Logging

### 4.1 H2hLogger API

```java
public interface H2hLogger {

    void info(String event, Map<String, Object> fields);

    void warn(String event, Map<String, Object> fields);

    void error(String event, Map<String, Object> fields, Throwable cause);

    void debug(String event, Map<String, Object> fields);
}
```

### 4.2 Standard Log Fields

Every log entry includes:

| Field           | Source | Example              |
| --------------- | ------ | -------------------- |
| `timestamp`     | Auto   | ISO-8601             |
| `level`         | Auto   | INFO                 |
| `event`         | Caller | `PAYMENT_POSTED`     |
| `correlationId` | MDC    | `a1b2c3d4-...`       |
| `partnerId`     | MDC    | `ACME_NG`            |
| `countryCode`   | MDC    | `NG`                 |
| `messageType`   | MDC    | `BULK_PAYMENT`       |
| `routeId`       | MDC    | `bulk-payment-split` |
| `podName`       | Env    | `h2h-runtime-7f8b9c` |
| `environment`   | Env    | `production`         |

### 4.3 JSON Log Format

```json
{
  "timestamp": "2026-06-29T14:32:01.123Z",
  "level": "INFO",
  "event": "PAYMENT_POSTED",
  "correlationId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "partnerId": "ACME_NG",
  "countryCode": "NG",
  "messageType": "BULK_PAYMENT",
  "batchId": "BATCH-20260629-001",
  "paymentCount": 1500,
  "durationMs": 4230,
  "status": "SUCCESS"
}
```

### 4.4 MDC (Mapped Diagnostic Context)

`CorrelationIdFilter` and `H2hContext` (see [Execution Context](/docs/18-execution-context.md)) populate SLF4J MDC at the start of each exchange:

```java
MDC.put("correlationId", context.getCorrelationId());
MDC.put("partnerId", context.getPartnerId());
MDC.put("countryCode", context.getCountryCode());
```

MDC is **cleared** in a `finally` block after route completion.

### 4.5 PII-Safe Logging

`PiiSafeLogFilter` intercepts log fields and masks:

| Field type          | Masked output |
| ------------------- | ------------- |
| Account number      | `****1234`    |
| PAN                 | `****`        |
| PGP key material    | `[REDACTED]`  |
| Vault secret values | `[REDACTED]`  |

Uses `DataMaskingService` from `h2h-security`.

### 4.6 Log Levels by Package

| Package               | Production level        |
| --------------------- | ----------------------- |
| `com.heirs.h2h`       | INFO                    |
| `org.apache.camel`    | WARN                    |
| `org.springframework` | WARN                    |
| `org.hibernate.SQL`   | WARN (disabled in prod) |

***

## 5. Distributed Tracing

### 5.1 TraceManager API

```java
public interface TraceManager {

    Span startSpan(String operationName, H2hContext context);

    void endSpan(Span span, SpanStatus status);

    void addEvent(Span span, String eventName, Map<String, String> attributes);

    Context propagateToKafka(ProducerRecord<?, ?> record);

    Context extractFromKafka(ConsumerRecord<?, ?> record);
}
```

### 5.2 W3C Trace Context Propagation

Uses `traceparent` and `tracestate` headers (W3C standard):

```
traceparent: 00-{traceId}-{spanId}-01
```

Propagated across:

| Boundary              | Mechanism                             |
| --------------------- | ------------------------------------- |
| API Gateway → Runtime | HTTP `traceparent` header             |
| Camel route → route   | Exchange property                     |
| Runtime → Kafka       | Kafka record header                   |
| Runtime → Finacle     | Custom metadata field (correlationId) |

### 5.3 Camel Auto-Tracing

`CamelTraceInterceptor` creates spans for:

* Each route entry/exit
* Each step in the step executor pipeline
* External calls (Finacle, SFTP, S3)

Span naming convention: `h2h.{routeId}.{stepCode}`

### 5.4 Trace Sampling

| Environment | Sample rate                  |
| ----------- | ---------------------------- |
| Development | 100%                         |
| UAT         | 100%                         |
| Production  | 10% (configurable)           |
| Errors      | 100% (always trace failures) |

***

## 6. Metrics

### 6.1 MetricsRecorder API

```java
public interface MetricsRecorder {

    void incrementCounter(String name, Map<String, String> tags);

    void recordTimer(String name, long durationMs, Map<String, String> tags);

    void recordGauge(String name, double value, Map<String, String> tags);
}
```

### 6.2 Standard Metric Catalog

| Metric                         | Type    | Tags                          | Description              |
| ------------------------------ | ------- | ----------------------------- | ------------------------ |
| `h2h_files_received_total`     | Counter | partner, messageType, country | Inbound files            |
| `h2h_files_delivered_total`    | Counter | partner, deliveryMethod       | Outbound files           |
| `h2h_payments_processed_total` | Counter | partner, status               | Payment count            |
| `h2h_payments_amount_total`    | Counter | partner, currency             | Payment volume           |
| `h2h_finacle_calls_total`      | Counter | operation, status             | Finacle API calls        |
| `h2h_finacle_call_duration_ms` | Timer   | operation                     | Finacle latency          |
| `h2h_route_duration_ms`        | Timer   | routeId, status               | End-to-end route time    |
| `h2h_step_duration_ms`         | Timer   | stepCode, status              | Per-step timing          |
| `h2h_dlq_messages_total`       | Counter | routeId, errorType            | Dead letter count        |
| `h2h_config_cache_hits_total`  | Counter | —                             | Config cache performance |
| `h2h_active_processing_gauge`  | Gauge   | routeId                       | In-flight exchanges      |
| `h2h_sla_breach_total`         | Counter | partner, slaType              | SLA violations           |

### 6.3 Prometheus Export

Spring Boot Actuator exposes `/actuator/prometheus` on management port (separate from app port).

```yaml
management:
  endpoints:
    web:
      exposure:
        include: health,prometheus,info
  server:
    port: 9090
```

### 6.4 Grafana Dashboards

Pre-built dashboard templates (Helm chart):

| Dashboard           | Panels                                    |
| ------------------- | ----------------------------------------- |
| H2H Overview        | TPS, error rate, latency p50/p95/p99      |
| Partner Health      | Per-partner success rate, volume          |
| File Processing     | Inbound/outbound files, delivery failures |
| Finacle Integration | Call rate, latency, circuit breaker state |
| Infrastructure      | Pod CPU/memory, Kafka lag, DB connections |
| SLA Monitor         | Cut-off misses, processing time breaches  |

***

## 7. Audit Event Publishing

### 7.1 AuditEventPublisher

Business-significant events published to Kafka for immutable audit store.

```java
public interface AuditEventPublisher {
    void publish(AuditEvent event);
}

public record AuditEvent(
    String eventId,
    String correlationId,
    String partnerId,
    String eventType,
    String stepCode,
    String status,
    Instant timestamp,
    Map<String, String> metadata
) {}
```

### 7.2 Audit Event Types

| Event type           | When                          |
| -------------------- | ----------------------------- |
| `FILE_RECEIVED`      | Inbound file registered       |
| `FILE_DELIVERED`     | Outbound file confirmed       |
| `PAYMENT_VALIDATED`  | Validation passed             |
| `PAYMENT_POSTED`     | Finacle success               |
| `PAYMENT_FAILED`     | Finacle or validation failure |
| `DUPLICATE_DETECTED` | Idempotency rejection         |
| `CONFIG_PUBLISHED`   | Config change went live       |
| `SECURITY_EVENT`     | Crypto or auth operation      |

**Event code:** `AUDIT_EVENT` — physical topic/queue from `event_channel_def` (default seed: `h2h.audit.events`).

***

## 8. Health Checks

### 8.1 Custom Health Indicators

| Indicator                 | Checks                                            |
| ------------------------- | ------------------------------------------------- |
| `FinacleHealthIndicator`  | FCJ/FCUBS connectivity, circuit breaker state     |
| `KafkaHealthIndicator`    | Broker connectivity, consumer group lag threshold |
| `VaultHealthIndicator`    | Vault seal status, auth token validity            |
| `SftpHealthIndicator`     | SFTP gateway reachability (sample partner)        |
| `ConfigDbHealthIndicator` | PostgreSQL connectivity, migration version        |
| `S3HealthIndicator`       | Bucket accessibility                              |

### 8.2 Health Endpoint

```
GET /actuator/health
GET /actuator/health/liveness
GET /actuator/health/readiness
```

**Readiness** fails if: Finacle unreachable, Kafka down, Config DB down.

**Liveness** fails if: JVM deadlock detected, Camel context stopped.

***

## 9. SLA Monitoring

### 9.1 SlaMonitor

Monitors processing time against partner SLA thresholds (database-driven):

| SLA type               | Source                                       | Breach action  |
| ---------------------- | -------------------------------------------- | -------------- |
| `FILE_PROCESSING_TIME` | `file_registry.created_at` → completion      | Alert + metric |
| `PAYMENT_POSTING_TIME` | Route timer vs `profile.sla.maxProcessingMs` | Alert          |
| `ACK_DELIVERY_TIME`    | Outbound delivery timer                      | Alert          |
| `CUT_OFF_COMPLIANCE`   | Payment received after cut-off               | Reject + alert |

SLA thresholds stored in `integration_profile.sla_config` (JSON).

### 9.2 Alert Publishing

SLA breaches publish `SLA_BREACH` via `EventPublisher` → `notification_subscription` (email/SMS) and/or Alertmanager / PagerDuty.

***

## 10. Camel Integration

### 10.1 Observability Processors

Registered in `h2h-camel-core`, implemented in `h2h-observability`:

| Processor                     | Action                       |
| ----------------------------- | ---------------------------- |
| `ObservabilityStartProcessor` | Start span, populate MDC     |
| `ObservabilityEndProcessor`   | End span, record route timer |
| `StepMetricsProcessor`        | Record per-step timer        |
| `AuditEventProcessor`         | Publish audit event          |

### 10.2 Route Integration

```java
from("file-mgmt:poll")
    .process(observabilityStartProcessor)
    // ... pipeline steps ...
    .process(observabilityEndProcessor);
```

***

## 11. Configuration

```yaml
h2h:
  observability:
    logging:
      format: JSON
      piiMasking: true
    tracing:
      enabled: true
      exporter: otlp
      endpoint: ${OTEL_EXPORTER_OTLP_ENDPOINT:http://jaeger:4317}
      sampleRate: 0.1
    metrics:
      enabled: true
      exportIntervalSeconds: 15
    audit:
      eventCode: AUDIT_EVENT
    sla:
      enabled: true
      checkIntervalSeconds: 60
```

***

## 12. OpenTelemetry Collector (Deployment)

Cloud-agnostic telemetry pipeline:

```mermaid
flowchart LR
  Pods[H2H Pods] -->|OTLP gRPC| Collector[OTel Collector]
  Collector -->|metrics| Prom[Prometheus]
  Collector -->|traces| Jaeger[Jaeger]
  Collector -->|logs| OS[OpenSearch]
```

Collector config deployed via Helm — same chart on AWS, Azure, GCP, or on-prem.

***

## 13. Operations Dashboard Integration

Admin operations dashboard consumes:

| Source            | Data                                |
| ----------------- | ----------------------------------- |
| Prometheus        | Real-time metrics via Grafana embed |
| OpenSearch        | Transaction search by correlationId |
| Kafka audit topic | Event timeline per batch            |
| `file_registry`   | File status                         |

***

## 14. Related Documents

* [Universal Library Extensibility](/docs/20-universal-library-extensibility.md)
* [Execution Context](/docs/18-execution-context.md) — correlation ID and MDC
* [Security Library](/docs/16-security-library.md) — PII masking, security audit events
* [File Management System](/docs/15-file-management-system.md) — file delivery metrics
* [Cloud-Agnostic Deployment](/docs/19-cloud-agnostic-deployment.md) — observability stack deployment
* [Camel Integration Patterns](/docs/08-camel-integration-patterns.md)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://host2host.onibonje.com/docs/17-monitoring-and-logging.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
