> For the complete documentation index, see [llms.txt](https://host2host.onibonje.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://host2host.onibonje.com/docs/15-file-management-system.md).

# File Management System

## 1. Overview

The **File Management System (FMS)** is a unified abstraction for storing, transferring, and delivering files across the H2H platform. It provides a single API and Camel component layer over multiple backends — **SFTP**, **object storage (S3-compatible buckets)**, and **local staging** — with **signed URL delivery**, **integrity verification**, and **guaranteed delivery** semantics.

**Module:** `h2h-file-management` (builds on `h2h-file-adapter`)

**Design goals:**

| Goal                    | Approach                                                               |
| ----------------------- | ---------------------------------------------------------------------- |
| Backend agnostic        | `FileStore` SPI — swap SFTP, S3, Azure Blob, GCS without route changes |
| Database-driven paths   | Partner inbox/outbox from `channel_config` — no hardcoded paths        |
| Secure delivery         | Pre-signed URLs, PGP, checksums, optional mTLS file gateway            |
| High delivery guarantee | Staged write → verify → commit → acknowledge pattern                   |
| Audit trail             | Every file operation logged with correlation ID                        |
| Large file support      | Multipart upload, streaming, chunked processing                        |

```mermaid
flowchart TB
  subgraph clients [External Clients]
    PartnerSFTP[Partner SFTP]
    PartnerAPI[Partner API Upload]
  end

  subgraph fms [h2h-file-management]
    API[FileManagementService]
    Router[Storage Router]
    Signer[Signed URL Service]
    Integrity[Integrity Service]
    Delivery[Delivery Manager]
  end

  subgraph backends [Storage Backends]
    SFTP[SFTP Server]
    S3[S3 / MinIO Bucket]
    Stage[Local Staging Volume]
  end

  subgraph meta [Metadata]
    DB[(file_registry)]
    Vault[Vault - keys]
  end

  clients --> API
  API --> Router
  Router --> SFTP
  Router --> S3
  Router --> Stage
  API --> Signer
  API --> Integrity
  API --> Delivery
  API --> DB
  Signer --> Vault
```

***

## 2. Architecture

### 2.1 Core Components

| Component           | Class / Service          | Responsibility                                 |
| ------------------- | ------------------------ | ---------------------------------------------- |
| File Management API | `FileManagementService`  | Public facade for all file operations          |
| Storage SPI         | `FileStore`              | Backend-specific read/write/list/delete        |
| Storage router      | `FileStoreRouter`        | Resolves backend from partner `channel_config` |
| Signed URL service  | `SignedUrlService`       | Generate and validate pre-signed URLs          |
| Integrity service   | `FileIntegrityService`   | SHA-256 checksum, size validation              |
| Delivery manager    | `FileDeliveryManager`    | Guaranteed outbound delivery with retry        |
| File registry       | `FileRegistryRepository` | Metadata persistence (not file content)        |
| Camel component     | `file-mgmt:`             | Route DSL integration                          |

### 2.2 Storage Backend Types

| Backend code | Implementation         | Typical use                              |
| ------------ | ---------------------- | ---------------------------------------- |
| `SFTP`       | `SftpFileStore`        | Partner file exchange (inbound/outbound) |
| `S3`         | `S3FileStore`          | Internal staging, archive, large files   |
| `MINIO`      | `S3FileStore` (S3 API) | On-prem dev/UAT, hybrid cloud            |
| `AZURE_BLOB` | `AzureBlobFileStore`   | Azure deployments                        |
| `GCS`        | `GcsFileStore`         | GCP deployments                          |
| `LOCAL`      | `LocalFileStore`       | Pod ephemeral staging only (not durable) |

Backend selection is **database-driven** per partner channel:

```json
{
  "inboundStore": "SFTP",
  "outboundStore": "SFTP",
  "archiveStore": "S3",
  "archiveBucket": "h2h-archive-ng",
  "archiveRetentionDays": 2555
}
```

***

## 3. File Registry (Metadata)

File content lives in SFTP/buckets; **metadata** lives in the database.

### 3.1 Table: `file_registry`

| Column                      | Description                                               |
| --------------------------- | --------------------------------------------------------- |
| `file_id`                   | UUID primary key                                          |
| `correlation_id`            | Links to transaction / batch                              |
| `partner_id`                | FK to partner                                             |
| `direction`                 | INBOUND, OUTBOUND, ARCHIVE                                |
| `original_filename`         | Partner-facing name                                       |
| `storage_backend`           | SFTP, S3, etc.                                            |
| `storage_path`              | Full path or object key                                   |
| `content_type`              | MIME type                                                 |
| `size_bytes`                | File size                                                 |
| `checksum_sha256`           | Integrity hash                                            |
| `pgp_encrypted`             | Boolean                                                   |
| `status`                    | STAGED, VERIFIED, PROCESSING, DELIVERED, FAILED, ARCHIVED |
| `delivery_attempts`         | Outbound retry count                                      |
| `signed_url_expires_at`     | If delivered via signed URL                               |
| `created_at` / `updated_at` | Timestamps                                                |
| `metadata_json`             | Optional partner-specific metadata                        |

### 3.2 File Lifecycle States

```mermaid
stateDiagram-v2
  [*] --> STAGED: Upload / poll received
  STAGED --> VERIFIED: Checksum + size OK
  VERIFIED --> PROCESSING: Camel route consuming
  PROCESSING --> ARCHIVED: Copy to archive store
  PROCESSING --> DELIVERED: Outbound success
  PROCESSING --> FAILED: Delivery exhausted
  FAILED --> STAGED: Manual reprocess
  DELIVERED --> ARCHIVED: Retention policy
  ARCHIVED --> [*]
```

***

## 4. SFTP Integration

### 4.1 Inbound (Polling Consumer)

SFTP polling is configured per partner in `channel_config`:

| Config field      | Example                                |
| ----------------- | -------------------------------------- |
| `sftpHostRef`     | `vault:secret/sftp/ng/acme/host`       |
| `inboxPath`       | `/inbound/payments`                    |
| `filePattern`     | `PAY_*.csv.pgp`                        |
| `pollCron`        | `0 */5 * * * *`                        |
| `deleteAfterRead` | `false` (move to `.processed` instead) |
| `moveAfterRead`   | `/inbound/payments/.processed`         |

**Flow:**

```
SFTP poll → download to staging bucket → register in file_registry (STAGED)
         → verify checksum → VERIFIED → emit Camel event → route processing
```

### 4.2 Outbound (Guaranteed Delivery)

```
Generate ACK/content → PGP encrypt (if required) → stage locally
                    → upload to partner SFTP outbox → verify remote exists
                    → update file_registry DELIVERED → partner notification
```

**Retry policy (database-driven):**

| Parameter             | Default |
| --------------------- | ------- |
| `maxDeliveryAttempts` | 5       |
| `retryDelayMs`        | 30000   |
| `backoffMultiplier`   | 2.0     |

On exhaustion → status `FAILED` → ops dashboard alert.

### 4.3 SFTP Security

| Control               | Implementation                        |
| --------------------- | ------------------------------------- |
| Authentication        | SSH key from Vault (`credential_ref`) |
| Host key verification | Known hosts stored in Vault           |
| Chroot                | Partner restricted to assigned paths  |
| No password auth      | Enforced at gateway level             |

***

## 5. Object Storage (Bucket) Integration

### 5.1 S3-Compatible API

`S3FileStore` uses the AWS SDK v2 S3 client with configurable endpoint — works with:

* Amazon S3
* MinIO (on-prem / hybrid)
* DigitalOcean Spaces
* Cloudflare R2
* Any S3-compatible store

**Configuration (infrastructure — not partner config):**

```yaml
h2h:
  file-management:
    s3:
      endpoint: ${S3_ENDPOINT:https://s3.amazonaws.com}
      region: ${S3_REGION:af-south-1}
      credentialsRef: vault:secret/s3/h2h/service-account
      defaultBucket: h2h-files
```

### 5.2 Bucket Layout

```
{bucket}/
├── staging/{partnerId}/{date}/{fileId}/          # In-flight files
├── inbound/{partnerId}/{date}/{filename}         # Post-verify inbound
├── outbound/{partnerId}/{date}/{filename}        # Pending partner pickup
├── archive/{partnerId}/{yyyy}/{mm}/{fileId}      # Long-term retention
└── temp/{correlationId}/                       # Processing scratch (TTL 24h)
```

### 5.3 Multipart Upload (Large Files)

Files above configurable threshold (default **100 MB**) use multipart upload:

| Setting                   | Value                                             |
| ------------------------- | ------------------------------------------------- |
| `multipartThresholdBytes` | 104857600 (100 MB)                                |
| `partSizeBytes`           | 10485760 (10 MB)                                  |
| Streaming                 | Camel route reads stream without full memory load |

***

## 6. Signed URL Delivery

### 6.1 Overview

**Pre-signed URLs** enable secure, time-limited file access without sharing long-lived credentials. Used for:

* Partner **API upload** of large files (PUT pre-signed URL)
* Partner **download** of ACKs and statements (GET pre-signed URL)
* Internal admin file preview in portal
* Cross-region file handoff without SFTP

### 6.2 Signed URL Types

| Operation        | Method           | Use case                         |
| ---------------- | ---------------- | -------------------------------- |
| **Upload URL**   | `PUT` pre-signed | Partner uploads file via HTTPS   |
| **Download URL** | `GET` pre-signed | Partner downloads ACK/statement  |
| **Post policy**  | Browser POST     | Legacy browser upload (optional) |

### 6.3 SignedUrlService API

```java
public interface SignedUrlService {

    SignedUrlResult generateUploadUrl(SignedUrlRequest request);

    SignedUrlResult generateDownloadUrl(SignedUrlRequest request);

    boolean validateUploadCompleted(String fileId);
}

public record SignedUrlRequest(
    String partnerId,
    String filename,
    String contentType,
    long maxSizeBytes,
    Duration ttl,
    Map<String, String> metadata
) {}
```

### 6.4 Upload Flow (Signed)

```mermaid
sequenceDiagram
  participant Partner
  participant API as Admin/API Gateway
  participant FMS as FileManagementService
  participant S3 as S3 Bucket
  participant Camel as Integration Runtime

  Partner->>API: POST /files/upload-url
  API->>FMS: generateUploadUrl(partnerId, filename)
  FMS->>FMS: Create file_registry (STAGED)
  FMS->>S3: Generate pre-signed PUT URL
  FMS-->>Partner: { uploadUrl, fileId, expiresAt }
  Partner->>S3: PUT file (direct to bucket)
  Partner->>API: POST /files/{fileId}/complete
  API->>FMS: validateUploadCompleted(fileId)
  FMS->>S3: HEAD object, verify size
  FMS->>FMS: Compute SHA-256, status VERIFIED
  FMS->>Camel: Trigger processing (Kafka / direct)
```

### 6.5 Download Flow (Signed ACK Delivery)

```mermaid
sequenceDiagram
  participant Camel
  participant FMS as FileManagementService
  participant S3
  participant Partner

  Camel->>FMS: deliver(fileId, partnerId)
  FMS->>S3: Store outbound file
  FMS->>FMS: Generate GET pre-signed URL (TTL 24h)
  FMS->>Partner: Webhook / API notification with downloadUrl
  Partner->>S3: GET via signed URL
  FMS->>FMS: status DELIVERED
```

### 6.6 Signed URL Security Controls

| Control                  | Implementation                                                      |
| ------------------------ | ------------------------------------------------------------------- |
| TTL                      | Default 15 min (upload), 24 h (download) — configurable per partner |
| Max file size            | Enforced in pre-signed policy (`content-length-range`)              |
| Content-Type restriction | Optional MIME whitelist per message type                            |
| IP restriction           | Optional IP condition in policy (S3 bucket policy)                  |
| One-time upload          | `fileId` bound to single object key                                 |
| Audit                    | Log URL generation (not the signed token itself)                    |
| HTTPS only               | `aws:SecureTransport` condition on bucket policy                    |

### 6.7 Alternative: HMAC Request Signing (API channel)

For partners not using pre-signed URLs, the API gateway supports **HMAC-signed requests**:

```
Authorization: HMAC-SHA256 Credential={apiKey}, SignedHeaders=host;x-date, Signature={sig}
```

**Module:** `h2h-security` (`RequestSignatureValidator`)

Partner config:

```json
{
  "signingAlgorithm": "HMAC-SHA256",
  "signingKeyRef": "vault:secret/api/acme/signing-key",
  "signedHeaders": ["host", "x-date", "x-correlation-id"]
}
```

***

## 7. Guaranteed Delivery Pattern

### 7.1 Staged Commit Protocol

All outbound files follow **write → verify → commit**:

| Phase          | Action                                                   |
| -------------- | -------------------------------------------------------- |
| 1. Stage       | Write to `staging/` path (not visible to partner)        |
| 2. Verify      | Checksum, size, optional PGP verify                      |
| 3. Commit      | Atomic move/rename to `outbound/` or partner SFTP outbox |
| 4. Acknowledge | Update `file_registry`, notify partner                   |
| 5. Archive     | Async copy to archive bucket                             |

**SFTP atomic commit:** Upload as `{filename}.tmp` → rename to `{filename}` on success.

**S3 atomic commit:** Write to staging key → `CopyObject` to final key → delete staging.

### 7.2 Inbound Idempotency

Duplicate file detection before processing:

| Key component              | Source           |
| -------------------------- | ---------------- |
| Partner ID                 | Channel config   |
| Filename + size + checksum | File metadata    |
| Optional file date         | Filename pattern |

If duplicate → skip processing, log audit event, optionally send duplicate NACK.

### 7.3 Delivery Receipt

Outbound delivery generates a **delivery receipt** record:

| Field                | Description                  |
| -------------------- | ---------------------------- |
| `file_id`            | FK                           |
| `delivered_at`       | Timestamp                    |
| `delivery_method`    | SFTP, SIGNED\_URL, API       |
| `remote_path_or_url` | Destination                  |
| `remote_checksum`    | Verified hash at destination |

***

## 8. PGP Integration

PGP operations delegate to `h2h-security` (`PgpService`).

| Operation        | When                         |
| ---------------- | ---------------------------- |
| Decrypt inbound  | Before validation/transform  |
| Encrypt outbound | Before SFTP/bucket delivery  |
| Sign outbound    | Optional partner requirement |
| Verify signature | Optional inbound requirement |

Key references from `channel_config` / `credential_ref` — never raw keys in DB.

***

## 9. Camel Component

### 9.1 URI Scheme: `file-mgmt:`

| URI                           | Action                                     |
| ----------------------------- | ------------------------------------------ |
| `file-mgmt:poll`              | Poll inbound (SFTP or bucket notification) |
| `file-mgmt:store`             | Store file to configured backend           |
| `file-mgmt:deliver`           | Guaranteed outbound delivery               |
| `file-mgmt:archive`           | Copy to archive store                      |
| `file-mgmt:signed-upload-url` | Generate upload URL (API route)            |

### 9.2 Example Route

```java
from("file-mgmt:poll")
    .routeId("inbound-file-poll")
    .process(correlationIdProcessor)
    .process(configResolverProcessor)
    .to("crypto:pgpDecrypt")          // h2h-security
    .to("direct:execute-profile");
```

```java
from("direct:deliver-ack")
    .to("crypto:pgpEncrypt")
    .to("file-mgmt:deliver");         // SFTP or signed URL per profile
```

***

## 10. Admin Portal — File Management Screens

| Screen               | Capability                                      |
| -------------------- | ----------------------------------------------- |
| File browser         | Search `file_registry` by partner, date, status |
| Delivery status      | View delivery attempts, retry                   |
| Manual reprocess     | Re-trigger processing for FAILED files          |
| Upload URL generator | Sandbox test for signed upload flow             |
| Storage config       | Configure backends per partner channel          |
| Archive browser      | Search archived files, generate download URL    |

***

## 11. Configuration Reference

### 11.1 Partner Channel (database)

```json
{
  "inbound": {
    "store": "SFTP",
    "path": "/inbound/payments",
    "filePattern": "PAY_*.csv.pgp",
    "pollCron": "0 */5 * * * *"
  },
  "outbound": {
    "store": "SFTP",
    "path": "/outbound/ack",
    "deliveryMethod": "SFTP",
    "pgpEncrypt": true
  },
  "archive": {
    "store": "S3",
    "bucket": "h2h-archive-ng",
    "retentionDays": 2555
  },
  "signedUrl": {
    "uploadTtlMinutes": 15,
    "downloadTtlHours": 24,
    "maxUploadSizeBytes": 5368709120
  }
}
```

### 11.2 Infrastructure (application.yml)

```yaml
h2h:
  file-management:
    defaultStagingStore: S3
    multipartThresholdBytes: 104857600
    delivery:
      maxAttempts: 5
      initialDelayMs: 30000
      backoffMultiplier: 2.0
    integrity:
      algorithm: SHA-256
```

***

## 12. Error Handling

| Error                   | Action                                     |
| ----------------------- | ------------------------------------------ |
| Checksum mismatch       | Status FAILED, alert ops, no processing    |
| SFTP connection failure | Retry with backoff                         |
| Signed URL expired      | Partner requests new URL                   |
| Upload size exceeded    | Reject at S3 policy, return 403            |
| PGP decrypt failure     | NACK to partner, audit log                 |
| Duplicate file          | Skip, audit, optional partner notification |

***

## 13. Related Documents

* [Universal Library Extensibility](/docs/20-universal-library-extensibility.md)
* [Security Library](/docs/16-security-library.md)
* [Execution Context](/docs/18-execution-context.md)
* [Monitoring and Logging](/docs/17-monitoring-and-logging.md)
* [Camel Integration Patterns](/docs/08-camel-integration-patterns.md)
* [Database-Driven Configuration](/docs/04-database-driven-configuration.md)
* [Cloud-Agnostic Deployment](/docs/19-cloud-agnostic-deployment.md)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://host2host.onibonje.com/docs/15-file-management-system.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
