Security Model¶

HermesX — Enterprise Security Architecture Answers: Who can access? What data? How is it isolated? How is it audited?

Threat Model¶

Trust Boundaries¶

┌─────────────────────────────────────────────────────────────┐
│                    PUBLIC INTERNET                            │
└──────────────────────────┬──────────────────────────────────┘
                           │ TLS (reverse proxy)
┌──────────────────────────▼──────────────────────────────────┐
│              API SERVER (Auth Boundary)                       │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Auth Chain: Static Token → API Key → JWT → Reject    │   │
│  └──────────────────────────────────────────────────────┘   │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Tenant Derivation: AuthContext → tenant_id           │   │
│  │  (NEVER from request headers or body for non-admin)   │   │
│  └──────────────────────────────────────────────────────┘   │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  RBAC: Role + Scope → Allow/Deny                      │   │
│  └──────────────────────────────────────────────────────┘   │
└──────────────────────────┬──────────────────────────────────┘
                           │
┌──────────────────────────▼──────────────────────────────────┐
│              DATA LAYER (Isolation Boundary)                  │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Store tenant guard + backend-specific DB controls      │   │
│  │  PostgreSQL: RLS; MySQL: static guard + regressions     │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

Key Principals¶

Principal	Identity Source	Trust Level
Anonymous	No credential	Untrusted — rejected
Tenant User	API Key (scoped)	Trusted within own tenant
Tenant Admin	API Key (admin role)	Full access within own tenant
Platform Admin	Static Token / Super Admin	Cross-tenant access
Auditor	API Key (auditor role)	Read-only audit access

Authentication¶

Auth Chain¶

Request
  │
  ├─ Header: Authorization: Bearer <token>
  │
  ▼
┌─────────────────────────────┐
│ 1. Static Token Check       │ → Match? → AuthContext{role: admin, tenant: *}
└──────────────┬──────────────┘
               │ No match
┌──────────────▼──────────────┐
│ 2. API Key Lookup           │ → SHA-256(token) → DB lookup
│    Check: not revoked       │ → Check: not expired
│    Check: tenant active     │ → AuthContext{role, tenant, scopes}
└──────────────┬──────────────┘
               │ Not found
┌──────────────▼──────────────┐
│ 3. JWT Validation           │ → Verify signature → Extract claims
│    (prepared, not active)   │ → AuthContext{role, tenant, user}
└──────────────┬──────────────┘
               │ No valid credential
               ▼
            401 Unauthorized

API Key Security¶

Storage: Only SHA-256 hash stored; raw key returned once at creation
Generation: 32 bytes from crypto/rand.Read with explicit panic on failure
Format: sk- prefix + base64url encoding (43 chars)
Lookup: O(1) hash comparison, no timing side-channel
Lifecycle: Create → Active → Revoked (soft delete, never hard delete)
Expiry: Optional expires_at field; expired keys rejected at auth time

Public /auth/channel/* routes do not accept tenant_id; tenant is derived from channel_apps.platform + app_key.
channel_apps stores provider secret references only. Raw secrets are resolved through secrets.SecretResolver.
Provider openid/userid values are HMAC-hashed with HERMES_CHANNEL_HASH_SECRET before storage or lookup.
OAuth state and gateway binding challenges are random, short-lived, and single-use.
Browser login uses an opaque hx_session HttpOnly cookie. Unsafe methods require X-Hermes-CSRF matching the hx_csrf cookie.
Feishu, Weixin, and WeCom webhooks must pass provider token/signature validation before creating MessageEvent.

Tenant Isolation¶

Design Principle¶

Tenant identity is NEVER derived from user-supplied headers or request body (for non-admin callers).

The TenantMiddleware extracts tenant_id exclusively from AuthContext, which is set by the auth chain based on the credential presented.

Defense in Depth¶

Layer	Mechanism	Bypass Difficulty
1. Application	`AuthContext.TenantID` from credential	Requires valid key for target tenant
2. Middleware	All store calls include tenant_id	Requires code modification
3. Database/backend guard	PostgreSQL RLS or MySQL static SQL guard + regression	Requires privileged DB access or code change
4. Index	Unique indexes include tenant_id	Schema-level enforcement

Backend Isolation Controls¶

PostgreSQL uses Row-Level Security as an additional database-side guard.

Every tenant-scoped table has:

ALTER TABLE <table> ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation_<table> ON <table>
  USING (tenant_id::text = current_setting('app.current_tenant', true))
  WITH CHECK (tenant_id::text = current_setting('app.current_tenant', true));

The application sets the tenant context per transaction:

func withTenantTx(ctx context.Context, pool *pgxpool.Pool, tenantID string, fn func(pgx.Tx) error) error {
    tx, _ := pool.Begin(ctx)
    tx.Exec(ctx, "SET LOCAL app.current_tenant = $1", tenantID)
    err := fn(tx)
    // commit or rollback
}

Important: RLS affects non-superuser roles. The application connects as a restricted role, not the database owner.

MySQL does not provide PostgreSQL-equivalent RLS. MySQL production support is therefore proven by:

Store-layer tenant_id parameters on tenant-scoped operations.
scripts/check_tenant_sql_mysql.sh static SQL guard.
Explicit skip markers for scheduler or platform aggregate queries.
Cross-tenant regression and restore drills documented in docs/runbooks/backend-enterprise-validation-matrix.md.

Tables with RLS¶

Table	Policy	Indexes
sessions	tenant_isolation_sessions	idx_sessions_tenant
messages	tenant_isolation_messages	idx_messages_tenant_session
tenants	tenant_isolation_tenants	pk
api_keys	tenant_isolation_api_keys	idx_api_keys_tenant
audit_logs	tenant_isolation_audit	idx_audit_tenant
memories	tenant_isolation_memories	idx_memories_tenant_user
user_profiles	tenant_isolation_profiles	idx_profiles_tenant
roles	tenant_isolation_roles	idx_roles_tenant
cron_jobs	tenant_isolation_cron	idx_cron_tenant
execution_receipts	tenant_isolation_exec_receipts	idx_exec_receipts_tenant
egress_rules	tenant_isolation_egress_rules	idx_egress_rules_tenant

FORCE RLS tables: execution_receipts and egress_rules are configured with FORCE ROW LEVEL SECURITY (migrations 109–110), ensuring the policy applies even to the table owner role.

Sandbox Model¶

Agent Runtime
  │
  ├─ Policy Check: Is tool in AllowedTools?
  │     No → outcome: skipped
  │
  ├─ Idempotency Check: Has this idempotency_key been seen?
  │     Yes → outcome: deduplicated (return cached result)
  │
  ├─ Sandbox Selection: SandboxPolicy.AllowDocker?
  │     ├─ Local: subprocess with timeout + env stripping
  │     └─ Docker: --network=none, --memory, --cpus
  │
  ├─ Execution: Run tool with timing capture
  │
  └─ Receipt: Record ExecutionReceipt with outcome + trace_id

Sandbox Controls¶

Control	Local	Docker
Timeout	Process kill after N seconds	Container kill after N seconds
Network	Inherited (no restriction)	`--network=none` available
Filesystem	Process CWD only	Ephemeral container filesystem
Memory	OS limits	`--memory` flag
CPU	OS scheduling	`--cpus` flag
Env vars	Stripped to PATH/HOME/LANG/TERM/TMPDIR	Minimal env
Output	Truncated at 50KB	Truncated at 50KB
Tool calls	Max 50 per session	Max 50 per session

Per-Tenant Policy¶

{
  "enabled": true,
  "max_timeout_seconds": 60,
  "allowed_tools": ["read_file", "write_file", "terminal", "web_search"],
  "allow_docker": true,
  "restrict_network": true,
  "max_stdout_kb": 50
}

Egress Policy¶

HermesX routes tool HTTP traffic, workflow service_task calls, and workflow agent_task tool traffic through SecureTransport.

Environment	Default	Override
development	`allow-all`	`HERMES_EGRESS_DEFAULT=allow-all\|deny-all\|log-only`
production	`deny-all` unless a tenant rule matches	`HERMES_EGRESS_DEFAULT=allow-all\|deny-all\|log-only`

Tenant allowlist rules are stored in egress_rules and managed through /admin/v1/egress/allowlist. Each rule matches tenant_id, host_pattern, optional path_prefix, action, and priority. deny-all also requires explicit tenant rules for LLM provider hosts; built-in provider host shortcuts are only convenience behavior outside production deny-all mode.

Every allowed, denied, DNS-failed, and private-IP-blocked decision is logged by the egress audit logger with tenant, host, allowed flag, and reason.

Redirect bypass protection: agent.go's CheckRedirect hook validates every redirect Location target via egress.ValidateRedirectTarget before following. Requests whose redirect target resolves to loopback (127.0.0.0/8), private (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16), CGNAT (100.64.0.0/10), or link-local (169.254.0.0/16) ranges are rejected, closing the standard SSRF redirect-bypass vector.

MCP Sampling safety gates: The MCP SamplingHandler supports an optional SafetyInterceptor. When configured via NewSamplingHandlerWithSafety, every sampling request is checked via CheckInput before the LLM call and CheckOutput after; blocked requests return JSON-RPC error -32000 with a reason string.

Audit Trail¶

What Gets Audited¶

Event	Actor	Resource	Metadata
API Key created	user	api_key	key_id, scopes
API Key revoked	admin	api_key	key_id, reason
Session created	user	session	session_id
Tool executed	agent	tool	tool_name, duration, status
GDPR export	admin	tenant	export_format
GDPR delete	admin	tenant	tables_affected
Tenant created	platform	tenant	plan, config
Sandbox policy changed	admin	tenant	old/new policy
Channel login started	user	channel_app	platform, app_id
Channel login succeeded	user	browser_session	platform
Channel binding created	user	channel_identity	platform
Channel binding revoked	user/admin	channel_identity	binding_id
Gateway unbound message	provider user	channel_app	platform, app_id
Channel auth failed	provider user	channel_auth	reason

Execution Receipts¶

Every tool invocation produces an ExecutionReceipt:

For the dedicated governance contract, API examples, and idempotency semantics, see Execution Receipts. For the boundary between free agent chat and fixed SOP workflow execution, see Workflow and Agent Runtime Boundary.

Field	Purpose
id	Unique receipt identifier
tenant_id	Tenant boundary
session_id	Session context
tool_name	Which tool was called
input	Truncated input (4KB max)
output	Truncated output (4KB max)
status	success / error
duration_ms	Execution time
idempotency_id	At-most-once guarantee
trace_id	Distributed trace correlation

Idempotency¶

Unique index on (tenant_id, idempotency_id) ensures at-most-once execution. If a duplicate request arrives: 1. Lookup by idempotency_id 2. Return cached output from existing receipt 3. No re-execution occurs

Rate Limiting¶

Architecture¶

Request → Extract tenant_id + user_id
  │
  ├─ Tenant limit: sliding window per tenant (Redis Lua script)
  ├─ User limit: sliding window per user within tenant
  │
  ├─ Both pass? → Allow
  ├─ Either fails? → 429 Too Many Requests
  │
  └─ Redis down? → Local LRU fallback (degraded accuracy)

Redis Sliding Window¶

Atomic Lua script ensures no race condition:

MULTI
  ZREMRANGEBYSCORE key 0 (now - window)
  ZADD key now now
  ZCARD key
EXEC

Secret Management¶

Principles¶

No hardcoded credentials in source code
All secrets via environment variables
API keys stored as SHA-256 hashes only
Raw keys returned exactly once at creation
crypto/rand for all key generation with explicit failure handling
No default passwords in any configuration

Credential Rotation¶

API Keys: Create new → migrate clients → revoke old
Database: Connection string rotation via env var update + restart
LLM keys: Hot-swap via env var (no restart required with config reload)

Network Security Recommendations¶

Component	Recommendation
API Server	Behind reverse proxy with TLS termination
PostgreSQL	Private network only, SSL required
Redis	Private network, AUTH enabled
MinIO	Private network, TLS for production
OTel Collector	Internal only, no public exposure
Metrics endpoint	Internal network or authenticated proxy