Operations
This section is for the team that owns the on-call pager. Every page here documents a contract the engine guarantees plus the recommended operator posture.
Observability
OpenTelemetry-native logs, metrics, traces. Provider adapters for every major cloud. OTLP collector defaults shipped with the scaffold.
Wire up telemetry HealthDependency health
Per-backend readiness probes for 18+ backends. Health is host-agnostic and composes into the runtime catalog.
Configure probes ReliabilityRuntime failure policy
The engine's contract for transient errors, terminal failures, and graceful shutdown.
Read the policy PerformanceBenchmarking
The BenchmarkDotNet suite that gates hot-path regressions on composition, runtime lifecycle, AspNetCore, and scaffolding.
Open the benchmarks Gap inventoryOperational hardening
The known gap inventory — what is still in flight before adoption-ready status.
Review the gaps Container opsContainer image publishing
How to build and publish the runtime image, including air-gapped flows.
Publish imagesThe mental model
Section titled “The mental model”A CephalonEngine app exposes operational truth through three surfaces:
/engine/*routes — manifest, runtime, health, telemetry summary. Always on for ASP.NET Core hosts.snapshot.*configuration keys — runtime-resolved configuration, including deployment-mode posture.- OTLP telemetry — logs, metrics, traces. Every module’s telemetry shares the same resource attributes (
cephalon.engine.id,cephalon.module.name, etc.).
Whoever is on-call should have those three surfaces dashboarded.
Runtime failure policy at a glance
Section titled “Runtime failure policy at a glance”- Composition failure is fatal. The host fails to start, prints the failing module, and exits non-zero.
- Lifecycle failure during
OnStartis fatal. Hosts crash so orchestrators see the failure. - Transient runtime failures are caller-policy. The engine does not retry on the caller’s behalf.
- Graceful shutdown runs lifecycle hooks in reverse order. The host waits for an explicit drain interval (default
30s).
Full contract: Source → Runtime failure policy.
Observability defaults
Section titled “Observability defaults”The generated scaffold ships:
- An OTLP-ready collector config (
otel-collector-config.yaml). - A
Engine:Observabilitysection with safe defaults. - Per-module log scopes carrying
cephalon.module.name,cephalon.module.version,cephalon.engine.id. - Resource attributes derived from the manifest, so traces always identify their origin.
Provider-specific guidance for Alibaba Cloud, AWS, Azure Monitor, DigitalOcean, GCP, Grafana Cloud, Huawei Cloud, Kubernetes, New Relic, OpenShift, Oracle Cloud, Serilog, and Tanzu lives in the Technology → Observability catalog.
Dependency health
Section titled “Dependency health”CephalonEngine ships probes for 18+ backends. Each probe:
- declares a typed status (
Healthy,Degraded,Unhealthy). - reports latency, last-error, and any backend-specific metadata.
- composes into the runtime catalog so the host can publish
/health. - runs on a configurable interval.
Backends covered today: Cassandra, ClickHouse, Consul, Elasticsearch, HTTP, Kafka, Memcached, MongoDB, MQTT, MySQL, NATS, Neo4j, OpenSearch, Oracle, Postgres, RabbitMQ, Redis, SQL Server.
Production readiness checklist
Section titled “Production readiness checklist”Before flipping traffic, confirm:
-
cephalon doctoris clean on the target. - Composition smoke test passes against production config (
dotnet test ... --filter Category=Composition). -
/healthreturns200end-to-end (with dependencies wired). -
/engine/manifestreturns the expected module set + capabilities. - OTLP traffic reaches the observability backend.
- Dependency-health probes show
Healthyfor every required backend. - Rollback path is documented in the deployment runbook.
- On-call documentation references the engine id and the manifest schema version.
The deeper-rationale walkthrough is mirrored at Source → Operations (planned to graduate into a dedicated Operations → Production readiness page for 0.2.0-preview).