Admin guide¶
This page is for operators bringing up a gdsgate cluster and keeping it running. It covers the supported deployment shapes, installation, registration and PKI lifecycle, network zoning, the state store, audit export, and the hardening checklist.
For the what and why, see Concepts. For every config field with defaults and examples, see Configuration.
Deployment shapes¶
All-in-one¶
One process runs Auth, Proxy, and an embedded agent against one state
store. It is the simplest topology — useful for development, demos, and a
small single-node deployment. Set store_url to a persistent location
(a file-backed SQLite or PostgreSQL) and the cluster survives restarts:
the transport CA is restored from the store and registered nodes keep
trusting it. With an in-memory store_url (default), every restart
yields a fresh CA — only useful for ephemeral tests.
An all-in-one cluster also accepts externally registered agents (your laptop, a remote sidecar), because the registration listener and the transport CA used on the internal channel come from the same persistent store. There is no separate "join-capable" mode.
Pros:
- one binary, one config, one process to operate;
- the simplest path to first traffic.
Cons:
- single process, single failure domain — there is no internal failover;
- the public listener stays plaintext by default in this mode (because the same anchor is used internally; turning on public TLS still works but you choose to distribute the CA anchor to clients).
Multi-node¶
Separate auth, proxy, and agent processes that authenticate to
each other with mutual TLS. New nodes obtain their transport identity
by registration (a one-time bootstrap token) and persist it. This
is the production shape.
Pros:
- horizontal scaling: many proxies in front of one Auth, many agents in the protected zone;
- failure isolation: a proxy or agent restart does not touch the control plane;
- HA: several Auth instances sharing one store, with a single audit write-leader at a time.
High availability¶
Set [ha].enabled = true on every Auth instance and point them at one
shared PostgreSQL store_url. Each instance starts as a follower
(refusing to write) and races for the audit write-leader lease
through the store. The single leader handles authorisation writes; the
followers serve only reads. A follower takes over within roughly
lease_ttl_secs of the leader's death. This is failover for the
single linear audit chain, not horizontal write scaling.
Installation¶
gdsgate ships as one binary. The CLI is a subcommand of it — there is no separate client to install.
From a release¶
Two artefacts per release:
| Artefact | Notes |
|---|---|
gdsgate-<tag>-x86_64-unknown-linux-gnu |
dynamically linked (glibc) |
gdsgate-<tag>-x86_64-unknown-linux-musl |
static, self-contained |
Download the binary and the integrity files, then verify before use.
Verification needs cosign and the
project's published cosign.pub public key.
# 1. checksums file signed by the project's cosign key
cosign verify-blob --key cosign.pub --signature SHA256SUMS.sig SHA256SUMS
# 2. artefacts match the checksums
sha256sum -c SHA256SUMS
# 3. (optional) inspect the CycloneDX SBOM
jq '.metadata.component.name, (.components | length)' gdsgate-<tag>.cdx.json
install -m 0755 gdsgate-<tag>-x86_64-unknown-linux-musl /usr/local/bin/gdsgate
gdsgate --version
If step 1 fails, the artefacts are not from the project (or were tampered with) — stop.
Each release is byte-for-byte reproducible: an independent rebuild from
the same commit produces an identical sha256. See
Operations → Release verification.
As a systemd unit¶
Run one role per unit, each with its own config. Example for an agent:
# /etc/systemd/system/gdsgate-agent.service
[Unit]
Description=gdsgate agent
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
ExecStart=/usr/local/bin/gdsgate --config /etc/gdsgate/agent.toml agent
# Bootstrap token (one-time secret) — keep out of the long-lived config file.
# Use an EnvironmentFile with mode 0600 to ship it on boot.
EnvironmentFile=-/etc/gdsgate/agent.env
DynamicUser=yes
StateDirectory=gdsgate
WorkingDirectory=/var/lib/gdsgate
CapabilityBoundingSet=
AmbientCapabilities=
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=yes
ReadWritePaths=/var/lib/gdsgate
Restart=on-failure
RestartSec=2s
LimitNOFILE=65535
[Install]
WantedBy=multi-user.target
/etc/gdsgate/agent.env:
Point [enroll].state_dir at /var/lib/gdsgate (the StateDirectory=)
so the persisted identity survives restarts.
As a container¶
The same binary runs in a minimal image. It needs no root and no Linux capabilities, so run it unprivileged with a read-only root filesystem:
FROM debian:bookworm-slim
COPY gdsgate /usr/local/bin/gdsgate
RUN useradd --uid 10001 --create-home gdsgate
USER gdsgate
ENTRYPOINT ["gdsgate"]
docker run --rm \
--read-only --tmpfs /tmp \
--cap-drop ALL --security-opt no-new-privileges \
-v gdsgate-state:/var/lib/gdsgate \
-v /etc/gdsgate:/etc/gdsgate:ro \
-e GDSGATE_ENROLL_TOKEN="$token" \
myorg/gdsgate --config /etc/gdsgate/agent.toml agent
Bringing up a multi-node cluster¶
This section walks through a production-shaped cluster, end to end.
1 — Auth + the state store¶
Auth holds the audit chain, the certificate authorities (their private keys), and the registration-token registry. Give it a persistent store:
# auth.toml
profile = "prod"
store_url = "postgres://gdsgate:…@store-db:5432/gdsgate"
[endpoints]
auth = "0.0.0.0:50051" # mTLS control plane
auth_enroll = "0.0.0.0:50050" # plaintext bootstrap registration
[policy]
path = "/etc/gdsgate/policy.cedar"
[oidc]
issuer = "https://idp.example.com/realms/gdsgate"
client_id = "gdsgate"
On first start Auth generates and persists the transport CA, the User SSH CA, and the Onward SSH CA. On restart it resumes the same authorities, so already-registered nodes keep trusting it.
2 — Bootstrap tokens¶
Auth's control plane is mTLS-only, so a standalone Proxy or Agent must register before it can talk to it. Generate a one-time token per node; the command shares Auth's store URL:
gdsgate --config auth.toml auth create-token --role proxy --ttl 3600
gdsgate --config auth.toml auth create-token --role agent --ttl 3600
Each prints one token on stdout (logs go to stderr). The token is
single-use and short-lived. Hand it to the joining node out of band
(a systemd EnvironmentFile, a container secret, a Vault retrieval —
not a long-lived config file).
3 — Register the Proxy¶
# proxy.toml
[endpoints]
auth = "auth:50051"
proxy_public = "0.0.0.0:50061"
proxy_internal = "0.0.0.0:50062"
proxy_ws = "0.0.0.0:50063"
[enroll]
endpoint = "http://auth:50050"
state_dir = "/var/lib/gdsgate/proxy"
The Proxy presents the token plus a certificate-signing request, receives
a signed transport leaf and the trust bundle, and persists them to
state_dir. From now on:
- the public listener serves TLS verified by the cluster transport CA;
- the internal listener requires mutual TLS;
- the Auth client runs mutual TLS off the same identity.
4 — Register the Agent¶
The agent declares the resources it serves and the tunnel target:
# agent.toml
[endpoints]
auth = "auth:50051"
proxy_internal = "proxy:50062"
proxy_ws = "proxy:50063"
[enroll]
endpoint = "http://auth:50050"
state_dir = "/var/lib/gdsgate/agent"
[agent]
id = "edge-1"
[[agent.backends]]
resource = "prod-db"
kind = "postgres"
addr = "10.0.0.5:5432"
[[agent.backends]]
resource = "jump-host"
kind = "ssh"
The Agent registers, opens the reverse tunnel to the Proxy, registers its resources, and is ready to serve.
5 — Give clients the transport CA¶
Clients verify the Proxy's public TLS against the cluster transport CA.
That anchor is the transport-ca.pem an enrolled node writes into its
state_dir (every node receives the same cluster CA). Distribute that
file to your users and point their client config at it:
# client.toml
[endpoints]
proxy_public = "gdsgate.example.com:50061"
[client]
transport_ca = "/etc/gdsgate/transport-ca.pem"
[oidc]
issuer = "https://idp.example.com/realms/gdsgate"
client_id = "gdsgate"
After that, the User guide is everything a user needs.
Registration lifecycle and PKI¶
What gets persisted where¶
- Auth's state store — the audit chain, the transport CA private key,
the User SSH CA and Onward SSH CA private keys (all rotations
retained), the registration-token registry, the resource catalog
(the seeded
[discovery]), the access-request queue. - Each node's
state_dir— its own transport key and certificate, the cluster's transport CA anchor (transport-ca.pem), and (for agents serving SSH model A) the persistent SSH host key.
Together they keep the cluster consistent across restarts. Back up the store; persist the state dir. A restored store lets Auth resume the same transport CA, so already-registered nodes keep working.
Re-registration near expiry¶
Node transport certificates are short-lived. A long-running node should have a fresh token available near expiry so it can re-register. Two common ways:
- ship a fresh token through the systemd
EnvironmentFileon the next restart, and let your supervisor restart the process well before expiry; - if you orchestrate from CI, generate a new token on schedule and roll the unit.
CA rotation¶
Two CAs are operator-rotatable, on top of the persistent transport CA:
- User SSH CA — issued access certificates. Rotate it to limit the
blast radius of a compromised signer or just on a calendar. Auth
drives a paced double-signing rotation: the new generation is
published as a candidate, verifiers refresh their trust bundle
(
propagation_secs), the candidate is promoted to the signing CA, and the old generation is retired only afterretire_secs(set this above the issued certificate TTL so existing certificates expire naturally before their CA is dropped). - Onward SSH CA — issued OpenSSH user certificates the agent
presents to downstream
sshdin the model-B jump-host path. The same paced rotation; downstreamsshdconfigurations must trust the rotating CA bundle (gdsgate auth onward-ca-pubsprints every active and retiring public key, one OpenSSH line each — drop into thesshd'sTrustedUserCAKeys).
Enable scheduled rotation via [ca_rotation],
or trigger one manually:
gdsgate --config auth.toml auth rotate-ca # User SSH CA
gdsgate --config auth.toml auth rotate-onward-ca # Onward SSH CA
The transport CA is not runtime-rotatable in v1: rotating it would invalidate every enrolled node at once. Plan it as a controlled re-key event (re-register every node).
Just-in-time access — configuring approvers¶
JIT approval has two independent controls.
| Control | Decides | Configured in |
|---|---|---|
Cedar approveRequest |
Who may sign off on a pending request | [policy].path |
[approvals] cascade |
How many distinct approvers are required | [approvals] and [[discovery.resources]].min_approvers |
Who: Cedar approveRequest¶
Permit the approver group(s) in your Cedar policy. The simplest case:
Tighten with MFA-age, an open ticket, and so on. See Policy → Approving access requests for the patterns and current Cedar-context limitations.
How many: the [approvals] cascade¶
The number of distinct approvers a request needs is resolved by the narrowest-wins cascade, evaluated against the resource the request targets:
- Per-resource —
min_approverson the catalogue entry: - Per-environment —
[approvals].per_environment: - Global —
[approvals].min_approvers. The floor is1.
The resource's environment comes from the catalogue (seeded from
[discovery]) — so JIT thresholds rely on [discovery] being
populated for every JIT-gated resource.
Day-to-day flow¶
# Requester
gdsgate request-access db-orders_prod \
--reason "PROD-1234, read-only on orders" --ttl 3600
# Approvers
gdsgate requests # list pending
gdsgate approve <request-id>
Every review (allow and deny) is recorded in the audit log. Once the
threshold is reached, the request transitions to approved with
expires_at = now + requested_ttl. An active approval is exposed to
the Cedar context of the requester's subsequent access call as
context.approved_request.expires — your policy then unblocks the
connect for the remaining TTL (see
Policy → JIT approval as the only way in).
A worked deployment¶
# auth.toml — operator-side cascade
[approvals]
min_approvers = 1
[approvals.per_environment]
prod = 2
staging = 1
# Per-resource override for the strictest backend
[[discovery.resources]]
id = "db-orders_prod"
kind = "postgres"
min_approvers = 3
// policy.cedar (fragment)
permit(principal, action == Action::"approveRequest", resource)
when { principal in Group::"sre" };
permit(principal, action == Action::"dbConnect", resource)
when {
resource.environment == "prod"
&& context has approved_request
&& context.approved_request.expires > context.timestamp
};
forbid(principal, action == Action::"dbConnect", resource)
when {
resource.environment == "prod"
&& !(context has approved_request)
};
Validate before deploy:
Operational notes¶
- Self-approval is not blocked by Cedar in v1 — the request's
requester is not in the Cedar context for
approveRequest. Enforce it operationally: keep requesters and approvers in distinct Cedar groups, and audit-monitor for the requester also appearing as a reviewer. - There is no built-in notifications — wire alerts off the audit
log's
createAccessRequestandapproveRequestevents. - Per-environment thresholds work today; per-environment
approver groups (e.g.
prod-approversfor prod-only) cannot yet be expressed in Cedar because the target resource of the request is not in the context forapproveRequest.
Network zoning¶
Place the components in segmented networks so the topology itself enforces the access path. A reference layout uses four zones:
| Zone | Members | Purpose |
|---|---|---|
edge |
client, Proxy public listener, identity provider | Public client access |
control |
Auth, Proxy, state store, identity provider (JWKS) | Control plane + state |
backend |
agent, resources | Protected zone — resources reachable only here |
uplink |
agent → Proxy, Auth | Agent egress: registration + reverse tunnel |
flowchart LR
client([client]) --- edge
edge --- proxy
proxy --- control
control --- auth & store[(store)]
agent --- backend
backend --- resources[(resources)]
agent --- uplink
uplink --- proxy
Key properties:
- Agent sits in
backendwith the resources and reaches out overuplinkto the Proxy. Nothing connects into the backend zone — the reverse tunnel is outbound-only. - Proxy, Auth, store are not on
backend— even they cannot reach a resource directly. The only path is the agent's tunnel. - Clients are on
edgeonly — they can reach neither the protected zone nor the control plane; only the Proxy's public TLS.
With real firewalling: "allow egress from agent to proxy; deny all inbound to the protected zone".
State store¶
Auth's store is the cluster's source of truth.
| Store | When | Notes |
|---|---|---|
sqlite::memory: |
development, unit tests, gdsgate all quick demo |
Lost on restart — every restart rolls fresh CAs. |
sqlite:///var/lib/gdsgate/state.db?mode=rwc |
single-node, small deployments | File-backed. Back up the file. Survives restarts; external nodes can register. |
postgres://… |
multi-node, HA | Required for multiple Auth instances sharing the audit chain. Use a real backup strategy. |
Whatever you choose, back it up. A restored store lets Auth resume the same transport CA, so already-registered nodes keep their identity across a restore. Without it you would re-register every node and re-issue every credential.
Audit export¶
Records are appended to a hash-chained log. They export as:
- Canonical JSON — event fields plus hex hashes of this and the previous record;
- Splunk HEC — the JSON event wrapped in a HEC envelope;
- CEF / syslog — a single CEF line for classic SIEMs.
Ship the log to your SIEM and periodically verify the chain (a gap or a broken link is an alert). See Operations → Audit.
Hardening checklist¶
The reference stand runs every gdsgate node with all of the below and stays green — proving the hardening does not break the data plane.
Network¶
- [ ] Backends only on the protected network; the agent bridges out, nothing in.
- [ ] Proxy / Auth not on the backend network.
- [ ] Clients reach only the proxy's public TLS, verified by the cluster transport CA.
Transport¶
- [ ] Internal mutual TLS on every control hop, off the transport CA.
- [ ] Registration tokens are single-use and short-lived; minted per node, out of band.
- [ ] Short credential TTLs; a renewal path for long-running nodes.
Process¶
- [ ] Non-root user; no Linux capabilities;
NoNewPrivileges=yes. - [ ] Read-only root filesystem; only the state directory writable;
tmpfs
/tmp. - [ ] One role per unit / container.
State¶
- [ ] Auth's state store private to the control plane; backed up.
- [ ] Each node's state directory persisted on durable storage.
- [ ] Audit log exported off-box; chain verification scheduled.
Policy¶
- [ ] A real Cedar policy is loaded (without one Auth runs deny-all).
- [ ] The Cedar policy is strict-validated before deploy
(
gdsgate auth policy validate). - [ ] Per-environment thresholds set in
[approvals]for sensitive resources.