Strategy guide

Designing retention and immutable backup policies for PBS

Turn RPO/RTO targets into practical retention, pruning, garbage collection, and immutability settings for Proxmox Backup Server.

Quick facts

Turn policy into practice.

  • 01Map RPO/RTO to retention tiers
  • 02Schedule prune + GC without collisions
  • 03Add immutability via namespaces/sync
  • 04Use Remote Sync for offsite retention

1) Define RPO/RTO first

Retention is a cost/control function of RPO (max tolerable data loss) and RTO (time to restore). Capture these per workload:

  • Tier apps (e.g., Tier 0 critical, Tier 1 important, Tier 2 standard).
  • Set RPO targets (e.g., Tier 0: 1–4 hours; Tier 1: 8–24 hours; Tier 2: daily).
  • Set RTO targets (e.g., Tier 0: <2 hours; Tier 1: <8 hours; Tier 2: next business day).
  • Align verification cadence to RTO: tighter RTO requires more frequent verify and restore drills.

2) Choose retention rules that map to tiers

Start with daily/weekly/monthly patterns and scale up with quarters/years where needed. Examples:

  • Tier 0: keep 14 daily, 6 weekly, 12 monthly.
  • Tier 1: keep 7 daily, 4 weekly, 6 monthly.
  • Tier 2: keep 7 daily, 2 weekly, 3 monthly.

In PBS, set retention in the PVE backup job or as a prune rule on the datastore. Keep policies consistent to avoid conflicts and unexpected deletions.

3) Prune and garbage collection cadence

  • Prune: Run daily after backup windows close. Mirror the retention rules above.
  • GC: Run after prune, at least weekly (nightly for fast-churning data). Avoid peak backup/verify windows.
  • Monitoring: Alert on prune/GC failures, chunk errors, or namespace warnings; these often signal permissions or storage latency.

PBS GUI: Datastore > Prune and Datastore > Garbage Collection. CLI: proxmox-backup-manager datastore prune and ... gc.

4) Immutability options

PBS itself does not enforce object-lock, but you can design for immutability in these ways:

  • ZFS snapshots of datastores: Periodic, read-only snapshots on the PBS pool, retained in line with business needs. Protects against accidental deletion, not all ransomware cases.
  • Offsite Remote Sync to object-lock capable storage: Sync to a secondary PBS backed by object-lock/WORM storage (e.g., S3 with object-lock). WORM is enforced by the object store; PBS S3 backend is currently marked “technology preview.”
  • Namespace permissions: Use namespaces with strict roles and tokens to prevent deletes from standard operators.
  • Air-gap / isolation: Limit network paths from compromised workloads to PBS; separate admin accounts; short-lived tokens.

Document what is immutable vs. what is just retained. Communicate deletion SLAs to stakeholders.

5) Namespaces and multi-tenant hygiene

  • Create namespaces per environment (prod/stage/dev) or per business unit.
  • Apply least-privilege roles at the namespace level (DatastorePowerUser or DatastoreBackup).
  • Run prune/GC per namespace to avoid surprises when teams share a datastore.
  • Use separate tokens per namespace; rotate regularly.

6) Offsite replication via Remote Sync

Keep a second PBS (or PBS over object-lock storage) in another AZ/DC.

  • On primary PBS: Remotes > Add your secondary PBS with scoped token.
  • Create a Sync Job per datastore/namespace; schedule after prune/verify.
  • Consider longer retention offsite (e.g., +1–3 months) and stricter immutability.
  • Test restores from the secondary monthly to validate DR posture.

7) Monitoring and alerting

  • Alert on prune/GC failures, verify failures, and chunk errors.
  • Track datastore free space, GC reclaimed bytes, and verify durations.
  • Send PBS syslog to a central system; enable email or webhook notifications.
  • Watch for token misuse or unexpected delete attempts in audit logs.

Example playbooks

Standard workload (Tier 1)

  • Retention: 7 daily, 4 weekly, 6 monthly.
  • Prune: nightly after backups; GC weekly.
  • Verify: weekly.
  • Offsite: Remote Sync nightly; offsite retention +1 month.
  • Immutability: ZFS snapshots daily retained 14 days; tokens scoped per namespace.

Critical workload (Tier 0)

  • Retention: 14 daily, 6 weekly, 12 monthly, 5 yearly.
  • Prune: nightly; GC nightly or every other night depending on churn.
  • Verify: daily light verify on recent snapshots; weekly full verify.
  • Offsite: Remote Sync twice daily; offsite immutability/object-lock enforced.
  • Immutability: Object-lock offsite; local ZFS snapshots hourly for 24h, daily for 14d.

Cost-sensitive workload (Tier 2)

  • Retention: 7 daily, 2 weekly, 3 monthly.
  • Prune: nightly; GC weekly.
  • Verify: weekly.
  • Offsite: Remote Sync weekly; offsite retention mirrors primary.
  • Immutability: Minimal—namespaces and role isolation only.

Need retention and immutability designed for you?

We build PBS policies, implement Remote Sync, and validate restores.