1) Define RPO/RTO first
Retention is a cost/control function of RPO (max tolerable data loss) and RTO (time to restore). Capture these per workload:
- Tier apps (e.g., Tier 0 critical, Tier 1 important, Tier 2 standard).
- Set RPO targets (e.g., Tier 0: 1–4 hours; Tier 1: 8–24 hours; Tier 2: daily).
- Set RTO targets (e.g., Tier 0: <2 hours; Tier 1: <8 hours; Tier 2: next business day).
- Align verification cadence to RTO: tighter RTO requires more frequent verify and restore drills.
2) Choose retention rules that map to tiers
Start with daily/weekly/monthly patterns and scale up with quarters/years where needed. Examples:
- Tier 0: keep 14 daily, 6 weekly, 12 monthly.
- Tier 1: keep 7 daily, 4 weekly, 6 monthly.
- Tier 2: keep 7 daily, 2 weekly, 3 monthly.
In PBS, set retention in the PVE backup job or as a prune rule on the datastore. Keep policies consistent to avoid conflicts and unexpected deletions.
3) Prune and garbage collection cadence
- Prune: Run daily after backup windows close. Mirror the retention rules above.
- GC: Run after prune, at least weekly (nightly for fast-churning data). Avoid peak backup/verify windows.
- Monitoring: Alert on prune/GC failures, chunk errors, or namespace warnings; these often signal permissions or storage latency.
PBS GUI: Datastore > Prune and Datastore > Garbage Collection. CLI: proxmox-backup-manager datastore prune and ... gc.
4) Immutability options
PBS itself does not enforce object-lock, but you can design for immutability in these ways:
- ZFS snapshots of datastores: Periodic, read-only snapshots on the PBS pool, retained in line with business needs. Protects against accidental deletion, not all ransomware cases.
- Offsite Remote Sync to object-lock capable storage: Sync to a secondary PBS backed by object-lock/WORM storage (e.g., S3 with object-lock). WORM is enforced by the object store; PBS S3 backend is currently marked “technology preview.”
- Namespace permissions: Use namespaces with strict roles and tokens to prevent deletes from standard operators.
- Air-gap / isolation: Limit network paths from compromised workloads to PBS; separate admin accounts; short-lived tokens.
Document what is immutable vs. what is just retained. Communicate deletion SLAs to stakeholders.
5) Namespaces and multi-tenant hygiene
- Create namespaces per environment (prod/stage/dev) or per business unit.
- Apply least-privilege roles at the namespace level (
DatastorePowerUserorDatastoreBackup). - Run prune/GC per namespace to avoid surprises when teams share a datastore.
- Use separate tokens per namespace; rotate regularly.
6) Offsite replication via Remote Sync
Keep a second PBS (or PBS over object-lock storage) in another AZ/DC.
- On primary PBS: Remotes > Add your secondary PBS with scoped token.
- Create a Sync Job per datastore/namespace; schedule after prune/verify.
- Consider longer retention offsite (e.g., +1–3 months) and stricter immutability.
- Test restores from the secondary monthly to validate DR posture.
7) Monitoring and alerting
- Alert on prune/GC failures, verify failures, and chunk errors.
- Track datastore free space, GC reclaimed bytes, and verify durations.
- Send PBS syslog to a central system; enable email or webhook notifications.
- Watch for token misuse or unexpected delete attempts in audit logs.
Example playbooks
Standard workload (Tier 1)
- Retention: 7 daily, 4 weekly, 6 monthly.
- Prune: nightly after backups; GC weekly.
- Verify: weekly.
- Offsite: Remote Sync nightly; offsite retention +1 month.
- Immutability: ZFS snapshots daily retained 14 days; tokens scoped per namespace.
Critical workload (Tier 0)
- Retention: 14 daily, 6 weekly, 12 monthly, 5 yearly.
- Prune: nightly; GC nightly or every other night depending on churn.
- Verify: daily light verify on recent snapshots; weekly full verify.
- Offsite: Remote Sync twice daily; offsite immutability/object-lock enforced.
- Immutability: Object-lock offsite; local ZFS snapshots hourly for 24h, daily for 14d.
Cost-sensitive workload (Tier 2)
- Retention: 7 daily, 2 weekly, 3 monthly.
- Prune: nightly; GC weekly.
- Verify: weekly.
- Offsite: Remote Sync weekly; offsite retention mirrors primary.
- Immutability: Minimal—namespaces and role isolation only.
Need retention and immutability designed for you?
We build PBS policies, implement Remote Sync, and validate restores.