Troubleshooting

Common PBS errors and quick fixes

Resolve the issues PBS admins hit most: prune/GC failures, verify errors, cert and fingerprint warnings, slow backups, and storage not freeing.

At a glance

Fix common PBS issues.

  • 01Prune/GC: perms, namespaces, latency
  • 02Verify: disk health or cert mismatch
  • 03Space: run GC; include namespaces
  • 04Performance: MTU/offloads, ARC, source IO

Prune errors

  • Permission/namespace issues: Ensure the token/user has prune rights on the datastore/namespace. Check DatastorePowerUser or DatastoreBackup roles.
  • Conflicting policies: Align PVE job retention with PBS prune rules; avoid competing keeps.
  • Path problems: Verify datastore path is mounted and writable; check for stale mounts.

Garbage collection issues

  • Active namespaces: GC may skip chunks in use. Run GC per namespace or ensure jobs are complete.
  • Latency: High IO latency can timeout GC. Check ZFS health, disk SMART, and storage queues.
  • Ordering: Run prune before GC and avoid overlapping heavy verify jobs.

Verify failures

  • IO errors: Check dmesg, ZFS status, and SMART. Bad disks or pools cause chunk read errors.
  • Timeouts/latency: Add cache/special device, reduce concurrency, or schedule verify away from peak.
  • Cert mismatch: Verify TLS and fingerprints when accessing remotes/namespaces.

Space not freeing after prune

  • Run GC after prune; space frees only after GC.
  • If namespaces exist, run GC for each or at datastore root with namespaces included.
  • Check for failed prunes or jobs holding chunks (verify/backup in progress).

Fingerprint and certificate warnings

  • Confirm PVE uses the same FQDN that the PBS certificate covers.
  • After renewing PBS certs, update fingerprints in PVE storage config.
  • Use proxmox-backup-manager cert info | grep SHA256 to fetch the fingerprint.

Slow backups or restores

  • Network: MTU mismatch, drops, or retransmits. Validate jumbo frames end-to-end; check switch counters.
  • Offloads/CPU: If CPU-bound, test disabling GRO/LRO on NICs; add CPU or reduce compression level.
  • Storage: Check disk latency, ARC pressure, and pool health; separate OS and datastore.
  • Concurrency: Reduce concurrent tasks; stagger PVE jobs; cap bandwidth during business hours.

Quick checklist

  • Prune before GC; run GC after every prune.
  • Verify weekly; daily for critical data.
  • Align retention between PVE jobs and PBS prune rules.
  • Keep TLS certs/FQDN/fingerprints in sync across PVE and PBS.
  • Monitor disk latency, ARC hit ratio, and network drops during backup windows.

Stuck on a PBS error?

We can troubleshoot, tune, and remediate prune/GC/verify issues for you.