host1 is now pve2: crossing to Proxmox 9 without dropping a container

host1 came back last month; this week it was reborn as pve2 on Proxmox 9.2. The cluster crossed a major Debian version live, eleven containers never noticed, and a two-node quorum trap got defused before it could bite.

pvecm status showing clusterFer with two nodes, quorate, in two-node mode

Last month, host1 came back from the dead. This week, it grew up.

In host1, back in the cluster I told the story of a node that froze, returned on a different CPU, and revealed 319 GiB of stale TRIM. That was the recovery. This is the upgrade: the whole home cluster crossed a major version line, host1 was reborn under a new name, and eleven production containers never noticed.

Crossing a major version, live

Our main hypervisor, pve1, was running Proxmox 8.3 on Debian Bookworm. The target was Proxmox 9.2 on Debian Trixie: a major OS jump, the kind that usually means a maintenance window and crossed fingers. We did it in stages instead. First a minor upgrade to 8.4, then a reboot onto the new kernel, then the official pve8to9 pre-flight check (39 passes, a handful of warnings, zero hard blockers), and only then the jump to 9.2.

Two safety nets stood behind every step: ZFS root snapshots we could roll back to in seconds, and a full set of container backups on our Proxmox Backup Server. We never needed either. After the final reboot pve1 came up on kernel 7.0, all eleven containers autostarted, and the ZFS pool reported healthy.

host1, reborn as pve2

With pve1 on 9.2, it was host1's turn, except host1 came back as pve2. We brought it up clean on the same Proxmox 9.2 and joined it to the cluster. Joining an empty node is the easy case: no guests to collide, no IDs to reconcile. Within a minute the cluster was two nodes, quorate, sharing a single configuration filesystem.

The two-node trap

Two nodes is exactly where Proxmox quorum gets dangerous. With one vote each, quorum becomes two, so if either node reboots the survivor loses quorum and freezes: no starting or stopping containers, configuration read-only. With production living on pve1, that is not a trade we wanted.

The fix is two-node mode: a single surviving node keeps quorum, and we disabled wait-for-all so pve1 can boot alone after a power cut without waiting for its partner. We confirmed it at the source, where the quorum requirement dropped to one, before trusting it.

Backups follow the new node

A machine that runs something but backs up nowhere is a liability, so we extended the backup server to pve2 the same day. Rather than share credentials, pve2 got its own access token and its own isolated namespace on each of the two backup disks, so its future backups can never collide with another node's and the two stores stay cleanly separated.

One small Proxmox 9 gotcha

The pre-flight check earned its keep: it flagged a custom permission role still referencing a privilege that Proxmox 9 had removed. Harmless, but it printed a warning on every container listing until we trimmed it. Exactly the kind of paper cut a major upgrade leaves behind.

Why we bother

This is our own lab, but the discipline is the same one we bring to client platforms: stay on supported versions, keep a second node so maintenance never means downtime, and make sure every machine that runs something also has somewhere to back it up. host1 is gone. Long live pve2.

in LTC Labs

# Systems

Our AI-operated quote designer

How we turned the dark art of scoping an Odoo implementation into a structured model: traditional versus AI, delivery time and year-one TCO, estimated versus real.