Help to restore networking after adding nvme and upgrade pve from 7.4 to 8

pengu1n · Dec 17, 2023

Hello.
I had a new NVME drive I wanted to add for a while when I had the time as I knew I would have downtime to adjust newly enumerated devices.
I have two autostarting VMs on this node. Node was on latest pve 7.4, one (and only one) has a pcie gpu passed through. I had read to prepare and knew I had to adjust both pci and network devices. I wanted to do this before any upgrades to pve 8. One step at the time.
I proceeded with the screenshotting and saving config files of the VM, disabled autostart and shutdown the node. Added the nvme and started up.
Then modified the VM settings using the UI to get the new pci device id of the gpu and audio device. It would not start, no matter what combination of All functions, GPU, ROM. Pci-e was enabled as in the past. The other VM without passthroughs started fine.
I had updated /etc/network/interfaces to reflect the new interface identifier and ended with:

auto lo
iface lo inet loopback

auto enp8s0
iface enp8s0 inet manual

auto vmbr0
iface vmbr0 inet static
    address 192.168.5.2/24
    gateway 192.168.5.1
    bridge-ports enp8s0
    bridge-stp off
    bridge-fd 0

Before the interface was enp4s0 .

After many hours I decided to upgrade pve in-place and now the pci devices to passthrough appear slightly differently but I've been able to start it up. But the network interface in the VM won't get an ip address. Diagnosing is hard for me because the passthrough takes over the gpu & keyboard so no real console.
I appreciate this question might need to be asked in the networking section but I ask first as is a result of the upgrade/setup of VM.

What happens now is this:
Normal pve boot:
VM starts and I can interact with the desktop. No network. Interface is up. I can restart it, see its definition. No routes out of it nor into it. My dhcpv4 logs at the firewall don't have an entry. It seems to me it can't get out of the pv linux bridge. I can't interact with proxmox UI or ssh into it (no route to host, clearly not getting out of itself).

Rescue mode pve boot:
I can use the local consoie (as the VM hasn't taken over):
Interface and bridge will be down and after

Code:

#systemctl restart networking

Code:

root@pve:~# ip -br -c link show
lo               UNKNOWN        00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
enp8s0           UP             f0:2f:74:1a:29:ee <BROADCAST,MULTICAST,UP,LOWER_UP>
vmbr0            UP             f0:2f:74:1a:29:ee <BROADCAST,MULTICAST,UP,LOWER_UP>
root@pve:~# ip -br -c addr show
lo               UNKNOWN        127.0.0.1/8 ::1/128
enp8s0           UP             
vmbr0            UP             192.168.5.2/24 fe80::f22f:74ff:fe1a:29ee/64

And the firewall gives the lease, all good.

Here I am stuck. If I then resume booting to normal

Code:

#systemctl default

then I'm back without network on the VM and out of the pve shell.
What I look for is advice how can I diagnose the problem I have, in the pve shell.
Also, before I re-enabled the autostart, the networking appeared fin in the UI. It reflected the new device, it was getting an IP address where I was reaching it. It has a static lease on the router/firewall.

Thanks!

sb-jw · Dec 17, 2023

Please post the VM Config with qm config VMID

pengu1n · Dec 18, 2023

thanks. I will do. I can only get the system going by starting in rescue mode, otherwise the autostarting VM takes over the graphics and keboard. But then only I can start the networking stack but no filesystems are mounted.
Do you know a way I can mount the filesystems without starting the VM ? This is what i'm researching right now online. Pointers welcome

Edit: It's a vanilla node. No containers, no clusters, a single disk partitioned with LVM for /dev/mapper/root and /dev/mapper/data
I'm more of standard ext4 or zfs legacy filesystems guy. LVM I need to look online for ways of doing things.

sb-jw · Dec 18, 2023

pengu1n said:
Do you know a way I can mount the filesystems without starting the VM ?

https://gist.github.com/shamil/62935d9b456a6f9877b5

pengu1n · Dec 18, 2023

much obliged. If that's OK I'll continue tomorrow morning. 23:25 and I've been at it for 10 hrs. My back is really hurting and head is mush.
One more thing. This link is to mount the VM's disk. It'll be useful. Meantime do you know how to mount the proxmox filesystems so I can get the VM configs without actually starting them?

sb-jw · Dec 18, 2023

You can use any live system for this.

pengu1n · Dec 18, 2023

of course!. Right I have started a live system and mounted the pve lvm root. I can't find the VM definitions. I was sure when the system is running, they are in /usr/pve/qemu-server/ or thereabouts. I am unable to mount the data logical volume. Looking for ways. I imagine what I need is there.

pengu1n · Dec 18, 2023

I haven't found the reason why i can't mount the data LV. I get a Permission denied message, but I can mount the root LV.
Does anyone has another idea that does not require me to get the current VM config?
p.s. I wish proxmox didn't use only LVM when many installations don't benefit from it. A straight partitioning would work better. Multi disks servers, yes. Single disk installations that will never need to expand volumes, just complicates things like in this case.

pengu1n · Dec 18, 2023

Some progress. Started proxmox in rescue mode but this time on a previous kernel 5.15.131-2-pve. Followed by #systemctl start pveproxy
With this I can now see the contents of /etc/pve/ which were empty before. I assume starting pveproxy also mounts the backend of this location.
note to self: I need to see if this is possible too on the new kernel from the upgrade to proxmox 8.

root@pve:~# cat /etc/pve/qemu-server/100.conf 
#hostpci0%3A 0000%3A0a%3A00
#
#Dell Mouse%3A ID 413c%3A301a Dell Computer Corp. Dell MS116 Optical Mouse
agent: 1
bios: ovmf
boot: order=scsi0;net0;scsi4
cores: 16
cpu: host
efidisk0: VMs1:100/vm-100-disk-1.qcow2,efitype=4m,pre-enrolled-keys=1,size=528K
hostpci0: 0000:0b:00,pcie=1,x-vga=1
hostpci1: 0000:0d:00.4,rombar=0
machine: q35
memory: 22528
meta: creation-qemu=6.1.0,ctime=1650560984
name: ubuntu1
net0: virtio=22:7C:11:B8:C2:D7,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: VMs1:100/vm-100-disk-0.qcow2,size=150G
scsi1: /dev/disk/by-id/ata-Samsung_SSD_870_EVO_1TB_S626NF0R943526A,size=976762584K
scsi2: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WMC4N2823965,size=2930266584K
scsi3: /dev/disk/by-id/ata-Hitachi_HDP725050GLA360_GEA554RF2GX3SG,size=500106780160
scsi4: cdrom,media=cdrom
scsi5: /dev/disk/by-id/nvme-INTEL_SSDPEKNU010TZ_BTKA1343082D1P0B,size=1000204632K
scsihw: virtio-scsi-pci
smbios1: uuid=ca310e8e-bb1f-48cd-80f6-5f2dfaef5f92
sockets: 1
tablet: 0
tags: homepc
usb0: host=04ca:004f
usb1: host=062a:5918
usb2: host=0c45:6366
usb3: host=04b8:1181
usb4: host=1-1,usb3=1
usb5: host=413c:301a
vmgenid: ac5aa74d-d419-4855-b1f9-9c4c09f91263

Second VM:

root@pve:~# cat /etc/pve/qemu-server/102.conf 
agent: 1
boot: order=scsi0
cores: 2
cpu: host,flags=+aes
machine: q35
memory: 8192
meta: creation-qemu=6.2.0,ctime=1655678224
name: venus
net0: virtio=0A:43:21:84:52:51,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: VMs1:102/vm-102-disk-0.qcow2,discard=on,size=32G
scsi1: /dev/disk/by-id/ata-WDC_WD20EURX-63T0FY0_WD-WCC4M7SVHJ99,size=1953514584K
scsihw: virtio-scsi-pci
smbios1: uuid=ec46b603-47c2-4f90-aabe-bfba6a2bd122
sockets: 1
tablet: 0
tags: zoneminder
vcpus: 2
vmgenid: a72b3094-f25d-4e85-8cb8-a0a95f1d7d11

What do you think? I've disabled autostart for now of both VMs. I've not touched the firewall on VM or Proxmox after the upgrade but I am unclear if they are as they were prior to the upgrade. I do all my firewalling outside proxmox so enabled is I assume default, never a problem until now (if that's the problem).

pengu1n · Dec 18, 2023

I've now rebooted twice.
VM autostart off.
First time, allow normal boot on new kernel 6.5.11-7-pve. The problem is reproducible, I can't reach the UI, nor ssh. Network seems "dead" to it from the outside. No DHCPv4 lease requested on router. Please remember due to the options on /etc/default/grub I can't interact with the local shell. The are "GRUB_CMDLINE_LINUX_DEFAULT="quiet nomodeset video=vesafb

ff video=efifb

ff video=simplefb

ff". i needed these to be able to passthrough a gpu to VM 100.
Second time, used rescue mode, followed by #systemctl start pveproxy. With this, network comes alive and I can reach the UI.
I also have disabled firewall in pve (node) > firewall > options. Set to "No" , and Datacenter > firewall > options . Set to "No".

Edit to add networking of the node:

root@pve:~# cat /etc/network/interfaces
# snipped boilerplate comments

auto lo
iface lo inet loopback

auto enp8s0
iface enp8s0 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.5.2/24
        gateway 192.168.5.1
        bridge-ports enp8s0
        bridge-stp off
        bridge-fd 0

What can I do to diagnose my problem please?

Search

Search

Help to restore networking after adding nvme and upgrade pve from 7.4 to 8

pengu1n

Member

sb-jw

Famous Member

pengu1n

Member

sb-jw

Famous Member

pengu1n

Member

sb-jw

Famous Member

pengu1n

Member

pengu1n

Member

pengu1n

Member

pengu1n

Member