Well, I was trying to get some outdated network cards (Connectx-2) working. Through the process, I had tried to compile and install Mellanox OFED, which failed.
I am fairly confident what happened was the installer removed packages which were otherwise required, before it failed. Specifically packages relating to networking
Now I am unable to connect to any of my LVs, ISCSI, or otherwise. It's the iscsi which is most important to me, containing the majority of my VMs
I am hopeful I can recover without needing to re-install proxmox. (While this is just a learning machine, not production, I do have enough time invested that I would prefer not to start over.) However, if there is a way to otherwise preserve the vms in my ISCSI target, than a fresh install isn't a major issue, but would still like to learn what went wrong, how to prevent, and how to repair.
Help!
Here is a snippet from my syslog which appears pertinent:
I am fairly confident what happened was the installer removed packages which were otherwise required, before it failed. Specifically packages relating to networking
Now I am unable to connect to any of my LVs, ISCSI, or otherwise. It's the iscsi which is most important to me, containing the majority of my VMs
I am hopeful I can recover without needing to re-install proxmox. (While this is just a learning machine, not production, I do have enough time invested that I would prefer not to start over.) However, if there is a way to otherwise preserve the vms in my ISCSI target, than a fresh install isn't a major issue, but would still like to learn what went wrong, how to prevent, and how to repair.
Help!
Here is a snippet from my syslog which appears pertinent:
Nov 12 07:12:00 server systemd[1]: Starting Proxmox VE replication runner...
Nov 12 07:12:01 server systemd[1]: pvesr.service: Succeeded.
Nov 12 07:12:01 server systemd[1]: Started Proxmox VE replication runner.
Nov 12 07:13:00 server systemd[1]: Starting Proxmox VE replication runner...
Nov 12 07:13:01 server systemd[1]: pvesr.service: Succeeded.
Nov 12 07:13:01 server systemd[1]: Started Proxmox VE replication runner.
Nov 12 07:13:10 server kernel: [ 858.785573] INFO: task lvdisplay:3800 blocked for more than 120 seconds.
Nov 12 07:13:10 server kernel: [ 858.785713] Tainted: P O 5.0.15-1-pve #1
Nov 12 07:13:10 server kernel: [ 858.785828] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 12 07:13:10 server kernel: [ 858.785976] lvdisplay D 0 3800 3783 0x80000004
Nov 12 07:13:10 server kernel: [ 858.785992] Call Trace:
Nov 12 07:13:10 server kernel: [ 858.786019] __schedule+0x2d4/0x870
Nov 12 07:13:10 server kernel: [ 858.786026] schedule+0x2c/0x70
Nov 12 07:13:10 server kernel: [ 858.786032] schedule_timeout+0x258/0x360
Nov 12 07:13:10 server kernel: [ 858.786043] ? call_rcu+0x10/0x20
Nov 12 07:13:10 server kernel: [ 858.786054] ? __percpu_ref_switch_mode+0xdb/0x180
Nov 12 07:13:10 server kernel: [ 858.786059] wait_for_completion+0xb7/0x140
Nov 12 07:13:10 server kernel: [ 858.786068] ? wake_up_q+0x80/0x80
Nov 12 07:13:10 server kernel: [ 858.786077] exit_aio+0xeb/0x100
Nov 12 07:13:10 server kernel: [ 858.786086] mmput+0x2b/0x130
Nov 12 07:13:10 server kernel: [ 858.786091] do_exit+0x28a/0xb30
Nov 12 07:13:10 server kernel: [ 858.786095] ? __schedule+0x2dc/0x870
Nov 12 07:13:10 server kernel: [ 858.786098] do_group_exit+0x43/0xb0
Nov 12 07:13:10 server kernel: [ 858.786106] get_signal+0x12e/0x6d0
Nov 12 07:13:10 server kernel: [ 858.786114] ? wait_woken+0x80/0x80
Nov 12 07:13:10 server kernel: [ 858.786124] do_signal+0x34/0x710
Nov 12 07:13:10 server kernel: [ 858.786129] ? do_io_getevents+0x81/0xd0
Nov 12 07:13:10 server kernel: [ 858.786140] exit_to_usermode_loop+0x8e/0x100
Nov 12 07:13:10 server kernel: [ 858.786144] do_syscall_64+0xf0/0x110
Nov 12 07:13:10 server kernel: [ 858.786149] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Nov 12 07:13:10 server kernel: [ 858.786156] RIP: 0033:0x7fb20b2a5f59
Nov 12 07:13:10 server kernel: [ 858.786168] Code: Bad RIP value.
Nov 12 07:13:10 server kernel: [ 858.786170] RSP: 002b:00007ffd05fee078 EFLAGS: 00000246 ORIG_RAX: 00000000000000d0
Nov 12 07:13:10 server kernel: [ 858.786173] RAX: fffffffffffffffc RBX: 00007fb20af46700 RCX: 00007fb20b2a5f59
Nov 12 07:13:10 server kernel: [ 858.786175] RDX: 0000000000000040 RSI: 0000000000000001 RDI: 00007fb20b900000
Nov 12 07:13:10 server kernel: [ 858.786177] RBP: 00007fb20b900000 R08: 0000000000000000 R09: 0000000205fee100
Nov 12 07:13:10 server kernel: [ 858.786179] R10: 00007ffd05fee100 R11: 0000000000000246 R12: 0000000000000001
Nov 12 07:13:10 server kernel: [ 858.786181] R13: 0000000000000000 R14: 0000000000000040 R15: 00007ffd05fee100
Nov 12 07:13:51 server systemd[1]: Starting Cleanup of Temporary Directories...
Nov 12 07:13:51 server systemd[1]: systemd-tmpfiles-clean.service: Succeeded.
Nov 12 07:13:51 server systemd[1]: Started Cleanup of Temporary Directories.
Nov 12 07:14:00 server systemd[1]: Starting Proxmox VE replication runner...
Nov 12 07:14:01 server systemd[1]: pvesr.service: Succeeded.
Nov 12 07:14:01 server systemd[1]: Started Proxmox VE replication runner.
Nov 12 07:14:35 server postfix/qmgr[2555]: 8FEB13A0648: from=<>, size=2874, nrcpt=1 (queue active)
Nov 12 07:14:35 server postfix/qmgr[2555]: D47043A1014: from=<root@server.joshuawest.ca>, size=732, nrcpt=1 (queue active)
Nov 12 07:14:35 server postfix/local[4816]: error: open database /etc/aliases.db: No such file or directory
Nov 12 07:14:35 server postfix/local[4816]: warning: hash:/etc/aliases is unavailable. open database /etc/aliases.db: No such file or directory
Nov 12 07:14:35 server postfix/local[4816]: error: open database /etc/aliases.db: No such file or directory
Nov 12 07:14:35 server postfix/local[4816]: warning: hash:/etc/aliases is unavailable. open database /etc/aliases.db: No such file or directory
Nov 12 07:14:35 server postfix/local[4816]: warning: hash:/etc/aliases: lookup of 'root' failed
Nov 12 07:14:35 server postfix/local[4819]: error: open database /etc/aliases.db: No such file or directory
Nov 12 07:14:35 server postfix/local[4819]: warning: hash:/etc/aliases is unavailable. open database /etc/aliases.db: No such file or directory
Nov 12 07:14:35 server postfix/local[4819]: warning: hash:/etc/aliases: lookup of 'root' failed
Nov 12 07:14:35 server postfix/local[4816]: 8FEB13A0648: to=<root@server.joshuawest.ca>, relay=local, delay=91906, delays=91906/0.08/0/0.04, dsn=4.3.0, status=deferred (alias database unavailable)
Nov 12 07:14:35 server postfix/local[4816]: using backwards-compatible default setting relay_domains=$mydestination to update fast-flush logfile for domain "server.joshuawest.ca"
Nov 12 07:14:35 server postfix/local[4819]: D47043A1014: to=<root@server.joshuawest.ca>, orig_to=<root>, relay=local, delay=262174, delays=262174/0.06/0/0.05, dsn=4.3.0, status=deferred (alias database unavailable)
Nov 12 07:14:35 server postfix/local[4819]: using backwards-compatible default setting relay_domains=$mydestination to update fast-flush logfile for domain "server.joshuawest.ca"
Nov 12 07:15:00 server systemd[1]: Starting Proxmox VE replication runner...
Nov 12 07:15:01 server systemd[1]: pvesr.service: Succeeded.
Nov 12 07:15:01 server systemd[1]: Started Proxmox VE replication runner.
Nov 12 07:15:01 server CRON[4859]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Nov 12 07:15:10 server kernel: [ 979.615650] INFO: task lvdisplay:3800 blocked for more than 120 seconds.
Nov 12 07:15:10 server kernel: [ 979.615789] Tainted: P O 5.0.15-1-pve #1
Nov 12 07:15:10 server kernel: [ 979.615908] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 12 07:15:10 server kernel: [ 979.616054] lvdisplay D 0 3800 3783 0x80000004
Nov 12 07:15:10 server kernel: [ 979.616069] Call Trace:
Nov 12 07:15:10 server kernel: [ 979.616098] __schedule+0x2d4/0x870
Nov 12 07:15:10 server kernel: [ 979.616104] schedule+0x2c/0x70
Nov 12 07:15:10 server kernel: [ 979.616111] schedule_timeout+0x258/0x360
Nov 12 07:15:10 server kernel: [ 979.616132] ? call_rcu+0x10/0x20
Nov 12 07:15:10 server kernel: [ 979.616142] ? __percpu_ref_switch_mode+0xdb/0x180
Nov 12 07:15:10 server kernel: [ 979.616147] wait_for_completion+0xb7/0x140
Nov 12 07:15:10 server kernel: [ 979.616157] ? wake_up_q+0x80/0x80
Nov 12 07:15:10 server kernel: [ 979.616170] exit_aio+0xeb/0x100
Nov 12 07:15:10 server kernel: [ 979.616180] mmput+0x2b/0x130
Nov 12 07:15:10 server kernel: [ 979.616184] do_exit+0x28a/0xb30
Nov 12 07:15:10 server kernel: [ 979.616189] ? __schedule+0x2dc/0x870
Nov 12 07:15:10 server kernel: [ 979.616192] do_group_exit+0x43/0xb0
Nov 12 07:15:10 server kernel: [ 979.616200] get_signal+0x12e/0x6d0
Nov 12 07:15:10 server kernel: [ 979.616210] ? wait_woken+0x80/0x80
Nov 12 07:15:10 server kernel: [ 979.616220] do_signal+0x34/0x710
Nov 12 07:15:10 server kernel: [ 979.616225] ? do_io_getevents+0x81/0xd0
Nov 12 07:15:10 server kernel: [ 979.616236] exit_to_usermode_loop+0x8e/0x100
Nov 12 07:15:10 server kernel: [ 979.616239] do_syscall_64+0xf0/0x110
Nov 12 07:15:10 server kernel: [ 979.616244] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Nov 12 07:15:10 server kernel: [ 979.616251] RIP: 0033:0x7fb20b2a5f59
Nov 12 07:15:10 server kernel: [ 979.616263] Code: Bad RIP value.
Nov 12 07:15:10 server kernel: [ 979.616265] RSP: 002b:00007ffd05fee078 EFLAGS: 00000246 ORIG_RAX: 00000000000000d0
Nov 12 07:15:10 server kernel: [ 979.616269] RAX: fffffffffffffffc RBX: 00007fb20af46700 RCX: 00007fb20b2a5f59
Nov 12 07:15:10 server kernel: [ 979.616271] RDX: 0000000000000040 RSI: 0000000000000001 RDI: 00007fb20b900000
Nov 12 07:15:10 server kernel: [ 979.616273] RBP: 00007fb20b900000 R08: 0000000000000000 R09: 0000000205fee100
Nov 12 07:15:10 server kernel: [ 979.616275] R10: 00007ffd05fee100 R11: 0000000000000246 R12: 0000000000000001
Nov 12 07:15:10 server kernel: [ 979.616276] R13: 0000000000000000 R14: 0000000000000040 R15: 00007ffd05fee100