Hi. My proxmox hypervisor reports a huge spike in Disk Write up to 200 PB a second (which obviously isn't possible) and then my entire SAN CPU shoots up to 100% and proceeds to not be responsive until the entire system is rebooted. I need to know who to point fingers at. Is this some glitch in the proxmox system that causes my SAN to freak out or vice versa? If the SAN were just crashing by itself wouldnt it not show this huge spike in IO on the proxmox graphs? I have a screenshot attached.
Here is what I have from the /var/log/messages file:
Where should I be looking to make sure this does not happen again? I am happy to provide as much server info as I can and appreciate any help in solving this urgent matter.
EDIT: The SAN messages log shows nothing unusual. I believe there is some proxmox issue that is spiking the system out of control randomly (this has happened a few times over the course of a few months).
EDIT2: I was trying to run an automated backup yesterday for the first time. It was proceeding fine at first and then maybe towards one of the later VM's it crashed? Or it crashed when it finished? All the backups appear to be intact...
Here is what I have from the /var/log/messages file:
Code:
Aug 14 05:11:54 proxmox1 kernel: lost page write due to I/O error on dm-6 (this one is higher up in the log)
Aug 14 05:12:02 proxmox1 kernel: __ratelimit: 991 callbacks suppressed
Aug 14 05:12:02 proxmox1 kernel: lost page write due to I/O error on dm-3 (a lot more of these higher up in the log)
Aug 14 09:39:52 proxmox1 kernel: lost page write due to I/O error on dm-3 (some of these)
Aug 14 10:47:14 proxmox1 kernel: __ratelimit: 1040 callbacks suppressed ( A lot of these)
Aug 14 11:11:55 proxmox1 kernel: vmbr0: port 3(tap103i0) entering disabled state
Aug 14 11:11:55 proxmox1 kernel: vmbr0: port 3(tap103i0) entering disabled state
Aug 14 11:12:14 proxmox1 kernel: vmbr0: port 4(tap106i0) entering disabled state
Aug 14 11:12:14 proxmox1 kernel: vmbr0: port 4(tap106i0) entering disabled state
Aug 14 11:12:17 proxmox1 kernel: device tap103i0 entered promiscuous mode
Aug 14 11:12:17 proxmox1 kernel: HTB: quantum of class 10001 is big. Consider r2q change.
Aug 14 11:12:17 proxmox1 kernel: vmbr0: port 3(tap103i0) entering forwarding state
Aug 14 11:12:54 proxmox1 kernel: device tap106i0 entered promiscuous mode
Aug 14 11:12:54 proxmox1 kernel: vmbr0: port 4(tap106i0) entering forwarding state
Where should I be looking to make sure this does not happen again? I am happy to provide as much server info as I can and appreciate any help in solving this urgent matter.
EDIT: The SAN messages log shows nothing unusual. I believe there is some proxmox issue that is spiking the system out of control randomly (this has happened a few times over the course of a few months).
EDIT2: I was trying to run an automated backup yesterday for the first time. It was proceeding fine at first and then maybe towards one of the later VM's it crashed? Or it crashed when it finished? All the backups appear to be intact...
Last edited: