Nagios VM IO wait problem

alanspa

Member
Apr 4, 2022
21
1
8
Good morning,
I have a Nagios VM and unfortunately the database keeps getting corrupted.
The system works but some features crash every now and then.

I opened a ticket with Nagios support and they tell me that the problem is related to iowait times and to contact proxmox support.

What information do you need to understand where is the problem?

OS version
CentOS Stream release 9

Thank you
Regards
 
Have you already resolved the problem? If not, it would be helpful how the VM is setup with qm config <vmid> and how your storage is setup with cat /etc/pve/storage.cfg. Also check the output of cat /proc/pressure/io while the VM is running. It could help to identify the problem and see how it could be resolved.
 
Hello,
Problem persist. After a while DB crash.

Nagios support confirm IO time problem.

In attach output request.

Thanks for support
 

Attachments

  • Nagios.txt
    3.7 KB · Views: 4
Okay, so to recap, the Nagios XI VM is running on a Ceph pool and its name suggests that it runs on SSDs. How did you setup your Ceph and how many nodes does it have (e.g. post the output of cat /etc/pve/ceph.conf and ceph -s) and on what hardware is it running exactly? It could very well be that there is some latency introduced there.
 
Also there are two other things that I have noticed:
  1. Do you do any kinds of snapshots or backups on that VM? If so, what is the backup target? It could very well be, that the database gets corrupted because of the way that backups are done. You could try to enable fleecing images [0] to counter iowait issues that are caused by QEMU's copy-before-write filter on guest disks.
  2. I have noticed that you use "ide0" for the VM disk image. You could try to convert the disk to SCSI, as the QEMU driver there is much more supported and maintained and usually performs better in I/O performance benchmarks.
[0] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_vm_backup_fleecing
 
The hardware is adequate.
The machine is the smallest one running on the cluster.

1) Yes, I backup the VM daily. I've always done them.

2) Regarding the type of disk, perhaps you are right: recently the machine was made from scratch because we had to update the operating system.

We downloaded the Nagios site image for VmWare and adapted it to proxmox.

Perhaps your disk configuration is not optimal.

Remind me how to convert from IDE to SCSI please?

Maybe the previous VM had no problems because the disk was SCSI.
Or by configuring the disk as SCSI the VM would not start. I don't remember.

I can do a try.

Thank you
 
Remind me how to convert from IDE to SCSI please?
You can convert a disk from IDE to SCSI by shutting down the VM, detaching the disk at ide0 and reattaching the used disk at scsi0. You could also change the SCSI controller to "VirtIO SCSI single". If the VM doesn't find the boot device, enable and add scsi0 in the VM's bootorder option.

In my experience a Linux guest should adapt fine to these changes, but for safety, backup the VM beforehand.
 
I remembered well. vm starts in emergency mode if use scsi hard drive.

any suggestions?

thank you
 
I confirm, it was our discussion.In the attached screen.How to collect logs in emergency mode?
 

Attachments

  • Screenshot 2024-10-30 153428.png
    Screenshot 2024-10-30 153428.png
    18.7 KB · Views: 5

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!