bug? proxmox cluster : 2 pvedaemon 100% cpu (2core) on master and web interface lag

spirit

Distinguished Member
Apr 2, 2010
6,796
986
243
www.groupe-cyllene.com
Hi,

i have a proxmox cluster with 3 hosts, and around 100 vm.

webinterface become laggy when try to contact host to retrieve vm informations (via soap).

i'm looking at the master host, and i see 2 pvedaemon with 100% cpu

Code:
PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND   
33347 root      20   0 92400  27m 2068 R  100  0.0   2:30.83 pvedaemon                                                                                                                                                                              
33557 root      20   0 91496  26m 2068 R  100  0.0   1:07.65 pvedaemon

Is it a bug ? or it's because I have too many vms for proxmox cluster current implementation? .
 
Re: bug? proxmox cluster : 2 pvedaemon 100% cpu (2core) on master and web interface l

seem to appear only when i'm using web interface, mainly when i browse some vm configs.

i had recheck now, seem to be normal, i had check the log:


/var/log/daemon.log:Mar 25 09:35:39 kvm1 pvedaemon[7097]: worker 33347 started
/var/log/daemon.log:Mar 25 09:48:59 kvm1 pvedaemon[33347]: WARNING: Cannot encode 'meminfo' element as 'hash'. Will be encoded as 'map' instead
/var/log/daemon.log:Mar 25 09:50:29 kvm1 pvedaemon[33347]: WARNING: Cannot encode 'meminfo' element as 'hash'. Will be encoded as 'map' instead
/var/log/daemon.log:Mar 25 09:50:43 kvm1 pvedaemon[7097]: worker 33347 finished


so, it had runned for 15 minutes.

can i had pvedaemon detailled logs somewhere ?

i'm using iscsi lun for disk, maybe it's try to scan something when i'm listing hardware config of a vm?
 
Re: bug? proxmox cluster : 2 pvedaemon 100% cpu (2core) on master and web interface l

/var/log/daemon.log:Mar 25 09:48:59 kvm1 pvedaemon[33347]: WARNING: Cannot encode 'meminfo' element as 'hash'. Will be encoded as 'map' instead

You can simply ignore that warning.

can i had pvedaemon detailled logs somewhere ?

all logs are in syslog

I'm using iscsi lun for disk, maybe it's try to scan something when i'm listing hardware config of a vm?

Yes (check for disk usage).
 
Re: bug? proxmox cluster : 2 pvedaemon 100% cpu (2core) on master and web interface l

ok, i'll try to have more informations, and try to reproduce this problem.

for second process, it had runned for 15minutes too.

Mar 25 09:37:11 kvm1 pvedaemon[7097]: worker 33557 started
Mar 25 09:48:58 kvm1 pvedaemon[33557]: WARNING: Cannot encode 'meminfo' element as 'hash'. Will be encoded as 'map' instead
Mar 25 09:50:29 kvm1 pvedaemon[33557]: WARNING: Cannot encode 'meminfo' element as 'hash'. Will be encoded as 'map' instead
Mar 25 09:52:54 kvm1 pvedaemon[7097]: worker 33557 finished

could you tell me where in the code it produce the "Cannot encode 'meminfo' element as 'hash' "?
it's waiting 11 minutes before these warning,and after these warning, the worker finish.
So I suppose i do something else before those warnings, and i take many cpu ...
 
Re: bug? proxmox cluster : 2 pvedaemon 100% cpu (2core) on master and web interface l

could you tell me where in the code it produce the "Cannot encode 'meminfo' element as 'hash' "?

no, because that is somewhere deep inside the SOAP framework.
 
Re: bug? proxmox cluster : 2 pvedaemon 100% cpu (2core) on master and web interface l

Hi diertmar, some news concerning that bug.

I had tried on a small server (dual core 1.6ghz), with 10 vm shutdown and 1 online, with iscsi lun attached without cluster.


the pvedaemon goes at 100%, when i'm on page
https://server1.xxx.com/vmlist/index.htm
when this ajax query is running:
https://server1.xxx.com/ws/vzlist?cid=0 , to refresh the vmlist status info.

i had identied where it's waiting,
in Cluster.pm

Code:
sub vzlist_update {
    my ($cid, $ticket) = @_;
    
    my $cinfo = clusterinfo ();

    my $vzlist;

    my $cvzl;

    my $conn = PVE::ConfigClient::connect ($ticket);

    my $ni;
    if (($ni = $cinfo->{"CID_$cid"})) {
    my $rcon = PVE::ConfigClient::connect ($ticket, $cinfo, $cid);
  [B]  $vzlist = $rcon->vzlist()->result;[/B]      [B]this line is the problem[/B]

    }

    if ($vzlist) {
    $cvzl = $conn->cluster_vzlist($cid, $vzlist)->result;
    }

    return $cvzl;
}


if i close my browser, the pvedaemon seem to continue. (no timeout ?).

if i launch, 2 session on the webadmin, i can sature the dual core. (2 ajax query, 100% cpu on each core.)


Do you have an idea ?
 
Re: bug? proxmox cluster : 2 pvedaemon 100% cpu (2core) on master and web interface l

going forward:


Qemu.pm
--------------
Code:
sub vmlist {

    my $res = {};

    my $list =  PVE::QemuServer::vzlist();
    my ($uptime) = PVE::Utils::read_proc_uptime();

    foreach my $veid (keys %$list) {
    my $fn = PVE::QemuServer::config_file ($veid);  
[B]   my $conf = PVE::Config::read_file ($fn);  ---> block here[/B]
so maybe it's a lock problem or something like that ?
 
Last edited:
Re: bug? proxmox cluster : 2 pvedaemon 100% cpu (2core) on master and web interface l

going forward:

Config.pm
---------
Code:
sub read_file {
    my ($filename, $full) = @_;

    my $parser;

    if ($filename =~ m|^/etc/qemu-server/\d+\.conf$|) {
[B]    $parser = \&read_qmconfig;  -> block here[/B]


Qemuserver.pm
-------------
Code:
sub parse_config {
.....
[B]
    my $di = load_diskinfo ($storecfg, $vmid, $res);

    foreach my $ds (keys %$di) {
    my $size = $di->{$ds}->{disksize};
    if ($res->{bootdisk} && ($ds eq $res->{bootdisk}) && $size) {
        $disksize = $size;
        $disktype = $di->{$ds}->{interface};
    }
    }

    $res->{disksize} = $disksize;
    $res->{disktype} = $disktype;
   $res->{diskinfo} = $di;[/B]
...


So it's seem to be related to iscsi luns parsing...
 
Re: bug? proxmox cluster : 2 pvedaemon 100% cpu (2core) on master and web interface l

Code:
sub load_diskinfo {
.....
 eval {
[B]    my $dl = PVE::Storage::vdisk_list ($storecfg, undef, $vmid, $vollist);           ---->block here[/B]

...

STorage.pm
-----------
Code:
sub vdisk_list {
...
[B]    my $iscsi_devices = iscsi_device_list() if $stypes->{iscsi};        -----> block here[/B]
...

Code:
sub iscsi_device_list {
...
  my $blockdev = find_dev_by_id ("/dev/$bdev");     ---->block here
...

Code:
sub find_dev_by_id {
    my $bdev = shift;

   [B] my ($full_path, $name) = find_stable_path ("/dev/disk/by-id", $bdev);  ---> block here[/B]

    return $name;
}

Code:
sub find_stable_path {
    my ($stabledir, $bdev) = @_;

    my $dh = IO::Dir->new ($stabledir);
    if ($dh) {
 [B]   while (defined(my $tmp = $dh->read)) {
        my $path = "$stabledir/$tmp";
 
        if (link_points_to ($path, $bdev)) {
        return wantarray ? ($path, $tmp) : $path;
        }
    }
[/B]
    $dh->close;
    }

    return wantarray ? () : undef;

}


So it's blocking here, in find_stable_path, in the while loop.

maybe it's a concurency problem ?
 
Re: bug? proxmox cluster : 2 pvedaemon 100% cpu (2core) on master and web interface l

seem to loop indefinitly,

i have added a syslog function,for each iteration, and after 3min (and never ending loop),

i have 1635179 iterations ...

in the /dev/disk/by-id , i have 1000 symlinks to iscsi disk and disk parts.
 
Re: bug? proxmox cluster : 2 pvedaemon 100% cpu (2core) on master and web interface l

more syslog, seem to be

iscsi_device_list which is call for each disk of each vm of the list.

iscsi_device_list take around 20 seconds, parsing all luns each time (i have 100 luns + multipath).


so maybe a cache of luns and paths somewhere could be useful ...
 
Re: bug? proxmox cluster : 2 pvedaemon 100% cpu (2core) on master and web interface l

I dietmar,

i had wrote a patch this night ;) (can't sleep ^_^).

Now, it going from 3 minutes to 1 seconde, with 200 luns .

basicly, it's parsing only once the /dev/disk/by-id , at the beginning of iscsi_device_list() sub
+ optimisations. (stat() function in link_points_to() sub was really slow)


I send you by email today for review.

Regards,

SPiRiT
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!