Hello everybody,
I'm new to PROXMOX and to Linux/iSCSI/multipath in general, I'm trying to learn, therefore I'm asking for your suggestions in order to tune my system, or maybe change approach if you have better ideas. It will be a long post...
I'm trying to set up PROXMOX on a server machine in order to run Windows 2003 Server guests that, each one, run a video survellance software which takes live H.264/MPEG4 unicast streams which are redirected to clients in multicast, and at the same time takes MJPEG streams and writes them to disk for recording. The "disk" is a SAN.
The server is a Dell PowerEdge R720 with 2 x esa-core CPUs, 32 Gbit RAM, and one quad GBit Ethernet NIC (4 x 1 GBit ports).
The SAN is an Enhance ES3160P4 with 4 x 1 Gbit ports x 2 controllers, and is accessed through iSCSI.
On the SAN for each controller a 4 x 1 GBit LACP link has been set up, which is attached to a Juniper Virtual Chassis, with LACP enabled.
On the server I set up 2 separate networks: the "stream network" is used by two physical interfaces, which are configured with a LACP bond (and the same is done on the switches), while the "SAN network" is used by the remaining 2 interfaces. Each one of the latter is set up separately with a different IP address on the "SAN network".
For the "stream network" I set up a virtual bridge on the bond, that is then used for connecting the W2k3 VMs on the "stream network" itself.
On the "SAN network" I set up iSCSI and multipath and got that every LUN of the SAN is seen through 4 paths (2 interfaces x 2 controllers) with this default multipath configuration:
defaults {
polling_interval 2
path_selector "queue-length 0"
path_grouping_policy multibus
getuid_callout "/lib/udev/scsi_id -g -u -d /dev/%n"
rr_min_io 1000
failback immediate
no_path_retry fail
}
I modified the timeouts in the configuration files of the iSCSI targets as suggested in the PROXMOX documentation.
Using multipath aliases, I get the devices in /dev/mapper/, so using pvcreate, vgcreate and lvcreate I set up one pv/vg/lv per LUN and mounted each one in a folder in /mnt/. In each LUN I created an EXT3 filesystem.
Then, in each LUN/mnt folder (that is seen as "local" in the PROXMOX web interface) I created a virtual disk that is then attached to a single W2k3 machine as a VIRTIO disk that is used only for the recording part of the software.
That is, each VM sees two virtual disks, one for system and one as "recorder", and the recorder is created in a mount that corresponds to a LUN of the SAN. The VMs also see one VIRTIO network card, which is connected to the "stream network" through the aforementioned virtual bridge.
If I run only one VM (which handles 10 digital videocameras), everything goes fine. One single VM needs to write approximately 8 Mbyte/sec to the SAN.
If I start to run more VMs, the IO delay of the server starts to increase (reaching 5-7-10% with 8 VMs) and the recording part of the software run by the VMs becomes very slow and not usable, while the live part runs ok.
Looking at the statistics on the physical links on the switch, I noticed that both links on the "SAN network" show traffic and that it is balanced between the two links, as expected. However, the sum of the traffic going out from the server NIC doesn't go higher than about 300 Mbit/s (about 150 Mbit/s for each link). Starting in sequence 8 VMs shows that the total traffic increases and then saturates on this threshold when 4-5 VMs are running. It seems to me then that there is a sort of bottleneck in the link between the server and the SAN, maybe due to a not optimal configuration... and that's because I'm here writing for asking your suggestions.
I also had a look at the syslogs of the server, and I found many and repeating errors regarding multipath that I can't interpret, like (as examples):
Jul 8 16:31:56 vmcluster-tvcc01 multipathd: vol-storage-236: sdq - directio checker is waiting on aio
Jul 8 16:31:56 vmcluster-tvcc01 multipathd: vol-storage-231: sdu - directio checker is waiting on aio
Jul 8 16:31:56 vmcluster-tvcc01 multipathd: vol-vm-base: sdy - directio checker is waiting on aio
Jul 8 16:31:58 vmcluster-tvcc01 multipathd: vol-storage-237: sde - directio checker is waiting on aio
.............
Jul 8 16:35:37 vmcluster-tvcc01 kernel: session20: session recovery timed out after 15 secs
Jul 8 16:35:37 vmcluster-tvcc01 multipathd: vol-storage-235: sdk - directio checker is waiting on aio
Jul 8 16:35:37 vmcluster-tvcc01 multipathd: checker failed path 8:160 in map vol-storage-235
Jul 8 16:35:37 vmcluster-tvcc01 multipathd: vol-storage-235: remaining active paths: 3
Jul 8 16:35:37 vmcluster-tvcc01 kernel: device-mapper: multipath: Failing path 8:160.
.............
Jul 8 16:35:39 vmcluster-tvcc01 kernel: sd 28:0:0:232: rejecting I/O to offline device
Jul 8 16:35:39 vmcluster-tvcc01 kernel: sd 28:0:0:232: [sdw] killing request
Jul 8 16:35:39 vmcluster-tvcc01 kernel: sd 28:0:0:232: rejecting I/O to offline device
Jul 8 16:35:39 vmcluster-tvcc01 kernel: sd 28:0:0:232: [sdw] killing request
Jul 8 16:35:39 vmcluster-tvcc01 kernel: sd 28:0:0:232: rejecting I/O to offline device
.............
Jul 8 16:35:39 vmcluster-tvcc01 kernel: end_request: I/O error, dev sdm, sector 4470628224
Jul 8 16:35:39 vmcluster-tvcc01 kernel: end_request: I/O error, dev sdm, sector 4471730304
Jul 8 16:35:39 vmcluster-tvcc01 kernel: end_request: I/O error, dev sdm, sector 4470500480
Jul 8 16:35:39 vmcluster-tvcc01 kernel: end_request: I/O error, dev sdm, sector 4472330752
I hope I've been clear enough...
Any help is welcome!
Many thanks in advance!
I'm new to PROXMOX and to Linux/iSCSI/multipath in general, I'm trying to learn, therefore I'm asking for your suggestions in order to tune my system, or maybe change approach if you have better ideas. It will be a long post...

I'm trying to set up PROXMOX on a server machine in order to run Windows 2003 Server guests that, each one, run a video survellance software which takes live H.264/MPEG4 unicast streams which are redirected to clients in multicast, and at the same time takes MJPEG streams and writes them to disk for recording. The "disk" is a SAN.
The server is a Dell PowerEdge R720 with 2 x esa-core CPUs, 32 Gbit RAM, and one quad GBit Ethernet NIC (4 x 1 GBit ports).
The SAN is an Enhance ES3160P4 with 4 x 1 Gbit ports x 2 controllers, and is accessed through iSCSI.
On the SAN for each controller a 4 x 1 GBit LACP link has been set up, which is attached to a Juniper Virtual Chassis, with LACP enabled.
On the server I set up 2 separate networks: the "stream network" is used by two physical interfaces, which are configured with a LACP bond (and the same is done on the switches), while the "SAN network" is used by the remaining 2 interfaces. Each one of the latter is set up separately with a different IP address on the "SAN network".
For the "stream network" I set up a virtual bridge on the bond, that is then used for connecting the W2k3 VMs on the "stream network" itself.
On the "SAN network" I set up iSCSI and multipath and got that every LUN of the SAN is seen through 4 paths (2 interfaces x 2 controllers) with this default multipath configuration:
defaults {
polling_interval 2
path_selector "queue-length 0"
path_grouping_policy multibus
getuid_callout "/lib/udev/scsi_id -g -u -d /dev/%n"
rr_min_io 1000
failback immediate
no_path_retry fail
}
I modified the timeouts in the configuration files of the iSCSI targets as suggested in the PROXMOX documentation.
Using multipath aliases, I get the devices in /dev/mapper/, so using pvcreate, vgcreate and lvcreate I set up one pv/vg/lv per LUN and mounted each one in a folder in /mnt/. In each LUN I created an EXT3 filesystem.
Then, in each LUN/mnt folder (that is seen as "local" in the PROXMOX web interface) I created a virtual disk that is then attached to a single W2k3 machine as a VIRTIO disk that is used only for the recording part of the software.
That is, each VM sees two virtual disks, one for system and one as "recorder", and the recorder is created in a mount that corresponds to a LUN of the SAN. The VMs also see one VIRTIO network card, which is connected to the "stream network" through the aforementioned virtual bridge.
If I run only one VM (which handles 10 digital videocameras), everything goes fine. One single VM needs to write approximately 8 Mbyte/sec to the SAN.
If I start to run more VMs, the IO delay of the server starts to increase (reaching 5-7-10% with 8 VMs) and the recording part of the software run by the VMs becomes very slow and not usable, while the live part runs ok.
Looking at the statistics on the physical links on the switch, I noticed that both links on the "SAN network" show traffic and that it is balanced between the two links, as expected. However, the sum of the traffic going out from the server NIC doesn't go higher than about 300 Mbit/s (about 150 Mbit/s for each link). Starting in sequence 8 VMs shows that the total traffic increases and then saturates on this threshold when 4-5 VMs are running. It seems to me then that there is a sort of bottleneck in the link between the server and the SAN, maybe due to a not optimal configuration... and that's because I'm here writing for asking your suggestions.
I also had a look at the syslogs of the server, and I found many and repeating errors regarding multipath that I can't interpret, like (as examples):
Jul 8 16:31:56 vmcluster-tvcc01 multipathd: vol-storage-236: sdq - directio checker is waiting on aio
Jul 8 16:31:56 vmcluster-tvcc01 multipathd: vol-storage-231: sdu - directio checker is waiting on aio
Jul 8 16:31:56 vmcluster-tvcc01 multipathd: vol-vm-base: sdy - directio checker is waiting on aio
Jul 8 16:31:58 vmcluster-tvcc01 multipathd: vol-storage-237: sde - directio checker is waiting on aio
.............
Jul 8 16:35:37 vmcluster-tvcc01 kernel: session20: session recovery timed out after 15 secs
Jul 8 16:35:37 vmcluster-tvcc01 multipathd: vol-storage-235: sdk - directio checker is waiting on aio
Jul 8 16:35:37 vmcluster-tvcc01 multipathd: checker failed path 8:160 in map vol-storage-235
Jul 8 16:35:37 vmcluster-tvcc01 multipathd: vol-storage-235: remaining active paths: 3
Jul 8 16:35:37 vmcluster-tvcc01 kernel: device-mapper: multipath: Failing path 8:160.
.............
Jul 8 16:35:39 vmcluster-tvcc01 kernel: sd 28:0:0:232: rejecting I/O to offline device
Jul 8 16:35:39 vmcluster-tvcc01 kernel: sd 28:0:0:232: [sdw] killing request
Jul 8 16:35:39 vmcluster-tvcc01 kernel: sd 28:0:0:232: rejecting I/O to offline device
Jul 8 16:35:39 vmcluster-tvcc01 kernel: sd 28:0:0:232: [sdw] killing request
Jul 8 16:35:39 vmcluster-tvcc01 kernel: sd 28:0:0:232: rejecting I/O to offline device
.............
Jul 8 16:35:39 vmcluster-tvcc01 kernel: end_request: I/O error, dev sdm, sector 4470628224
Jul 8 16:35:39 vmcluster-tvcc01 kernel: end_request: I/O error, dev sdm, sector 4471730304
Jul 8 16:35:39 vmcluster-tvcc01 kernel: end_request: I/O error, dev sdm, sector 4470500480
Jul 8 16:35:39 vmcluster-tvcc01 kernel: end_request: I/O error, dev sdm, sector 4472330752
I hope I've been clear enough...
Any help is welcome!
Many thanks in advance!