When a guest uses GPU passthrough (or any PCI passthrough for that matter), it will allocate main memory for the IOMMU group that was assigned to it so that the virtual machine can directly communicate with the device as if it was physically connected. But since you use a P4-1Q vGPU profile it seems unusual that it will allocate 4GB of main memory for 1GB of graphics memory. How did you set up your PVE and your guests?