Recent content by tytanick

  1. T

    walk_pgd_range crash pve9.1 on 6.18+

    Only 1G hugepages. But recently i do not see those anymore. Upgraded kernel and changes few settings async io: threads (previous on uoring i had this issues)
  2. T

    walk_pgd_range crash pve9.1 on 6.18+

    This is happening also on normal 2K pages. Does not matter which kernel.
  3. T

    walk_pgd_range crash pve9.1 on 6.18+

    Just had another freeze in 6.17.4-2-pve kernel. Exactly same issue.
  4. T

    walk_pgd_range crash pve9.1 on 6.18+

    Well i have set every server to 6.17 pve2 . I had issues with passthrough + blackwell before. but will see now as some time passed. Some rebooted already and i have them now on 6.17 so within 2-3 days i should see if they will again have this bug and reboot or not. Also i seen some other issue...
  5. T

    walk_pgd_range crash pve9.1 on 6.18+

    Hehe so google does not see everything I am using this: https://prebuiltkernels.com/ When made small script to install kernels from there: kernel="6.18.7-pbk"; deb="${kernel%-*}"; deb="${deb/-rc/~rc}-1"; rm -f ./linux-*.deb; wget...
  6. T

    walk_pgd_range crash pve9.1 on 6.18+

    Recently i have reported slab memory leak and it was fixed. I am having yet another issue and wondering where to write with it. Would you be able to tell me if this is the right place or should i send it to someone else ? The issue seems also like memory leak. It happens on multiple servers...
  7. T

    Passthrough RTX 6000/5090 CPU Soft BUG lockup, D3cold to D0, after guest shutdown

    Can you share your test suite as this might be interesting. We are testing using some mining progress but i am wondering if we could use something that makes many different things. Also f you have 2 cpus then numa should be 1 in host. And if you allocating more than 300GB ram, consider using...
  8. T

    Passthrough RTX 6000/5090 CPU Soft BUG lockup, D3cold to D0, after guest shutdown

    6.17 also should be fine but i see improvement in newest kernels 6.17+ even in terms of booting time
  9. T

    Passthrough RTX 6000/5090 CPU Soft BUG lockup, D3cold to D0, after guest shutdown

    Can you try 6.18 kernel? For me those are working the best on 5090 and 6000 https://prebuiltkernels.com/
  10. T

    Passthrough RTX 6000/5090 CPU Soft BUG lockup, D3cold to D0, after guest shutdown

    I have returned mine to nvidia and replaced them for 600w ones. Anyway i did not solved the max-q ones. Make sure you will also do this: sudo echo 'SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{d3cold_allowed}="0"' | sudo tee /etc/udev/rules.d/99-nvidia-d3cold.rules sudo udevadm control...
  11. T

    Passthrough RTX 6000/5090 CPU Soft BUG lockup, D3cold to D0, after guest shutdown

    Rtx pro 6000 600w version and rtx 5090 seems to be fixed when we add udev rule. But i just got also rtx pro 6000 q-max 300w version which has even stranger issue. The crash happens instantly on random gpu when i turn on VM. Every time different gpu. And without passthrough it works fine. Every...
  12. T

    Passthrough RTX 6000/5090 CPU Soft BUG lockup, D3cold to D0, after guest shutdown

    My RTX6000 max-q are disappearing on Host when i just power on VM like 2-3 times ... And radom GPU is gone. This is madness. Nvidia is still "passed to developers" and no other info. Sitting on 48 GPUs that are broken ..... And yes i confirmed that ATTR was applied: for d in...
  13. T

    Passthrough RTX 6000/5090 CPU Soft BUG lockup, D3cold to D0, after guest shutdown

    I have implemented this also and i also see issue. But here is the thing. My RTX PRO 6000 blackwell 96GB 600w i think is working fine now But RTX PRO 6000 blackwell 96GB 300w max-q is having this issue regardless. Also did upgrade to 6.17 kernel and flashed that uefi firmware fix but still no...
  14. T

    Passthrough RTX 6000/5090 CPU Soft BUG lockup, D3cold to D0, after guest shutdown

    Actually mortise, i think this solves the issue. I am not sure as it does not happen often but i think you hit the spot. Where did you heard about this ? Wondering why i could not find that solution anywhere ?
  15. T

    Passthrough RTX 6000/5090 CPU Soft BUG lockup, D3cold to D0, after guest shutdown

    Yeah it seems like it ! I have upgraded few servers with RTX4090 and RTX5090 and l also RTX6000 blackwell to that kernel proxmox-kernel-6.14.8-2-bpo12-pve/stable And so far it works ok + those crazy fast startup. So only one thing stil remains. Crashing GPUs when VM guest does some strange...