turned off VM still respond to ping and SSH

offerlam

Renowned Member
Dec 30, 2012
218
0
81
Denmark
First off - HAPPY NEW YEARS!

I thought this was because of multiple IP. But im starting not to think it's not..

So i have a VM 122

I turn it off from proxmox gui
I checked from the proxmox node using qm list

qm list
122 Data01DingIT stopped 1024 200.00 0


according to my proxmox gui my VM mac address for VM 122 is:
1A:1B:32:BE:30:E4

NOW.. since the VM is turned off i suppose i should not be able to connect over putty but i can.
And when i go ifconfig i can see im on the right server IP wise. also the MAC is the same as in the proxmox gui. But it should be off?????

eth0 Link encap:Ethernet HWaddr 1a:1b:32:be:30:e4

wtf is going on?!?!?

I have no idea how to troupleshoot this further?

pveversion -v
proxmox-ve-2.6.32: 3.4-163 (running kernel: 2.6.32-41-pve)
pve-manager: 3.4-10 (running version: 3.4-10/73ab1bcf)
pve-kernel-2.6.32-41-pve: 2.6.32-163
pve-kernel-2.6.32-28-pve: 2.6.32-124
pve-kernel-2.6.32-23-pve: 2.6.32-109
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-3
pve-cluster: 3.0-19
qemu-server: 3.4-6
pve-firmware: 1.1-4
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-33
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-11
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1

The setup is a 3 node cluster
connected to two switches in a stack. Allied telesis.

Hope someone has a suggestion..

Also. when i log in using putty you would normally get the standard ubuntu information. you know IP, load, disk space usage and so on.
But when i do it now with the server turned off, according to proxomx, i get this dump of error code:

Last login: Fri Jan 1 09:37:53 2016 from 172.16.99.30
-bash: /usr/bin/groups: cannot execute binary file: Exec format error
Traceback (most recent call last):
File "/usr/lib/command-not-found", line 27, in <module>
from CommandNotFound.util import crash_guard
File "/usr/lib/python3/dist-packages/CommandNotFound/__init__.py", line 3, in <module>
from CommandNotFound.CommandNotFound import CommandNotFound
File "/usr/lib/python3/dist-packages/CommandNotFound/CommandNotFound.py", line 7, in <module>
import dbm.gnu as gdbm
TypeError: source code string cannot contain null bytes
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 62, in appor t_excepthook
import re, traceback
TypeError: source code string cannot contain null bytes

Original exception was:
Traceback (most recent call last):
File "/usr/lib/command-not-found", line 27, in <module>
from CommandNotFound.util import crash_guard
File "/usr/lib/python3/dist-packages/CommandNotFound/__init__.py", line 3, in <module>
from CommandNotFound.CommandNotFound import CommandNotFound
File "/usr/lib/python3/dist-packages/CommandNotFound/CommandNotFound.py", line 7, in <module>
import dbm.gnu as gdbm
TypeError: source code string cannot contain null bytes
-bash: /usr/bin/dircolors: cannot execute binary file: Exec format error
Traceback (most recent call last):
File "/usr/lib/command-not-found", line 27, in <module>
from CommandNotFound.util import crash_guard
File "/usr/lib/python3/dist-packages/CommandNotFound/__init__.py", line 3, in <module>
from CommandNotFound.CommandNotFound import CommandNotFound
File "/usr/lib/python3/dist-packages/CommandNotFound/CommandNotFound.py", line 7, in <module>
import dbm.gnu as gdbm
TypeError: source code string cannot contain null bytes
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 62, in appor t_excepthook
import re, traceback
TypeError: source code string cannot contain null bytes

Original exception was:
Traceback (most recent call last):
File "/usr/lib/command-not-found", line 27, in <module>
from CommandNotFound.util import crash_guard
File "/usr/lib/python3/dist-packages/CommandNotFound/__init__.py", line 3, in <module>
from CommandNotFound.CommandNotFound import CommandNotFound
File "/usr/lib/python3/dist-packages/CommandNotFound/CommandNotFound.py", line 7, in <module>
import dbm.gnu as gdbm
TypeError: source code string cannot contain null bytes

hope this helped somehow! im desperate
 
  • This is proxmox 3.x ?
    • Are you sure there is no other VM / physical_node running the same IP / Mac address on your network ?
      • Are you sure the Vm is shutdown ?? --> qm stop <vmID>
        • If its not in a remote Data center, but local, you could try and give it another vMac (e.g. 08:12:34:56:78:90) and restart it. See if that "old" machine keeps responding on that old "mac/ip combo"
 
Hi Q-Wulf

1. Yes this is 3.4
2. I just went through all the servers and this is the only one with this IP
3. I'm sure its stoppe because qm list says its stopped. But i did a qm stop anyways which didn't do anything.
4. It is a remote datacenter but i changed the mac anyways. And here something wierd happend.

Cause the machine i can log into with the ip does NOT change mac. It hasn't changed my problem though.

VM 122 comes from a image i made of another vm. Could it be the image that is responding?

Or maybe the shadow of a former VM that hasn't been totally removed?

any way to get some information from the Vswitch perhaps?
 
I see 3 Options to trace this down.

  1. Its a remote Datacenter, so maybe they assigned the same vMac/IP to another customer ? (i am assuming we are talking public IP here and a ISP like OVH that gives you vMacs, where their internal services then assign the Ip based on that vMac)
    1. If you do not have your VM(s) as autostart, you could just restart the Node(s) in question using your maintenance window, You then do not restart any Vm's until your test is done. You then check if the IP is still reachable.
      1. If it is, then it has been assigned to someone else. Do not look at proxmox anymore --> Talk to their Support team directly.
      2. If it isn't, the problem is within your Proxmox Cluster --> Continue looking at Proxmox as the source (2. + 3)
  2. You say you created this VM from a Image of another VM.
    1. Did you change vMac or IP afterwards on the new VM.
    2. Is the old VM still running (somewhere), not sure where you copied it from. If its origin is not in the same Datacenter and not the same Proxmox-node , you could run a traceroute on the IP in question (all other VM's off) and see if the hops point to anything useful.
  3. Any chance you have configured the IP-Address by Hand statically somewhere on one of your containers and/or nodes ?
    1. check ifconfig of all running Nodes/VM's/Containers
Hope that helps, pretty sure that covers all angles, if it does not say something :)

edit: You mentioned vSwitch, do you mean openvswitch ? If so, google for "VLOG" its the openvswitch logging mechanism (never used it personally). In any case i'd check the steps above first.
 
Hi Q-wulf

1. This is a colocation site where i have my own 42U rack. So the Local and external IP is managed by me. I have chekcked 3 times now that this IP is not in use anywhere else. There are no servers running besides the cluster so if the IP is in use somewhere i has to be another vm.

2. I created a server with the setup i wanted for furture servers. I than "destroyed" this server by making it into a image. This i done from the proxmox gui. This is the image i have used to create new servers in the future.
1. No i changed the mac when you asked me to further up. But i do belive when you clone a new VM from a image that proxmox do create a new Vmac for it. The base image is set to use DHCP with a preconfigured static setup that is commented out in the interface file using #. So i did change the VM from DHCP to a static IP.
2. No the old VM is not running anywhere since when you say "this vm should be a image" the VM gets destroyed and becomes a image​
3. I have checked all running VMs. There are 17 of those. I have also checked the macs of them which does not related to the one i see when i ssh in.

By vswitch i ment if there are any logs for the default proxmox vswitch? im not using openvswitch unless thats the default switch proxmox uses for its vms?
 
I just noticed one thing though..

the Mac of the machine i connect to via ssh when the VM is turned off is
1a:1b:32:be:30:e4
I have changed the network interface in the proxmox gui for this back to be
1a:1b:32:be:30:e5
I just turned on VM 122 and noticed that the MAC has changed? It is still
1a:1b:32:be:30:e4
Even though my proxmox gui says something else.

Maybe this can cast some light as to what the problem is.

I'm thinking of deleting the network interface and create a new one and see what happens when i do this. But im hesistant because if it solves the problem all logs for finding the issue may be gone..

Should i do it Q-Wulf?
 
[...]1. This is a colocation site where i have my own 42U rack. So the Local and external IP is managed by me. I have chekcked 3 times now that this IP is not in use anywhere else. There are no servers running besides the cluster so if the IP is in use somewhere i has to be another vm.[...]

The IP in question, is it a public IP or a local subnet ?
If local subnet, is your network isolated from the other customers on said colocated Data Center ? Are you sure ?
Where are you testing your IP from ? From a Vm on the same Proxmox node ? From the Proxmox node itself? from another server in the rack ? From a computer outside the data center ?
How/where do your IP's get assigned via DHCP ?


[...]
2. I created a server with the setup i wanted for furture servers. I than "destroyed" this server by making it into a image. This i done from the proxmox gui. This is the image i have used to create new servers in the future.[...]

Just for clarity: You created a VM and then via Proxmox-GUI turned that Vm into a template, right ?

[...]
2. No the old VM is not running anywhere since when you say "this vm should be a image" the VM gets destroyed and becomes a image3. I have checked all running VMs. There are 17 of those. I have also checked the macs of them which does not related to the one i see when i ssh in.[...]
just for clarity1: When you say you have "checked" the vMacs, you have done this from inside the VM/CT via e.g. noVNC ??
just for clarity2: those 17 IP VM's . any of them a clone/copy/template of the original VM that you think is still "up as a ghost" ??
 
Hi Q-Wulf.

  1. Its a local IP 192.168.253.2
  2. The colocation site only offers external IPs. The local LAN is managed by myself.
  3. I'm connected directly to the LAN of the site by IPSEC site 2 site
  4. I'm testing both from my pc, proxmox and other VMs on the same VLAN
  5. IPs are assigned static
  6. Yes i created a VM and made a template of it using proxmox-gui
  7. I checked the VM Mac from VM 122 console in proxmox
  8. I took special time for the 3 of 17 VMs which are also spin off from the template and they don't have the same MAC or IP.

I tried to offline migrate the VM 122 from one node to another and now the problem is gone. Even if i move it pack to the node i was in the first place it works. I can't connect to it anymore even though its on the node where i had the problem.

So it was diffidently something Proxmox related. But i don't know what..

I can see from the migration that VM 122 has adopted the new MAC i set in Proxmox for its network interface. So maybe that's why its working now.

Thanks for helping out though!
 
I tried to offline migrate the VM 122 from one node to another and now the problem is gone. Even if i move it pack to the node i was in the first place it works. I can't connect to it anymore even though its on the node where i had the problem.


That is something really strange. Especially since it is not reproducible, and did not show up on the active VM lists via qm list.

From the symptoms I'd have said it points to a Machine from outside your (managed) network, but inside the same colo-site, where the another customer has assigned the same vMac and your DNS is providing an IP (as i do not know how you/cour colo isp do network isolation)

Anyways, good to hear its fixed.
 
I met this same issue, config:
2 pve node cluster (v7.2.4)
  • the ip of vm (in node 1) is pingable even it's power off (conformed in shell), from switch and arp table confirmed the icmp respond is from node 1, and no other vm have same MAC address.
  • no other vm is configured with the ip address.
  • migrate the vm to pve host 2, power on and power it off.
  • the ip is still pingable (packet from pve node 1)
  • reboot pve node 1 may help, but I want to diagnose it, what should I do?

thanks

vm config:
Code:
root@pve-2:~# cat /etc/pve/qemu-server/120.conf
agent: 1
boot: order=scsi0;ide2;net0
cores: 2
ide2: none,media=cdrom
memory: 2048
meta: creation-qemu=6.1.1,ctime=1648787837
name: svr.pbs
net0: virtio=2A:63:8D:A7:FA:DF,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: dsm02:200/vm-200-disk-0.qcow2,cache=writeback,iothread=1,size=12G
scsihw: virtio-scsi-single
smbios1: uuid=b7995c07-2d08-4930-9fbf-99e04f46710f
sockets: 2
vmgenid: c0ade8dc-388d-4bb4-b1f3-977617072e27
 
@k-123

just to be sure that it's really stopped,
on host:

#ps -aux |grep "kvm -id <vmid>".

if you don't have any result, the vm is correctly stopped.
if you are still able to ping it, you have another server with the same ip somewhere in your network.

if it's still runiing, you can simply kill the process. (but it's a bug in proxmox in this case)
 
you could look also at bridge fdb to see mac address <=>vm association

#ping the ip

look at your arp table to find mac address

#ip neigh show |grep <ip>

then look at bridge

#bridge fdb show |grep <mac>
 
here it is:
Code:
root@pve-1:~# ping 172.16.2.20
PING 172.16.2.20 (172.16.2.20) 56(84) bytes of data.
64 bytes from 172.16.2.20: icmp_seq=1 ttl=64 time=0.084 ms
64 bytes from 172.16.2.20: icmp_seq=2 ttl=64 time=0.098 ms
64 bytes from 172.16.2.20: icmp_seq=3 ttl=64 time=0.082 ms
64 bytes from 172.16.2.20: icmp_seq=4 ttl=64 time=0.084 ms
64 bytes from 172.16.2.20: icmp_seq=5 ttl=64 time=0.050 ms
^C
--- 172.16.2.20 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4081ms
rtt min/avg/max/mdev = 0.050/0.079/0.098/0.015 ms
root@pve-1:~# ip neigh show | grep 172.16.2.20
172.16.2.20 dev vmbr0 lladdr 2a:63:8d:a7:fa:df REACHABLE
root@pve-1:~# bridge fdb show | grep 2a:63:8d:a7:fa:df
2a:63:8d:a7:fa:df dev tap200i0 master fwbr200i0
2a:63:8d:a7:fa:df dev fwpr200p0 master vmbr0
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!