Servers and VMs online but files over smb not reachable

SAR33

New Member
Jan 30, 2013
11
0
1
Hello,
I made a misstake and tried to do a backup into /var/lib/vz and additional it was not enough space left. I solved the problem by putting the node2 out of backup lock and rebooting node2. After that both nodes showed the OS partition with 100% full. The smb on node2 was still reachable. I then deleted the over 90GB *.dat file on node2 and rebooted both servers. I got some error messages for drb files on both servers but they booted, gon online, vm's are running and the OS partition size is back to 2% in use. Everything seems back to normall excluding the smb on node2.

So now the files via smb on node1 are reachable but i can't get a smb connection on node2. I have pinged both ip's of the node and got a pong back. Also i can access both nodes via ssh so i to time have no clue why the smb in node2 isn't reachable.

Some more information:
It look like that some table informations are corrupted or just wrong. I can run the VM's and everything seem's to be ok excluding that the filebox isn't reachable.

http://pic-hoster.net/view/51154/Bildschirmfoto2013-02-03um16.32.02.png
http://pic-hoster.net/view/51155/Bildschirmfoto2013-02-03um16.31.33.png
http://pic-hoster.net/view/51157/Bildschirmfoto2013-02-03um17.00.58.png

I know that i have to update but im new in this company, we dono have a backup server for the importend files and we also dono have a NAS server. Next week we have production week and i dono want to risk to loos the importend data. There for i want to clone the server later, and migrate a 3th backup server after everything is working like it should. If im then have a backup server i can work on the importend serv to update and to get it running. Just to clarify why im still on the v1.9.
And i allready have a big problem in case of that we can't access the file server. :(

Some more information:

fs2:/etc/lvm# pveversion -v
pve-manager: 1.9-26 (pve-manager/1.9/6567)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 1.9-55+ovzfix-1
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.32-6-pve: 2.6.32-55+ovzfix-1
qemu-server: 1.1-32
pve-firmware: 1.0-14
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.29-3pve1
vzdump: 1.2-16
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-2
ksm-control-daemon: 1.0-6

pve-manager: 1.9-26 (pve-manager/1.9/6567)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 1.9-55+ovzfix-1
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.32-6-pve: 2.6.32-55+ovzfix-1
qemu-server: 1.1-32
pve-firmware: 1.0-14
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.29-3pve1
vzdump: 1.2-16
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-2
ksm-control-daemon: 1.0-6

Also i got a email from server telling me this:

<admin@XXX.at> (expanded from <root>): host terminal.sil.at[88.198.105.243]
said: 550-Verification failed for <root@fs1.localdomain> 550-Invalid domain
part in email address 550 Sender verify failed (in reply to RCPT TO
command)
Reporting-MTA: dns; fs1.localdomain
X-Postfix-Queue-ID: C00BB234354
X-Postfix-Sender: rfc822; root@fs1.localdomain
Arrival-Date: Sat, 2 Feb 2013 18:25:37 +0100 (CET)

Final-Recipient: rfc822; admin@XXX.at
Original-Recipient: rfc822; root
Action: failed
Status: 5.0.0
Remote-MTA: dns; terminal.sil.at
Diagnostic-Code: smtp; 550-Verification failed for <root@fs1.localdomain>
550-Invalid domain part in email address 550 Sender verify failed

Von: root@fs1.localdomain (Cron Daemon)
Datum: 02. Februar 2013 06:25:02 MEZ
An: root@fs1.localdomain
Betreff: Cron <root@fs1> test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )


/etc/cron.daily/man-db:
/usr/bin/mandb: can't write to /var/cache/man/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/ja/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/zh_TW/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/de/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/ko/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/fr/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/it.ISO8859-1/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/ru/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/pl.ISO8859-2/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/pl.UTF-8/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/sv/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/fr.ISO8859-1/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/fi/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/es/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/zh_CN/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/pt_BR/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/id/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/pl/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/cs/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/fr.UTF-8/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/it.UTF-8/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/hu/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/gl/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/tr/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/it/750685: No space left on device
/usr/bin/mandb: can't create index cache /var/cache/man/oldlocal/750685: No space left on device
/usr/bin/mandb: can't create index cache /var/cache/man/local/750685: No space left on device

(i deleted the companys name out of the email address cause i dono know if im allowed to post it.)
 
Last edited:
Hello,
I made a misstake and tried to do a backup into /var/lib/vz and additional it was not enough space left. I solved the problem by putting the node2 out of backup lock and rebooting node2. After that both nodes showed the OS partition with 100% full. The smb on node2 was still reachable. I then deleted the over 90GB *.dat file on node2 and rebooted both servers. I got some error messages for drb files on both servers but they booted, gon online, vm's are running and the OS partition size is back to 2% in use. Everything seems back to normall excluding the smb on node2.

So now the files via smb on node1 are reachable but i can't get a smb connection on node2. I have pinged both ip's of the node and got a pong back. Also i can access both nodes via ssh so i to time have no clue why the smb in node2 isn't reachable.

Some more information:
It look like that some table informations are corrupted or just wrong. I can run the VM's and everything seem's to be ok excluding that the filebox isn't reachable.

http://pic-hoster.net/view/51154/Bildschirmfoto2013-02-03um16.32.02.png
http://pic-hoster.net/view/51155/Bildschirmfoto2013-02-03um16.31.33.png
http://pic-hoster.net/view/51157/Bildschirmfoto2013-02-03um17.00.58.png

I know that i have to update but im new in this company, we dono have a backup server for the importend files and we also dono have a NAS server. Next week we have production week and i dono want to risk to loos the importend data. There for i want to clone the server later, and migrate a 3th backup server after everything is working like it should. If im then have a backup server i can work on the importend serv to update and to get it running. Just to clarify why im still on the v1.9.
And i allready have a big problem in case of that we can't access the file server. :(

Some more information:

fs2:/etc/lvm# pveversion -v
pve-manager: 1.9-26 (pve-manager/1.9/6567)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 1.9-55+ovzfix-1
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.32-6-pve: 2.6.32-55+ovzfix-1
qemu-server: 1.1-32
pve-firmware: 1.0-14
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.29-3pve1
vzdump: 1.2-16
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-2
ksm-control-daemon: 1.0-6

pve-manager: 1.9-26 (pve-manager/1.9/6567)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 1.9-55+ovzfix-1
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.32-6-pve: 2.6.32-55+ovzfix-1
qemu-server: 1.1-32
pve-firmware: 1.0-14
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.29-3pve1
vzdump: 1.2-16
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-2
ksm-control-daemon: 1.0-6

Also i got a email from server telling me this:

<admin@XXX.at> (expanded from <root>): host terminal.sil.at[88.198.105.243]
said: 550-Verification failed for <root@fs1.localdomain> 550-Invalid domain
part in email address 550 Sender verify failed (in reply to RCPT TO
command)
Reporting-MTA: dns; fs1.localdomain
X-Postfix-Queue-ID: C00BB234354
X-Postfix-Sender: rfc822; root@fs1.localdomain
Arrival-Date: Sat, 2 Feb 2013 18:25:37 +0100 (CET)

Final-Recipient: rfc822; admin@XXX.at
Original-Recipient: rfc822; root
Action: failed
Status: 5.0.0
Remote-MTA: dns; terminal.sil.at
Diagnostic-Code: smtp; 550-Verification failed for <root@fs1.localdomain>
550-Invalid domain part in email address 550 Sender verify failed

Von: root@fs1.localdomain (Cron Daemon)
Datum: 02. Februar 2013 06:25:02 MEZ
An: root@fs1.localdomain
Betreff: Cron <root@fs1> test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )


/etc/cron.daily/man-db:
/usr/bin/mandb: can't write to /var/cache/man/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/ja/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/zh_TW/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/de/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/ko/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/fr/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/it.ISO8859-1/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/ru/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/pl.ISO8859-2/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/pl.UTF-8/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/sv/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/fr.ISO8859-1/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/fi/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/es/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/zh_CN/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/pt_BR/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/id/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/pl/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/cs/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/fr.UTF-8/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/it.UTF-8/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/hu/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/gl/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/tr/750685: No space left on device
/usr/bin/mandb: can't write to /var/cache/man/it/750685: No space left on device
/usr/bin/mandb: can't create index cache /var/cache/man/oldlocal/750685: No space left on device
/usr/bin/mandb: can't create index cache /var/cache/man/local/750685: No space left on device

(i deleted the companys name out of the email address cause i dono know if im allowed to post it.)

Hi,
you running smbd directly on both hosts? Not inside an VM/CT?

Any hints in the samba-log?
How looks the other partitions? Please post the output of
Code:
df -k
I guess your tbb-files are corrupt:
Code:
rm /var/cache/netsamlogon_cache.tdb
rm /var/cache/login_cache.tdb
Udo
 
Last edited:
I run samba inside the VM.

I dono have thoes .tdb files in my /var/cache/
This is everything within /var/cache/ :
apache2 apt debconf dictionaries-common ldconfig man samba

And here is a pic of the partitions:
http://pic-hoster.net/view/51165/Bildschirmfoto2013-02-04um01.26.28.png

For a better read and with -h :
fs1:/# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/pve-root 95G 2.1G 88G 3% /
tmpfs 3.9G 0 3.9G 0% /lib/init/rw
udev 10M 984K 9.1M 10% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
/dev/sda1 504M 49M 430M 11% /boot


fs2:/# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/pve-root 95G 2.0G 88G 3% /
tmpfs 2.0G 0 2.0G 0% /lib/init/rw
udev 10M 960K 9.1M 10% /dev
tmpfs 2.0G 0 2.0G 0% /dev/shm
/dev/sda1 504M 49M 430M 11% /boot

And from Proxmox:
http://pic-hoster.net/view/51166/Bildschirmfoto2013-02-04um01.25.14.png
 
Edit:
Allready posted the info but it needs to be aproved.

Also i noticed that i get some error messages on bootup but i also got thoes message as i rebooted fs2 after the server was stuck in backup lock and everything worked fine, except that the filebox was reachable.
But now it's still unreachable.

This is the error message:
device-mapper: snapshots: Snapshot is marked invalid.
device-mapper: snapshots: Snapshot is marked invalid.
Buffer I/O error on device dm-35, logical block 0
Buffer I/O error on device dm-35, logical block 1
Buffer I/O error on device dm-35, logical block 2
Buffer I/O error on device dm-35, logical block 3
Buffer I/O error on device dm-35, logical block 0
Buffer I/O error on device dm-37, logical block 0
Buffer I/O error on device dm-37, logical block 1
Buffer I/O error on device dm-37, logical block 2
Buffer I/O error on device dm-37, logical block 3
Buffer I/O error on device dm-37, logical block 0

EDIT:
I found something. The IP's of the VM (192.168.100.1 || 192.168.100.2) i can ping them but the IP's of the machine it self (10.0.1.1 || 10.0.1.2) i can't get a pong back. The command ifconfig show me the correct IP's for eth1 so they do mach with the config in the web interface. I will continue my research and appologize my self for the kind off noob questions.
 
Last edited:
I noticed a message on the fs1 it self (not via ssh) maybe it is this why i can't ping the machine's IP's ?

fs1:~# tun: Universal TUN/TAP device driver, 1.6
tun: (C) 1999-2004 Max Krasnyansky (maxk@qualcomm.com)
device tap101i0d0 entered promiscuous mode
vmbr0: port 2(tap10110d0) entering forwarding state
device tap101i0d1 entered promiscuous mode
vmbr0: port 3(tap101i0d1) entering forwarding state
device tap102i0d0 entered promiscuous mode
vmbr0: port 4(tap102i0d0) entering forwarding state
device tap102i0d1 entered promiscuous mode
vmbr0: port 5(tap102i0d1) entering forwarding state

Is that OK?

Or what else can i do to recover the IP's or to check if the satic IP's of the machine's have changed?
 
Any hints in the samba-log?
Nope there is no samba log at all. At least if i ping the eth1 static IP's within the fileservers itself it work and send me a pong back.
I goolged a bit and used 'smbclient -L IP-of-vmbr0' and got:
Connection to 192.168.100.2 failed (Error NT_STATUS_CONNECTION_REFUSED)

So it seems something corrupted the samba conf and Proxmox is working like it should.
 
Ok my brain is juice -.-
I cant find any really usefull samba file on the servers. But i found some files on my MAC which i work on. Im a bit corious cause i thought samba have to run on the server and also the files need to be there. Now i found them on my MAC. But anyway the only files in there are:
/var/samba/
account_policy.tdb gencache.tdb notify.tdb printing unexpected.tdb
brlock.tdb group_mapping.tdb ntdrivers.tdb registry.tdb winbindd_privileged
browse.dat locking.tdb ntforms.tdb sessionid.tdb winbindd_public config.mutex
messages.tdb ntprinters.tdb share_info.tdb connections.tdb namelist.debug perfmon shares.mutex

But i don't have this both files:
netsamlogon_cache.tdb login_cache.tdb

EDIT:
As i sayed after 24 hours of work with no sleep my brain is juice :P
Used again 'smbclient -L IP-of-vmbr0' but this time i used the IP of my MAC i work on where i also found the smb files and now i got this error message:

admin:samba admin$ smbclient -L 192.168.0.77
params.c:Parameter() - Unexpected end-of-file at: ??5:q
Password:
session setup failed: NT_STATUS_LOGON_FAILURE

Lets give another trie to fix that problem, even with a juice brain ^^
 
Last edited:
I found a kind of bug in smb.conf on my MAC. In the conf file at the end there was this: ^ 5:q
I deleted thoes strings and now samba spit out another error message:

admin:etc admin$ smbclient -L 192.168.0.77
Receiving SMB: Server stopped responding
protocol negotiation failed

However you're right about the snap's:

fs2:/# lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
vm-102-disk-1 drbdvgBU -wi-ao 930.00G
vm-101-disk-1 drbdvgFS owi-ao 16.00G
vm-101-disk-2 drbdvgFS owi-ao 300.00G
vm-101-disk-3 drbdvgFS owi-ao 300.00G
vm-101-disk-4 drbdvgFS owi-ao 300.00G
vm-101-disk-5 drbdvgFS owi-ao 300.00G
vm-101-disk-6 drbdvgFS owi-ao 600.00G
vm-101-disk-7 drbdvgFS owi-ao 100.00G
vzsnap-fs1-1 drbdvgFS swi-a- 1.00G vm-101-disk-2 0.01
vzsnap-fs1-2 drbdvgFS swi-a- 1.00G vm-101-disk-3 0.81
vzsnap-fs1-3 drbdvgFS swi-a- 1.00G vm-101-disk-4 49.33
vzsnap-fs1-4 drbdvgFS swi-a- 1.00G vm-101-disk-5 8.61
vzsnap-fs1-5 drbdvgFS Swi-I- 1.00G vm-101-disk-6 100.00
vzsnap-fs1-6 drbdvgFS swi-a- 1.00G vm-101-disk-7 3.08
vzsnap-fs2-0 drbdvgFS swi-a- 1.00G vm-101-disk-1 6.97
vzsnap-fs2-1 drbdvgFS swi-a- 1.00G vm-101-disk-2 0.01
vzsnap-fs2-2 drbdvgFS swi-a- 1.00G vm-101-disk-3 0.81
vzsnap-fs2-3 drbdvgFS swi-a- 1.00G vm-101-disk-4 49.33
vzsnap-fs2-4 drbdvgFS swi-a- 1.00G vm-101-disk-5 8.61
vzsnap-fs2-5 drbdvgFS Swi-I- 1.00G vm-101-disk-6 100.00
vzsnap-fs2-6 drbdvgFS swi-a- 1.00G vm-101-disk-7 3.12
vm-102-disk-1 drbdvgSHARE -wi-ao 32.00G
root pve -wi-ao 96.00G
swap pve -wi-ao 7.00G

fs2:/# vgs
VG #PV #LV #SN Attr VSize VFree
drbdvgBU 1 1 0 wz--n- 931.48G 1.48G
drbdvgFS 3 20 13 wz--n- 2.73T 865.44G
drbdvgSHARE 1 1 0 wz--n- 827.95G 795.95G
pve 1 2 0 wz--n- 103.00G 0

fs1:/# lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
vm-102-disk-1 drbdvgBU -wi-ao 930.00G
vm-101-disk-1 drbdvgFS owi-ao 16.00G
vm-101-disk-2 drbdvgFS owi-ao 300.00G
vm-101-disk-3 drbdvgFS owi-ao 300.00G
vm-101-disk-4 drbdvgFS owi-ao 300.00G
vm-101-disk-5 drbdvgFS owi-ao 300.00G
vm-101-disk-6 drbdvgFS owi-ao 600.00G
vm-101-disk-7 drbdvgFS owi-ao 100.00G
vzsnap-fs1-1 drbdvgFS swi-a- 1.00G vm-101-disk-2 0.01
vzsnap-fs1-2 drbdvgFS swi-a- 1.00G vm-101-disk-3 0.81
vzsnap-fs1-3 drbdvgFS swi-a- 1.00G vm-101-disk-4 49.33
vzsnap-fs1-4 drbdvgFS swi-a- 1.00G vm-101-disk-5 8.61
vzsnap-fs1-5 drbdvgFS Swi-I- 1.00G vm-101-disk-6 100.00
vzsnap-fs1-6 drbdvgFS swi-a- 1.00G vm-101-disk-7 3.08
vzsnap-fs2-0 drbdvgFS swi-a- 1.00G vm-101-disk-1 6.97
vzsnap-fs2-1 drbdvgFS swi-a- 1.00G vm-101-disk-2 0.01
vzsnap-fs2-2 drbdvgFS swi-a- 1.00G vm-101-disk-3 0.81
vzsnap-fs2-3 drbdvgFS swi-a- 1.00G vm-101-disk-4 49.33
vzsnap-fs2-4 drbdvgFS swi-a- 1.00G vm-101-disk-5 8.61
vzsnap-fs2-5 drbdvgFS Swi-I- 1.00G vm-101-disk-6 100.00
vzsnap-fs2-6 drbdvgFS swi-a- 1.00G vm-101-disk-7 3.12
vm-102-disk-1 drbdvgSHARE -wi-ao 32.00G
root pve -wi-ao 96.00G
swap pve -wi-ao 7.00G

fs1:/# vgs
VG #PV #LV #SN Attr VSize VFree
drbdvgBU 1 1 0 wz--n- 931.48G 1.48G
drbdvgFS 3 20 13 wz--n- 2.73T 865.44G
drbdvgSHARE 1 1 0 wz--n- 827.95G 795.95G
pve 1 2 0 wz--n- 103.00G 0

So i can delete thoes snapshot's without damaging something ??

Can or shall i delete thoes vzsnap-fs1-1--cow in /dev/mapper/ too ?

What a wired problem, i edited the smb.conf and removed the strange caractors on end of the file, saved it and exited. If i now run smbclient -L 192.168.0.77 i get Receiving SMB: Server stopped responding
protocol negotiation failed. If i now use testparm i get:

admin:var admin$ testparm
Load smb config files from /private/etc/smb.conf
Loaded services file OK.
Server role: ROLE_STANDALONE
Press enter to see a dump of your service definitions

[global]
admin:var admin$

Is it me or what....? This is unbeliveable -.-
 
Last edited:
{solved}

Yea what else can i say, the server was up since 202 days and just and simply needed a chkdsk -.-

Now the fileserver is finally running again. The chkdsk have eaten a samba .tdb file and i need to change the pass but that's the smaller amount of troubles :P

THX to everyone and keep up the great work. :)