Two issues - not sure if they are related:
1) I have promox setup to backup to NFS drive each night. The past few days the NFS drive was full, so the backup failed.
However when the backup fails my VM becomes inaccessible via web. Pingdom reports sites down from 3:09 to 3:14
These are the syslog entries:
The only messages around that time in messages are:
Why would a backup failing, cause sites to go down?
Next problem - promox has also started crashing and restarting every few hours regardless of the backup running, but this started at the same time as the above backup running out of space.
I have corrected the backup situation by enabling more room, so that issue will be resolved, but can't believe it is causing a crash when the backup is not running? but strange it started about the same time....
any ideas?
These are the syslog for the latest crash:
pveversion -v
There are no entries in messages in the run up to the crash, and then just normal boot up messages.
I really need help with this one - --- please
1) I have promox setup to backup to NFS drive each night. The past few days the NFS drive was full, so the backup failed.
However when the backup fails my VM becomes inaccessible via web. Pingdom reports sites down from 3:09 to 3:14
These are the syslog entries:
Code:
Apr 25 02:59:01 willow systemd[1]: Started Proxmox VE replication runner.
Apr 25 02:59:01 willow CRON[5319]: (root) CMD (/usr/local/rtm/bin/rtm 23 > /dev/null 2> /dev/null)
Apr 25 03:00:00 willow systemd[1]: Starting Proxmox VE replication runner...
Apr 25 03:00:01 willow systemd[1]: Started Proxmox VE replication runner.
Apr 25 03:00:01 willow CRON[8304]: (root) CMD (vzdump 403 --storage ns3046957.ip-********.eu --mailnotification always --mailto *****@*****.** --mode snapshot --compress lzo --quiet 1)
Apr 25 03:00:01 willow CRON[8305]: (root) CMD (/usr/local/rtm/bin/rtm 23 > /dev/null 2> /dev/null)
Apr 25 03:00:02 willow vzdump[8304]: <root@pam> starting task UPID:willow:000020B8:05EBC740:5ADFE122:vzdump::root@pam:
Apr 25 03:00:02 willow vzdump[8376]: INFO: starting new backup job: vzdump 403 --compress lzo --quiet 1 --mode snapshot --mailnotification always --storage ns*******.ip-*******.eu --mailto ******@******.**
Apr 25 03:00:02 willow vzdump[8376]: INFO: Starting Backup of VM 403 (qemu)
Apr 25 03:00:02 willow qm[8389]: <root@pam> update VM 403: -lock backup
Apr 25 03:01:00 willow systemd[1]: Starting Proxmox VE replication runner...
Apr 25 03:01:01 willow systemd[1]: Started Proxmox VE replication runner.
Apr 25 03:01:01 willow CRON[12916]: (root) CMD (/usr/local/rtm/bin/rtm 23 > /dev/null 2> /dev/null)
Apr 25 03:02:00 willow systemd[1]: Starting Proxmox VE replication runner...
Apr 25 03:02:01 willow systemd[1]: Started Proxmox VE replication runner.
Apr 25 03:02:01 willow CRON[18611]: (root) CMD (/usr/local/rtm/bin/rtm 23 > /dev/null 2> /dev/null)
Apr 25 03:03:00 willow systemd[1]: Starting Proxmox VE replication runner...
Apr 25 03:03:01 willow systemd[1]: Started Proxmox VE replication runner.
Apr 25 03:03:01 willow CRON[24558]: (root) CMD (/usr/local/rtm/bin/rtm 23 > /dev/null 2> /dev/null)
Apr 25 03:04:00 willow systemd[1]: Starting Proxmox VE replication runner...
Apr 25 03:04:01 willow systemd[1]: Started Proxmox VE replication runner.
Apr 25 03:04:01 willow CRON[30340]: (root) CMD (/usr/local/rtm/bin/rtm 23 > /dev/null 2> /dev/null)
Apr 25 03:04:04 willow smartd[2970]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Temperature_Case changed from 74 to 73
Apr 25 03:04:10 willow rrdcached[4451]: flushing old values
Apr 25 03:04:10 willow rrdcached[4451]: rotating journals
Apr 25 03:04:10 willow rrdcached[4451]: started new journal /var/lib/rrdcached/journal/rrd.journal.1524621850.794202
Apr 25 03:04:10 willow rrdcached[4451]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1524614650.794246
Apr 25 03:05:00 willow systemd[1]: Starting Proxmox VE replication runner...
Apr 25 03:05:01 willow systemd[1]: Started Proxmox VE replication runner.
Apr 25 03:05:02 willow CRON[3605]: (root) CMD (/usr/local/rtm/bin/rtm 23 > /dev/null 2> /dev/null)
Apr 25 03:06:00 willow systemd[1]: Starting Proxmox VE replication runner...
Apr 25 03:06:01 willow systemd[1]: Started Proxmox VE replication runner.
Apr 25 03:06:01 willow CRON[10216]: (root) CMD (/usr/local/rtm/bin/rtm 23 > /dev/null 2> /dev/null)
Apr 25 03:07:00 willow systemd[1]: Starting Proxmox VE replication runner...
Apr 25 03:07:01 willow systemd[1]: Started Proxmox VE replication runner.
Apr 25 03:07:01 willow CRON[14752]: (root) CMD (/usr/local/rtm/bin/rtm 23 > /dev/null 2> /dev/null)
Apr 25 03:07:45 willow pvestatd[4781]: got timeout
Apr 25 03:07:55 willow pvestatd[4781]: got timeout
Apr 25 03:07:55 willow pvestatd[4781]: unable to activate storage 'ns*******.ip**********.eu' - directory '/mnt/pve/ns******.ip.*******.eu' does not exist or is unreachable
Apr 25 03:08:00 willow systemd[1]: Starting Proxmox VE replication runner...
Apr 25 03:08:01 willow systemd[1]: Started Proxmox VE replication runner.
Apr 25 03:08:01 willow CRON[17639]: (root) CMD (/usr/local/rtm/bin/rtm 23 > /dev/null 2> /dev/null)
Apr 25 03:08:05 willow pvestatd[4781]: got timeout
Apr 25 03:08:05 willow pvestatd[4781]: unable to activate storage 'ns********.ip-*****.eu' - directory '/mnt/pve/ns******.ip-******.eu' does not exist or is unreachable
Apr 25 03:08:15 willow pvestatd[4781]: got timeout
Apr 25 03:08:15 willow pvestatd[4781]: unable to activate storage 'ns*******.ip-*******.eu' - directory '/mnt/pve/ns******.ip-******.eu' does not exist or is unreachable
Apr 25 03:08:25 willow pvestatd[4781]: got timeout
Apr 25 03:08:25 willow pvestatd[4781]: unable to activate storage 'ns******.ip-******.eu' - directory '/mnt/pve/ns******.ip-******.eu' does not exist or is unreachable
Apr 25 03:08:35 willow pvestatd[4781]: got timeout
Apr 25 03:08:35 willow pvestatd[4781]: unable to activate storage 'ns******.ip-*******.eu' - directory '/mnt/pve/ns******.ip-******.eu' does not exist or is unreachable
Apr 25 03:08:45 willow pvestatd[4781]: got timeout
Apr 25 03:08:45 willow pvestatd[4781]: unable to activate storage 'ns******.ip-******.eu' - directory '/mnt/pve/ns*******.ip-******.eu' does not exist or is unreachable
Apr 25 03:08:55 willow pvestatd[4781]: got timeout
Apr 25 03:08:55 willow pvestatd[4781]: unable to activate storage 'ns******.ip-******.eu.eu' - directory '/mnt/pve/ns******.ip-******.eu' does not exist or is unreachable
Apr 25 03:09:00 willow systemd[1]: Starting Proxmox VE replication runner...
Apr 25 03:09:01 willow systemd[1]: Started Proxmox VE replication runner.
Apr 25 03:09:01 willow CRON[19588]: (root) CMD (/usr/local/rtm/bin/rtm 23 > /dev/null 2> /dev/null)
Apr 25 03:09:25 willow pvestatd[4781]: got timeout
Apr 25 03:09:35 willow pvestatd[4781]: got timeout
Apr 25 03:09:45 willow pvestatd[4781]: got timeout
Apr 25 03:09:55 willow pvestatd[4781]: got timeout
Apr 25 03:10:00 willow systemd[1]: Starting Proxmox VE replication runner...
Apr 25 03:10:01 willow systemd[1]: Started Proxmox VE replication runner.
Apr 25 03:10:01 willow CRON[23525]: (root) CMD (/usr/local/rtm/bin/rtm 23 > /dev/null 2> /dev/null)
Apr 25 03:10:06 willow pvestatd[4781]: got timeout
Apr 25 03:10:06 willow pvestatd[4781]: unable to activate storage 'ns******.ip-******.eu' - directory '/mnt/pve/ns******.ip-******.eu' does not exist or is unreachable
Apr 25 03:10:15 willow pvestatd[4781]: got timeout
Apr 25 03:10:15 willow pvestatd[4781]: unable to activate storage 'ns******.ip-******.eu' - directory '/mnt/pve/ns******.ip-******.eu' does not exist or is unreachable
Apr 25 03:10:25 willow pvestatd[4781]: got timeout
Apr 25 03:10:25 willow pvestatd[4781]: unable to activate storage 'ns******.ip-******.eu' - directory '/mnt/pve/ns******.ip******.eu' does not exist or is unreachable
Apr 25 03:10:35 willow pvestatd[4781]: got timeout
Apr 25 03:10:35 willow pvestatd[4781]: unable to activate storage 'ns******.ip-*******.eu' - directory '/mnt/pve/ns******.ip******eu' does not exist or is unreachable
Apr 25 03:10:45 willow pvestatd[4781]: got timeout
Apr 25 03:10:45 willow pvestatd[4781]: unable to activate storage 'ns******.ip-******.eu' - directory '/mnt/pve/ns******7.ip-******.eu' does not exist or is unreachable
Apr 25 03:10:59 willow vzdump[8376]: ERROR: Backup of VM 403 failed - vma_queue_write: write error - Broken pipe
Apr 25 03:11:00 willow systemd[1]: Starting Proxmox VE replication runner...
Apr 25 03:11:01 willow systemd[1]: Started Proxmox VE replication runner.
Apr 25 03:11:01 willow vzdump[8376]: INFO: Backup job finished with errors
Apr 25 03:11:01 willow CRON[25244]: (root) CMD (/usr/local/rtm/bin/rtm 23 > /dev/null 2> /dev/null)
Apr 25 03:11:01 willow vzdump[8376]: job errors
Apr 25 03:11:01 willow postfix/pickup[28911]: 63B1D4DA71: uid=0 from=<******@*****.**>
Apr 25 03:11:01 willow vzdump[8304]: <root@pam> end task UPID:willow:000020B8:05EBC740:5ADFE122:vzdump::root@pam: job errors
Apr 25 03:11:01 willow postfix/cleanup[25245]: 63B1D4DA71: message-id=<20180425021101.63B1D4DA71@willow>
Apr 25 03:11:01 willow postfix/qmgr[4658]: 63B1D4DA71: from=<******@******.**>, size=6072, nrcpt=1 (queue active)
Apr 25 03:11:03 willow postfix/smtp[25257]: 63B1D4DA71: to=<******@******.**>, relay=mail.ethica.io[164.132.17.220]:25, delay=1.7, delays=0.04/0.01/1.3/0.35, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as C9763813A8FD)
Apr 25 03:11:03 willow postfix/qmgr[4658]: 63B1D4DA71: removed
Apr 25 03:12:00 willow systemd[1]: Starting Proxmox VE replication runner...
Apr 25 03:12:00 willow systemd[1]: Started Proxmox VE replication runner.
Apr 25 03:12:01 willow CRON[27943]: (root) CMD (/usr/local/rtm/bin/rtm 23 > /dev/null 2> /dev/null)
Apr 25 03:13:00 willow systemd[1]: Starting Proxmox VE replication runner...
Apr 25 03:13:01 willow systemd[1]: Started Proxmox VE replication runner.
Apr 25 03:13:01 willow CRON[30487]: (root) CMD (/usr/local/rtm/bin/rtm 23 > /dev/null 2> /dev/null)
Apr 25 03:14:00 willow systemd[1]: Starting Proxmox VE replica
The only messages around that time in messages are:
Code:
Apr 25 03:00:02 willow vzdump[8304]: <root@pam> starting task UPID:willow:000020B8:05EBC740:5ADFE122:vzdump::root@pam:
Apr 25 03:00:02 willow qm[8389]: <root@pam> update VM 403: -lock backup
Why would a backup failing, cause sites to go down?
Next problem - promox has also started crashing and restarting every few hours regardless of the backup running, but this started at the same time as the above backup running out of space.
I have corrected the backup situation by enabling more room, so that issue will be resolved, but can't believe it is causing a crash when the backup is not running? but strange it started about the same time....
any ideas?
These are the syslog for the latest crash:
Code:
Apr 25 12:46:00 willow systemd[1]: Started Proxmox VE replication runner.
Apr 25 12:46:01 willow CRON[32024]: (root) CMD (/usr/local/rtm/bin/rtm 23 > /dev/null 2> /dev/null)
Apr 25 12:47:00 willow systemd[1]: Starting Proxmox VE replication runner...
Apr 25 12:47:00 willow systemd[1]: Started Proxmox VE replication runner.
Apr 25 12:47:01 willow CRON[2142]: (root) CMD (/usr/local/rtm/bin/rtm 23 > /dev/null 2> /dev/null)
Apr 25 12:48:00 willow systemd[1]: Starting Proxmox VE replication runner...
Apr 25 12:48:00 willow systemd[1]: Started Proxmox VE replication runner.
Apr 25 12:48:01 willow CRON[4983]: (root) CMD (/usr/local/rtm/bin/rtm 23 > /dev/null 2> /dev/null)
Apr 25 12:49:00 willow systemd[1]: Starting Proxmox VE replication runner...
Apr 25 12:49:00 willow systemd[1]: Started Proxmox VE replication runner.
Apr 25 12:49:01 willow CRON[8397]: (root) CMD (/usr/local/rtm/bin/rtm 23 > /dev/null 2> /dev/null)
Apr 25 12:50:00 willow systemd[1]: Starting Proxmox VE replication runner...
Apr 25 12:50:00 willow systemd[1]: Started Proxmox VE replication runner.
Apr 25 12:50:01 willow CRON[10742]: (root) CMD (/usr/local/rtm/bin/rtm 23 > /dev/null 2> /dev/null)
Apr 25 12:53:46 willow systemd-modules-load[1619]: Inserted module 'iscsi_tcp'
Apr 25 12:53:46 willow systemd-udevd[1657]: Network interface NamePolicy= disabled on kernel command line, ignoring.
Apr 25 12:53:46 willow systemd-modules-load[1619]: Inserted module 'ib_iser'
Apr 25 12:53:46 willow systemd-modules-load[1619]: Inserted module 'vhost_net'
Apr 25 12:53:46 willow systemd-sysctl[1690]: Couldn't write '0' to 'net/ipv6/conf/vmbr0/autoconf', ignoring: No such file or directory
Apr 25 12:53:46 willow systemd-sysctl[1690]: Couldn't write '0' to 'net/ipv6/conf/vmbr0/accept_ra', ignoring: No such file or directory
Apr 25 12:53:46 willow systemd-sysctl[1690]: Couldn't write '0' to 'net/ipv6/conf/vmbr0/accept_ra_defrtr', ignoring: No such file or directory
Apr 25 12:53:46 willow kernel: [ 0.000000] random: get_random_bytes called from start_kernel+0x42/0x4fd with crng_init=0
Apr 25 12:53:46 willow kernel: [ 0.000000] Linux version 4.13.16-2-pve (root@nora) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)) #1 SMP PVE 4.13.16-47 (Mon, 9 Apr 2018 09:58:12 +0200) ()
Apr 25 12:53:46 willow kernel: [ 0.000000] Command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-4.13.16-2-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs rootdelay=15 rootdelay=15 noquiet nosplash net.ifnames=0 biosdevname=0
Apr 25 12:53:46 willow kernel: [ 0.000000] KERNEL supported cpus:
Apr 25 12:53:46 willow kernel: [ 0.000000] Intel GenuineIntel
Apr 25 12:53:46 willow kernel: [ 0.000000] AMD AuthenticAMD
Apr 25 12:53:46 willow kernel: [ 0.000000] Centaur CentaurHauls
Apr 25 12:53:46 willow kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Apr 25 12:53:46 willow kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Apr 25 12:53:46 willow kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
Apr 25 12:53:46 willow kernel: [ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
Apr 25 12:53:46 willow kernel: [ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
Apr 25 12:53:46 willow kernel: [ 0.000000] e820: BIOS-provided physical RAM map:
pveversion -v
Code:
pveversion -v
proxmox-ve: 5.1-42 (running kernel: 4.13.16-2-pve)
pve-manager: 5.1-51 (running version: 5.1-51/96be5354)
pve-kernel-4.13: 5.1-44
pve-kernel-4.13.16-2-pve: 4.13.16-47
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-4-pve: 4.13.13-35
pve-kernel-4.13.13-3-pve: 4.13.13-34
pve-kernel-4.13.13-2-pve: 4.13.13-33
pve-kernel-4.13.13-1-pve: 4.13.13-31
pve-kernel-4.13.8-3-pve: 4.13.8-30
pve-kernel-4.13.8-2-pve: 4.13.8-28
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.10.17-4-pve: 4.10.17-24
pve-kernel-4.10.17-3-pve: 4.10.17-23
pve-kernel-4.10.17-2-pve: 4.10.17-20
pve-kernel-4.10.17-1-pve: 4.10.17-18
corosync: 2.4.2-pve4
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: not correctly installed
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-30
libpve-guest-common-perl: 2.0-14
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-18
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-2
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-15
pve-cluster: 5.0-25
pve-container: 2.0-21
pve-docs: 5.1-17
pve-firewall: 3.0-8
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-2
qemu-server: 5.0-25
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.7-pve1~bpo9
There are no entries in messages in the run up to the crash, and then just normal boot up messages.
I really need help with this one - --- please