strange bug in 5b2, pvesh disables the web ui.

Aug 6, 2014
136
3
18
running proxmox 5beta2 in vagrant (libvirt/kvm) using this packer build, https://github.com/rgl/proxmox-ve/tree/pve-5

when running any pvesh commands from the provision script, the gui seems to be disabled. the first time i look, the node is replaced by a red "node1", after logging out and back in, roots password is not accepted even if i change it from the shell. but, pvesh seems to work fine. its only the webui that gets messed up. if you run the vagrant file below, and uncomment that last line, you should see the error. i tried a few times with that last line commented out and uncommented. the behavior is consistent. but, after letting vagrant up complete, sshing in and running that command does not break the ui. i say "i" because it could be something weird about my environment. the vagrant host is fedora 25 with upstream vagrant and upstream libvirt plugin.

the last line in provision is the one that does it. id like to file a bug report, but id like to narrow this down first. could it be something in the packer file? any suggestions on debugging this?

Code:
net = "10.12.12."

Vagrant.configure('2') do |config|
  config.vm.box = 'proxmox-ve'
  config.vm.synced_folder './','/vagrant', type: 'rsync'
  config.vm.provider :libvirt do |lv|
    lv.memory = 4096
    lv.cpus = 2
    lv.cpu_mode = 'host-passthrough'
    lv.nested = true
  end

  config.vm.define "node1" do |n1|
    n1_ip = net + "21"
    n1.vm.hostname = "node1"
    n1.vm.provider :libvirt do |lv|
      lv.storage :file, :size => '50G'
    end
    n1.vm.network "private_network", ip: n1_ip, auto_config: false
    n1.vm.provision "shell", path: "provision.sh", args: n1_ip
  end

end
Code:
#!/bin/bash
set -eux

ip=$1
fqdn=$(hostname --fqdn)

# configure apt for non-interactive mode.
export DEBIAN_FRONTEND=noninteractive

# fix /etc/hosts so ceph works
# removes the first instance of 127.0.0.1 with the name
# so ceph createmon finds the one with the "real" ip address
# as of this writing, its the first line
sed -i "/^127\.0\.0\.1.\+${fqdn}/d" /etc/hosts

# configure the network for NATting.
ifdown vmbr0
cat >/etc/network/interfaces <<EOF
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet dhcp

auto eth1
iface eth1 inet manual

auto vmbr0
iface vmbr0 inet static
    address $ip
    netmask 255.255.255.0
    bridge_ports eth1
    bridge_stp off
    bridge_fd 0
    # enable IP forwarding. needed to NAT and DNAT.
    post-up   echo 1 >/proc/sys/net/ipv4/ip_forward
    # NAT through eth0.
    post-up   iptables -t nat -A POSTROUTING -s '$ip/24' ! -d '$ip/24' -o eth0 -j MASQUERADE
    post-down iptables -t nat -D POSTROUTING -s '$ip/24' ! -d '$ip/24' -o eth0 -j MASQUERADE
EOF
sed -i -E "s,^[^ ]+( .*pve.*)\$,$ip\1," /etc/hosts
sed 's,\\,\\\\,g' >/etc/issue <<'EOF'

     _ __  _ __ _____  ___ __ ___   _____  __ __   _____
    | '_ \| '__/ _ \ \/ / '_ ` _ \ / _ \ \/ / \ \ / / _ \
    | |_) | | | (_) >  <| | | | | | (_) >  <   \ V /  __/
    | .__/|_|  \___/_/\_\_| |_| |_|\___/_/\_\   \_/ \___|
    | |
    |_|

EOF
cat >>/etc/issue <<EOF
    https://$ip:8006/
    https://$fqdn:8006/

EOF
ifup vmbr0
# iptables-save # show current rules.
killall agetty | true # force them to re-display the issue file.

# set up more storage
mkfs.ext4 /dev/vdb
mkdir /vmspace
echo /dev/vdb /vmspace ext4 defaults 0 0 >> /etc/fstab
mount /vmspace
# pvesh create /storage -storage vmspace -type dir -content rootdir,images -path /vmspace
 
# pvesh create /storage -storage vmspace -type dir -content rootdir,images -path /vmspace

So if this gets executed no GUI runs? What storage is located under /vmspace ? May it be that it is already external mounted (e.g. by fstab), you could use the "is_mountpoint" switch do disable a mount try from the PVE side if this is the case. It shouldn't really affect the GUI but it's the single thing I can imagine for now.

Can you get the status of the respective services:

Code:
systemctl list-units --failed
systemctl status pveproxy.service pvedaemon.service
 
# pvesh create /storage -storage vmspace -type dir -content rootdir,images -path /vmspace

if that gets executed after the proxmox vm is up, the gui works.
if it is run as part of the provision script, the gui will let you log in,
but node1 has the red circle with an x in it. in either case, you'll
get '200 OK'

seems like it somehow kills the cluster filesystem. adding a 5 second sleep
before calling "pvesh create storage" runs without killing pve-cluster.service.
maybe its a timing bug?

Code:
# systemctl list-units --failed
  UNIT                LOAD   ACTIVE SUB    DESCRIPTION                                   
● pve-cluster.service loaded failed failed The Proxmox VE cluster filesystem             
● smartd.service      loaded failed failed Self Monitoring and Reporting Technology (SMART
● zfs-mount.service   loaded failed failed Mount ZFS filesystems                         
● zfs-share.service   loaded failed failed ZFS file system shares                         

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

4 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

Code:
● pveproxy.service - PVE API Proxy Server
   Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2017-06-06 14:10:25 WEST; 13min ago
  Process: 1457 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
 Main PID: 1532 (pveproxy)
    Tasks: 4 (limit: 4915)
   CGroup: /system.slice/pveproxy.service
           ├─1532 pveproxy
           ├─1533 pveproxy worker
           ├─1534 pveproxy worker
           └─1535 pveproxy worker

Jun 06 14:13:48 node1 pveproxy[1534]: ipcc_send_rec failed: Connection refused
Jun 06 14:13:50 node1 pveproxy[1535]: ipcc_send_rec failed: Connection refused
Jun 06 14:13:50 node1 pveproxy[1535]: ipcc_send_rec failed: Connection refused
Jun 06 14:13:50 node1 pveproxy[1535]: ipcc_send_rec failed: Connection refused
Jun 06 14:13:53 node1 pveproxy[1534]: ipcc_send_rec failed: Connection refused
Jun 06 14:13:53 node1 pveproxy[1534]: ipcc_send_rec failed: Connection refused
Jun 06 14:13:53 node1 pveproxy[1534]: ipcc_send_rec failed: Connection refused
Jun 06 14:13:53 node1 pveproxy[1533]: ipcc_send_rec failed: Transport endpoint is not conn
Jun 06 14:13:53 node1 pveproxy[1533]: ipcc_send_rec failed: Connection refused
Jun 06 14:13:53 node1 pveproxy[1533]: ipcc_send_rec failed: Connection refused

● pvedaemon.service - PVE API Daemon
   Loaded: loaded (/lib/systemd/system/pvedaemon.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2017-06-06 14:10:24 WEST; 13min ago
  Process: 1280 ExecStart=/usr/bin/pvedaemon start (code=exited, status=0/SUCCESS)
 Main PID: 1453 (pvedaemon)
    Tasks: 4 (limit: 4915)
   CGroup: /system.slice/pvedaemon.service
           ├─1453 pvedaemon
           ├─1454 pvedaemon worker
           ├─1455 pvedaemon worker
           └─1456 pvedaemon worker

Jun 06 14:10:23 pve systemd[1]: Starting PVE API Daemon...
Jun 06 14:10:24 node1 pvedaemon[1453]: starting server
Jun 06 14:10:24 node1 pvedaemon[1453]: starting 3 worker(s)
Jun 06 14:10:24 node1 pvedaemon[1453]: worker 1454 started
Jun 06 14:10:24 node1 pvedaemon[1453]: worker 1455 started
Jun 06 14:10:24 node1 pvedaemon[1453]: worker 1456 started
Jun 06 14:10:24 node1 systemd[1]: Started PVE API Daemon.
Jun 06 14:11:11 node1 pvedaemon[1454]: <root@pam> successful auth for user 'root@pam'
 

Attachments

  • proxmox-gui-fail.png
    proxmox-gui-fail.png
    118.6 KB · Views: 10
what does "journalctl -u 'pve-cluster.service'" say in that state?
 
now i cant get it back in the failed state, which kinda worries me. i hope its because something was fixed.

Thanks for the fast responses! Ill try that and revisit this thread if it happens again.
 
If you have a PVE Node still setup where this happened you could look in the syslog around the command execution time, it would be interesting why pve-cluster failed.
 
wish i could. they were vagrant runs. long gone now. i tried re running vagrant in the conditions that caused the problems, but they no longer do. if it helps, today it all worked, yesterday was when i could reliably cause them.

did not update the proxmox image in that time, and none of the vagrant provisioning steps call for an update. even the versions of vagrant and the vagrant plugins are the same. so, i cant think of what else to try.

a week ago, bug with the same symptoms also triggered by building the vagrant box with a larger root disk, but i just tried that too, and no bug.
 
OK, seems a bit strange, but else it probably wouldn't be IT-related.
Thanks for your reproduction tries, appreciated them! Lets hope all stays good and else just post the log from from a problematic run here.
 
it happend again while trying to get ceph running. ill try to make the vagrant run reproducible, but heres requested output,

Code:
root@node1:/home/vagrant# systemctl list-units --failed
  UNIT                LOAD   ACTIVE SUB    DESCRIPTION                                   
● pve-cluster.service loaded failed failed The Proxmox VE cluster filesystem             
● smartd.service      loaded failed failed Self Monitoring and Reporting Technology (SMART
● zfs-mount.service   loaded failed failed Mount ZFS filesystems                         
● zfs-share.service   loaded failed failed ZFS file system shares                         

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

4 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.
root@node1:/home/vagrant# journalctl -u 'pve-cluster.service'
-- Logs begin at Wed 2017-06-07 15:39:44 WEST, end at Wed 2017-06-07 16:04:09 WEST. --
Jun 07 15:39:49 pve systemd[1]: Starting The Proxmox VE cluster filesystem...
Jun 07 15:39:50 pve systemd[1]: Started The Proxmox VE cluster filesystem.
Jun 07 16:01:46 node1 systemd[1]: pve-cluster.service: Main process exited, code=killed, s
Jun 07 16:01:46 node1 systemd[1]: pve-cluster.service: Unit entered failed state.
Jun 07 16:01:46 node1 systemd[1]: pve-cluster.service: Failed with result 'signal'.

to get there, i ran
Code:
pveceph install
pveceph init --network 10.12.13.0/24
vim /etc/ceph/ceph.conf (to add "osd crush chooseleaf type = 0" to globals)
pveceph createmon
pveceph createosd /dev/vdb
pveceph createosd /dev/vdc
pveceph createosd /dev/vdd

then tried to login the gui. let me login, then gave me the same page as the screenshot above.

the crush line is to allow a single node ceph cluster, because this vm is meant to test code that might interact with ceph on proxmox. it doesnt have to be redundant for real.
 
could you check the journal around 16:01:46? maybe the OOM-killer got triggered?
 
it happened again. this time im keeping the vagrant session running, so you can ask me to do more diagnostics.

after vagrant up, i ran "pvesh get /storage" and "ls /etc/pve" to make sure those worked. then tried to log in with the webui. it let me in, but with the node error, the red circle with x. then sshing back in, the cluster service was down. heres the logs and output of all these steps.

the output of "vagrant up" and the full journalctl when it happened were too long to post, so they're separate files.

Code:
[pixel@hivecluster pveceph]$ vagrant ssh
Linux node1 4.10.11-1-pve #1 SMP PVE 4.10.11-9 (Mon, 22 May 2017 09:59:55 +0200) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
vagrant@node1:~$ sudo -sH
root@node1:/home/vagrant# pvesh get /storage
200 OK
[
   {
      "content" : "images,rootdir",
      "digest" : "5c6c534c39ce2c609ce370dc9306e1dfd9f32891",
      "krbd" : 1,
      "monhost" : "10.12.13.21",
      "pool" : "rbd",
      "shared" : 1,
      "storage" : "rbd",
      "type" : "rbd",
      "username" : "admin"
   },
   {
      "content" : "vztmpl,iso,backup",
      "digest" : "5c6c534c39ce2c609ce370dc9306e1dfd9f32891",
      "path" : "/var/lib/vz",
      "storage" : "local",
      "type" : "dir"
   },
   {
      "content" : "images,rootdir",
      "digest" : "5c6c534c39ce2c609ce370dc9306e1dfd9f32891",
      "storage" : "local-lvm",
      "thinpool" : "data",
      "type" : "lvmthin",
      "vgname" : "pve"
   }
]
root@node1:/home/vagrant# exit
exit
vagrant@node1:~$ sudo -sH
root@node1:/home/vagrant# ls /etc/pve
authkey.pub    local  openvz        pve-www.key  user.cfg
ceph.conf    lxc    priv        qemu-server  vzdump.cron
datacenter.cfg    nodes  pve-root-ca.pem    storage.cfg

At this point, logged in with browser, got the red x, and ssh back in.

Code:
root@node1:/home/vagrant# ls /etc/pve
ls: cannot access '/etc/pve': Transport endpoint is not connected
root@node1:/home/vagrant# free -m
              total        used        free      shared  buff/cache   available
Mem:           3949         745        2463          29         740        2944
Swap:          4095           0        4095
root@node1:/home/vagrant# date
Sat Jun 10 07:41:20 WEST 2017
root@node1:/home/vagrant# journalctl -u pve-cluster.service
-- Logs begin at Sat 2017-06-10 07:35:42 WEST, end at Sat 2017-06-10 07:42:08 WE
Jun 10 07:35:47 pve systemd[1]: Starting The Proxmox VE cluster filesystem...
Jun 10 07:35:48 pve systemd[1]: Started The Proxmox VE cluster filesystem.
Jun 10 07:38:57 node1 systemd[1]: pve-cluster.service: Main process exited, code
Jun 10 07:38:57 node1 systemd[1]: pve-cluster.service: Unit entered failed state
Jun 10 07:38:57 node1 systemd[1]: pve-cluster.service: Failed with result 'signa

Code:
Jun 10 07:37:14 node1 sudo[7631]:  vagrant : TTY=pts/0 ; PWD=/home/vagrant ; USER=root ; COMMAND=/bin/bash
Jun 10 07:37:14 node1 sudo[7631]: pam_unix(sudo:session): session opened for user root by vagrant(uid=0)
Jun 10 07:37:35 node1 sudo[7631]: pam_unix(sudo:session): session closed for user root
Jun 10 07:37:38 node1 sudo[7708]:  vagrant : TTY=pts/0 ; PWD=/home/vagrant ; USER=root ; COMMAND=/bin/bash
Jun 10 07:37:38 node1 sudo[7708]: pam_unix(sudo:session): session opened for user root by vagrant(uid=0)
Jun 10 07:38:56 node1 pvedaemon[1358]: <root@pam> successful auth for user 'root@pam'
Jun 10 07:38:57 node1 kernel: server[1170]: segfault at 18 ip 000055f2e09baa55 sp 00007fb57d512a90 error 4 in pmxcfs[55f2e09ae000+2b000]
Jun 10 07:38:57 node1 systemd[1]: pve-cluster.service: Main process exited, code=killed, status=11/SEGV
Jun 10 07:38:57 node1 systemd[1]: pve-cluster.service: Unit entered failed state.
Jun 10 07:38:57 node1 systemd[1]: pve-cluster.service: Failed with result 'signal'.
Jun 10 07:38:58 node1 pvestatd[1306]: ipcc_send_rec failed: Transport endpoint is not connected
Jun 10 07:38:58 node1 pvestatd[1306]: ipcc_send_rec failed: Connection refused

Code:
root@node1:~# dpkg -l | grep -i prox
ii  libpve-access-control                5.0-4                          amd64        Proxmox VE access control library
ii  libpve-common-perl                   5.0-12                         all          Proxmox VE base library
ii  libpve-guest-common-perl             2.0-1                          all          Proxmox VE common guest-related modules
ii  libpve-http-server-perl              2.0-4                          all          Proxmox Asynchrounous HTTP Server Implementation
ii  libpve-storage-perl                  5.0-3                          all          Proxmox VE storage management library
ii  proxmox-ve                           5.0-9                          all          The Proxmox Virtual Environment
ii  pve-cluster                          5.0-7                          amd64        Cluster Infrastructure for Proxmox Virtual Environment
ii  pve-container                        2.0-6                          all          Proxmox VE Container management tool
ii  pve-docs                             5.0-1                          all          Proxmox VE Documentation
ii  pve-firewall                         3.0-1                          amd64        Proxmox VE Firewall
ii  pve-ha-manager                       2.0-1                          amd64        Proxmox VE HA Manager
ii  pve-kernel-4.10.11-1-pve             4.10.11-9                      amd64        The Proxmox PVE Kernel Image
ii  pve-manager                          5.0-10                         amd64        Proxmox Virtual Environment Management Tools

the proxmox vagrant box was completed Jun 7th 4:20am pacific time (los angeles time zone). as of now,

Code:
root@node1:/home/vagrant# apt update
Hit:1 http://security.debian.org stretch/updates InRelease
Hit:2 http://ftp.nl.debian.org/debian stretch InRelease                                                            
Hit:3 http://download.proxmox.com/debian/ceph-luminous stretch InRelease                
Hit:4 http://download.proxmox.com/debian stretch InRelease
Reading package lists... Done
Building dependency tree      
Reading state information... Done
12 packages can be upgraded. Run 'apt list --upgradable' to see them.
root@node1:/home/vagrant# apt list --upgradeable
Listing... Done
libcephfs1/testing 10.2.5-7.2 amd64 [upgradable from: 10.2.5-7]
libnss3/testing 2:3.26.2-1.1 amd64 [upgradable from: 2:3.26.2-1]
libpam-systemd/testing 232-25 amd64 [upgradable from: 232-24]
libssl1.0.2/testing 1.0.2l-2 amd64 [upgradable from: 1.0.2k-1]
libssl1.1/testing 1.1.0f-3 amd64 [upgradable from: 1.1.0e-2]
libsystemd0/testing 232-25 amd64 [upgradable from: 232-24]
libudev1/testing 232-25 amd64 [upgradable from: 232-24]
linux-libc-dev/testing 4.9.30-1 amd64 [upgradable from: 4.9.25-1]
openssl/testing 1.1.0f-3 amd64 [upgradable from: 1.1.0e-2]
systemd/testing 232-25 amd64 [upgradable from: 232-24]
systemd-sysv/testing 232-25 amd64 [upgradable from: 232-24]
udev/testing 232-25 amd64 [upgradable from: 232-24]
 

Attachments

  • vagrantup.log
    27 KB · Views: 3
  • jounrnalctl-when.log
    22.1 KB · Views: 0
Last edited:
I'll see if I can reproduce this here..
 
heres vagrant setup i have running. if you make the proxmox box, remember to checkout the pve-5 branch. this one didnt fail until trying to use the web-ui. then, the cluster service was killed and /etc/pve was gone.

here is the packer file so you dont have to scroll back to find it, https://github.com/rgl/proxmox-ve/tree/pve-5
note that its not the master branch.

i built it with a disk_size of 50G, not that i see how that could make a difference.

tried looking at tasks in /var/log/pve but only saw successful, completed tasks. is there a way to log all attempted tasks before they complete?
 

Attachments

  • pveceph.zip
    1.9 KB · Views: 1
so the problem seems to be that your provisioning script / vagrant changes the hostname, but does not correctly change the /etc/hosts file and does not restart the services after changing:

Code:
# ls -R1 /etc/pve/nodes/
/etc/pve/nodes/:
node1
pve

/etc/pve/nodes/node1:
lrm_status
lxc
qemu-server

/etc/pve/nodes/node1/lxc:

/etc/pve/nodes/node1/qemu-server:

/etc/pve/nodes/pve:
lrm_status
lxc
openvz
priv
pve-ssl.key
pve-ssl.pem
qemu-server

/etc/pve/nodes/pve/lxc:

/etc/pve/nodes/pve/openvz:

/etc/pve/nodes/pve/priv:

/etc/pve/nodes/pve/qemu-server:

the pmxcfs thinks it is running on the node "pve":
Code:
# readlink /etc/pve/local
nodes/pve

but when you access the GUI, the GUI makes a request for the task list, which gets passed to the cluster file system, which fails because the request makes it think it's clustered while it isn't:
Code:
Thread 3 "server" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f87592d9700 (LWP 1328)]
0x000055e8fdd6ba55 in cfs_create_status_msg (str=0x55e8ff9d2000,
    nodename=nodename@entry=0x7f87421f0274 "node1", key=key@entry=0x7f87421f0174 "tasklist")
    at status.c:1303
1303                    if ((clnode = g_hash_table_lookup(cfs_status.clinfo->nodes_byname, nodename)))
(gdb) print cfs_status
$1 = {start_time = 1497349921, quorate = 1, clinfo = 0x0, clinfo_version = 0, vmlist = 0x55e8ff9ac000,
  vmlist_version = 1, kvstore = 0x0, kvhash = 0x55e8ff98dde0, rrdhash = 0x55e8ff98de40,
  iphash = 0x55e8ff98dea0, memdb_changes = 0x55e8ff98df00, clusterlog = 0x55e8ff99bcf0}
(gdb) print cfs
$2 = {nodename = 0x55e8ff99c130 "pve", ip = 0x55e8ff99c550 "10.0.2.15", gid = 33, debug = 0}
(gdb)

the same behaviour can also be triggered directly with "pvesh get /cluster/tasks"

while this should be handled more gracefully in pmxcfs, the root cause is in the way the packer / vagrant scripts setup the hostname and pve-cluster. note that if you (e.g.) put the correct info into /etc/hosts and restart pve-cluster (before logging into the web interface), it seems to work (added to the end of provision.sh):
Code:
hn=$(hostname)
sed -i -e "s/^.*${hn}.*$//" /etc/hosts
sed -i -e "s/pve.example.com pve pvelocalhost$/${fqdn} ${hn} pvelocalhost/" /etc/hosts
systemctl restart pvedaemon pveproxy pve-cluster

now this does not result in a segfault, but returns the task list (note the first tasks with node "pve" as configured by the initial packer build, and the rest with "node1" when vagrant provisioning takes over):
Code:
vagrant@node1:~$ sudo pvesh get /cluster/tasks
200 OK
[
   {
      "endtime" : 1497357103,
      "id" : "vdd",
      "node" : "node1",
      "saved" : "1",
      "starttime" : "1497357094",
      "status" : "OK",
      "type" : "cephcreateosd",
      "upid" : "UPID:node1:000015A0:0000286F:593FDB26:cephcreateosd:vdd:root@pam:",
      "user" : "root@pam"
   },
   {
      "endtime" : 1497357094,
      "id" : "vdc",
      "node" : "node1",
      "saved" : "1",
      "starttime" : "1497357086",
      "status" : "OK",
      "type" : "cephcreateosd",
      "upid" : "UPID:node1:00001174:00002508:593FDB1E:cephcreateosd:vdc:root@pam:",
      "user" : "root@pam"
   },
   {
      "endtime" : 1497357085,
      "id" : "vdb",
      "node" : "node1",
      "saved" : "1",
      "starttime" : "1497357077",
      "status" : "OK",
      "type" : "cephcreateosd",
      "upid" : "UPID:node1:00000EDC:000021A2:593FDB15:cephcreateosd:vdb:root@pam:",
      "user" : "root@pam"
   },
   {
      "endtime" : 1497357077,
      "id" : "mon.0",
      "node" : "node1",
      "saved" : "1",
      "starttime" : "1497357075",
      "status" : "OK",
      "type" : "cephcreatemon",
      "upid" : "UPID:node1:00000E41:000020C3:593FDB13:cephcreatemon:mon.0:root@pam:",
      "user" : "root@pam"
   },
   {
      "endtime" : 1497357003,
      "id" : "",
      "node" : "node1",
      "saved" : "1",
      "starttime" : "1497357003",
      "status" : "OK",
      "type" : "startall",
      "upid" : "UPID:node1:000005D0:000004C4:593FDACB:startall::root@pam:",
      "user" : "root@pam"
   },
   {
      "endtime" : 1497340394,
      "id" : "",
      "node" : "pve",
      "saved" : "1",
      "starttime" : "1497340394",
      "status" : "OK",
      "type" : "stopall",
      "upid" : "UPID:pve:00000A62:00000CAD:593F99EA:stopall::root@pam:",
      "user" : "root@pam"
   },
   {
      "endtime" : 1497340368,
      "id" : "",
      "node" : "pve",
      "saved" : "1",
      "starttime" : "1497340368",
      "status" : "OK",
      "type" : "startall",
      "upid" : "UPID:pve:000003D5:00000276:593F99D0:startall::root@pam:",
      "user" : "root@pam"
   },
   {
      "endtime" : 1497340346,
      "id" : "",
      "node" : "pve",
      "saved" : "1",
      "starttime" : "1497340346",
      "status" : "OK",
      "type" : "stopall",
      "upid" : "UPID:pve:00005B49:000042DB:593F99BA:stopall::root@pam:",
      "user" : "root@pam"
   },
   {
      "endtime" : 1497340182,
      "id" : "",
      "node" : "pve",
      "saved" : "1",
      "starttime" : "1497340182",
      "status" : "OK",
      "type" : "startall",
      "upid" : "UPID:pve:000003BE:000002DA:593F9916:startall::root@pam:",
      "user" : "root@pam"
   }
]

haven't tested if vagrant reverts the hosts file though - leaving that up to you ;)
 
thanks!

i noticed in the case of this vagrant file, the hosts line reads:

10.12.13.21 node1 node1 pvelocalhost

could this cause any problems later on? the vagrantfile could be modified if so.
regarding vagrant recreating /etc/hosts, thats not a problem. the hostmanager plugin can be disabled.

but, there is still a problem. "ls /etc/pve" and "pvesh get /cluster/tasks" work, but sometimes the web gui gives the red node1 error. this is reliable when ceph.bash is commented out of the vagrantfile, but happened once out of the four times i tried with ceph.bash included.

another problem that was reliable with ceph.bash commented out was the browser reporting a new certificate using an old serial number. including a pic of this.

heres the updated vagrantfile
 

Attachments

  • pveceph.zip
    2.1 KB · Views: 0
  • ff-fail.png
    ff-fail.png
    71.1 KB · Views: 1
Last edited:
See https://pve.proxmox.com/wiki/Instal...d_an_.2Fetc.2Fhosts_entry_for_your_IP_address for how the host file should look like.

The red node usually means that some storage is either too slow or unavailable, you can check the log ("journalctl -b -u "pvestatd") for messages like "pvestatd[2529]: status update time (9.584 seconds)".

The certificate is because the initial CA certificate is generated when building the template and thus shared between all boxes provisoned using it, it should probably be deleted there so that the newly provisioned box gets a new one (same for the SSH server keys, both should be automatically generated on boot if not available).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!