I borked it..

J-dub

Member
Dec 20, 2023
34
6
8
Sooo..

I added another LXC on the host (NAS). Not an issue - until I decided to add the NFS share to the new LXC - that is hosted on another LXC.

In my infinite wisdom I decided it's going to be easiest to just add the share to the host (bare-metal) and add a new mp0 to the new LXC

I told the NAS (host) to reboot and went off to bed... (this is best foreshadowing I could think of).

I woke up to no host LAN connectivity (only the BMC interface works) and my KVM shows a super helpful all black screen. Looking at the IPMI SEL I see another epic hint of the issue "CPU_CATERR sensor of type processor logged a IERR". :rolleyes:

A forced reboot and selecting PVE recovery mode gets me past the all black screen and instead hangs out forever at:
CaptureScreen.jpeg

Yay.

Time for a live CD boot? Options/Comments/Commiserations?
 
I should add - I assume the issue is the host is now waiting for a LXC hosted NFS share (that won't be available until the host is fully booted...). I'd fix that, but I can't get the kernel to boot - so I can't get to the terminal. I don't "think" the Mellanox not starting (error in image) is the real issue.
 
and add a new mp0 to the new LXC
What's an "mp0"?

Time for a live CD boot?
Yeah, if you have one handy that should work. I'd personally start that up, mount the PVE filesystems then disable some of the remote connectivity pieces (nfs maybe?) such that you'll be able to reboot into a functional system and then investigate and undo whatever went wrong.

If a live CD boot isn't really a go-er, you can probably just restart the PVE install and adjust the grub boot line so it includes init=/bin/bash. That'll run bash directly instead of doing a full system boot, which can let you investigate things (after remounting the / filesystem read-write generally).

At least, that's an old trick to get into Linux systems when you've inadvertently locked yourself out somehow (eg forgot password). Haven't tried it on a ZFS based Proxmox system (yet) to see if it'll work there too.
 
  • Like
Reactions: Kingneutron
Update:

Using the GRUB menu "E" option, changing ro to rw and appending init=/bin/bash - very cool trick indeed!

I was able to start networking, use apt, install doom, etc. Good times!
I also found out I was missing all kinds of PVE necessary files (I think they got removed as part of a Champagne induced, Mellanox install frenzy I went on, trying to get a used NIC (that was ultimately missing to many caps) to work. I vaguely remember PVE saying something about a script keeping it from getting removed. It was all a blur with my "apt install everything-from-mellanox/unstable-sloppy-backports -y " command. I learned a valuable lesson today, on the power of "Yes".

So I decided to do a fresh install and try out this fun new trend, "ZFS as the base install" (Kudos to the Proxmox devs - that was super simple!). Now I have two SSD's to mirror for a read cache instead of OS duties! Oh the joys of learning the hard way - every step of the way. There was a way to avoid doing that.. something, something RTFM, something.. I can't remember.

Anywho thanks for the new-to-me-toy (init=/bin/bash) @justinclift - it's a good tool!

In fact, I don't think I need backups anymore, just GRUB and a Bash shell.. YOLO!!!
(gross - I just got Paris Hilton 2010 vibes "That's Hawt").

Heck - I might even try pineapple on my pizza!
 
  • Like
Reactions: justinclift
Oh and offsite backup for the old NAS files? A short 16 days estimated download time... yay me! :oops:
 
> offsite backup for the old NAS files? A short 16 days estimated download time

Yah, this is why I do NAS and multiple copies- cloud backup is next to useless unless you have gigabit fiber and no data caps
 
  • Like
Reactions: J-dub
> offsite backup for the old NAS files? A short 16 days estimated download time

Yah, this is why I do NAS and multiple copies- cloud backup is next to useless unless you have gigabit fiber and no data caps

This is the NAS - I also have a 3 node HCI/HA cluster, that backs up to this NAS. My no data-cap, gigabit, internet connection is used for the offsite backup and that is going to take 16 days.. it's the cheap cold storage provider in Germany that is slow. I might just make all new backups.. but I kinda wanted the originals and to test the "SHTF" scenario.
 
@J-dub Cool that you're experimenting with the ZFS side of things. :cool:

Is this a case of being new to ZFS completely, or just new to using ZFS for the boot drives?

Asking because if you're new to ZFS completely then you might not have yet discovered the power of snapshots, then backups based upon them.

They're a super efficient way of doing backups as it just transfers the blocks that changed between the start and end snapshots you give it (ie rpool/stuff@2024.06.10-snapshot1 -> rpool/stuff@2024.06.11-snapshot2).

Your remote backup provider needs to be able to accept them though. rsync.net can, and I've been meaning to try them out specifically for that purpose at some point. Maybe in a few weeks. :)
 
Last edited:
  • Like
Reactions: Kingneutron
@J-dub Cool that you're experimenting with the ZFS side of things. :cool:

Is this a case of being new to ZFS completely, or just new to using ZFS for the boot drives?

Asking because if you're new to ZFS completely then you might not have yet discovered the power of snapshots, then backups based upon them.

They're a super efficient way of doing backups as it just transfers the blocks that changed between the start and end snapshots you give it (ie rpool/stuff@2024.06.10-snapshot1 -> rpool/stuff@2024.06.11-snapshot2).

Your remote backup provider needs to be able to accept them though. rsync.net can, and I've been meaning to try them out specifically for that purpose at some point. Maybe in a few weeks. :)
I've not used Linux - 6mo ago and 2 months ago I decided to give this ZFS "thing" a shot. I initially had the OS (PVE) on a BTRFS mirror of Micron 7450 980GB drives and a SLOG (or Log as PVE calls it?) on a Intel Optane SSD U.2 DC P5800X 400GB.
Snapshots (incremental with deduplication even!) I've heard of before (Storagecraft (Windows) and BTRFS/Snapper (Arch desktop last month)) and I vaguely remember some post that mentioned ZFS snapshots - totally forgot about them until you mentioned them!

I now have the Optane partitioned for the "log" (SLOG/ZIL) and "cache" (L2ARC) and the mirrored Microns are for "special" (Metadata) and 32k small files... we'll see how useful any of that really is. It does transfer faster from my old NAS than it did with the original ZFS setup. I've not tested the 100G/bs Ceph network to it yet (still learning there too).

I'm using Hetzner right now because it's cheaper than anything else I've found.

I'm just nerd with new toys, ADHD and a Youtube subscription,so here I am, failing-until-success!
 
  • Like
Reactions: justinclift
Cool, sounds like you're experimenting the right way then. :)

I'm using Hetzner right now because it's cheaper than anything else I've found.

Yeah, Hetzners good for a bunch of stuff. We have some dedicated servers with them that we've been using for probably ~2 years now.

We're trialing a new approach now though, of putting servers we own (running Proxmox) in a local data center and having everything hosted there.

It looks like it'll be a lot more cost effective than even Hetzner, as Hetzner starts to get substantially pricey when you customize the specs and start getting into large memory/ssd configurations.
 
... and a SLOG (or Log as PVE calls it?) on a Intel Optane SSD U.2 DC P5800X 400GB.
That sounds like you've optimised it for handling syncronous writes. That's the workload you're using it for yeah? :)

Guessing you've read through this already?

https://arstechnica.com/information...01-understanding-zfs-storage-and-performance/

It's just a basic primer for ZFS, but it explains the building blocks well. It kind of sounds like you're well past that already, but I'm mentioning it just in case. :)
 
That sounds like you've optimised it for handling syncronous writes. That's the workload you're using it for yeah? :)

Guessing you've read through this already?

https://arstechnica.com/information...01-understanding-zfs-storage-and-performance/

It's just a basic primer for ZFS, but it explains the building blocks well. It kind of sounds like you're well past that already, but I'm mentioning it just in case. :)
Looks like a good read! I've dabbled a bit in many sources of info. Brain - Is - Numb o_O

I have indeed set it up for synchronous writes!

I'm now just trying to get network speed up as much as possible on the 8 HDD in ZFS RAID 10 (mirror vdevs). All for backups, NFS shares and the ability to host a VM from a backup image if needed. Should be 90% boring inline writes... unless things crash - then 90% reads lol
 
  • Like
Reactions: justinclift

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!