Zimbra 5.0.16 vanishing messages on Ubuntu 8.04 standard container with software RAID

Cavelle

New Member
May 29, 2009
8
0
1
Hi,

I am having bizarre problems with Zimbra on Proxmox. For about one minute, a user inbox will show that there are for example 12 unread messages. During this time, messages for a whole period of time (from about the Proxmox conversion time to about 48 hours ago) are completely missing in the system. Yet a minute or two later, suddenly there may be 150 unread messages - all the messages from the missing period of time. The messages disappear and reappear apparently completely on their own, regardless of whether the Ajax or standard client is used, and regardless of the browser type (Firefox or IE).

Occasionally, clicking on one message shows a different message in the reading pane. Also occasionally when selecting a message for deletion or viewing, an error message 'this conversation thread is unknown' appears. These messages appear to occur when messages are 'disappearing', because taking the same action when they re-appear does not create the error.

The reason I am posting these symptoms in the Proxmox forum is because they arose immediately after migrating the Zimbra server to Proxmox. It had been running without any problems for about 18 months prior to the conversion.

It therefore seems possible that the Proxmox configuration/conversion may be involved. Unfortunately, Zimbra itself was also upgraded from 5.0.5 to 5.0.16 (not at exactly the same time, but over the course of a couple of days as the machine was being moved, because 5.0.5 was not guaranteed compatible with Ubuntu 8.04 LTS). The second potentially confounding factor is software RAID.

As you will likely have seen in many other posts in the forum, the developers of the excellent Proxmox VE do not recommend using software RAID.

However, for those who like to live dangerously, or cannot use hardware RAID, instructions are also posted on how to install Proxmox VE using software RAID (install on top of Debian Lenny that already has software RAID set up).

So the configuration is Zimbra 5.0.16 (community edition) on top of the standard Ubuntu 8.04 container appliance, with the real hardware server running Lenny with Proxmox 1.0.3 on top of software RAID 10.

Note that initial testing of the configuration for about 48 hours following the changeover showed no problems at all. There are backups going back one week (much easier now with the VM snapshot), and backups from the changeover, so I'm not concerned about data actually being lost. The fact that the messages do sometimes correctly appear leads me to believe that there is not (yet?) some permanent file, directory, or disk corruption.

Unfortunately the only thing I can think of doing is checking the Zimbra related logs for clues, but I suspect that this may not be a problem that is related to only Zimbra. As a new user of Proxmox, the Ubuntu container, and the dreaded software RAID to boot, I'm not quite sure how to troubleshoot beyond there.

Any ideas are welcome - even if they include an 'I told you so' on the software RAID!

Thanks!
 
Re: Zimbra 5.0.16 vanishing messages on Ubuntu 8.04 standard container with software

how much RAM/SWAP do you assign to the container? see web interface.

and check for fail counts:

Code:
cat /proc/user_beancounters
 
Re: Zimbra 5.0.16 vanishing messages on Ubuntu 8.04 standard container with software

Hi,

Thanks for the quick response and the questions.

There is 3072MB RAM plus 3072MB swap currently assigned. Current usage reported is around 3880MB. The requested counts are below.

Is it worth running any of the mount checks on at either the server level or the VM container level?

There is definitely something very interesting happening, almost as though there are two different (file? cache?) systems. One system consists of everything older than date D, plus the following approximately 10 days to date E (about 3 days ago). When that system is 'active', those messages (and appointments) can be viewed, deleted, moved, etc. The other system consists of everything older than date D, plus everything from date E to current (i.e. newly arriving mail is appearing in this 'second' system).

However, while both systems have a common data root before D, changes in one system on items older than D do not (always?) affect the other system. For example, emptying the spam folder on system 'one' did not empty the spam folder on system 'two'. The two systems are still switching back and forth in a range of 30 seconds to a couple of minutes, with the switch triggered by something I have not yet identified.

Zimbra itself seems to stop and restart cleanly. We tried rebooting the VM, and even the whole server, but the behaviour remains the same. The percentage of disk space used is very low on all server partitions (less than 5%).

It's not clear yet whether this is affecting all users, but I suspect I will hear about that over the coming hours!

Thanks for any suggestions or trouble-shooting steps. There is nothing else critical on this server that cannot be stopped or reconfigured (and at the moment all other VMs on this server are stopped), so feel free to suggest even drastic steps.

I think the default plan is to configure another server with no software RAID, but as far as possible everything else identical including the VE-over-Lenny, copy the Zimbra store to there, and see what happens. (This will be just as a test environment, unless everything really goes very badly with the production server - since at this point it appears that nothing is actually being completely lost, we'll just continue making frequent backups every few hours.) However, it took several days for the original problem to show itself, and we don't have any theories on what started it, so it doesn't seem a very efficient approach.

Thanks again for any suggestions!

cat /proc/user_beancounters
Version: 2.5
uid resource held maxheld barrier limit failcnt
101: kmemsize 13608916 17983021 9223372036854775807 9223372036854775807 0
lockedpages 0 0 786432 786432 0
privvmpages 993146 1573331 1572864 1585364 513
shmpages 23 23 9223372036854775807 9223372036854775807 0
dummy 0 0 0 0 0
numproc 134 204 1024 1024 0
physpages 237403 250440 0 9223372036854775807 0
vmguarpages 0 0 1572864 9223372036854775807 0
oomguarpages 237408 250445 1572864 9223372036854775807 0
numtcpsock 53 74 9223372036854775807 9223372036854775807 0
numflock 158 180 9223372036854775807 9223372036854775807 0
numpty 1 2 255 255 0
numsiginfo 0 6 1024 1024 0
tcpsndbuf 1063168 1410816 9223372036854775807 9223372036854775807 0
tcprcvbuf 1063424 1341952 9223372036854775807 9223372036854775807 0
othersockbuf 176128 369664 9223372036854775807 9223372036854775807 0
dgramrcvbuf 0 8448 9223372036854775807 9223372036854775807 0
numothersock 119 225 9223372036854775807 9223372036854775807 0
dcachesize 991849 1135471 9223372036854775807 9223372036854775807 0
numfile 7233 8894 9223372036854775807 9223372036854775807 0
dummy 0 0 0 0 0
dummy 0 0 0 0 0
dummy 0 0 0 0 0
numiptent 10 10 9223372036854775807 9223372036854775807 0
0: kmemsize 11760276 13808690 9223372036854775807 9223372036854775807 0
lockedpages 0 8 9223372036854775807 9223372036854775807 0
privvmpages 38402 102700 9223372036854775807 9223372036854775807 0
shmpages 825 841 9223372036854775807 9223372036854775807 0
dummy 0 0 9223372036854775807 9223372036854775807 0
numproc 125 159 9223372036854775807 9223372036854775807 0
physpages 28835 37151 9223372036854775807 9223372036854775807 0
vmguarpages 0 0 9223372036854775807 9223372036854775807 0
oomguarpages 28867 37183 9223372036854775807 9223372036854775807 0
numtcpsock 11 17 9223372036854775807 9223372036854775807 0
numflock 5 10 9223372036854775807 9223372036854775807 0
numpty 1 1 9223372036854775807 9223372036854775807 0
numsiginfo 0 4 9223372036854775807 9223372036854775807 0
tcpsndbuf 193536 254720 9223372036854775807 9223372036854775807 0
tcprcvbuf 180224 320768 9223372036854775807 9223372036854775807 0
othersockbuf 195072 357888 9223372036854775807 9223372036854775807 0
dgramrcvbuf 0 8448 9223372036854775807 9223372036854775807 0
numothersock 143 154 9223372036854775807 9223372036854775807 0
dcachesize 3268438 3297526 9223372036854775807 9223372036854775807 0
numfile 2480 2658 9223372036854775807 9223372036854775807 0
dummy 0 0 9223372036854775807 9223372036854775807 0
dummy 0 0 9223372036854775807 9223372036854775807 0
dummy 0 0 9223372036854775807 9223372036854775807 0
numiptent 10 10 9223372036854775807 9223372036854775807 0
 
Re: Zimbra 5.0.16 vanishing messages on Ubuntu 8.04 standard container with software

For me, this looks like an application error - maybe you can downgrade to the older version (which is known to work)?
 
Re: Zimbra 5.0.16 vanishing messages on Ubuntu 8.04 standard container with software

I had a similar problem with Zimbra about a year ago. (I did not use ProxMox at that time.)
I cloned a running system for testing purposes and moved it on a LAN network with private addressess. After that I checked that I had a SSH connection to the new test server. I changed the server name, and deleted users... and realised that it had also done this somehow on the live server. I don't know how, but both servers were somehow still connected. After that fiasco I started using different root passwords and always recreating any SSL certs.
 
Re: Zimbra 5.0.16 vanishing messages on Ubuntu 8.04 standard container with software

Thanks for the good observations. Certainly there were multiple servers running during the testing phase, and while I believe they are all gone from the environment I will check this. It does also seem unlikely that there is a generalized problem somewhere in VE that would account for this, because there do not seem to be other reports of similar problems with other applications on VE, and because the impact on the application seems much too exact (leaving everything file-based running and functional) to be just random issues with mirroring or something.

However, as I have not been able to isolate the problem, I am preparing to move to another server (virtual or otherwise). Unfortunately downgrading the application does not necessarily seem straightforward, and I don't want to go any further off the beaten track than I already am.

In considering what kind of server to migrate Zimbra to, I came across the following post at Zimbra from a Zimbra contractor (Pheonix):

"BTW, an openVZ container isn't a supported platform for Zimbra so be aware you may have problems getting it working and/or running in a stable manner. "

http://www.zimbra.com/forums/migration/30659-zimbra-migration-5-0-2-5-0-16-a.html

Maybe I'm reading too much between the lines, but I'm wondering if there is more to the warning than just the general corporate disclaimer. There are notes on this site that Zimbra has been known to run well on Proxmox, and the suggestion is that a container is used. Does anyone have a Zimbra 5.0.16 installation working on Proxmox 1.3? Has anyone seen anything unusual in that configuration?

It's maybe looking like the safest thing to do in the short term is to move Zimbra back onto its own dedicated hardware, or perhaps in order to stay on the Proxmox server, to try run Zimbra in (low performance...) fully virtualized mode. It possibly won't be very long before Zimbra 6 is fully released anyway (I think they're at Beta 2), and we do plan to move to 6 once it has been out for a bit, so I'm just looking for a temporary solution that lowers risk (and ideally still keeps Zimbra in the Proxmox picture).

Any thoughts or suggestions?

Thank you for all the input.
 
Re: Zimbra 5.0.16 vanishing messages on Ubuntu 8.04 standard container with software

Full kudos to SamTzu! The problem was the 'human' part of the human interface.

It would have taken a long time to ever think of going through all the checks again for a duplicate server if not for his suggestion.

I couldn't believe that when the PVE container was stopped, something was still responding to a ping to that address.

It turns out that the old mail server (at the same IP) had been configured to restart on AC power restore (rather than to last power state, as most newer BIOSs can be set for). It was switched off, but the power had been cycled and lo and behold it was quietly back on. They've been intermittently competing for incoming mail and serving requests for days. How embarrassing.

It would be nice to pretend it had just all never happened. But there's now the interesting question of trying to clean it up and get all the mail and appointments onto the real server. Hopefully utter stupidity will play a lesser role in that. Definitely room for improvement to our decommissioning process of old machines as we continue to migrate servers onto Proxmox.

Thank you to all for the timely and helpful suggestions.

And thanks to the Proxmox team for creating such an outstanding product - seems the only problem is that it is now usable by idiots.
 
Re: Zimbra 5.0.16 vanishing messages on Ubuntu 8.04 standard container with software

We have been running our 3 node Zimbra system on top of Proxmox 1.1 (OpenVZ) for couple of months now without a hickup.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!