Quantcast
Channel: Microsoft Exchange
Viewing all articles
Browse latest Browse all 20055

Very strange Exchange 2010 database issue, need advice!

$
0
0

I have recently run into a very strange, and seemingly random issue on our exchange 2010 server that started (1st presented itself as a real issue) on Monday. I have spent the past 2 days searching high and low, and have only found partial posts on this from google-fu-ing my face off.. Nothing has been a solid eureka moment though, and im no closer to finding the root cause than i was on monday night. I am going to apologize for the long post to come below, but I feel like I need to try and give as much detail as I can. Since this is one of those kind of issues where the cause doesn’t have a bull’s-eye on it, I hoping to gain some insight and advice on how I should be troubleshooting this kind of issue. I will attach the event logs I found following the dismounts and restart on Monday, but none of these events/errors have returned since then...


History:
We upgraded from exchange 2003 to 2010 back in February. The transition went smoothly without any major issues. Since it is a transition, we needed to leave to 2003 box on for all outlook clients to redirect to the new server once they reconnected to the network after we migrated all mailboxes. We do have many remote workers, aso we planned for a month or 2 of coexistence to be safe. This also went off without any major issues that I couldn’t resolve in more than half an hour. So, being a one man shop here, we also at the same time had 40 new PCs coming in, and a new backup server to implement. I went ahead and got to work on those projects, all of which I finished in late May, first week in June.

I think it’s ironic as I was just planning to do the complete removal of exchange 2003 in a few weeks. I haven’t seen any issues while in co-existence with 2003/2010 at all, so I figured a few more weeks wouldn’t hurt anybody, as it’s been rock solid so far. Until last thursday..

Issue:
I first noticed something was 'different' last Thursday night. I use thurs nights to do server patching, reboots, PC moves/installs around the office, etc.. I ve been doing this same routine for years, and when you do something that long you tend to get 'used' to the way things flow during the process.. So, I’m moving along archiving logs, and restarting boxes.. I get to the old exch 2003 server, and as soon as I restarted it my outlook client promptly lost connection to exchange 2010 and gave the usual username/password dialogue. This never happened during reboots of that server previously since we migrated to 2010, but since I was midway thru rebooting all servers anyways, I didn’t really think much of it and kept on going.. As soon as I rebooted the exchange 2010 box, connectivity restored, all is good.. or so i thought..

Monday:
Was in an executive meeting with the CEO at around 430pm when phone calls started flooding the CEOs office saying, "Where's Dave?! Do you know the system is down?!" We both exclaimed, "NO?!" Hearing that, I ran back to my office to find that indeed outlook is refusing to connect to exchange 2010, for everyone.. Outlook connection status isn’t seeing anything at all. blank screen. Everything else was fine, internet was up, file & terminal services were up.. I logged into the EMC, saw both mail and pub folder stores were dismounted. Tried to remount it from EMC, fails.. Restarted the info store service, and sys attendant service.. not remounting.. Again, after I restarted the server... everything seemed to be fine.. and my week of headaches and lack of sleep began.


These are the kind of issues that freak me out. There is no rhyme or reason as to why the stores automatically dismounted themselves. ( or at least i havent put all the pieces together yet to clearly see the cause) I have found some interesting event logs about it, seemingly it started with a single VSS error at around 330pm monday, and some seem to hint at a hardware or "I/O problem" as the logs put it, but I’m not seeing anything regarding physical disk issues, and all lights are "green' in the IMM. no predictive failures, nothing. Some of the events mention possible JetDB corruption... and one says dirty shutdown. I haven’t been able to run eseutil /mh yet, as I’m unsure if I should do that on a live DB... to be safe, I’m assuming no. Anyone out there have any experience otherwise? I also see some mentioning DB size, and "...if its physical size minus the logical size exceed 1024GB.. the DB will dismount on a regular basis" My Db's are no where near 1TB is size.. and i have plenty of disk space left on this brand new server. (IBM x3650 M4)


So, my question is: Could this possibly be related to firmware? What else could be causing this? I will know by tonight without a doubt if this is a persistent, and patterned issue. From last Thursday to Monday at 430 is just under 96 hours apart, and I’ve been racking my brain all week trying to get to the bottom of it before the next '96 hour mark' comes to pass. I have seen a server in the past that we could literally predict when the next crash was going to happen, to the hour.. and, that was a definite firmware issue/bug. (but, on a much older IBM BladeCenter S) and again there were no warnings/events generated by the server before each crash... For some reason, i feel like this might happen again at any moment and its haunting me every minute of everyday! lol.. /facepalm


I’ve done a few exchange migrations/setups before, and have been able to manage and maintain them so far.. but I am definitely not a 'pro' when it comes to the down and dirty troubleshooting of not-so-apparent problems in Exchange. This is a first for me! Any help is kindly appreciated!

Im sure there will be questions, and will be trying to check this post as much as i can today.

I really appreciate any advice/guidance on what you guys think this could be, and how i should move ahead in determining what the heck happened!

Respectfully,

Dave


Viewing all articles
Browse latest Browse all 20055

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>