First let me say I’m biting the hand that feeds me, I’m a long time A2 hosting customer for my personal things and those are tiny sites that do little to nothing.. I’ve been a member for years and I’ve referred a TON of clients.. I still recommend them for their standard hosting accounts since I’ve had only one issue and they fixed it by moving me to a new server because the one I was on was apparently over crowded.
At ANY rate… My company was looking to move to cloud hosting because we have corporate clients who demand high availability and we didn’t want to put them on our colocated server because there have been random issues usually once or twice a year where maybe a drive will fail in the raid which causes performance issues, or once when the datacenter lost power. So after my excellent history with A2 I decided to suggest their cloud server option.. the price was more than right and their support staff to me seemed competent ..
We signed up in December and started actually utilizing it in January or so.. We set up a dedicated server for a very critical client who sells medical supplies and then a second server for our shared hosting environment.
Things were dandy until about March or April and we started having connectivity problems.. the first issue they claimed was a rogue server congesting the network… the second issue was something they took a full business day to solve while we had a dozen critical sites offline.. their support wouldn’t really say anything other than they will get back to me when they know something more.. Eventually they got the system up but the drive was corrupt to the extreme for no apparent reason.. they literally GAVE up on it and said I should back up as much as I can and then deploy a new instance and start over, which was unacceptable to me.. now that the system was back up I FIXED IT MYSELF via the console… the next two days our other server went offline and I spent hours with the support staff.. so I decided to log into our shared server and see what files were modified by them to fix the other machines network issue..
Guess what? … they hard coded the IP address into /etc/network/interfaces and disabled DHCP … which instantly told me their DHCP server was down.. so I repeated that process on our other server and got it back up MYSELF … and then I warned A2 that their DHCP server was dead and that if they didn’t fix it they would be blasted by phone calls, oh and PS .. every call I made to A2 .. I got one person, except once or twice when someone else answered… which makes me think their support staff is VERY small.. it’s also VERY dis-concerning that their DHCP server was down and their support staff fixed me by hard-coding my IP rather than fixing the actual problem.. which then took down my second server a couple days later after the IP expired… they had no clue and it was simple .. SUPER simple to diagnose, any rookie admin should have spotted that instantly.
After those issues were resolved .. A2 sent a bulletin saying they would be doing a 4 hour maintenance and that all systems running would stay running, any systems down would be down for the duration.. my system was up… literally ON the hour that they were starting maintenance my server was halted with “Power Button Pressed” while I was in the middle of editing a file via SSH… after the maintenance, nobody notified me it was done.. I went in and “stopped” and “started” the system myself .. it came back up..
A few hours later… it halted again “Power Button Pressed”.. I contacted support and they insured me NOBODY over there did that.. and I know there’s no way I could have done it.. so it happened once or twice more randomly .. They eventually got me to a “guru” who suggested I disable ACPI even though I never heard of that happening, and he couldn’t provide links showing it had… nevertheless I disabled ACPI exactly as he said… and it happened several more times over the course of several days.. randomly, USUALLY during business hours.. sometimes late..
So our clients were VERY upset, and so were we.. I eventually set up a new VPS with another provider local to me where I used to work actually… I migrated all of our clients off the troubled server thinking maybe they are right, maybe there’s just a glitch in that installation.. but I wasn’t going to take a chance.. I left the dedicated server there and migrated everything off the shared server.. which running the exact same version of Debian without fault.. here’s the kicker though..
THE SERVER THAT NEVER HALTED OR HAD ISSUES… Suddenly started halting .. that’s right, our dedicated server started halting right after I cancelled our shared server.. So my only thought is one of two things after speaking with several of my fellow systems admins, and we’ve been in this business for a long time .. ( me since 1998 )..
1. Either something is seriously wrong with their config that causes other users who are “stopping” their VM instances to trigger a halt on mine… or ..
2. Someone at A2 is having fun at their users expense
So I tell you this story NOT to bash A2 .. but to warn you of my experience and tell you the story of what we went through.. it cost us several clients and if I hadn’t acted it would have cost us more… we experienced two full 8 hour day outages along with a 4 hour outage and then the miscellaneous network issues to add another couple of hours.. all within just a few months.. that is BEYOND unacceptable.. and the funny thing is, the main reason our CEO went with it was because one of the key features is ( hahaha ) .. “HIGH AVAILABILITY – Free”
You get what you pay for folks…. you’ve been warned.. their support is also VERY slow on the VPS cloud environment, they start you at level 1 which seems to know nothing beyond what is on your own control panel… they can verify you are down, they can click stop,start,reboot the same as you probably already tried.. but then they just push you to upper level which is VERY slow to reply.. It’s not like their standard hosting which lower level support seems better equipped for.. so again.. BE warned..
Oh and one last thing.. they told me they found logs that said the halts were related to us editing the disk size and adding an IP .. which we did do .. months earlier, so I asked them how actions taken months ago would cause randoml multiple halts now and they really ignored it for a while until they agreed that was probably a “red herring” ( their words ) … crazy..
Just so you know my background..
I started in 1998 working at DONet, I worked my way up to lead linux systems administrator and ran the department until I moved into primarily software development where I created the toolset they used ( and still use some of it to this day ) to run the business. I then moved on to Bitstorm in 2007 where I serve as the lead software engineer and systems administrator.
I started with computers when I was 13 and ran my first bulletin board system ( pre internet popularity )
Some of my high points ( or low depending on your perspective )
1. Helping reverse engineer the ICQ protocol ( I had a motive )
2. Developing a basic form of artificial intelligence ( my motive for #1 )
3. Developing a chat bot that talked with random people
4. Getting nearly sued by AOL for 1, 2 and 3 ..
5. Developed many automated systems for detecting and stopping hackers before they have a chance ( requires administration of the server ).. at least one of which I released as a Joomla plugin
PS. AOL took my website’s domain and shut it down… which I recently got back 😉 stay tuned on that..
Ps. Ps. … This image is my ticket history, notice the last issue at the top.. that was the random server halting which was THE WORST issue we had on the shared server, look how long it was open? it was only closed when we cancelled that service.. and forgive the typo, I’m too tired to correct it .. 🙂 optn = open