Dice Game – Hosting Failure

The Dice Game is hosted in an OpenVZ container on a server in Renton, WA. I should rephrase that. It *was* hosted in an OpenVZ container on a server in Renton, WA.

Last night the server hosting the Dice Game failed. At first there were some filesystem errors. Then the root filesystem was remounted in read-only mode. All this happened while I was pushing out some changes to the game.

I issued a reboot of the underlying server and waited for it to come back up. I heard nothing from it. Remote console access wasn’t working; nothing was showing up on the console. How long should I wait before finding a new home for the Dice Game? The server was powercycled and there was still no output on the console. I think it’s dead.

It just so happens that I have another OpenVZ container at a hosting provider in Seattle, but it’s running a rather old version of CentOS. It would have to do.

I removed the incumbent version of Ruby and downloaded, built and installed the latest stable release from scratch. I installed rubygems and all of the required gems to make the Dice Game work. I had to rebuild Apache from scratch to support mod_proxy and mod_proxy_reverse (so it could send requests to the mongrel processes). I configured Apache, Mongrel Cluster and Mysql. I restored the Dice Game database backup from my last hourly backup. I installed Git so I could push out my local repository to the new server. I created a new hostname in DNS and updated the Facebook application configuration to use the new hostname.

It took about two hours to get everything built and restored on the new system, which is less time than it would have taken to order and have a brand new dedicated server delivered. The Dice Game seemed a bit more snappy on the new system too.

I found out the next morning that the original server was not completely dead. I’m not going to move the game back to it though, unless the new system dies. If I have to do so, I know that the old system is ready to be used as a replacement.

Links:

The Filesystem Errors:


EXT3-fs warning (device sda2): ext3_rmdir: empty directory has nlink!=2 (-1)
EXT3-fs warning (device sda2): ext3_rmdir: empty directory has nlink!=2 (-2)
EXT3-fs warning (device sda2): empty_dir: bad directory (dir #17991246) - no `.' or `..'
EXT3-fs warning (device sda2): ext3_rmdir: empty directory has nlink!=2 (1)
EXT3-fs warning (device sda2): empty_dir: bad directory (dir #17991247) - no `.' or `..'
EXT3-fs warning (device sda2): ext3_rmdir: empty directory has nlink!=2 (8)
EXT3-fs unexpected failure: !buffer_revoked(bh);
inconsistent data on disk
ext3_forget: aborting transaction: IO failure in __ext3_journal_revoke
ext3_abort called.
EXT3-fs error (device sda2): ext3_forget: error -5 when attempting revoke
Remounting filesystem read-only
Aborting journal on device sda2.
EXT3-fs error (device sda2) in ext3_free_blocks_sb: Journal has aborted
EXT3-fs error (device sda2) in ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device sda2) in ext3_truncate: IO failure
EXT3-fs error (device sda2) in ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device sda2) in ext3_orphan_del: Journal has aborted
EXT3-fs error (device sda2) in ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device sda2) in ext3_delete_inode: IO failure

This entry was posted on Monday, February 9th, 2009 at 1:45 am and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Be the first to leave a comment.

Leave a Reply