Wednesday, November 10, 2004

Uptime

It seems appropriate after my recent blog about downtime to have one about uptime, especially as the lack of it recently has seen my frustration levels rise. While there are an ever increasing amount of "admins" in IT, it seems that there are an ever decreasing amount of enterprise admins. By enterprise I mean admins that know how to look after large sites and ensure uptime. That doesn't mean just making sure that the boxes are patched and functioning, but also that the design of the network and services can scale, respond to short, sharp spikes in load and generally be reliable at all times. I mean admins that know what it's like to lose thousands of pounds per second when your service is down and know how to avoid it. I say this because of two experiences in the last two days.

The first experience there was with world of warcraft. This is a new game that is set to be the next BIG thing in MMORPGs. Now for some time, Blizzard, the company that makes WoW, have been saying that they will have an "open" beta, meaning that the general public will get a chance to test the game before Blizzard marks it as ready for general consumption and final release. This is a great opportunity for people who might be interested in the game and try it out, people like me :) Now the problem with this is that the client to connect to WoW is 2.5gb, and when you have thousands of testers wanting to play that is a LOT of bandwidth needed to give each user a copy. Blizzard being scared of trying to host something like that themselves took a different route. What they did was told fileplanet that if they hosted the signup for the open beta, they would give fileplanet exclusive rights over the beta by giving them a whole heap of keys (that is the ability to actually get INTO the open beta, because while it is "open" in reality there were limited amount of positions to be filled). Fileplanet accepted and promptly started offering deals to "subscribe to fileplanet" and at the same time get "free entrance into the WoW beta". A wonderful marketing opportunity for them and Blizzard doesn't have to host it. A win all round yes? well actually NO. The problem was quite simple, despite the fact that fileplanet had obviously sold Blizzard on the idea based on their ability to host something like this, they quite clearly didn't have the skills to do it. Let me explain.

Now to jump forward a bit it turns out that the idea of anyone trying to serve out 2.5gb of data to a huge amount of yearning gamers frightened everyone involved so they came up with an interesting way around the problem by making the client you used to download the game client a custom built bit torrent client. For those of you that don't know what bit torrent is I encourage you to go its site and check it out, but for the purposes of this rant its enough to say how it works is that instead of having to download the 2.5gb from one server run by a company, instead you end up grabbing the file from other people who are downloading it as well thus allieviating the load on the companies servers. Now bearing this in mind we see fileplanet only had to do a few things:
1. provide an interface to sign up to the open beta. This was in the form on ONE webpage form which took some details you put in and dumped them into a database.
2. send the new signee a client with which to download the game (the aforementioned bittorrent client).
Now I happened to be up when the open beta finally went live and so I immediately headed over to get my key and sign up. Imagine my surprise when the company that had sold this entire idea to blizzard because of the power and size of their site, had fallen into a complete heap. For a start the web page just stopped, I mean completely halted. Then the errors started. It turned out the entire fileplanet site is run on some stupid conglomeration of MS .NET technology that quite clearly was simply not up to the task of servicing that many requests. Over the new 30 minutes I saw errors from the .NET technology, errors from the webservers, errors from the database, errors from protocols (ie timeouts) and just about everything else that could break did. Whats more you could almost SEE their technicians running around randomly restarting things and by the time they had moved on to the next broken component the first one had died under the load because it wasn't able to talk to anything else ( any decent admin would take the entire site offline, do what is needed then bring it all back at once). It would have been comical if it wasn't for the fact that I was trying to get some information out of this mess. The site was still useless a solid 8 hours later when I got up (on a different note I was able to get my key but that was because I sat there spamming a query into their DB when I knew it finally came up for a few seconds :). The whole incident really killed any faith I had in even large sites doing things "right".

The next incident that fired me up was the release of Firefox version 1 yesterday. Now this is the long awaited release of what I personally think is the best browser around right now and so I was keen to get my hands on the new version. Much to my chargrin when I went to www.mozilla.org it had slowed to a crawl, it took over 3 minutes to render one page. I was quite shocked because I always tend to think better of my free software friends then those that run MS apps but it seemed that the problem was not specific to an architecture but rather, simply a lack of good admins (I group architects in the group as admins as a good admin will do both). Now to be fair to mozilla.org it is possible that the load they experienced was simply NOT feasible to plan for, and at least their site was still working, albeit incredibly slowly. It is also possible that they knew their site would not deal with the load and that their design was near perfect BUT that they didn't have the money to setup the right infrastructure to deal with it, but I don't think so. Fileplanet certainly doesn't have that excuse as they are a commercial organisation dedicated to doing events like the WoW openbeta.

Where are all the admins?

No comments:

Post a Comment