I have always thought that I have been relatively lucky when it comes to hardware. I have been using computers extensively for the last 20 years or so and I have never had a failure that has resulted in me losing all my data or having to completely replace a computer. Yet recently I am beginning to think that I am cursed and that while I have not had a failure that is decisive, that perhaps that is WORSE then what I am experiencing. Let me explain.
Currently I have 2 computers that I am using to host various things (I have others scattered around the world for redundancy but essentially the active services are run on these 2 boxen) and just when I get something setup and others start to use / rely on it something breaks. Now it is never anything major, an example might be my CPU. By "break" I mean that sometimes when doing some compiling I get an error from GCC indicating that the compilation failed due to a hardware error. At first I suspected my memory, but I have no reason to believe the RAM that has been fine for years should suddenly fail, whereas the cpu has been stressed quite severely over that period of time and some of the fans on the mainboard have failed (no not the cpu fans) and so I thought that it was simply an issue of overheating. After some cooling tests this doesn't seem to be the case as I can reproduce the gcc errors quite reliably after I have left the machine off for some time or have cooled it, whereas I can still compile other things equally CPU intensive with no problems. This doesn't sound like a major problem until you start to combine it with other things.
My main box connecting to the net is using a usb modem. I actually think that this is a good thing these days as using a seperate device, like say a adsl router, means that you are relying on a black box for you security at the net termination point which is a bad thing IMO. So while I am happy that I am connecting my net connection straight into my hardened server / router it causes a problem as the modem is usb. Currently the drivers for my modem require that usb be built as a module in the kernel which not only further increases security concerns but also means that there is a lot of loading and unloading the driver when it initially connects. Now it turns out that there are a few additional problems, firstly unix usb code is in general shit (not specifically linux) and that the code for my usb modem is not much better. So now we have a situation where we have an unreliable usb module with an unreliable driver being inserted into my kernel with some other unreliable issues like my cpu. The results are the usb section of my kernel often panics and the box locks ... this is the same box that is running all my active services. Now yes I _could_ run all my active services on my other box, my desktop we will call it. The issue with that ofcourse is that it is my desktop, which means that I use it as a testbed for all kinds of things (I am doing more and more gentoo development work and end up running such alpha quality stuff that I invariably kill it, additionally, while it is increasingly rare these days on the odd occassion I like to play some games so that means a reboot into windows and consequently downtime on all the services anyway. What this means is that I need to get a new box and migrate the important services off of the older one onto my current desktop and turn my current desktop into the new main server. Ofcourse this is complicated by the fact that I don't really NEED a new box and I particularly don't want to buy one right now when the computer industry is going through a hardware change the likes of which we have not seen since when the first pentiums were produced. I am referring to the change from 32 bit to 64 bit on the home computer, the change from atx to btx form factor and the change from pci to pci-x. I have said a lot of times that I will only buy once all of these things are readily available and hopefully the price premium has dropped a little :) So in the interim I just keep trying to keep my old server working and try to ignore the jibes from my friends about "reliability" and me being an enterprise admin ...
No comments:
Post a Comment