I've been working with Scott on this issue for a couple of days now. I think we're making progress, but there's been a whole lot of
Quick-Silver: you asking if UNIX is rock solid? In short, yes.
This issue is a strange one, though. When the system was unresponsive before, it was because the system load would spike suddenly and cross the threshold that the apps were configured to respond with a "server busy" error. What was causing the system load spike is still unknown.
Currently, we've tuned MySQL for better memory use (smaller, actually) which has reduced the running system load pretty dramatically. I don't think anyone has seen a "server busy" error since yesterday afternoon, but the remaining issue is one of the connection not responding for a period of time. I have noticed that there are errors and dropped packets on the interface and they have been growing, so a new cable or network card would be the next logical step in this process.
TM: can you setup with the hosting company a time for them to replace the cable with a new one AND move it to a different port on the switch? They need to check the logs on the switch, too for errors.
All in all, I think the system performance is improving. The loading seems to run a bit faster unless it's just my optimistic thinking.
If we can just nail down this connectivity issue, then I think we'll be good for a while.