• Welcome to the Two Wheeled Texans community! Feel free to hang out and lurk as long as you like. However, we would like to encourage you to register so that you can join the community and use the numerous features on the site. After registering, don't forget to post up an introduction!

Server Busy Errors

I was thinking that there have been a some announced hardware or software upgrades to the system over the past several months. Sometimes just going back to an earlier ghost or drive image can help isolate problems related to changes in a network environment. I used to dread upgrades because almost invariably it meant incompatibility issues would need to be addressed.

Re-imaging a system means down time, but it's not like you are in a position of having to guarantee uptime - right? I visit other forums and TWT is the fastest as far as response time goes. Unless there are other factors involved in your decision to have your own server, the elimination of a few little glitches we've run into lately hardly seem worth the cost of buying a new server just for this small TWT database.

This is a clean and well run board, so much so that I don't mind going along with whatever it takes for you guys to work through these "server busy" problems. This too shall pass.;-)

just my $.02.
:sun:
 
I was thinking that there have been a some announced hardware or software upgrades to the system over the past several months. Sometimes just going back to an earlier ghost or drive image can help isolate problems related to changes in a network environment. I used to dread upgrades because almost invariably it meant incompatibility issues would need to be addressed.

Re-imaging a system means down time, but it's not like you are in a position of having to guarantee uptime - right? I visit other forums and TWT is the fastest as far as response time goes. Unless there are other factors involved in your decision to have your own server, the elimination of a few little glitches we've run into lately hardly seem worth the cost of buying a new server just for this small TWT database.

This is a clean and well run board, so much so that I don't mind going along with whatever it takes for you guys to work through these "server busy" problems. This too shall pass.;-)

Bill, a good point here, "I have noticed that there are errors and dropped packets on the interface and they have been growing, so a new cable or network card would be the next logical step in this process." In the old days when we made our own network cables this was common place, especially when apprentice ISSS's had made some of the cables. I've experienced slow response times when controllers, routers or modems were dieing.

I also noticed this morning I couldn't get in until the third try. My PC timed out and I got "connection failed" errors indicating the server was down.

Could there be problems between our host and there ISP (addressing issues) that could be the cause of our "busy server" or "failed connection" situation?

just my $.02.
:sun:
 
:tab QS, there have only be a few upgrades since we moved to this server back around March. The problems started though from day one on this server. The first thing we did was bump the RAM from 1GB to 2GB and that seemed to work up until recently. The other upgrade was when I upgraded the forum software to the latest version a few months back. That did not appear to have any affect on the server performance. What we had was just the occasional user getting a server busy message, similar to what was going on before that upgrade.

:tab Now recently, in the last month or so, the server performance has really gotten MUCH worse even though we have not changed anything. The outages have become more frequent and longer lasting. This is what prompted the kernel upgrade last Saturday. That made things worse and we have rolled back the upgrade to the previous version.

:tab Chris has been working at monitoring the system and looking for problems in config files. We've been trying different MySQL config settings. Those may be helping some, but we are still getting what I would describe the server freezing. Previously, people would get the server busy error, but that means the server was still at least putting out the website content with that message. Also, I was still able to log into the server remotely and restart MySQL to drop the load. What I see happening now is the server is just freezing and not responding at all. So if you type in twtex.com, it simply never loads anything and you get the browser error saying "page not responding". When this happens, I cannot even ping the machine. Also, the server load numbers don't always show a rising load prior to a freeze or after a freeze. This is what has me thinking there may be a hardware issue. It could be anything from a chip getting to hot and shutting itself down until it cools, network cards acting up, network cables going in/out, etc,...

:tab The data center guys are nice, but not real helpful when it comes to getting them to do stuff that they cannot bill for. So I really have to push to get hardware related stuff investigated. I agree that the cost of setting up a new server and moving the site is something to be avoided if at all possible. So we are going to keep :headbang: until we can figure something out. Then if all else fails, I will look at relocating the site.
 
I know this is reaching, but I have seen instances where after an thunder storm or an electrical outage a router would need to be brought down due to addressing problems and up again to re-initialize itself and this is all that was needed for the fix.
:sun:
 
So either we have a piece of hardware going bad on us, or there is something in the software that is off... I will probably take the site down again this coming Saturday at noon...

Just so you'll know, I tried to log-in about 20 minutes ago and got a "Server Does Not Exist" error from my browser. It cleared after about 5 minutes.

I think you may have a hardware problem. Check for bad/shorted cabling, especially if you're getting load spikes for no apparent reason.
 
Just so you'll know, I tried to log-in about 20 minutes ago and got a "Server Does Not Exist" error from my browser. It cleared after about 5 minutes.

I think you may have a hardware problem. Check for bad/shorted cabling, especially if you're getting load spikes for no apparent reason.

Well, the site may be going down anytime now. I asked them to replace the cable/network card. As soon as the other tech heads get back from lunch, the guy I talked with will be heading down to the cage to make the changes. If you are a praying person... Now would be a good time :pray:
 
Well, the site may be going down anytime now. I asked them to replace the cable/network card. As soon as the other tech heads get back from lunch, the guy I talked with will be heading down to the cage to make the changes. If you are a praying person... Now would be a good time :pray:

[PRAY]Lord, please bless this forum and guide the technicians in returning it to good working order. Thank you for TWT, and for the opportunities and relationships in facilitates. In Christ's holy name, Amen.[/PRAY]

:thumb:
 
random thoughts that have helped me in the past.

1. Always start with the basics. Cabling (internal and external). Ports on switch, NIC., clean power.
2. Change management. Always document every change made in some form of log. Document every hiccup reported. Sometimes you don't know you have a trend until you look back 4 weeks later and see change X made on date Y had effect Z, even though X shouldn't have resulted in Z.
3. Prayer. Isn't it amazing to think that God knows where every electron is and he knows exactly what line of code is bad or what component is hiccupping.
4. Systematic elimination. I'm sure you're all over this one.
 
With everything said, done and tried thats been listed and judging from the amount of money you appear to be spending per month for this host, I would venture to say that perhaps Sproketdata.com may not be serving your best financial interests. At least the same IP resolves back to Sproketdata.

We hosted through theplanet in dallas, and their facilities are nice, and significantly more affordable than what your paying for these guys. I ve been getting a significant amount of page cannot be displayed errors in the last few days.
 
Yeah... changing hosts may be in the works. We are just not quite at that point yet.
 
Theplanet has some nice deals for sure. However, I'd be paying a good bit more than I am paying now. Given the cost for dedicated hosting, I'd like to build/purchase my own machine instead of leasing. At their lease rates, I could pay for the machine pretty quick ;-) I may eventually go this route even if we figure out what is going on with the current machine just so that we have room for TWT to continue growing (hopefully without the growing pains :-P).
 
I think the spikes are the spam system sending out bursts of UK Lottery Win notices... :roll:

:mrgreen:
 
Theplanet has some nice deals for sure.

The Planet is where we were originally hosted, before I moved equipment down to Houston. They do good work, but like anything you get what you pay for. There's an advantage to leasing equipment from them ... they'll do your break/fix, but it all comes down to what you can afford.
 
I know you guys are considering all factors, but we had a server running open solaris that had problems. Rebuilt with CentOS and no hardware changes and the server performs perfectly. All the other apps and services were the same. It was easier than kernel debugging.
 
:tab Well, yesterday around 4:30pm or so, we got a different network card installed in the machine. I doubt it was new... Anyway, things seemed to run pretty well initially, but last night we started getting problems again :doh: So, we will keep at it...
 
Theplanet has some nice deals for sure. However, I'd be paying a good bit more than I am paying now. Given the cost for dedicated hosting, I'd like to build/purchase my own machine instead of leasing. At their lease rates, I could pay for the machine pretty quick ;-) I may eventually go this route even if we figure out what is going on with the current machine just so that we have room for TWT to continue growing (hopefully without the growing pains :-P).


Noted: We had 1/2 rack from them, and supplied out own equipment. Went from MCI/Worldcom facilities in Richardson over to ThePlanet after we got a much better deal.

No fault can be given for wanting your own equipment in there. If your vision & model is to continue to grow with a compound rate of members and or contributions, then owning your own equipment can get 'spensive pretty quickly. If you are paying this much for service for machines and bandwidth for a site the size of TWTEX, I simply couldnt imagine what Adv Rider costs per month. :eek2:

Good luck guys. I know youll get it worked out.
 
In the immortal words of Monty Python, "I'm getting better! No, you're not. You'll be stone dead in a moment!"

That's kinda been the experience on this so far. Where we've tuned things it appears that it has in fact made it run better. This load spike issue seems to be getting less frequent, but it's still there.

TM: does the provider do any denial of service detection? I wonder if the randomness of this is due to an outside flood of traffic in an effort to hack/crash the system?
 
...but last night we started getting problems again :doh: So, we will keep at it...

Yep, I had this last night:

notwtex.gif


Best of luck. Keep us posted.
 
Techs say the have been monitoring for DOS attacks and have seen nothing indicating that might be a problem.
 
Just got off the phone with techs again. They will be trying to build a new box for us from comparable hardware and then mirroring the site over to the new machine. Ideally, this will confirm/deny hardware issues... Will probably be Monday before they can get the new box up and running to make the switch.
 
Ah ha!

TM mentioned to the techs about an excessive number of *.bin files in the mysql directory to which he responded that he thought they were dump/crash files. That got me thinking and researching.

They are binary logs for updates to the db (kinda like transaction logs). According to TM, there are a BUNCH of them (up to 22GB worth) in this same directory. They are also used for DB replication which doesn't apply to us. I happened to catch the system when it spiked this morning at 09:30 with vmstat and I saw lots of I/O for that timeframe. TM also mentioned that the timestamps of the files matched many of the outage windows we've been seeing.

I've sent a PM to him with the following suggestions:
1. Using MySQL commands, trim the binary logs down to just one. This is a normal maintenance thing that needs to take place periodically but doesn't look like it has at all simply because we didn't know to do it. This could fix the situation and preserve the point-in-time recovery of the DB if it is struggling to keep track of a LARGE number of files. Not to mention the filesystem performance can be affected if the number of files gets to be excessive. I don't know the exact number of files that exist, but if it's been running for all this time without a purge, it could be huge.
2. Turn off binary logging altogether. This would remove the ability to do a point in time recovery-meaning if the database died we would only be able to recover back to the data of the last backup (which I think is being done nightly), but this could improve DB performance and stop this pesky issue.

All of the other tuning seems to have really improved the site at all the other times.
 
Update:

Well...:headbang: we're getting there...

Think of this like an old analog dial radio. You turn the dial one way, then back the other to find that sweet spot that gives you the best performance. The last round of tweaks from overnight had the system absolutely hammered. So...turned things back today and things are closer to normal. TM and I are working on this pretty solid, so if you get a timeout or server busy error here and there-please be patient.
 
Back
Top