Solve Slice or Website 'Down' Issues Quickly

When your website is down or your slice is unreachable, you can run through a handful of routine tests to identify which of the most common causes could lie behind the problem.


Looking for a culprit

The first thing to check is whether the problem is somewhere on the Internet or if it lies on the slice itself. To do this you can work through the following tests.

Use Slice Manager's Diagnostics to check your slice and your local network connections

To do this, log into the Slice Manager at https://manage.slicehost.com/ , click on 'My Slices', then click on your slice's name. Click on the 'Diagnostics' link in the top right of the information page for your slice.

Check that the slice is 'running' (up), that the host is 'up', and 'host load' is below 1.0.

Make a note of the swap IO and root IO numbers. The swap IO number should be lower than 0.1 and the closer to zero the better.

While you are in the Diagnostics page, check if the slice IP address or any installed webserver URL is responding to connection attempts from elsewhere on the internet.

Along with showing you the status of your slice, the 'Diagnostics' link includes a tool to test ping connectivity to your slice from around the world. Another tool tests the connectivity of your own local Internet connection - also from around the world. Look for the links labelled:

Ping your slice

Ping your local connection

at the bottom of the Diagnostics page, as shown in the image below.

Slice Manager's Diagnostics page shows two built-in connectivity test

Use both tests to check connectivity and possibly narrow down the cause of the problem. Bear in mind as you do this that you may have blocked ping responses on your slice as a security measure. If you've made that change then a lack of response to pings would be normal.

Check if the problem is limited to your webserver software

If you think only your webserver is down - but not your slice - check it per Google's :

http://downforeveryoneorjustme.com

or enter the website URL directly using this format:

http://downforeveryoneorjustme.com/<website URL>

or use the slice's IP address:

http://downforeveryoneorjustme.com/<slice IP address>

While you are there, you can also check if you are able to connect to Slicehost's Slice Manager by entering:

http://downforeveryoneorjustme.com/manage.slicehost.com

Check if you can connect to the slice via ssh

Check if you can ssh to the slice using the ssh port you usually use. The default is 22 but many people change it to 30000 or some other less obvious port.

If the slice is contactable by ssh but no longer responding to web requests that suggests the problem is limited to the webserver. Knowing this, you can log on via ssh and start to troubleshoot the webserver software. A good starting point is with a review of the webserver log files, usually found in the /var/log/ directory.

The same applies to other unresponsive software, such as mail servers or ruby and mongrel.

If the above tests confirm that everything including the slice itself is unresponsive, you can move on to eliminating the most common causes on the slice itself. To do that, first login to Slice Manager at:

https://manage.slicehost.com/

and check the Slice Manager links for the problem slice in the following order:

• Diagnostics - if you haven't checked it as part of the first test in this how-to

• Status - check for any notes, such as a note saying the slice is 'down'

• Console - this allows non-ssh login to the slice for troubleshooting purposes. Slices in trouble often display helpful error messages to this Console terminal that are visible before you log in. These error messages disappear after login but can still be seen in the /var/log/messages file.

After following the links to bring up Console access you may see only the words: GET COLORS PASTE.

If you do, the issue is that you need to 'wake up' Xen's console, just like a normal keyboard, mouse and screen combination. To do so, position your mouse pointer approximately one inch or three centimetres beneath the words and left mouse-click a couple of times. Then hit your keyboard's 'Enter' key a couple of times.

Some users have reported that they achieve the same result by clicking the 'GET' button two or three times.

Whichever approach you try, it should wake the console. Note, however, that the console always has latency - slowness - that a direct ssh connection usually does not have. So you'll have to be patient with it.

Once you are in the slice via Console, check the available memory.

Check the RAM usage of the slice

After logging into the Console, check if the slice has enough memory. To do that, follow our how-to 'Memory management with free': http://articles.slicehost.com/2007/9/7/memory-management-with-free

Check the load on the slice

Check for resource bottlenecks. To do that, follow our how-to 'System monitoring with top': http://articles.slicehost.com/2007/9/7/system-monitoring-with-top

Check for intermittent load issues

If you suspect the slice is swapping intermittently set up a cron job to run a 'batch-mode top' script once a minute. Create a script in your user directory on the slice, naming it something obvious like this:


vim /home/lee/why_swap.sh

You can use nano or another editor in place of vim, of course.

Add the following to the script, replacing my name with your chosen user name:


#!/bin/bash
top -b -n 1 | head -n 10 >> /home/lee/why_swap.txt; echo ' ' >> /home/lee/why_swap.txt
exit 0

Make the script executable:


chmod u+x /home/lee/why_swap.sh

Then, using sudo, create an entry in /etc/cron.d to fire the script every minute:


sudo vim /etc/cron.d/why_swap

Put the following into the file, again changing my name to your chosen user name:


*/1 * * * * lee /home/lee/why_swap.sh

The script will collect load data for the slice every minute and log it to - in this case - /home/lee/why_swap.txt for later analysis.

Let that run through at least one swapping event and you'll probably have enough data to see what happened to push the slice into swap.

Review how to resolve load issues

Slicehost's Jared has written a comprehensive guide to identifying and resolving load issues at: http://superjared.com/entry/double-edged-swap/

If it is not yet resolved, visit Slicehost Support

If none of these tests have helped you identify the problem, feel free to drop by Support Chat at http://chat.slicehost.com/ to discuss what you found, or if you need further help.

Want to comment?


(not made public)

(optional)

(use plain text or Markdown syntax)