Speed up resizes - Part 1

This guide will help you shorten slice resize times, slice moves, and slice backup times. If your slice's applications match any of the characteristics highlighted in this how-to, then read on — this article will probably make resize times shorter and more predictable for you. We'll show you how cut resize times by taking preventative action before resizing, whether you are resizing up or resizing down.

And, as we like to give value for money, these tips will also speed things up for you if we ever have to move your slice or migrate it from failing hardware. As yet another benefit, backup times will be reduced.


Resize, backup and clone times are generally determined by the volume of used disk space on a slice. That is, the more data to copy, the longer the job takes. Typically an empty OS resize or clone takes 10-15 minutes to complete. However, certain slice 'personalities' dramatically extend resize time. They are:

  • 1. Slices containing lots of small files, such as Ruby session files, cache files, mail server message files, or web server image thumbnails. These resizes take longer to complete than you might expect, but the reboot - and associated downtime - that completes the resize will usually be quite short.

  • 2. Slices containing many files that are being updated during resize. Typically these are MySQL MyISAM table files or web servers hosting multiple domains, each configured to log to separate files. The need to update these actively-written files after the resize's first-pass copy tends to extend the reboot downtime that completes the resize process.

  • 3. Slices containing one or more large files that are updated during resize. Examples are MySQL databases using InnoDB format, mail servers with large mail logs and webservers that log to single, large files. The need to update these actively-written files after the resize's first-pass copy greatly extends the reboot downtime that completes the resize process.

So, the secret to cutting resize and clone times is to manage data on your slice and to identify any applications that are writing to disk during the resize. Let's look at these factors more closely to see how we can mitigate each.

Beware! Small files overhead...

Although they don't take up much individual disk space, slices hosting many small files force the resize process to carry out many 'file-open, file-copy, file-check' processes. These occur during the first — or 'preparation' — stage of a resize, while the slice is still running. They don't affect the operation of the slice but they do extend the time required before the final 'reboot' phase of the resize, making it less predictable for you.

As an example, a one-gigabyte data file only requires one file-open, file-copy, file-check process. Contrast that with a one-gigabyte chunk of data spread across 10,000 individual files that requires 10,000 individual file-open, file-copy, and file-checks. That's a lot more system and network overhead.

Check if your applications are among those that are more likely to create many small files and see longer resize preparation times. They are:

  • * Web servers serving many small thumbnails or image files
  • * Caching servers that cache on disk with small files
  • * Email servers with large 'archives' of undeleted email
  • * Ruby/Rails servers - which tend to create lots of small session files and not delete them
  • * git repositories
  • * Custom application servers that create - but do not delete - session files for each visitor

While it may take some time to copy these small files, once they are copied the 'resize down-time' during reboot will usually be quite short.

To summarise, the 'small files problem' makes resize preparation time longer and therefore less predictable. And that makes the time of the final reboot phase harder to predict. However, the final reboot time - the 'resize downtime' - will usually remain quite short in these 'small files' cases.

How to mitigate

Check your slice applications against our list above. If your applications are on the list, then do what you can to prune unwanted small files before flicking the resize switch.

If you are running Ruby/Rails, assume the worst and search community forums for typical session and cache file locations, as well as how to identify which ones you can safely delete. Look into storing session date in MySQL with the command:

rake db:sessions:create

and truncating log files in log/ to zero bytes with the command:

rake log:clear

If you are running a caching server that caches to disk (as opposed to RAM), identify its file-storage directory and prune with vigour.

Check your filesystem for small session and cache files created by custom applications. Again, prune with vigour.

If resizing an email server that has a Mail Delivery Agent (MDA) such as Dovecot installed, have your email users clean their email archives of old cruft first.

Constantly-changing files

Files that change between the start of a resize and the final reboot stage of a resize have to be copied again from scratch during the reboot that completes a slice resize.

This certainly extends 'resize downtime' and therefore extends 'resize completion time'.

Database servers are the most frequent culprits that change large files of data between the start and end of a resize. These changes force the system to copy the entire database file again in the second 'update' part of the copy process that occurs in resizes. The slice is down during the second copy so time spent re-copying updated database files is time spent with the server offline.

Some combinations of database structure and type tend to exacerbate this kind of problem. For example, if you have a MySQL multi-table MyISAM database with many table files that are all updated within single SQL transactions, then many or all of the table files may need to be copied again during the second copy.

Given that database files can be many gigabytes in size, the implications of these updates for 'resize downtime' become clear. It also illustrates how difficult it is for the resize system to predict accurately how long a resize will take - after all, how can it predict what and how many SQL updates will occur between resize start and finish?

How to mitigate

If your database contains a lot of obsolete data it may pay to archive that data and then prune it from the live database before resizing. MySQL, for example, allows you to archive data using the mysqldump script, after which you can delete obsolete data in the live database. The large mysqldump output file containing the obsolete data will not extend 'resize downtime' because it is not being written to during resize.

Another option if you have applications writing to many files during resize is to set the application into read-only mode immediately before the resize. Databases can usually be set into temporary read-only mode. With other applications your mileage may vary.

You can also prevent writes to multiple files by turning applications off, but setting applications to read-only mode is usually the preferred option.

In the second part of this how-to we will look at the impact of slices with large constantly-updated files and how to mitigate their impact on resize times.

Lee

Want to comment?


(not made public)

(optional)

(use plain text or Markdown syntax)