Parallel disk I/O: Is it faster?

Being the developer of GNU Parallel I often meet the question: Is it faster to do disk I/O in parallel or sequentially?

The answer is a very clear and resounding: It depends.

In the simplest case where you have a single spinning disk (“one spindle”) it will normally be faster do sequential I/O. But see http://unix.stackexchange.com/questions/124527/speed-up-copying-1000000-small-files for an example that contradicts this.

If you instead have a RAID6 over 40 spindles, things are different: On such a system I got a 6x speedup by running 10 jobs in parallel. Fewer jobs or more jobs gave less speed up.

Network file systems are stored on physical servers, so the above applies to network file systems, too. On top of that the network can introduce a latency, so that it is slow to open a file, but when the file is open, you get the data fast. In this case you will often see a speed up, too. Distributed network file systems that are spread over several nodes will in most cases work faster if you can keep all nodes busy.

In the general case: If there is long latency (which is not caused by the parallelization), then parallelizing may be a good idea, because while one job is waiting due to the latency a a different job can be receiving its data.

With SSD, RAID, different network file systems, and caching there is really only one safe answer: Test with different parallelization and measure.

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s