Excuses for not installing GNU Parallel

Over the time I have seen people, who could benefit from using GNU Parallel, give excuses why they should not use GNU Parallel. I believe most of the reasons are based in a lack of due diligence: To avoid doing a little work now, they end up doing a lot of work later.

Here are the most popular excuses I have met so far:

“GNU Parallel is not installed everywhere”

It is true that GNU Parallel is not installed everywhere. But it is designed to be extremely easy to install. We are literally talking a matter of seconds:

    (wget pi.dk/3 -qO - ||  curl pi.dk/3/) | bash

Or if you want full control over the process: http://git.savannah.gnu.org/cgit/parallel.git/tree/README

“I do not have root access”

GNU Parallel does not require root access to do a personal installation, so you do not need root access to install it. Just do:

    ./configure --prefix=$HOME && make && make install

or:

    (wget pi.dk/3 -qO - ||  curl pi.dk/3/) | bash

GNU Parallel can also be embedded in a shell script on a system that has GNU Parallel installed by doing this:

    parallel --embed > newscript

Edit the end of newscript and copy newscript to the other system.

“I am not allowed to install software”

If you are allowed to run your own scripts, you can run GNU Parallel just like you would any of your own scripts. All you need to do is to copy the file parallel and use it the same way you would your own scripts.

“Using GNU Parallel is overkill”

Overkill usually means that setting up the program and running it will take longer than just running the job in a different way. Setting up GNU Parallel can be done in 10 seconds, so the risk is quite limited.

“Keeping GNU Parallel up to date is too hard”

By installing even an old version you most likely will save time by using it. The basic functionality has remained the same for years.

You only need to upgrade if you need some of the newer functionality or if you are being hit by a bug that is fixed in newer versions. Most of the recent bugs are related to rare race conditions and will never affect most people.

Updating to the newest version is also a matter of seconds:

    (wget pi.dk/3 -qO - ||  curl pi.dk/3/) | bash

“I do not understand Perl”

Unless you are going to change GNU Parallel you do not need to know Perl to use the tool – just like you do not need to learn kernel programming to use Linux.

“GNU Parallel is too hard to compile and install”

This excuse is typically from users who never even tried to install it. GNU Parallel is written in Perl and is compatible with even very old versions of Perl. So even if your system is too limited to do the ‘./configure && make && make install’ then you should always be able to do the ‘cp parallel $HOME/bin’.

If wget, gpg, and bzip2 are installed this ought to work:

    (wget pi.dk/3 -qO - ||  curl pi.dk/3/) | bash

“My software should not depend on non-standard software”

Why not distribute GNU Parallel next to your software? The beauty of free software is that you are in fact allowed to do this – even if your software is non-free.

SM is an example of software co-distributing GNU Parallel https://github.com/sm/sm/blob/master/README.md

GNU Parallel can also be embedded in a shell script using:

    parallel --embed > newscript

Edit the end of newscript.

“GNU Parallel is not as well tested as xargs”

I have seen this argument from people who then did something like this:

    find mydir -print | grep some_stuff | tail | xargs -P 10 mycommand | grep other_stuff

So while xargs may be well tested they basically screwed themselves over by using xargs in a way that is not safe – by design.

The above has 2 major issues:

  • If file names in mydir contain space, ‘ or ” xargs will interpret these and mycommand will not be run on those files. And it is quite common for users (especially GUI users) to create files with space, ‘ or “.
  • The output from the parallel running mycommands is not guaranteed to not mix, so you risk having half a line from one instance of mycommand with the rest of the line from another instance.

These are both risks you avoid by using GNU Parallel. The scary part is that they most likely will not notice the problem on their test set, and the problem will therefore only be discovered after their script has been put into production.

There is also a minor issue:

  • If the file names in mydir contain \n (newline).

This will fail in GNU Parallel, too (unless using -0). But I have yet to see a file name with a newline in it that was created by normal users. This only happens with malicious users.

Dealing with the problems from xargs

Some are aware of the problems with xargs and then spend a lot of effort trying to fix them:

  • They quote ‘ ” and space (though sometimes they forget some of them).
  • They try to un-mix the output by wrapping the output with markers and then post process the output based on these markers, but they never manage to fix the half-line issue.

So while xargs is well tested, their fixups are not, thereby defeating the purpose of only using well tested software. All in all they spend much more effort on building their own fixup around xargs than it would take to simply install GNU Parallel.

– o –

Being the developer of GNU Parallel I am of course biased: I think everyone using the command line owe it to themselves and to their command line to have GNU Parallel in their toolbox.

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s