Excuses for not installing GNU Parallel

Over the time I have seen people, who could benefit from using GNU Parallel, give excuses why they should not use GNU Parallel. I believe most of the reasons are based in a lack of due diligence: To avoid doing a little work now, they end up doing a lot of work later.

Here are the most popular excuses I have met so far:

“GNU Parallel is not installed everywhere”

It is true that GNU Parallel is not installed everywhere. But it is designed to be extremely easy to install. We are literally talking a matter of seconds:

$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
  fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 67bd7bc7dc20aff99eb8f1266574dadb
12345678 67bd7bc7 dc20aff9 9eb8f126 6574dadb
$ md5sum install.sh | grep b7a15cdbb07fb6e11b0338577bc1780f
b7a15cdb b07fb6e1 1b033857 7bc1780f
$ sha512sum install.sh | grep 186000b62b66969d7506ca4f885e0c80e02a22
6f25960b d4b90cf6 ba5b76de c1acdf39 f3d24249 72930394 a4164351
93a7668d 21ff9839 6f920be5 186000b6 2b66969d 7506ca4f 885e0c80
e02a2244 40e8a43f
$ bash install.sh

Or if you want full control over the process: http://git.savannah.gnu.org/cgit/parallel.git/tree/README

“I do not have root access”

GNU Parallel does not require root access to do a personal installation, so you do not need root access to install it. Just do:

./configure --prefix=$HOME && make && make install

or:

$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
  fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 67bd7bc7dc20aff99eb8f1266574dadb
12345678 67bd7bc7 dc20aff9 9eb8f126 6574dadb
$ md5sum install.sh | grep b7a15cdbb07fb6e11b0338577bc1780f
b7a15cdb b07fb6e1 1b033857 7bc1780f
$ sha512sum install.sh | grep 186000b62b66969d7506ca4f885e0c80e02a22
6f25960b d4b90cf6 ba5b76de c1acdf39 f3d24249 72930394 a4164351
93a7668d 21ff9839 6f920be5 186000b6 2b66969d 7506ca4f 885e0c80
e02a2244 40e8a43f
$ bash install.sh

GNU Parallel can also be embedded in a shell script on a system that has GNU Parallel installed by doing this:

parallel --embed > newscript

Edit the end of newscript and copy newscript to the other system.

“I am not allowed to install software”

If you are allowed to run your own scripts, you can run GNU Parallel just like you would any of your own scripts. All you need to do is to copy the file parallel and use it the same way you would your own scripts.

“Using GNU Parallel is overkill”

Overkill usually means that setting up the program and running it will take longer than just running the job in a different way. Setting up GNU Parallel can be done in 10 seconds, so the risk is quite limited.

“Keeping GNU Parallel up to date is too hard”

By installing even an old version you most likely will save time by using it. The basic functionality has remained the same for years.

You only need to upgrade if you need some of the newer functionality or if you are being hit by a bug that is fixed in newer versions. Most of the recent bugs are related to rare race conditions and will never affect most people.

Updating to the newest version is also a matter of seconds:

$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
  fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 67bd7bc7dc20aff99eb8f1266574dadb
12345678 67bd7bc7 dc20aff9 9eb8f126 6574dadb
$ md5sum install.sh | grep b7a15cdbb07fb6e11b0338577bc1780f
b7a15cdb b07fb6e1 1b033857 7bc1780f
$ sha512sum install.sh | grep 186000b62b66969d7506ca4f885e0c80e02a22
6f25960b d4b90cf6 ba5b76de c1acdf39 f3d24249 72930394 a4164351
93a7668d 21ff9839 6f920be5 186000b6 2b66969d 7506ca4f 885e0c80
e02a2244 40e8a43f
$ bash install.sh

“I do not understand Perl”

Unless you are going to change GNU Parallel you do not need to know Perl to use the tool – just like you do not need to learn kernel programming to use Linux.

“GNU Parallel is too hard to compile and install”

This excuse is typically from users who never even tried to install it. GNU Parallel is written in Perl and is compatible with even very old versions of Perl. So even if your system is too limited to do the ‘./configure && make && make install’ then you should always be able to do the ‘cp parallel $HOME/bin’.

If wget, gpg, sha1sum, and bzip2 are installed this ought to work:

$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
  fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 67bd7bc7dc20aff99eb8f1266574dadb
12345678 67bd7bc7 dc20aff9 9eb8f126 6574dadb
$ md5sum install.sh | grep b7a15cdbb07fb6e11b0338577bc1780f
b7a15cdb b07fb6e1 1b033857 7bc1780f
$ sha512sum install.sh | grep 186000b62b66969d7506ca4f885e0c80e02a22
6f25960b d4b90cf6 ba5b76de c1acdf39 f3d24249 72930394 a4164351
93a7668d 21ff9839 6f920be5 186000b6 2b66969d 7506ca4f 885e0c80
e02a2244 40e8a43f
$ bash install.sh

“My software should not depend on non-standard software”

Why not distribute GNU Parallel next to your software? The beauty of free software is that you are in fact allowed to do this – even if your software is non-free.

SM is an example of software co-distributing GNU Parallel https://github.com/sm/sm/blob/master/README.md

GNU Parallel can also be embedded in a shell script using:

parallel --embed > newscript

Edit the end of newscript.

“GNU Parallel is not as well tested as xargs”

I have seen this argument from people who then did something like this:

find mydir -print |
  grep -f file_with_some_stuff |
  tail |
  xargs -n1 -P 10 mycommand |
  grep other_stuff

So while xargs may be well tested they basically screwed themselves over by using xargs in a way that is not safe – by design.

The above has 2 major issues:

  • If file names in mydir contain space, ‘ or ” xargs will interpret these and mycommand will not be run on those files. And it is quite common for users (especially GUI users) to create files with space, ‘ or “.
  • The output from the parallel running mycommands is not guaranteed to not mix, so you risk having half a line from one instance of mycommand with the rest of the line from another instance.

These are both risks you avoid by using GNU Parallel. The scary part is that they most likely will not notice the problem on their test set, and the problem will therefore only be discovered after their script has been put into production.

There is also a minor issue:

  • If the file names in mydir contain \n (newline).

This will fail in GNU Parallel, too (unless using -0). But I have yet to see a file name with a newline in it that was created by normal users. This only happens with malicious users.

Dealing with the problems from xargs

Some are aware of the problems with xargs and then spend a lot of effort trying to fix them:

  • They quote ‘ ” and space (though sometimes they forget some of them).
  • They try to un-mix the output by wrapping the output with markers and then post process the output based on these markers, but they never manage to fix the half-line issue.

So while xargs is well tested, their fixups are not, thereby defeating the purpose of only using well tested software. All in all they spend much more effort on building their own fixup around xargs than it would take to simply install GNU Parallel.

– o –

Being the developer of GNU Parallel I am of course biased: I think everyone using the command line owe it to themselves and to their command line to have GNU Parallel in their toolbox.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a comment