Excuses for not installing GNU Parallel

Over the time I have seen people, who could benefit from using GNU Parallel, give excuses why they should not use GNU Parallel. I believe most of the reasons are based in a lack of due diligence: To avoid doing a little work now, they end up doing a lot of work later.

Here are the most popular excuses I have met so far:

“GNU Parallel is not installed everywhere”

It is true that GNU Parallel is not installed everywhere. But it is designed to be extremely easy to install. We are literally talking a matter of seconds:

$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
       fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 3374ec53bacb199b245af2dda86df6c9
12345678 3374ec53 bacb199b 245af2dd a86df6c9
$ md5sum install.sh | grep 029a9ac06e8b5bc6052eac57b2c3c9ca
029a9ac0 6e8b5bc6 052eac57 b2c3c9ca
$ sha512sum install.sh | grep f517006d9897747bed8a4694b1acba1b
40f53af6 9e20dae5 713ba06c f517006d 9897747b ed8a4694 b1acba1b 1464beb4
60055629 3f2356f3 3e9c4e3c 76e3f3af a9db4b32 bd33322b 975696fc e6b23cfb
$ bash install.sh

Or if you want full control over the process: http://git.savannah.gnu.org/cgit/parallel.git/tree/README

“I do not have root access”

GNU Parallel does not require root access to do a personal installation, so you do not need root access to install it. Just do:

./configure --prefix=$HOME && make && make install

or:

$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
       fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 3374ec53bacb199b245af2dda86df6c9
12345678 3374ec53 bacb199b 245af2dd a86df6c9
$ md5sum install.sh | grep 029a9ac06e8b5bc6052eac57b2c3c9ca
029a9ac0 6e8b5bc6 052eac57 b2c3c9ca
$ sha512sum install.sh | grep f517006d9897747bed8a4694b1acba1b
40f53af6 9e20dae5 713ba06c f517006d 9897747b ed8a4694 b1acba1b 1464beb4
60055629 3f2356f3 3e9c4e3c 76e3f3af a9db4b32 bd33322b 975696fc e6b23cfb
$ bash install.sh

GNU Parallel can also be embedded in a shell script on a system that has GNU Parallel installed by doing this:

parallel --embed > newscript

Edit the end of newscript and copy newscript to the other system.

“I am not allowed to install software”

If you are allowed to run your own scripts, you can run GNU Parallel just like you would any of your own scripts. All you need to do is to copy the file parallel and use it the same way you would your own scripts.

“Using GNU Parallel is overkill”

Overkill usually means that setting up the program and running it will take longer than just running the job in a different way. Setting up GNU Parallel can be done in 10 seconds, so the risk is quite limited.

“Keeping GNU Parallel up to date is too hard”

By installing even an old version you most likely will save time by using it. The basic functionality has remained the same for years.

You only need to upgrade if you need some of the newer functionality or if you are being hit by a bug that is fixed in newer versions. Most of the recent bugs are related to rare race conditions and will never affect most people.

Updating to the newest version is also a matter of seconds:

$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
       fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 3374ec53bacb199b245af2dda86df6c9
12345678 3374ec53 bacb199b 245af2dd a86df6c9
$ md5sum install.sh | grep 029a9ac06e8b5bc6052eac57b2c3c9ca
029a9ac0 6e8b5bc6 052eac57 b2c3c9ca
$ sha512sum install.sh | grep f517006d9897747bed8a4694b1acba1b
40f53af6 9e20dae5 713ba06c f517006d 9897747b ed8a4694 b1acba1b 1464beb4
60055629 3f2356f3 3e9c4e3c 76e3f3af a9db4b32 bd33322b 975696fc e6b23cfb
$ bash install.sh

“I do not understand Perl”

Unless you are going to change GNU Parallel you do not need to know Perl to use the tool – just like you do not need to learn kernel programming to use Linux.

“GNU Parallel is too hard to compile and install”

This excuse is typically from users who never even tried to install it. GNU Parallel is written in Perl and is compatible with even very old versions of Perl. So even if your system is too limited to do the ‘./configure && make && make install’ then you should always be able to do the ‘cp parallel $HOME/bin’.

If wget, gpg, sha1sum, and bzip2 are installed this ought to work:

$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
       fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 3374ec53bacb199b245af2dda86df6c9
12345678 3374ec53 bacb199b 245af2dd a86df6c9
$ md5sum install.sh | grep 029a9ac06e8b5bc6052eac57b2c3c9ca
029a9ac0 6e8b5bc6 052eac57 b2c3c9ca
$ sha512sum install.sh | grep f517006d9897747bed8a4694b1acba1b
40f53af6 9e20dae5 713ba06c f517006d 9897747b ed8a4694 b1acba1b 1464beb4
60055629 3f2356f3 3e9c4e3c 76e3f3af a9db4b32 bd33322b 975696fc e6b23cfb
$ bash install.sh

“My software should not depend on non-standard software”

Why not distribute GNU Parallel next to your software? The beauty of free software is that you are in fact allowed to do this – even if your software is non-free.

SM is an example of software co-distributing GNU Parallel https://github.com/sm/sm/blob/master/README.md

GNU Parallel can also be embedded in a shell script using:

parallel --embed > newscript

Edit the end of newscript.

“GNU Parallel is not as well tested as xargs”

I have seen this argument from people who then did something like this:

find mydir -print |
  grep -f file_with_some_stuff |
  tail |
  xargs -n1 -P 10 mycommand |
  grep other_stuff

So while xargs may be well tested they basically screwed themselves over by using xargs in a way that is not safe – by design.

The above has 2 major issues:

  • If file names in mydir contain space, ‘ or ” xargs will interpret these and mycommand will not be run on those files. And it is quite common for users (especially GUI users) to create files with space, ‘ or “.
  • The output from the parallel running mycommands is not guaranteed to not mix, so you risk having half a line from one instance of mycommand with the rest of the line from another instance.

These are both risks you avoid by using GNU Parallel. The scary part is that they most likely will not notice the problem on their test set, and the problem will therefore only be discovered after their script has been put into production.

There is also a minor issue:

  • If the file names in mydir contain \n (newline).

This will fail in GNU Parallel, too (unless using -0). But I have yet to see a file name with a newline in it that was created by normal users. This only happens with malicious users.

Dealing with the problems from xargs

Some are aware of the problems with xargs and then spend a lot of effort trying to fix them:

  • They quote ‘ ” and space (though sometimes they forget some of them).
  • They try to un-mix the output by wrapping the output with markers and then post process the output based on these markers, but they never manage to fix the half-line issue.

So while xargs is well tested, their fixups are not, thereby defeating the purpose of only using well tested software. All in all they spend much more effort on building their own fixup around xargs than it would take to simply install GNU Parallel.

– o –

Being the developer of GNU Parallel I am of course biased: I think everyone using the command line owe it to themselves and to their command line to have GNU Parallel in their toolbox.

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s