Concept of multi threading in bash script

I have a bash script which execute a command over a large number of files in a folder.How can I include the effect of multithreading to this script so that the script runs faster?

1

2 Answers

With GNU Parallel you can do this:

parallel ./myscript.sh --option1 --inputfile {} --outputfile {}.out ::: files*

It will default to run one job per CPU thread in parallel.

To see what will be run use --dry-run:

parallel --dry-run ./myscript.sh --option1 --inputfile {} --outputfile {}.out ::: files*

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

For security reasons you should install GNU Parallel with your package manager, but if GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

$ (wget -O - || lynx -source || curl || \ fetch -o - ) > install.sh
$ sha1sum install.sh | grep 67bd7bc7dc20aff99eb8f1266574dadb
12345678 67bd7bc7 dc20aff9 9eb8f126 6574dadb
$ md5sum install.sh | grep b7a15cdbb07fb6e11b0338577bc1780f
b7a15cdb b07fb6e1 1b033857 7bc1780f
$ sha512sum install.sh | grep 186000b62b66969d7506ca4f885e0c80e02a22444
6f25960b d4b90cf6 ba5b76de c1acdf39 f3d24249 72930394 a4164351 93a7668d
21ff9839 6f920be5 186000b6 2b66969d 7506ca4f 885e0c80 e02a2244 40e8a43f
$ bash install.sh

For other installation options see

Learn more

See more examples:

Watch the intro videos:

Walk through the tutorial:

Read the book (at least chapter 2):

Sign up for the email list to get support:

I would use something like parallel from the moreutils package. It expects a single command and a list of arguments. By default it will feed in one argument per instance of command and will spin up as many parallel instances as you have CPU cores.

The important thing is that you have a workload that can be batched. Having a pile of files to do separate operations upon is a good example. But doing things where the output is synchronous (eg adding multiple files to a single zip) are more complicated.

A good example, adapted from the man parallel page is processing an unknown quantity of files through UFRAW, one per CPU.

parallel ufraw -o processed -- *.NEF
0

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

You Might Also Like