Parallel curl with file input and output filename on Linux shell

I usually download files on parallel with curl and urls in a file like this:

cat links.txt | parallel --will-cite curl --connect-timeout 5 --max-time 10 --retry-max-time 40 --retry 5 --retry-delay 0 -s -f -O -C -

Where links.txt has one url per line.

Now I need to assign a custom filename to each file and I can't figure out how to add this second input to curl trough parallel. Among other things, I tried to add -o filename to each line of links.txt but it didn't work.

Regards,

4

1 Answer

According to man parallel, you can use some placeholders to aid you. For example, you could rewrite your code to:

parallel curl "${CURL_ARGS[@]}" -o '{#}'.curl_output '{}' :::: links.txt

where ${CURL_ARGS[@]} are all your arguments to curl and links.txt has one URL per line. This command you fetch the URLs in links.txt and will store the result in a file named after the number of the job than run curl (e.g., 10.curl_output). You would have to march job number with URLs.

Another approach would be to call curl with -o '{}'._curl_output. In this case, you would have to deal with special characters in URLs (/, for example).

Besides those, you could also split the columns on a links.txt which contains an URL and an output file per line. This would let you do

parallel --colsep " " curl "${CURL_ARGS[@]}" -o '{1}' '{2}' :::: links.txt

The --colsep will split the lines based on the delimiter provide as its argument (" " in this case).

1

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

You Might Also Like