In bash, how to sort strings with numbers in them?

If I have these files in a directory

cwcch10.pdf
cwcch11.pdf
cwcch12.pdf
cwcch13.pdf
cwcch14.pdf
cwcch15.pdf
cwcch16.pdf
cwcch17.pdf
cwcch18.pdf
cwcch1.pdf
cwcch2.pdf
cwcch3.pdf
cwcch4.pdf
cwcch5.pdf
cwcch6.pdf
cwcch7.pdf
cwcch8.pdf
cwcch9.pdf

how can I list them in Bash so that they are in ascending numeric order based on the number part of the string. So the resulting order is cwcch1.pdf, cwcch2.pdf, ..., cwcch9.pdf, cwcch10.pdf, etc.

What I'm ultimately trying to do is concatenate the pdfs with pdftk with something like the following

pdftk `ls *.pdf | sort -n` cat output output.pdf

but that doesn't work as my sorting is wrong.

2

7 Answers

Something like this might do what you want, though it takes a slightly different approach:

pdftk $(for n in {1..18}; do echo cwcch$n.pdf; done) cat output output.pdf
1

Your sort may have the ability to do this for you:

sort --version-sort
2

For this particular example you could also do this:

ls *.pdf | sort -k2 -th -n

That is, sort numerically (-n) on the second field (-k2) using 'h' as the field separator (-th).

1

You can use the -v option in GNU ls: natural sort of (version) numbers within text.

ls -1v cwcch*

This does not work with BSD ls (e.g. on OS X), where the -v option has a different meaning.

1

Use shell expansion directly in a commandline. The expansion should order them properly. If I understand pdftk's commandline syntax properly, this will do what you want:

# shell expansion with square brackets
pdftk cwcch[1-9].pdf cwcch1[0-9].pdf cat output output.pdf
# shell expansion with curly braces
pdftk cwcch{{1..9},{10..18}}.pdf cat output output.pdf

Or you can try a different approach. When I need to do something like this, I usually try to get my numbers formatted properly ahead of time. If I'm coming into it late and the PDFs are already numbered like your example, I'll use this to renumber:

# rename is rename.pl aka prename -- perl rename script
# this adds a leading zero to single-digit numbers
rename 's/(\d)/0$1/' cwcch[1-9].pdf

Now the standard ls sorting will work properly.

2

Here's a method just using sort:

ls | sort -k1.6n

Sort -g is used to sort numbers in ascending order.

anthony@mtt3:~$ sort --help | egrep "\-g"
-g, --general-numeric-sort compare according to general numerical value


The following one liner iterates over a file with the names of the PDF files and grabs the numbers only with egrep -o and uses sort -g to sort the numbers in ascending order. Then it feeds these numbers to sed and plugs them in. Then rids the output of duplicates with uniq.


In place of uniq, you can also use awk:

awk '!x[$0]++'

The above is equivalent to uniq.


What you're looking for is this one liner:

for i in `cat tmp | egrep -o "[0-9]*" | sort -g`; do cat tmp | sed "s/\(^[a-z]*\)\([0-9]*\)\(\.pdf\)/\1$i\3/g" | uniq; done


Contents of tmp:

anthony@mtt3:~$ cat tmp
cwcch10.pdf
cwcch11.pdf
cwcch12.pdf
cwcch13.pdf
cwcch14.pdf
cwcch15.pdf
cwcch16.pdf
cwcch17.pdf
cwcch18.pdf
cwcch1.pdf
cwcch2.pdf
cwcch3.pdf
cwcch4.pdf
cwcch5.pdf
cwcch6.pdf
cwcch7.pdf
cwcch8.pdf
cwcch9.pdf 

EDIT:

Output of command:

anthony@mtt3:~$ for i in `cat tmp | egrep -o "[0-9]*" | sort -g`; do cat tmp | sed "s/\(^[a-z]*\)\([0-9]*\)\(\.pdf\)/\1$i\3/g" | uniq; done
cwcch1.pdf
cwcch2.pdf
cwcch3.pdf
cwcch4.pdf
cwcch5.pdf
cwcch6.pdf
cwcch7.pdf
cwcch8.pdf
cwcch9.pdf
cwcch10.pdf
cwcch11.pdf
cwcch12.pdf
cwcch13.pdf
cwcch14.pdf
cwcch15.pdf
cwcch16.pdf
cwcch17.pdf
cwcch18.pdf
2

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

You Might Also Like