rsync using regex to include only some files

I am trying to run rsync to copy some files recursively down a path based on their file name pattern, case insensitive. This is what I have done to run rsync:

$ rsync -avvz --include ='*/' --include='.*[Nn][Aa][Mm][E].*' --exclude='*' ./a/ ./b/

Nothing gets copied, the debug output shows:

[sender] hiding file 1Name.txt because of pattern *
[sender] hiding file 1.txt because of pattern *
[sender] hiding file 2.txt because of pattern *
[sender] hiding file Name1.txt because of pattern *
[sender] hiding directory test1 because of pattern *
[sender] hiding file NaMe.txt because of pattern *

I have tried using: --include='*[Nn][Aa][Mm][E]*'and other combinations but it still doesn't go.

Any ideas on how to use regex to include some files?

3

6 Answers

rsync doesn't speak regex. You can enlist find and grep, though it gets a little arcane. To find the target files:

find a/ |
grep -i 'name'

But they're all prefixed with "a/" - which makes sense, but what we want to end up with is a list of include patterns acceptable to rsync, and as the "a/" prefix doesn't work for rsync I'll remove it with cut:

find . |
grep -i 'name' |
cut -d / -f 2-

There's still a problem - we'll still miss files in subdirectories, because rsync doesn't search directories in the exclude list. I'm going to use awk to add the subdirectories of any matching files to the list of include patterns:

find a/ |
grep -i 'name' |
cut -d / -f 2- |
awk -F/ '{print; while(/\//) {sub("/[^/]*$", ""); print}}'

All that's left is to send the list to rsync - we can use the argument --include-from=- to provide a list of patterns to rsync on standard input. So, altogether:

find a/ |
grep -i 'name' |
cut -d / -f 2- |
awk -F/ '{print; while(/\//) {sub("/[^/]*$", ""); print}}' |
rsync -avvz --include-from=- --exclude='*' ./a/ ./b/

Note that the source directory 'a' is referred to via two different paths - "a/" and "./a/". This is subtle but important. To make things more consistent I'm going to make one final change, and always refer to the source directory as "./a/". However, this means the cut command has to change as there will be an extra "./" on the front of the results from find:

find ./a/ |
grep -i 'name' |
cut -d / -f 3- |
awk -F/ '{print; while(/\//) {sub("/[^/]*$", ""); print}}' |
rsync -avvz --include-from=- --exclude='*' ./a/ ./b/
6

I would suggest to use the filter option of rsync. For your example just type:

rsync -vam -f'+ *[Nn][Aa][Mm][E]*' -f'+ */' -f'- *' a b

the first filter rule tells rsync what patterns to include. The second rule is needed to tell rsync to inspect all directories on its traversal. To prevent empty dirs from inclusion they are excluded explicitly by -m option. The last filter rule tells rsync to dispose all remaining patterns that still didn't match so far.

2

If you use ZSH then you can use the (#i) flag to turn off case sensitivity. Example:

$ touch NAME
$ ls (#i)*name*
NAME

ZSH also supports exclusions, which are specified just like the regular path but they have an initial ~

$ touch aa ab ac
$ ls *~*c
aa ab

You can chain exclusions:

$ ls *~*c~*b
aa

Finally you can specify what kind of file you want returned (directory, file, etc). This is done with (/) for directory and (.) for file.

$ touch file
$ mkdir dir
$ ls *(.)
file

Based on all this, I would do that command as:

rsync -avvz *(/) (#i)*name* ./a/ ./b/

(I don't see a need for an exclusion with these selectors)

@sqweek's answer above is awesome, though I suspect he has a bug in his awk script for generating parent directories, as it gives me e.g.:

$ echo a/b/c/d | awk -F/ '{print; while(/\//) {sub("/[^/]*", ""); print}}'
a/b/c/d
a/c/d
a/d
a

I was able to fix it by using gensub instead:

$ echo a/b/c/d | awk -F/ '{print; while(/\//) { $0=gensub("(.*)/[^/]*", "\\1", "g"); print}}'
a/b/c/d
a/b/c
a/b
a

So, his full solution, with the awk bit changed, would be:

find ./a/ |
grep -i 'name' |
cut -d / -f 3- |
awk -F/ '{print; while(/\//) { $0=gensub("(.*)/[^/]*", "\\1", "g"); print}}' |
rsync -avvz --include-from=- --exclude='*' ./a/ ./b/
1

Tried with a C# script since is the language i have the most experience with. I am able to create the list of files that i want to include, but someone rsync is still tell me take a hike. It creates the folders, but it ignores the files. Here is what is what i got..

First the content of the directory:

~/mono$ ls -l
total 24
drwxr-xr-x 5 me me 4096 Jan 15 00:36 a
drwxr-xr-x 2 me me 4096 Jan 15 00:36 b
drwxr-xr-x 3 me me 4096 Jan 14 00:31 bin
-rw-r--r-- 1 me me 3566 Jan 15 00:31 test.cs
-rwxr-xr-x 1 me me 4096 Jan 15 00:31 test.exe
-rwxr--r-- 1 me me 114 Jan 14 22:40 test.sh

Then the output of the C# script:

~/mono$ mono test.exe
/a/myfile/myfileseries.pdf
/a/myfile2/testfile.pdf

And the debug output:

~/mono$ mono test.exe | rsync -avvvz --include='*/' --include-from=- --exclude='*' ./a/ ./b/
[client] add_rule(+ */)
[client] parse_filter_file(-,20,3)
[client] add_rule(+ /a/myfile/myfileseries.pdf)
[client] add_rule(+ /a/myfile2/testfile.pdf)
[client] add_rule(- *)
sending incremental file list
[sender] make_file(.,*,0)
[sender] hiding file 1Name.txt because of pattern *
[sender] showing directory myfile2 because of pattern */
[sender] make_file(myfile2,*,2)
[sender] hiding file 1.txt because of pattern *
[sender] hiding file 2.txt because of pattern *
[sender] hiding file Name1.txt because of pattern *
[sender] showing directory test1 because of pattern */
[sender] make_file(test1,*,2)
[sender] hiding file NaMe.txt because of pattern *
[sender] showing directory myfile because of pattern */
[sender] make_file(myfile,*,2)
send_file_list done
send_files starting
[sender] hiding file myfile/myfileseries.pdf because of pattern *
[sender] hiding file myfile2/testfile.pdf because of pattern *
[sender] hiding file test1/test.txt because of pattern *

[EDIT] This only works locally. For remote paths, the directory structure has to be created first.

More simple than the accepted answer; Use --file-from, which includes parent directories automatically and printf the file path with %P

find /tmp/source -wholename '*[Nn][Aa][Mm][E]*' -printf '%P\n' | rsync -vzrm --exclude='*/' --files-from=- /tmp/source/ /tmp/target/

So you only have to use find and rsync.

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

You Might Also Like