I have 500 folders containing many *_1.fastq.gz and *_2.fastq.gz files per folder.
I want to:
cat *_1.fastq.gz > Combined *_1.fastq.gz & cat *_2.fastq.gz > Combined_2.fastq.gzper folder.
How do I achieve this? I would like to learn some bash, I am comfortable with python
3 Answers
for i in {1..2} ; do cat *_$i.fastq.gz >>Combined_$i.fastq.gz ; doneThis probably won't work well, as concatenated gzip files ; I would imagine you would prefer to use zcat *_$i.fastq.gz >>Combined_$i.fastq & then gzip; or perhaps better, simply scrap the idea of cat & do this:
for i in {1..2} ; do tar -c *_$i.fastq.gz >Combined_$i.fastq.gz.tar; doneper folder - then to recurse the folders, simply enclose the line above in a further loop, and run this from the top-level folder:
for f in *; do pushd . ; cd $f for i in {1..2} ; do tar -c *_$i.fastq.gz >Combined_$i.fastq.gz.tar; done popd
doneSo here, the loop index $f picks every folder; saves your place; cd's to the folder; executes the loop; returns to original directory & loops again.
The pushd / popd are there to ensure an error doesn't have you roaming the filesystem! Not strictly neccesary, but useful to learn.
1This isn't a great example for learning bash, but the simplest way is:
zcat *_1.fastq.gz | gzip > Combined_1.fastq.gz &
zcat *_2.fastq.gz | gzip > Combined_2.fastq.gzUsing a loop:
for f in *_1.fastq.gz; do zcat "$f"
done | gzip > Combined_1.fastq.gzNotes
- you iterate over the results of a glob pattern -- do not be tempted to parse the output of
ls() - quote your
"$variables"when you want the value - you can redirect or pipe the output of an entire for or while loop.
You need something like
for d in */ do cd $d && cat *_1.fastq.gz > Combined_1.fastq.gz && cat *_2.fastq.gz > Combined_2.fastq.gz
done