I have a directory with 78 .tsv files. I want to find the average number of lines for all files in the directory.
I use tail -n +2 $i | wc -l; done > line_numbers.txt to get the number of lines, excluding the file header to output into a txt file. The txt file output will look like this:
0
10
2
12
14
10
7
13
10
25
14
13
14
...But I want to just print the average of the number of lines for the whole directory instead of calculating the average from the txt file output instead.
What is the best way to do this?
13 Answers
A simple solution, but won't work if there are linefeeds in file names of very many files too count:
files=$(ls *.tsv | wc -l)
lines=$(cat *.tsv | wc -l)
average=$(( (lines-files)/files ))A more robust solution that will handle strange file names and a large number of files:
names=(*.tsv)
files=${#names[@]}
lines=$(printf '%s\0' "${names[@]}" | xargs -0 cat | wc -l)
average=$(( (lines-files)/files )) Use awk:
awk 'END{FNUM=ARGC-1; print (NR-FNUM)/FNUM}' *.tsv bc -l <<< $(tail -q -n+2 *.csv | wc -l)/$(ls *.csv | wc -l)
# ~~ Data lines ~~~~ Count ~ Files ~ CountIf the filenames contain newlines, you need to use a different strategy to count the files. Populate an array with file names, then use parameter length expansion to get the number of elements in the array:
csv_files=(*.csv)
bc -l <<< $(tail -q -n+2 *.csv | wc -l)/${#csv_files[@]}