Count lowest number of lines among list of files - bash

I want to print count for file with lowest number of lines among the list of files. Nothing is printed.
Here is the code
MINcount=$(for txtfile in /home/folder/*.txt;
do
LC=$(cat $txtfile | wc -l);
min=0
(($LC < min || min == 0)) && min=$LC
done)
echo $MINcount
Thanks

give this a try:
wc -l /home/folder/*.txt|sort -n|awk '{print $1;exit}'

You can use the following pipe:
wc -l /home/folder/*.txt | sort -n | head -n1 | cut -f1
Explanation
wc -l /home/folder/*.txt | sort -n will produce output like this:
50 file2
94 file1
144 total
wc prints the total in the first line, and then the lines per file, which get sorted in numerical order (sort -n). head -n1 will then select the first line from output, cut -f1 the first column from that line.

Related

append output of each iteration of a loop to the same in bash

I have 44 files (2 for each chromosome) divided in two types: .vcf and .filtered.vcf.
I would like to make a wc -l for each of them in a loop and append the output always to the same file. However, I would like to have 3 columns in this file: chr[1-22], wc -l of .vcf and wc -l of .filtered.vcf.
I've been trying to do independent wc -l for each file and paste together columnwise the 2 outputs for each of the chromosomes, but this is obviously not very efficient, because I'm generating a lot of unnecessary files. I'm trying this code for the 22 pairs of files:
wc -l file1.vcf | cut -f 1 > out1.vcf
wc -l file1.filtered.vcf | cut -f 1 > out1.filtered.vcf
paste -d "\t" out1.vcf out1.filtered.vcf
I would like to have just one output file containing three columns:
Chromosome VCFCount FilteredVCFCount
chr1 out1 out1.filtered
chr2 out2 out2.filtered
Any help will be appreciated, thank you very much in advance :)
printf "%s\n" *.filtered.vcf |
cut -d. -f1 |
sort |
xargs -n1 sh -c 'printf "%s\t%s\t%s\n" "$1" "$(wc -l <"${1}.vcf")" "$(wc -l <"${1}.filtered.vcf")"' --
Output newline separated list of files in the directory
Remove the extension with cut (probably something along xargs -i basename {} .filtered.vcf would be safer)
Sort it (for nice sorted output!) (probably something along sort -tr -k2 -n would sort numerically and would be even better).
xargs -n1 For each one file execute the script sh -c
printf "%s\t%s\t%s\n" - output with custom format string ...
"$1" - the filename and...
"(wc -l <"${1}.vcf")" - the count the lines in .vcf file and...
"$(wc -l <"${1}.filtered.vcf")" - the count of the lines in the .filtered.vcf
Example:
> touch chr{1..3}{,.filtered}.vcf
> echo > chr1.filtered.vcf ; echo > chr2.vcf ;
> printf "%s\n" *.filtered.vcf |
> cut -d. -f1 |
> sort |
> xargs -n1 sh -c 'printf "%s\t%s\t%s\n" "$1" "$(wc -l <"${1}.filtered.vcf")" "$(wc -l <"${1}.vcf")"' --
chr1 0 1
chr2 1 0
chr3 0 0
To have nice looking table with headers, use column:
> .... | column -N Chromosome,VCFCount,FilteredVCFCount -t -o ' '
Chromosome VCFCount FilteredVCFCount
chr1 0 1
chr2 1 0
chr3 0 0
Maybe try this.
for chr in chr*.vcf; do
base=${chr%.vcf}
awk -v base="$base" 'BEGIN { OFS="\t"
# Remove this to not have this pesky header line
print "Chromosome", "VCFCount", "FilteredVCFCount"
}
FNR==1 && n { p=n }
{ n=FNR }
END { print base, p, n }' "$chr" "$base.filtered.vcf"
done >counts.txt
The very simple Awk script just collects the highest line number for each file (so we basically reimplement wc -l) and prints the collected numbers in the desired format. FNR is the line number in the current input file; we simply save this, and copy the value to p to keep the saved value from the previous file in a separate variable when we switch to a new file (starting over at line number 1).
The shell parameter substitution ${variable%pattern} retrieves the value of variable with any suffix match on pattern removed. (There is also ${variable#pattern} to remove a prefix, and Bash has ## and %% to trim the longest pattern match instead of the shortest.)
If efficiency is important, you could probably refactor all of the script into a single Awk script, but this way, all the pieces are simple and hopefully understandable.

extract the total count line (wc -l) number in shell

I am trying to figure out how to extract the last total count number when I use "wc -l" on multiple files under a directory. For example:
currentDir$ wc -l *.fastq
216272 a.fastq
402748 b.fastq
4789028 c.fastq
13507076 d.fastq
5818620 e.fastq
24733744 total
I would only need to extract 24733744 from the above. I tried
wc -l *.fastq | tail -l
to get
24733744 total
but not sure what to do next. If I use "cut", the annoying thing is that there are multiple spaces before the number, and I will need to use this code for other folders too, and the number of spaces may differ.
Any advice is appreciated. Thank you very much!
For this particular problem, it's probably easier to do :
cat *.fastq | wc -l
This should work with any number of spaces:
wc -l *.fastq | tail -l | tr -s ' ' | cut -f 2 -d ' '
Example:
echo " 24733744 total" | tr -s ' ' | cut -f 2 -d ' '
24733744

Creating an associative array from grep results in bash

I am using the following loc in a bash script to return unique grep result strings and their counts:
hitStrings="$(eval 'find "$DIR" -type f -print0 | xargs -0 grep -roh "\w*$searchWord\w*"' | sort | uniq -c)"
For example, if I have a $searchWord of "you", I could get the following results:
5 Kougyou 2 Layout 10 layouts 2330 you 859 your 17 yourself
My questions are:
How to I created an associative array containing the strings that are returned as the keys, and their counts as the values?
How do I omit the initial searchWord and its count from the associative array above (so no you-859 when I search for "you")?
Thanks
you have too many unnecessary layers, you can achieve the same with
$ grep -roh "\w*$key\w*" | sort | uniq -c > counts
and
$ declare -A counts; while read -r v k; do counts[$k]=$v; done < counts
$ echo ${counts["you"]}
Note that depends on the usage, you may get away by the file itself. Again searching for "you" count from the file
$ awk -v key="you" '$2==key{print $1}' counts
if the same name confuses you change one of them, or remove the temp file altogether by substitution.
$ declare -A counts
$ while read -r v k; do counts[$k]=$v; done < <(grep -roh "\w*$key\w*" | sort | uniq -c)
or with evil eval you can do
$ eval declare -A counts=( $(grep -roh "\w*$key\w*" | sort | uniq -c | awk '{print "["$2"]="$1}') )
but why? The while loop is a perfectly fine solution.

Elegant way to check for equal values within an array or any given textfile

Hello i'm fairly new to scripting, and struggling with trying to test/check if 4 lines in a textfile are equal to eachother, and i cannot figure this one out since comparison examples are all with two variables. i've come up with this:
#!/bin/sh
#check if mxf videofiles are older than 10 minutes and parse them into tclist.txt
find . -amin +10 |sed "s/^..//" >tclist.txt
#grep timecode and cut : from the output of mxfprobe and place that into variable TC
for z in $(cat tclist.txt); do TC=$(mxfprobe -i "$z" 2>&1 |grep timecode|sed "s/[^0-9]*//"|sed "s/://"|sed "s/://"|sed "s/://")
echo $TC >>offsetcheck.txt
done;
The output of offsetcheck.txt then looks like this:
10194013
10194013
10194014
10194014
How can i test if those 4 values are equal to eachother? (in this example two files are drifted one frame)
I've tried to place those values into an array and check them for uniqueness...
exec 10<&0
exec < offsetcheck.txt
let count=0
while read LINE; do
ARRAY[$count]=$LINE
((count++))
done
echo ${ARRAY[#]}
exec 0<&10 10<&-
if ($ARRAY !== array_unique($ARRAY))
{
echo There were duplicate values
}
... struggling with trying to test/check if 4 lines in a textfile are
equal to eachother
You could use sort and wc to determine the number of unique values in the file. The following would tell whether the file contains unique values or not:
(( $(sort -u offsetcheck.txt | wc -l) == 1 )) && echo "File contains unique values" || echo "File does not contain unique values"
If you wanted to do the same for an array, you could say:
for i in "${ARRAY[#]}"; do echo "$i" ; done | sort -u | wc -l
to get the number of unique values in the array.
If the values in the array are guaranteed not to have any space, then saying:
echo "${ARRAY[#]}" | tr ' ' '\n' | sort -u | wc -l
would suffice. (But note the if above.)
Looks to me like the whole process can be reduced to
n=$(
find . -amin +10 |
sed "s/^..//" |
xargs -I FILE mxfprobe -i "FILE" 2>&1 |
grep -h timecode |
sed 's/[^0-9]//g' |
sort -u |
wc -l
)
Then check if n == 1

Extracting minimum and maximum from line number grep

Currently, I have a command in a bash script that greps for a given string in a text file and prints the line numbers only using sed ...
grep -n "<string>" file.txt | sed -n 's/^\([0-9]*\).*/\1/p'
The grep could find multiple matches, and thus, print multiple line numbers. From this command's output, I would like to extract the minimum and maximum values, and assign those to respective bash variables. How could I best modify my existing command or add new commands to accomplish this? If using awk or sed will be necessary, I have a preference of using sed. Thanks!
You can get the minimum and maximum with this:
grep -n "<string>" input | sed -n -e 's/^\([0-9]*\).*/\1/' -e '1p;$p'
You can also read them into an array:
F=($(grep -n "<string>" input | sed -n -e 's/^\([0-9]*\).*/\1/' -e '1p;$p'))
echo ${F[0]} # min
echo ${F[1]} # max
grep -n "<string>" file.txt | sed -n -e '1s/^\([0-9]*\).*/\1/p' -e '$s/^\([0-9]*\).*/\1/p'
grep .... |awk -F: '!f{print $1;f=1} END{print $1}'
Here's how I'd do it, since grep -n 'pattern' file prints output in the format line number:line contents ...
minval=$(grep -n '<string>' input | cut -d':' -f1 | sort -n | head -1)
maxval=$(grep -n '<string>' input | cut -d':' -f1 | sort -n | tail -1)
the cut -d':' -f1 command splits the grep output around the colon and pulls out only the first field (the line numbers), sort -n sorts the numeric line numbers in ascending order (which they would already be in, but it's good practice to ensure it), then head -1 and tail -1 remove the first, and last value in the sorted list respectively, i.e. the minimum and maximum values and assign them to variables $minval and $maxval respectively.
Hope this helps!
Edit: Turns out you can't do it the way I had it originally, since echoing out a list of newline-separated values apparently concatenates them into one line.
It can be done with one process. Like this:
awk '/expression/{if(!n)print NR;n=NR} END {print n}' file.txt
Then You can assign to an array (as perreal suggested). Or You can modify this script and assign to varables using eval
eval $(awk '/expression/{if(!n)print "A="NR;n=NR} END {print "B="n}' file.txt)
echo $A
echo $B
Output (file.txt contains three lines of expression)
1
3

Resources