Separate special rows of text file - bash

I have a text file like this:
a 20 100
b 10 150
c 30 400
I want to separate rows where the value of column 2 is equal to or less than 20 and value of column 3 is equal or greater than 100. I tried this code:
s_t1=20
e_t1=100
s_t=(`awk '{printf "%15.10g", $2}' a.txt`)
e_t=(`awk '{printf "%15.10g", $3}' a.txt`)
numb=`more a.txt|wc|awk '{print $1}'`;
iii=0
while [[ $iii -lt $numb ]]; do
if [[ $s_t[$iii] -le $s_t1 ]] && [[ $e_t[$iii] -ge $e_t1 ]]; then
awk -v l=$iii 'FNR==l' a.txt >> out.txt
fi
iii=$(($iii+1))
done
but I have this error:
syntax error: invalid arithmetic operator (error token is "[0]"

You try to access an array element with $s_t[$iii] when it should be accessed with ${s_t[$iii]}. ${s_t[iii]} would actually work as well, as the subscript is treated as an arithmetic expression.
Once this is fixed, there is another bug in your script: array indices start at zero, but awk record numbers start at 1, so when you find a matching line at array index 0, you try to print line 0 in awk, but that doesn't exist, so your program only returns the first line instead of the first two.
To fix that, your line
awk -v l=$iii 'FNR==l' a.txt >> out.txt
has to be replaced by
awk -v l=$(( iii + 1 )) 'FNR==l' a.txt >> out.txt
What your program does is way, way too complicated. You can do what you want with
awk '$2<=20 && $3>=100 { print }' a.txt

Related

AWK, average columns of different length from multiple files

I need to calculate average from columns from multiple files but columns have different number of lines. I guess awk is best tool for this but anything from bash will be OK. Solution for 1 column per file is OK. If solution works for files with multiple columns, even better.
Example.
file_1:
10
20
30
40
50
file_2:
20
30
40
Expected result:
15
25
35
40
50
awk would be a tool to do it easily,
awk '{a[FNR]+=$0;n[FNR]++;next}END{for(i=1;i<=length(a);i++)print a[i]/n[i]}' file1 file2
And the method could also suit for multiple files.
Brief explanation,
FNR would be the input record number in the current input file.
Record the sum of the specific column in files into a[FNR]
Record the number of show up times for the specific column into n[FNR]
Print the average value for each column using print a[i]/n[i] in the for loop
I have prepared for you the following bash script for you,
I hope this helps you.
Let me know if you have any question.
#!/usr/bin/env bash
#check if the files provided as parameters exist
if [ ! -f $1 ] || [ ! -f $2 ]; then
echo "ERROR: file> $1 or file> $2 is missing"
exit 1;
fi
#save the length of both files in variables
file1_length=$(wc -l $1 | awk '{print $1}')
file2_length=$(wc -l $2 | awk '{print $1}')
#if file 1 is longer than file 2 appends n 0\t to the end of the file
#until both files are the same length
# you can improve the scrips by creating temp files instead of working directly on the input ones
if [ "$file1_length" -gt "$file2_length" ]; then
n_zero_to_append=$(( file1_length - file2_length ))
echo "append $n_zero_to_append zeros to file $2"
#append n zeros to the end of file
yes 0 | head -n "${n_zero_to_append}" >> $2
#combine both files and compute the average line by line
awk 'FNR==NR { a[FNR""] = $0; next } { print (a[FNR""]+$0)/2 }' $1 $2
#if file 2 is longer than file 1 do the inverse operation
# you can improve the scrips by creating temp files instead of working on the input ones
elif [ "$file2_length" -gt "$file1_length" ]; then
n_zero_to_append=$(( file2_length - file1_length ))
echo "append $n_zero_to_append zeros to file $1"
yes 0 | head -n "${n_zero_to_append}" >> $1
awk 'FNR==NR { a[FNR""] = $0; next } { print (a[FNR""]+$0)/2 }' $1 $2
#if files have the same size we do not need to append anything
#and we can directly compute the average line by line
else
echo "the files : $1 and $2 have the same size."
awk 'FNR==NR { a[FNR""] = $0; next } { print (a[FNR""]+$0)/2 }' $1 $2
fi

How to simplify the syntax needed for 10 >= x >= 1 in BASH conditionals?

I use this BASH script to check if file.txt has between 1 and 10 lines:
if [[ `wc -l file.txt | awk '{ print $1 }'` -le "10" && `wc -l file.txt | awk '{ print $1}'` -ge "2" ]]
echo "It has between 1 and 10 lines."
fi
This code is too verbose. If I make a change to one part, it is easy to forget to make a change to the repeated part.
Is there a way to simplify the syntax?
One option would be to do the whole thing using awk:
awk 'END{if(1<=NR&&NR<=10) print "It has between 1 and 10 lines."}' file.txt
As pointed out in the comments (thanks rici), you might want to prevent awk from processing the rest of your file once it has read 10 lines:
awk 'NR>10{exit}END{if(1<=NR&&NR<=10) print "It has between 1 and 10 lines."}' file.txt
The END block is still processed if exit is called, so it is still necessary to have both checks in the if.
Alternatively, you could store the result of wc -l to a variable in bash:
lines=$(wc -l < file.txt)
(( 1 <= lines && lines <= 10)) && echo "It has between 1 and 10 lines."
Note that redirecting the file into wc means that you just get the number without the filename.
Get the line count, then check it against the range bounds:
lc=$(wc -l < file.txt)
if (( 1 <= lc && lc <= 10 )); then
echo "It has between 1 and 10 lines"
fi

adding numbers without grep -c option

I have a txt file like
Peugeot:406:1999:Silver:1
Ford:Fiesta:1995:Red:2
Peugeot:206:2000:Black:1
Ford:Fiesta:1995:Red:2
I am looking for a command That counts the number of red Ford Fiesta cars.
The last number in each line is the amount of that particular car.
The command I am looking for CANNOT use the -c option of grep.
so this command should just output the number 4.
Any help would be welcome, thank you.
A simple bit of awk would do the trick:
awk -F: '$1=="Ford" && $4=="Red" { c+=$5 } END { print c }' file
Output:
4
Explanation:
The -F: switch means that the input field separator is a colon, so the car manufacturer is $1 (the 1st field), the model is $2, etc.
If the 1st field is "Ford" and the 4th field is "Red", then add the value of the 5th (last) field to the variable c. Once the whole file has been processed, print out the value of c.
For a native bash solution:
c=0
while IFS=":" read -ra col; do
[[ ${col[0]} == Ford ]] && [[ ${col[3]} == Red ]] && (( c += col[4] ))
done < file && echo $c
Effectively applies the same logic as the awk one above, without any additional dependencies.
Methods:
1.) use some scripting language for counting, like awk or perl and such. Awk solution already posted, here is an perl solution.
perl -F: -lane '$s+=$F[4] if m/Ford:.*:Red/}{print $s' < carfile
#or
perl -F: -lane '$s+=$F[4] if ($F[0]=~m/Ford/ && $F[3]=~/Red/)}{print $s' < carfile
both examples prints
4
2.) The second method is based on shell-pipelining
filter out the right rows
extract the column with the count
sum the numbers
e.g some examples:
grep 'Ford:.*:Red:' carfile | cut -d: -f5 | paste -sd+ | bc
the grep filter out the right rows
the cut get the last column
the paste creates an line like 2+2 what can be counted by
the bc for counting
Another example:
sed -n 's/\(Ford:.*:Red\):\(.*\)/\2/p' carfile | paste -sd+ | bc
the sed filter and extract
another example - different way of counting
(echo 0 ; sed -n 's/\(Ford:.*:Red\):\(.*\)/\2+/p' carfile ;echo p )| dc
numbers are counted by RPN calculator called dc, e.g. it works like 0 2 + - first comes the values and as the last the operation.
the first echo puts into the stack 0
the sed creates a stream of numbers like 2+ 2+
the last echo p prints the stack
exists many other possibilies how count a strem of numbers.
e.g counting by bash
while read -r num
do
sum=$(( $sum + $num ))
done < <(sed -n 's/\(Ford:.*:Red\):\(.*\)/\2/p' carfile)
and pure bash:
while IFS=: read -r maker model year color count
do
if [[ "$maker" == "Ford" && "$color" == "Red" ]]
then
(( sum += $count ))
fi
done < carfile
echo $sum

Finding text files with less than 2000 rows and deleting them

I have A LOT of text files, with just one column.
Some text file have 2000 lines (consisting of numbers), and some others have less than 2000 lines (also consisting only of numbers).
I want to delete all the textiles with less than 2000 lines in them.
EXTRA INFO
The files that have less than 2000 lines, are not empty they all have line breaks till row 2000. Plus my files have some complicated names like: Nameofpop_chr1_window1.txt
I tried using awk to first count the lines of my text file, but because there are line breaks for every file I get the same result, 2000 for every file.
awk 'END { print NR }' Nameofpop_chr1_window1.txt
Thanks in advance.
You can use this awk to count non-empty lines:
awk 'NF{i++} END { print i }' Nameofpop_chr1_window1.txt
OR this awk to count only those lines that have only numbers
awk '/^[[:digit:]]+$/ {i++} END { print i }' Nameofpop_chr1_window1.txt
To delete all files with less than 2000 lines with numbers use this awk:
for f in f*; do
[[ -n $(awk '/^[[:digit:]]+$/{i++} END {if (i<2000) print FILENAME}' "$f") ]] && rm "$f"
done
you can use expr $(cat filename|sort|uniq|wc -l) - 1 or cat filename|grep -v '^$'|wc -l it will give you the number of lines per file and based on that you decidewhat to do
You can use Bash:
for f in $files; do
n=0
while read line; do
[[ -n $line ]] && ((n++))
done < $f
[ $n -lt 2000 ] && rm $f
done

In Bash, how to extract a word and a following number from a file?

I've got a list which has many entries of two different formats:
Generated Request {some text} easy level group X
---or---
easy level group X {some text}
where X is a number between 1-6 digits long.
I'm trying to go through that file line by line and reduce down everything to just "group X" on each line (so that I can then compare it to another file).
I'll post my attempt below so you can join me in laughing at it, but I'm just picking up the basics of bash, awk and sed, so I apologize now for this assault on good scripting...
for line in $(< abc.txt);do
if [ ${line:0:2} == "Ge" ] then
awk '{print $8,$9}' $line >> allgood.txt
elif [ ${line:0:2} == "ea" ] then
awk '{print $3,$4}' $line >> allgood.txt
fi
done
The attempted logic was, if it starts with "Ge", then extract phrases $8 and $9 and append to a file. If it starts with "ea", then extract phrases $3 and $4 and append to the same file. However, this doesn't work at all.
Any thoughts?
The simplest approach for this problem is to use grep:
grep -o 'group [0-9]*' file
The -o option displays only the matching part of the line.
You never have to use bash to loop over every line in a file then pass the line to awk as this is exactly how awk works, it iterates over each line and applies the relevant blocks. Here is an approach using your logic in pure awk:
awk '/^Ge/{print $8,$9}/^ea/{print $3,$4}' file
You can do this with "while read" and avoid awk if you prefer:
while read a b c d e f g h i; do
if [ ${a:0:2} == "Ge" ]; then
echo $h $i >> allgood.txt;
elif [ ${a:0:2} == "ea" ]; then
echo $c $d >> allgood.txt;
fi;
done < abc.txt
The letters represent each column, so you'll need as many as you have columns. After that you just output the letters you need.

Resources