Unix bash script grep loop counter (for) - bash

I am looping our the a grep result. The result contains 10 lines (every line has different content). So the loop stuff in the loop gets executed 10 times.
I need to get the index, 0-9, in the run so i can do actions based on the index.
ABC=(cat test.log | grep "stuff")
counter=0
for x in $ABC
do
echo $x
((counter++))
echo "COUNTER $counter"
done
Currently the counter won't really change.
Output:
51209
120049
148480
1211441
373948
0
0
0
728304
0
COUNTER: 1

If your requirement is to only print counter(which is as per shown samples only), in that case you could use awk(if you are ok with it), this could be done in a single awk like, without creating variable and then using grep like you are doing currently, awk could perform both search and counter printing in a single shot.
awk -v counter=0 '/stuff/{print "counter=" counter++}' Input_file
Replace stuff string above with the actual string you are looking for and place your actual file name for Input_file in above.
This should print like:
counter=1
counter=2
........and so on

Your shell script contains what should be an obvious syntax error.
ABC=(cat test.log | grep "stuff")
This fails with
-bash: syntax error near unexpected token `|'
There is no need to save the output in a variable if you only want to process one at a time (and obviously no need for the useless cat).
grep "stuff" test.log | nl
gets you numbered lines, though the index will be 1-based, not zero-based.
If you absolutely need zero-based, refactoring to Awk should solve it easily:
awk '/stuff/ { print n++, $0 }' test.log
If you want to loop over this and do something more with this information,
awk '/stuff/ { print n++, $0 }' test.log |
while read -r index output; do
echo index is "$index"
echo output is "$output"
done
Because the while loop executes in a subshell the value of index will not be visible outside of the loop. (I guess that's what your real code did with the counter as well. I don't think that part of the code you posted will repro either.)

Do not store the result of grep in a scalar variable $ABC.
If the line of the log file contains whitespaces, the variable $x
is split on them due to the word splitting of bash.
(BTW the statement ABC=(cat test.log | grep "stuff") causes a syntax error.)
Please try something like:
readarray -t abc < <(grep "stuff" test.log)
for x in "${abc[#]}"
do
echo "$x"
echo "COUNTER $((++counter))"
done
or
readarray -t abc < <(grep "stuff" test.log)
for i in "${!abc[#]}"
do
echo "${abc[i]}"
echo "COUNTER $((i + 1))"
done

you can use below increment statement-
counter=$(( $counter + 1));

Related

How to filter text data in bash more efficiently

I have data file which I need to filter with bash script, see data example:
name=pencils
name=apples
value=10
name=rocks
value=3
name=tables
value=6
name=beds
name=cups
value=89
I need to group name value pairs like so apples=10, if current line starts with name and next line starts with name, first line should be omitted entirely. So result file should look like this:
apples=10
rocks=3
tables=6
cups=89
I came with this simple solution which works but is very slow, it takes 5 min to complete for file with 2000 lines.
VALUES=$(cat input.txt)
for x in $VALUES; do
if [[ -n $(echo $x | grep 'name=') ]]; then
name=$(echo $x | sed "s/name=//")
elif [[ -n $(echo $x | grep 'value=') ]]; then
value=$(echo $x | sed "s/value=//")
echo "${name}=${value}" >> output.txt
fi
done
I'm aware that this kind of task is not very suitable for bash, but script is already written and this is just small part of it.
How can I optimize this task in bash?
Do not run any commands in subshells, it slows your script a lot. You can do everything in the current shell.
#! /bin/bash
while IFS== read k v ; do
if [[ $k == name ]] ; then
name=$v
elif [[ $k == value ]] ; then
printf '%s=%s\n' "$name" "$v"
fi
done
There are three easy optimizations you can make that will greatly speed up the script without requiring a major rethink.
1. Replace for with while read
Loading input.txt into a string, and then looping over that string with for x in $VALUES is slow. It requires the whole file to be read into memory even though this task could be done in a streaming fashion, reading a line at a time.
A common replacement for for line in $(cat file) is while read line; do ... done < file. It turns out that loops are compound commands, and like the normal one-line commands we're used to, compound commands can have < and > redirections. Redirecting a file into a loop means that for the duration of the loop, stdin comes from the file. So if you call read line inside the loop then it will read one line each iteration.
while IFS= read -r x; do
if [[ -n $(echo $x | grep 'name=') ]]; then
name=$(echo $x | sed "s/name=//")
elif [[ -n $(echo $x | grep 'value=') ]]; then
value=$(echo $x | sed "s/value=//")
echo "${name}=${value}" >> output.txt
fi
done < input.txt
2. Redirect output outside loop
It's not just input that can be redirected. We can do the same thing for the >> output.txt redirection. Here's where you'll see the biggest speedup. When >> output.txt is inside the loop output.txt must be opened and closed every iteration, which is crazy slow. Moving it to the outside means it only needs to be opened once. Much, much faster.
while IFS= read -r x; do
if [[ -n $(echo $x | grep 'name=') ]]; then
name=$(echo $x | sed "s/name=//")
elif [[ -n $(echo $x | grep 'value=') ]]; then
value=$(echo $x | sed "s/value=//")
echo "${name}=${value}"
fi
done < input.txt > output.txt
3. Shell string processing
One final improvement is to use faster string processing. Calling grep requires forking a subprocess every time just to do a simple string split. It'd be a lot faster if we could do the string splitting using just shell constructs. Well, as it happens that's easy now that we've switched to read. read can do more than read whole lines; it can also split on a delimiter from the variable $IFS (inter-field separator).
while IFS='=' read -r key value; do
case "$key" in
name) name="$value";;
value) echo "$name=$value";;
fi
done < input.txt > output.txt
Further reading
BashFAQ/001 - How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
This explains why I have IFS= read -r in the first two iterations.
BashFAQ/024 - I set variables in a loop that's in a pipeline. Why do they disappear after the loop terminates? Or, why can't I pipe data to read?
cmd | while read; do ... done is another popular use of while read, but it has unique pitfalls.
BashFAQ/100 - How do I do string manipulations in bash?
More in-shell string processing options.
If you have performance issues do not use bash at all. Use a text processing tool like, for instance, awk:
$ awk -F= '{name = $2} $1 == "value" {print name "=" $2}' data.txt
apples=10
rocks=3
tables=6
cups=89
Explanation: -F= defines the field separator as character =. The first block is executed only if the first field of a line ($1) is equal to string value. It prints variable name followed by character = and the second field ($2). The second block is executed on each line and it stores the second field ($2) in variable name.
Normally, if your input resembles what you show, this should automatically skip the first line. Else, we can exclude it explicitly using a test on the NR variable which value is the line number, starting at 1:
awk -F= 'NR != 1 && $1 == "value" {print name "=" $2}
NR != 1 {name = $2}' data.txt
All this works on inputs like the one you show but not on inputs where you would have other types of lines or several value=... consecutive lines. If you really want to test that the name/value pair is on two consecutive lines we need something more. For instance, test if the first field is name and use another variable n to store the line number of the last encountered name=... line. With all these tests we can now put the 2 blocks in a slightly more intuitive order (but the opposite would work the same):
awk -F= 'NR != 1 && $1 == "name" {name = $2; n = NR}
NR != 1 && NR == n+1 && $1 == "value" {print name "=" $2}' data.txt
With awk there might be a more elegant solution but you can have:
awk 'BEGIN{RS="\n?name=";FS="\nvalue="} {if($2) printf "%s=%s\n",$1,$2}' inputs.txt
RS="\n?name=" says that the record separator is name=
FS="\nvalue=" says that the field separator for each record is value=
if($2) says to only proceed the printf is the second field exists

Print character n times based on column value

I am trying to print the character "N" multiple times, based on the value of a column in several lines of a table
For example, the output of this table:
AATTGGCC A 1.7 4
CCGGTTAA T 0.8 3
AAGGTTCC G 2.4 7
Would ideally return:
NNNN
NNN
NNNNNNN
based on the value in column 4.
Currently, I have been using:
while read line
do
a=$4
printf "$0.sN\n" {a}
done < table.txt
But this only returns three 'N's, each on new lines.
Ideally I would also like to print out the letter from column 2 on the end of each line of output.
Can anyone please help me?
With bash:
while read x x x end; do
for ((i=0; i<$end; i++)); do
echo -n "N"
done
echo
done < file
Output:
NNNN
NNN
NNNNNNN
Building off the example you gave, I would change it to this:
while read line
do
a=$(echo ${line} | awk '{print $4}')
printf "N%.0s" $(seq $a)
echo
done < table.txt
It's not clear to me how you were able to get the 3 N's from your original solution (I could not reproduce), but maybe we have a different configuration. The solution I posted above is based on BASH. Inside the while loop, the first line of code,
a=$(echo ${line} | awk '{print $4}')
echos the line then pipes it into an awk statement that only looks at the fourth column and saves it to a variable called "a". The $(...) notation simply means to evaluate the statement within the parenthesis as a BASH command (in this case an echo piped into an awk command). On the second line,
printf "N%.0s" $(seq $a)
I print out $a number of Ns via the printf command. Once again, the $(...) simply runs the BASH command, which in this case is a seq command to print $a characters. With this in a script, I was able to get the following result for your example table.txt file:
NNNN
NNN
NNNNNNN

Is there any better solution to reverse uniq count

I want to reverse the uniq -c output from:
1 hi
2 test
3 try
to:
hi
test
test
try
try
try
My solution now is to use a loop:
while read a b; do yes $b |head -n $a ;done <test.txt
I wonder if there are any simpler commands to achieve that?
another awk
awk '{while ($1--) print $2}' file
Here's another solution
echo "\
1 hi
2 test
3 try" | awk '{for(i=1;i<=$1;i++){print($2)}}'
output
hi
test
test
try
try
try
This will work the same way, with
awk '{for(i=1;i<=$1;i++){print($2)}}' uniq_counts.txt
The core of the script, if of course
{ for (i=1;i<=$1;i++) { print $2 }
where awk has parsed the input into 2 fields, $1 being the number (1,2,3) and $2 is the value, (hi,test,try).
The condition i<=$1 tests that the counter i has not incremented beyond the count supplied in field $1, and the print $2 prints the value each time that i<=$1 condition is true.
IHTH
You don't need awk or any other command, you can do it entirely in bash
while read n s
do
for ((i=0; i<n; i++))
do
echo $s;
done
done < test.txt
Here my solution uses the bash brace expansion and the printf internal command.
while read a b
do
eval printf "'%.${#b}s\n'" "'$b'"{1..$a}
done <test.txt
The following simple example
printf '%s\n' test{1..2}
prints two lines which contain the string test followed by a number:
test1
test2
but we can specify the exact number of characters to print by using the precision field of the printf command:
printf '%.4s\n' test{1..2}
to display:
test
test
The length of the characters to print is given by the length of the text to print (${#b}).
Finally the eval command must be used in other to use variables in the brace expansion.

Printing number of lines with in shell with echo

I know that the simplest way to print out the specific value of line/bytes/words is to use wc -l < filename.sh, but when i try to use it in conjunction with the echo command, it's printing the physical command itself and not the output.
My intended output is "this file has x lines", with x being number of lines, but when i try to do things like echo "this line has" wc -l < filename.sh "lines", it's printing the command itself. I've also tried this without breaking the quotation, among several other things.
is it just the command itself that's not applicable alongside echo, or am i missing something extremely obvious here?
echo "this line has $(wc -l < filename.sh) lines"
printf is versatile:
printf 'this file has %s lines\n' $(wc -l < filename.sh)
$(command) converts the output of command into an argument.
Try this one:
echo "this file has `wc -l < filename.sh | awk '{print $1}'` lines"
Explanation:
wc -l < filename.sh retrieves the line number of the file
awk '{print $1}' prints the number without any blanks
`` means executing the command first in order to get the result
Without any subshell or pipe, awk have an inbuilt variable NR which holds the number of record in the input file. Print is written inside END block to print the result at the end else, it will print the line number of each line.
awk 'END{print "This line has " NR " lines" }' file

Using grep in while loop

Is it possible to go trough the results of grep in using a shell script like this:
while read line ; do
...
done < grep ...
Can anyone explain why this doesn't work? What are the alternatives?
thanks!
Looks like you were trying to use process substitution:
lines=5
while read line ; do
let ++lines
echo "$lines $line" # Number each line
# Other operations on $line and $lines
done < <(grep ...)
echo "Total: $lines lines"
Provided grep actually returns some output lines, the result should look like this:
6: foo
7: bar
Total: 7 lines
This is slightly different from grep ... | while ...: In the former, grep is run in a subshell, while in the latter the while loop is in a subshell. This is usually only relevant if you want to keep some state from within the loop - In that case you should use the first form.
On the other hand, if you write
lines=5
grep ... | while read line ; do
let ++lines
echo "$lines $line" # Number each line
# Other operations on $line and $lines
done
echo "Total: $lines lines"
the result would be:
6: foo
7: bar
Total: 5 lines
Ouch! The counter is passed to the subshell (the second part of the pipe), but it's not returned to the parent shell.
grep is a command but done < grep is telling the shell to use the file named grep as the input. You need something like:
grep ... | while read line ; do
...
done

Resources