Explanation of awk function - bash

I am converting some bash-style (actually using busybox) scripts to c for usage in a custom kernel driver. Everything is going fine but I'm dreadfully unfamiliar with awk, and would really appreciate an explanation of what this one liner is doing. The function is here:
checksum=`echo $sum | busybox awk '{$NF *= -1; print}'`
checksum and sum are standard integers that have been accounted for, and can be either positive or negative. I just have no clue what happens when sum is piped into the awk function.

This piece of code awk '{$NF *= -1; print}' multiplies the value of the last field $NF by -1 in all the lines and then it prints the whole line with the new value assigned to last field $NF.
This syntax is often called a shorthand assignment and is equivalent to $NF=$NF*-1. Similarilly we have more shorthand operations like addition and subtraction:
$ echo "1 2 3" |awk '{$NF *=10;print}' #Equivalent to $NF=$NF*10
1 2 30
$ echo "1 2 3" |awk '{$NF +=10;print}' #Equivalent to $NF=$NF+10
1 2 13
$ echo "1 2 3" |awk '{$NF -=10;print}' #Equivalent to $NF=$NF-10
1 2 -7
$ echo "1 2 3" |awk '{$NF /=10;print}' #Equivalent to $NF=$NF/10
1 2 0.3
In your case:
$ echo "1 2 3" |awk '{$NF *=-1;print}'
1 2 -3
Mind that in awk, each input line - each record, is by default separated by one or more spaces.
Then each line is split into fields starting from $1 (first field) up to the last field $NF.
$ echo "1 2 3" |awk '{print $1}'
1
$ echo "1 2 3" |awk '{print $2}'
2
$ echo "1 2 3" |awk '{print $3}'
3
$ echo "1 2 3" |awk '{print $NF}'
3
The whole record in awk is called $0:
$ echo "1 2 3" |awk '{print $0}'
1 2 3
A single print, by default prints the whole line $0:
$ echo "1 2 3" |awk '{print}'
1 2 3

Related

How to get the line number of a string in another string in Shell

Given
str="hij";
sourceStr="abc\nefg\nhij\nlmn\nhij";
I'd like to get the line number of the first occurrence of $str in $sourceStr, which should be 3.
I don't know how to do it.
I have tried:
awk 'match($0, v) { print NR; exit }' v=$str <<<$sourceStr
grep -n $str <<< $sourceStr | grep -Eo '^[^:]+';
grep -n $str <<< $sourceStr | cut -f1 -d: | sort -ug
grep -n $str <<< $sourceStr | awk -F: '{ print $1 }' | sort -u
All output 1, not 3.
How can I get the line number of $str in $sourceStr?
Thanks!
You may use this awk + printf in bash:
awk -v s="$str" '$0 == s {print NR; exit}' <(printf "%b\n" "$sourceStr")
3
Or even this awk without any bash support:
awk -v s="$str" -v source="$sourceStr" 'BEGIN {
split(source, a); for (i=1; i in a; ++i) if (a[i] == s) {print i; exit}}'
3
You may use this sed as well:
sed -n "/^$str$/{=;q;}" <(printf "%b\n" "$sourceStr")
3
Or this grep + cut:
printf "%b\n" "$sourceStr" | grep -nxF -m 1 "$str" | cut -d: -f1
3
It's not clear if you've just made a cut-n-paste error, but your sourceStr is not a multiline string (as demonstrated below). Also, you really need to quote your herestring (also demonstrated below). Perhaps you just want:
$ sourceStr="abc\nefg\nhij\nlmn\nhij"
$ echo "$sourceStr"
abc\nefg\nhij\nlmn\nhij
$ sourceStr=$'abc\nefg\nhij\nlmn\nhij'
$ echo "$sourceStr"
abc
efg
hij
lmn
hij
$ cat <<< $sourceStr
abc efg hij lmn hij
$ cat <<< "$sourceStr"
abc
efg
hij
lmn
hij
$ str=hij
$ awk "/${str}/ {print NR; exit}" <<< "$sourceStr"
3
Just use sed!
printf 'abc\nefg\nhij\nlmn\nhij\n' \
| sed -n '/hij/ { =; q; }'
Explanation: if sed meets a line that contains "hij" (regex /hij/), it prints the line number (the = command) and exits (the q command). Else it doesn't print anything (the -n switch) and goes on with the next line.
[update] Hmmm, sorry, I just noticed your "All output 1, not 3".
The primary reason why your commands don't output 3 is that sourceStr="abc\nefg\nhij\nlmn\nhij" doesn't automagically change your \n into new lines, so it ends up being one single line and that's why your commands always display 1.
If you want a multiline string, here are two solutions with bash:
printf -v sourceStr "abc\nefg\nhij\nlmn\nhij"
sourceStr=$'abc\nefg\nhij\nlmn\nhij'
And now that your variable contains space characters (new lines), as stated by William Pursell, in order to preserve them, you must enclose your $sourceStr with double quotes:
grep -n "$str" <<< "$sourceStr" | ...
There's always a hard way to do it:
str="hij";
sourceStr="abc\nefg\nhij\nlmn\nhij";
echo -e $sourceStr | nl | grep $str | head -1 | gawk '{ print $1 }'
or, a bit more efficient:
str="hij";
sourceStr="abc\nefg\nhij\nlmn\nhij";
echo -e $sourceStr | gawk '/'$str/'{ print NR; exit }'

How to count characters between tabs that are greater than 8000 in linux

I have a file for example file.dat.gz that is tab delimited.
For example
hi^Iapple^Itoast
is it possible to count in between the tabs using wc?
Since the above counts would be 2, 5, 5 wc would return 0 but if it was greater than 8000 could it list 1 or the exact value?
Doesn't need wc.
Set $IFS to a tab temporarily on the line ahead of a read.
That will exclude spaces (c.f. "a b c").
Read into an array, and loop each.
Test for length > 8000 and behave accordingly.
Here's a quick example you should be able to adapt.
$: IFS=" " read -a lst < in
$: for x in "${lst[#]}"
> do l="${#x}"
> if (( l > 8000 ))
> then x='<too long>'
> fi
> printf "'%s' = %d\n" "$x" "$l"
> done
'hi' = 2
'a b c' = 5
'apple' = 5
'<too long>' = 10000
'toast' = 5
If you are processing a really big file, write it in awk or perl for better performance.
awk -F'\t' '{for (i=1; i<=NF;i++) if(length($i)>8000) print $i}'
Demo
$echo -e "hi\tapple\ttoast" | awk -F'\t' '{for (i=1; i<=NF;i++) if(length($i)>2) print $i}'
apple
toast
$echo -e "hi\tapple\ttoast" | awk -F'\t' '{print length($1) , length($2) , length($3)}'
2 5 5
$echo -e "hi\tapple\ttoast"
hi apple toast
$echo -e "hi\tapple\ttoast" | awk -F'\t' '{print length($1) , length($2) , length($3)}'
2 5 5
$echo -e "hi\tapple\ttoast" | awk -F'\t' '{for (i=1; i<=NF;i++) if(length($i)>2) print $i}'
apple
toast
$

awk ignores line with 0 only

I don’t know why I can not give only 0 to awk in a direct statement, e.g. if I want to output the square of a number:
$ echo 4 | awk '$0=$1*$1'
16
$ echo 3 | awk '$0=$1*$1'
9
$ echo 0 | awk '$0=$1*$1'
Why do I get nothing on the last try?
PS. it works if I write $1 in a bracketed statement:
$ echo 0 | awk '{print $1*$1}'
0
No, awk does not ignore a line with 0.
However, your awk command: $0=$1*$1 does not do what you think.
By default awk prints $0 if there is an statement that evaluates to true (not zero).
So, this will always print $0:
awk '1'
And this will never print $0:
awk '0'
To do what you want: to always print $0 after it has been re-calculated, you need to do:
awk '{$0=$1*$1; print}'
And so:
$ echo "0" | awk '{$0=$1*$1; print}'
0
$ echo "2" | awk '{$0=$1*$1; print}'
4
Or, without changing the value of $0, do:
$ echo "2" | awk '{print $0*$0}'
Or (shorter but less readable):
$ echo "2" | awk '{$0=$0*$0}1'
4
And, even shorter:
$ echo "4" | awk '{$0*=$0}1'
16
This last awk script is actually composed of two command lines:
awk '
<default pattern> { $0*=$0 }
1 { <default action> }
'
Which become, replacing the action by print and the condition by all:
awk ' /.*/{$0*=$0}
1 {print $0}'
Both lines are applied to all input lines. For all lines $0 is changed, and for all input lines a print $0 is executed.

Sum/Average numbers in a single line - UNIX

I'm working on a small script to take 3 numbers in a single line, sum and average them, and print the result at the end of the line. I know how to use the paste command, but everything I'm finding is telling me how to average a column. I need to average a line, not a column. Any advice? Thanks!
awk to the rescue!
$ echo 1 2 3 | awk -v RS=' ' '{sum+=$1; count++} END{print sum, sum/count}'
6 2
works for any number of input fields
$ echo 1 2 3 4 | awk -v RS=' ' '{sum+=$1; count++} END{print sum, sum/count}'
10 2.5
You can manipulate your line before giving it to bc. With bc you have additional possibilities such as setting the scale.
A simple mean from 1 2 3 would be
echo "1 2 3" | sed -e 's/\([0-9.]\) /\1+/g' -e 's/.*/(&)\/3/' | bc
You can wrap it in a function and see more possibilities:
function testit {
echo "Input $#"
echo "Integer mean"
echo "$#" | sed -e 's/\([0-9.]\) /\1+/g' -e 's/.*/(&)\/'$#'/' | bc
echo "floating decimal mean"
echo "$#" | sed -e 's/\([0-9.]\) /\1+/g' -e 's/.*/(&)\/'$#'/' | bc -l
echo "2 decimal output mean"
echo "$#" | sed -e 's/\([0-9.]\) /\1+/g' -e 's/.*/scale=2; (&)\/'$#'/' | bc
echo
}
testit 4 5 6
testit 4 5 8
testit 4.2 5.3 6.4
testit 1 2 3 4 5 6 7 8 9

Determine the number of characters in a variable

How can I determine the number of characters in a variable?
FOO="blabla.bla.blabla.bla."
--check--
echo $FOO # 4 dot
FOO="..bla.bla.bla.blabla.bla."
--check--
echo $FOO # 7 dot
You should try this:
echo ${#FOO}
${#VARIABLE_NAME} gives you the lenght of a string. Read (its on top of the page)
awk -F. '{print NF-1}' <<<$FOO
example:
kent$ FOO="blabla.bla.blabla.bla."
kent$ awk -F. '{print NF-1}' <<<$FOO
4
kent$ FOO="..bla.bla.bla.blabla.bla."
kent$ awk -F. '{print NF-1}' <<<$FOO
7
echo $FOO | tr -dc \\. | wc -c
Does that answer your question?
Strip the non-dots and count the length of the result.
$ x=..bla.bla.bla.blabla.bla.
$ _=${x//[^.]} count=${#_}; echo "$count"
7
$ printf -v _ %s%n "${x//[^.]}" count; echo "$count"
7

Resources