How to add all values in a certain column? - shell

I want to add all the 3rd fields from each line and produce the result.
Below is the way I solved the problem
sum=0
grep '2016Feb' input.txt|awk -F\- '{print $3}'|while read LINE; do
sum = $(expr $sum + $LINE)
done
echo $sum
Is there a better way of solving the problem than my code? Possible a command that solves the problem # command line itself?
For a file like:
$ cat input.txt
Feb2016-2016-110
Feb2016-2016-20
Feb2016-2016-220
Feb2016-2016-140
Feb2016-2016-100
The output is: 590.

Just set the field separator to the dash and sum the third column:
$ awk -F- '{sum+=$3} END{print sum+0}' file
590 ^^
# in case there are no matching lines, print 0
Since it looks like you are just counting those lines that contain the text "Feb2016", you can also add a filter:
awk -F- '/Feb2016/{sum+=$3} END{print sum+0}' file
# ^^^^^^^^^
# just on lines containing the string "Feb2016"

$ cat data
Feb2016-2016-110
Feb2016-2016-20
Feb2016-2016-220
Feb2016-2016-140
Feb2016-2016-100
$ cut -d - -f 3 data | paste -s -d '+' | bc
590
$

Related

Grep - Getting the character position in the line of each occurrence

According to the manual, the option -b can give the byte offset of a given occurence, but it seems to start from the beginning of the parsed content.
I need to retrieve the position of each matching content returned by grep. I used this line, but it's quite ugly:
grep '<REGEXP>' | while read -r line ; do echo $line | grep -bo '<REGEXP>' ; done
How to get it done in a more elegant way, with a more efficient use of GNU utils?
Example:
$ echo "abcdefg abcdefg" > test.txt
$ grep 'efg' | while read -r line ; do echo $line | grep -bo 'efg' ; done < test.txt
4:efg
12:efg
(Indeed, this command line doesn't output the line number, but it's not difficult to add it.)
With any awk (GNU or otherwise) in any shell on any UNIX box:
$ awk -v re='efg' -v OFS=':' '{
end = 0
while( match(substr($0,end+1),re) ) {
print NR, end+=RSTART, substr($0,end,RLENGTH)
end+=RLENGTH-1
}
}' test.txt
1:5:efg
1:13:efg
All strings, fields, array indices in awk start at 1, not zero, hence the output not looking like yours since to awk your input string is:
123456789012345
abcdefg abcdefg
rather than:
012345678901234
abcdefg abcdefg
Feel free to change the code above to end+=RSTART-1 and end+=RLENGTH if you prefer 0-indexed strings.
Perl is not a GNU util, but can solve your problem nicely:
perl -nle 'print "$.:$-[0]" while /efg/g'

Count number of Special Character in Unix Shell

I have a delimited file that is separated by octal \036 or Hexadecimal value 1e.
I need to count the number of delimiters on each line using a bash shell script.
I was trying to use awk, not sure if this is the best way.
Sample Input (| is a representation of \036)
Example|Running|123|
Expected output:
3
awk -F'|' '{print NF-1}' file
Change | to whatever separator you like. If your file can have empty lines then you need to tweak it to:
awk -F'|' '{print (NF ? NF-1 : 0)}' file
You can try
awk '{print gsub(/\|/,"")}'
Simply try
awk -F"|" '{print substr($3,length($3))}' OFS="|" Input_file
Explanation: Making field separator -F as | and then printing the 3rd column by doing $3 only as per your need. Then setting OFS(output field separator) to |. Finally mentioning Input_file name here.
This will work as far as I know
echo "Example|Running|123|" | tr -cd '|' | wc -c
Output
3
This should work for you:
awk -F '\036' '{print NF-1}' file
3
-F '\036' sets input field delimiter as octal value 036
Awk may not be the best tool for this. Gnu grep has a cool -o option that prints each matching pattern on a separate line. You can then count how many matching lines are generated for each input line, and that's the count of your delimiters. E.g. (where ^^ in the file is actually hex 1e)
$ cat -v i
a^^b^^c
d^^e^^f^^g
$ grep -n -o $'\x1e' i | uniq -c
2 1:
3 2:
if you remove the uniq -c you can see how it's working. You'll get "1" printed twice because there are two matching patterns on the first line. Or try it with some regular ascii characters and it becomes clearer what the -o and -n options are doing.
If you want to print the line number followed by the field count for that line, I'd do something like:
$grep -n -o $'\x1e' i | tr -d ':' | uniq -c | awk '{print $2 " " $1}'
1 2
2 3
This assumes that every line in the file contains at least one delimiter. If that's not the case, here's another approach that's probably faster too:
$ tr -d -c $'\x1e\n' < i | awk '{print length}'
2
3
0
0
0
This uses tr to delete (-d) all characters that are not (-c) 1e or \n. It then pipes that stream of data to awk which just counts how many characters are left on each line. If you want the line number, add " | cat -n" to the end.

bash - how do I use 2 numbers on a line to create a sequence

I have this file content:
2450TO3450
3800
4500TO4560
And I would like to obtain something of this sort:
2450
2454
2458
...
3450
3800
4500
4504
4508
..
4560
Basically I would need a one liner in sed/awk that would read the values on both sides of the TO separator and inject those in a seq command or do the loop on its own and dump it in the same file as a value per line with an arbitrary increment, let's say 4 in the example above.
I know I can use several one temp file, go the read command and sorts, but I would like to do it in a one liner starting with cat filename | etc. as it is already part of a bigger script.
Correctness of the input is guaranteed so always left side of TOis smaller than bigger side of it.
Thanks
Like this:
awk -F'TO' -v inc=4 'NF==1{print $1;next}{for(i=$1;i<=$2;i+=inc)print i}' file
or, if you like starting with cat:
cat file | awk -F'TO' -v inc=4 'NF==1{print $1;next}{for(i=$1;i<=$2;i+=inc)print i}'
Something like this might work:
awk -F TO '{system("seq " $1 " 4 " ($2 ? $2 : $1))}'
This would tell awk to system (execute) the command seq 10 4 10 for lines just containing 10 (which outputs 10), and something like seq 10 4 40 for lines like 10TO40. The output seems to match your example.
Given:
txt="2450TO3450
3800
4500TO4560"
You can do:
echo "$txt" | awk -F TO '{$2<$1 ? t=$1 : t=$2; for(i=$1; i<=t; i++) print i}'
If you want an increment greater than 1:
echo "$txt" | awk -F TO -v p=4 '{$2<$1 ? t=$1 : t=$2; for(i=$1; i<=t; i+=p) print i}'
Give a try to this:
sed 's/TO/ /' file.txt | while read first second; do if [ ! -z "$second" ] ; then seq $first 4 $second; else printf "%s\n" $first; fi; done
sed is used to replace TO with space char.
read is used to read the line, if there are 2 numbers, seq is used to generate the sequence. Otherwise, the uniq number is printed.
This might work for you (GNU sed):
sed -r 's/(.*)TO(.*)/seq \1 4 \2/e' file
This evaluates the RHS of the substitution command if the LHS contains TO.

Print last line of text file

I have a text file like this:
1.2.3.t
1.2.4.t
complete
I need to print the last non blank line and two line to last as two variable. the output should be:
a=1.2.4.t
b=complete
I tried this for last line:
b=awk '/./{line=$0} END{print line}' myfile
but I have no idea for a.
grep . file | tail -n 2 | sed 's/^ *//;1s/^/a=/;2s/^/b=/'
Output:
a=1.2.4.t
b=complete
awk to the rescue!
$ awk 'NF{a=b;b=$0} END{print "a="a;print "b="b}' file
a=1.2.4.t
b=complete
Or, if you want to the real variable assignment
$ awk 'NF{a=b;b=$0} END{print a, b}' file
| read a b; echo "a="$a; echo "b="$b
a=1.2.4.t
b=complete
you may need -r option for read if you have backslashes in the values.

Explode to Array

I put together this shell script to do two things:
Change the delimiters in a data file ('::' to ',' in this case)
Select the columns and I want and append them to a new file
It works but I want a better way to do this. I specifically want to find an alternative method for exploding each line into an array. Using command line arguments doesn't seem like the way to go. ANY COMMENTS ARE WELCOME.
# Takes :: separated file as 1st parameters
SOURCE=$1
# create csv target file
TARGET=${SOURCE/dat/csv}
touch $TARGET
echo #userId,itemId > $TARGET
IFS=","
while read LINE
do
# Replaces all matches of :: with a ,
CSV_LINE=${LINE//::/,}
set -- $CSV_LINE
echo "$1,$2" >> $TARGET
done < $SOURCE
Instead of set, you can use an array:
arr=($CSV_LINE)
echo "${arr[0]},${arr[1]}"
The following would print columns 1 and 2 from infile.dat. Replace with
a comma-separated list of the numbered columns you do want.
awk 'BEGIN { IFS='::'; OFS=","; } { print $1, $2 }' infile.dat > infile.csv
Perl probably has a 1 liner to do it.
Awk can probably do it easily too.
My first reaction is a combination of awk and sed:
Sed to convert the delimiters
Awk to process specific columns
cat inputfile | sed -e 's/::/,/g' | awk -F, '{print $1, $2}'
# Or to avoid a UUOC award (and prolong the life of your keyboard by 3 characters
sed -e 's/::/,/g' inputfile | awk -F, '{print $1, $2}'
awk is indeed the right tool for the job here, it's a simple one-liner.
$ cat test.in
a::b::c
d::e::f
g::h::i
$ awk -F:: -v OFS=, '{$1=$1;print;print $2,$3 >> "altfile"}' test.in
a,b,c
d,e,f
g,h,i
$ cat altfile
b,c
e,f
h,i
$

Resources