Sum Values in Shell Script - bash

I have the following values in this format in a file:
Filevalue.txt
abc
123
dev
456
hij
567
123
542
I need to add the numerical values from the character values which are below it
Output
abc 123
dev 456
hij 1232
Anyhelp will be deeply appreciated?

Here's another awk that mantains the order.
Example
$ awk '/[0-9]+/{a+=$1;next}a{print a;a=0}{printf($1FS);a=0}END{print a}' file
abc 123
dev 456
hij 1232
Explanation
/[0-9]+/{a+=$1;next}:When a number is detected as the content of the record, it's value is accumulated into a var, then next is used to stop further processing and pass the flow to the next record.
a{print a;a=0}: ONLY when the counter is not null we print a value that correspond to the previous word and initialize it.
{printf($1FS);a=0}: Print current record and the separator avoiding the carriage return. This is applied to all text records.
END{print a}: Show the final counter that will remain after the last record.

It's possible to do this by using awk and arrays:
$ awk '{if($1!~/^[0-9]+$/){cur=$1}else{sums[cur]+=$1}}END{for(el in sums){printf("%s %d\n",el,sums[el])}}' Filevalue.txt
hij 1232
dev 456
abc 123
Here is the same code but written to the file sum.awk and with comments:
# this block will be executed for every line in the file
{
# if current processing line is not a number, then it's a name of the next
# group, let's save it to cur variable
if ($1 !~ /^[0-9]+$/) {
cur = $1
} else {
# here we're summing values
sums[cur] += $1
}
}
# this block will be executed at the end of file processing, here we're
# printing array with sums
END {
for (el in sums) {
printf("%s %d\n", el, sums[el])
}
}
Use it like this:
$ awk -f sum.awk Filevalue.txt
hij 1232
dev 456
abc 123
The only downside of using awk here is that it doesn't preserve keys order.

Related

awk to get first column if the a specific number in the line is greater than a digit

I have a data file (file.txt) contains the below lines:
123 pro=tegs, ETA=12:00, team=xyz,user1=tom,dom=dby.com
345 pro=rbs, team=abc,user1=chan,dom=sbc.int,ETA=23:00
456 team=efg, pro=bvy,ETA=22:00,dom=sss.co.uk,user2=lis
I'm expecting to get the first column ($1) only if the ETA= number is greater than 15, like here I will have 2nd and 3rd line first column only is expected.
345
456
I tried like cat file.txt | awk -F [,TPF=]' '{print $1}' but its print whole line which has ETA at the end.
Using awk
$ awk -F"[=, ]" '{for (i=1;i<NF;i++) if ($i=="ETA") if ($(i+1) > 15) print $1}' input_file
345
456
With your shown samples please try following GNU awk code. Using match function of GNU awk where I am using regex (^[0-9]+).*ETA=([0-9]+):[0-9]+ which creates 2 capturing groups and saves its values into array arr. Then checking condition if 2nd element of arr is greater than 15 then print 1st value of arr array as per requirement.
awk '
match($0,/(^[0-9]+).*\<ETA=([0-9]+):[0-9]+/,arr) && arr[2]+0>15{
print arr[1]
}
' Input_file
I would harness GNU AWK for this task following way, let file.txt content be
123 pro=tegs, ETA=12:00, team=xyz,user1=tom,dom=dby.com
345 pro=rbs, team=abc,user1=chan,dom=sbc.int,ETA=23:00
456 team=efg, pro=bvy,ETA=02:00,dom=sss.co.uk,user2=lis
then
awk 'substr($0,index($0,"ETA=")+4,2)+0>15{print $1}' file.txt
gives output
345
Explanation: I use String functions, index to find where is ETA= then substr to get 2 characters after ETA=, 4 is used as ETA= is 4 characters long and index gives start position, I use +0 to convert to integer then compare it with 15. Disclaimer: this solution assumes every row has ETA= followed by exactly 2 digits.
(tested in GNU Awk 5.0.1)
Whenever input contains tag=value pairs as yours does, it's best to first create an array of those mappings (v[]) below and then you can just access the values by their tags (names):
$ cat tst.awk
BEGIN {
FS = "[, =]+"
OFS = ","
}
{
delete v
for ( i=2; i<NF; i+=2 ) {
v[$i] = $(i+1)
}
}
v["ETA"]+0 > 15 {
print $1
}
$ awk -f tst.awk file
345
456
With that approach you can trivially enhance the script in future to access whatever values you like by their names, test them in whatever combinations you like, output them in whatever order you like, etc. For example:
$ cat tst.awk
BEGIN {
FS = "[, =]+"
OFS = ","
}
{
delete v
for ( i=2; i<NF; i+=2 ) {
v[$i] = $(i+1)
}
}
(v["pro"] ~ /b/) && (v["ETA"]+0 > 15) {
print $1, v["team"], v["dom"]
}
$ awk -f tst.awk file
345,abc,sbc.int
456,efg,sss.co.uk
Think about how you'd enhance any other solution to do the above or anything remotely similar.
It's unclear why you think your attempt would do anything of the sort. Your attempt uses a completely different field separator and does not compare anything against the number 15.
You'll also want to get rid of the useless use of cat.
When you specify a column separator with -F that changes what the first column $1 actually means; it is then everything before the first occurrence of the separator. Probably separately split the line to obtain the first column, space-separated.
awk -F 'ETA=' '$2 > 15 { split($0, n, /[ \t]+/); print n[1] }' file.txt
The value in $2 will be the data after the first separator (and up until the next one) but using it in a numeric comparison simply ignores any non-numeric text after the number at the beginning of the field. So for example, on the first line, we are actually literally checking if 12:00, team=xyz,user1=tom,dom=dby.com is larger than 15 but it effectively checks if 12 is larger than 15 (which is obviously false).
When the condition is true, we split the original line $0 into the array n on sequences of whitespace, and then print the first element of this array.
Using awk you could match ETA= followed by 1 or more digits. Then get the match without the ETA= part and check if the number is greater than 15 and print the first field.
awk '/^[0-9]/ && match($0, /ETA=[0-9]+/) {
if(substr($0, RSTART+4, RLENGTH-4)+0 > 15) print $1
}' file
Output
345
456
If the first field should start with a number:
awk '/^[0-9]/ && match($0, /ETA=[0-9]+/) {
if(substr($0, RSTART+4, RLENGTH-4) > 15)+0 print $1
}' file

Searching a string and replacing another string above the searched string

I have a file with the lines below
123
456
123
789
abc
efg
xyz
I need to search with abc and replace immediate above 123 with 111. This is the requirement, abc is only one occurrence in the file but 123 can be multiple occurrences and 123 can be at any position above abc.
Please help me.
I have tried with below sed command
sed -i.bak "/abc/!{x;1!p;d;};x;s/123/1111" filename
With the above command, it is only replacing 123, if 123 is just above abc, if 123 is 2 lines above abc then replace is failing.
There's more than on way to do it. Here's one:
sed -i.bak '1{h;d;};/123/{x;p;d;};/abc/{x;s/123/111/;p;d;};H;${x;p;};d' filename
ed comes in handy for complex editing of files in scripts:
ed -s file <<EOF
/^abc$/;?^123$?;.c
111
.
w
EOF
This: Sets the current line to the first one matching abc (/^abc$/;). Then changes the first line before that point that matches 123 to 111 (?XXX? searches backwards for a matching regular expression, and ?^123$?;. selects that single line for c to change) and finally saves the modified file.
This is a classic case where you keep track of your previous line and change stuff depeinding on conditions satisfying the current line. Genearlly, an awk program looks like this:
awk '(FNR==1){prev=$0; next}
(condition_on_$0) { action_on_prev }
{ print prev; prev = $0 }
END { print $0 }'
So in the case of the OP, this would read:
awk '(FNR==1){prev=$0; next}
$0 == "abc" { if (prev == "123") prev = "111" }
{ print prev; prev = $0 }
END { print $0 }'
This might work for you (GNU sed):
sed -Ez 's/(.*)(\n123.*\nabc)/\1\n111\2/' file
This slurps the file into memory and inserts 111 in front of the last occurrence of 123 before abc.
A less memory intensive solution:
sed -E '/^123$/{:a;N;/\n123$/{h;s///p;g;s/.*\n//;ba};/\nabc$/!ba;s/^/111\n/}' file
This gathers up lines following a line containing 123. If another line containing 123 is encountered it offloads all lines before it and begins gathering lines again. If it finds a line containing abc it inserts 111 at the front of the lines gathered so far.
Another alternative:
sed '/abc/{x;/./{s/^/111\n/p;z};x;b};/123/{x;/./p;x;h;$!d;b};x;/./{x;H;$!d};x' file
$ tac file | awk 'f && sub(/123/,"111"){f=0} /abc/{f=1} 1' | tac
123
456
111
789
abc
efg
xyz

BASH - Split file into several files based on conditions

I have a file (input.txt) with the following structure:
>day_1
ABC
DEF
GHI
>day_2
JKL
MNO
PQR
>day_3
STU
VWX
YZA
>month_1
BCD
EFG
HIJ
>month_2
KLM
NOP
QRS
...
I would like to split this file into multiple files (day.txt; month.txt; ...). Each new text file would contain all "header" lines (the one starting with >) and their content (lines between two header lines).
day.txt would therefore be:
>day_1
ABC
DEF
GHI
>day_2
JKL
MNO
PQR
>day_3
STU
VWX
YZA
and month.txt:
>month_1
BCD
EFG
HIJ
>month_2
KLM
NOP
QRS
I cannot use split -l in this case because the amount of lines is not the same for each category (day, month, etc.). However, each sub-category has the same number of lines (=3).
EDIT: As per OP adding 1 more solution now.
awk -F'[>_]' '/^>/{file=$2".txt"} {print > file}' Input_file
Explanation:
awk -F'[>_]' ' ##Creating field separator as > or _ in current lines.
/^>/{ file=$2".txt" } ##Searching a line which starts with > if yes then creating a variable named file whose value is 2nd field".txt"
{ print > file } ##Printing current line to variable file(which will create file name of variable file's value).
' Input_file ##Mentioning Input_file name here.
Following awk may help you on same.
awk '/^>day/{file="day.txt"} /^>month/{file="month.txt"} {print > file}' Input_file
You can set the record separator to > and then just set the file name based on the category given by $1.
$ awk -v RS=">" 'NF {f=$1; sub(/_.*$/, ".txt", f); printf ">%s", $0 > f}' input.txt
$ cat day.txt
>day_1
ABC
DEF
GHI
>day_2
JKL
MNO
PQR
>day_3
STU
VWX
YZA
$ cat month.txt
>month_1
BCD
EFG
HIJ
>month_2
KLM
NOP
QRS
Here's a generic solution for >name_number format
$ awk 'match($0, /^>[^_]+_/){k = substr($0, RSTART+1, RLENGTH-2);
if(!(k in a)){close(op); a[k]; op=k".txt"}}
{print > op}' ip.txt
match($0, /^>[^_]+_/) if line matches >name_ at start of line
k = substr($0, RSTART+1, RLENGTH-2) save the name portion
if(!(k in a)) if the key is not found in array
a[k] add key to array
op=k".txt" output file name
close(op) in case there are too many files to write
print > op print input record to filename saved in op
Since each subcategory is composed of the same amount of lines, you can use grep's -A / --after flag to specify that number of lines to match after a header.
So if you know in advance the list of categories, you just have to grep the headers of their subcategories to redirect them with their content to the correct file :
lines_by_subcategory=3 # number of lines *after* a subcategory's header
for category in "month" "day"; do
grep ">$category" -A $lines_by_subcategory input.txt >> "$category.txt"
done
You can try it here.
Note that this isn't the most efficient solution as it must browse the input once for each category. Other solutions could instead browse the content and redirect each subcategory to their respective file in a single pass.

awk: if condition is met, add string to these lines

I am trying to combine these two commands:
awk -F[' '] '{if ($1=="string" || $4=="string") print $0" ""blue"}' file >file.out
awk -F[' '] '{if ($1!="string" && $4!="string") print $0}' file >>file.out
Basically I want to add a column, but only print blue in that column if either the first or the fourth column are equal to a string.
input:
string 123 452 abc
def 420 902 ghi
expected output:
string 123 452 abc blue
def 420 902 ghi
You can do it in a single pass like
awk '$1 == "string" || $4 == "string" {print $0 " blue"; next} {print}' file > file.out
which will test if either the first or forth field is string and will print the line followed by blue if so. We use next then to not process the next block (rather than doing the negated test we just did). Any line we didn't print with blue on, we just print as it was.
Another awk solution:
$ awk '$1=="string" || $4=="string"{ NF+=1; $NF="blue" }1' file
string 123 452 abc blue
def 420 902 ghi
awk '{print $0 (($1 == "string" || $4 == "string") ? OFS "blue":"")}' infile
Another similar alternative:
awk '$1=="string" || $4=="string" {$0=$0" blue"} 1' infile
Output:
string 123 452 abc blue
def 420 902 ghi
Following may also help you, where I am not mentioning the field numbers in check condition(considering that your Input_file is same as shown sample), so it looks if a line starts from string "string" and ends from string "string" then it prints blue else print normal line.
awk '/^string/||/string$/{print $0,"blue";next} 1' Input_file

Creating a mapping count

I have this data with two columns
Id Users
123 2
123 1
234 5
234 6
34 3
I want to create this count mapping from the given data like this
123 3
234 11
34 3
How can I do it in bash?
You have to use associative arrays, something like
declare -A newmap
newmap["123"]=2
newmap["123"]=$(( ${newmap["123"]} + 1))
obviously you have to iterate through your input, see if the entry exists then add to it, else initialize it
It will be easier with awk.
Solution 1: Doesn't expect the file to be sorted. Stores entire file in memory
awk '{a[$1]+=$2}END{for(x in a) print x,a[x]}' file
34 3
234 11
123 3
What we are doing here is using the first column as key and adding second column as value. In the END block we iterate over our array and print the key=value pair.
If you have the Id Users line in your input file and want to exclude it from the output, then add NR>1 condition by saying:
awk 'NR>1{a[$1]+=$2}END{for(x in a) print x,a[x]}' file
NR>1 is telling awk to skip the first line. NR contains the line number so we instruct awk to start creating our array from second line onwards.
Solution 2: Expects the file to be sorted. Does not store the file in memory.
awk '$1!=prev && NR>1{print prev,sum}{prev=$1; sum+=$2}END{print prev,sum}' file
123 3
234 14
34 17
If you have the Id Users line in your input file and want to exclude it from the output, then add NR>1 condition by saying:
awk '$1!=prev && NR>2{print prev, sum}NR>1{prev = $1; sum+=$2}END{print prev, sum}' ff
123 3
234 14
34 17
A Bash (4.0+) solution:
declare -Ai count
while read a b ; do
count[$a]+=b
done < "$infile"
for idx in ${!count[#]}; do
echo "${idx} ${count[$idx]}"
done
For a sorted output the last line should read
done | sort -n

Resources