Add Columns Values with Shell - shell

I've an input file which looks as below.
pmx . pmnosysrelspeechneighbr -m 1 -r
INFO: The ROP files contain suspected faulty counter values.
They have been discarded but can be kept with pmr/pmx option "k" (pmrk/pmxk) or highlighted with pmx option "s" (pmxs)
Date: 2017-11-04
Object Counter 14:45 15:00 15:15 15:30
UtranCell=UE1069XA0 pmNoSysRelSpeechNeighbr 0 1 0 0
UtranCell=UE1069XA1 pmNoSysRelSpeechNeighbr 0 0 0 0
UtranCell=UE1069XA2 pmNoSysRelSpeechNeighbr 0 0 0 0
UtranCell=UE1069XA3 pmNoSysRelSpeechNeighbr 0 0 2 0
UtranCell=UE1069XB0 pmNoSysRelSpeechNeighbr 0 0 0 0
UtranCell=UE1069XB1 pmNoSysRelSpeechNeighbr 0 0 0 3
UtranCell=UE1069XB2 pmNoSysRelSpeechNeighbr 0 0 0 0
UtranCell=UE1069XB3 pmNoSysRelSpeechNeighbr 0 0 0 0
UtranCell=UE1069XC0 pmNoSysRelSpeechNeighbr 0 0 0 0
UtranCell=UE1069XC1 pmNoSysRelSpeechNeighbr 0 0 0 4
UtranCell=UE1069XC2 pmNoSysRelSpeechNeighbr 0 0 0 0
UtranCell=UE1069XC3 pmNoSysRelSpeechNeighbr 0 0 1 0
UtranCell=UE1164XA0 pmNoSysRelSpeechNeighbr 0 3 0 0
UtranCell=UE1164XA1 pmNoSysRelSpeechNeighbr 0 0 0 0
UtranCell=UE1164XA2 pmNoSysRelSpeechNeighbr 1 0 0 0
Now I want the output as below which is basically sum of the time column (from $3 to $6) values.
Counter 14:45 15:00 15:15 15:30
pmNoSysRelSpeechNeighbr 1 4 3 7
I've been trying with below command. But it's just giving sum of one column values:
pmx . pmnosysrelspeechneighbr -m 1 -r | grep - i ^Object| awk '{sum += $4} END {print $1 , sum}'

Try this out, you will get both header and trailer as sum of individual columns.
BEGIN {
trail="pmNoSysRelSpeechNeighbr";
}
{
if($1=="Object") print $2 OFS $3 OFS $4 OFS $5 OFS $6;
else if($1 ~ /^UtranCell/) {
w+=$3; x+=$4; y+=$5; z+=$6;
}
}
END {
print trail OFS w OFS x OFS y OFS z;
}

You need to sum each of the columns separately:
awk -v g=pmNoSysRelSpeechNeighbr '$0 ~ g { for(i=3;i<=6;i++) sum[i]+=$i }
END { printf g; for(i=3;i<=6;i++) printf OFS sum[i] }' file
but only for lines (records) containing the group (counter) of interest ($0~"pmNoSysRelSpeechNeighbr").
Note you (almost) never need to pipe grep's output to awk, because awk already supports extended regular expressions filtering with /regex/ { action }, or var ~ /regex/ { action }. One exception would be the need for PCRE (grep -P).
As an alternative to awk for simple "command-line statistical operations" on textual files, you could also use GNU datamash.
For example, to sum columns 3 to 6, but group by column 2:
grep 'UtranCell' file | datamash -W -g2 sum 3-6

Related

Replace values of one column based on other column conditions in shell

I have a tab separated text file below. I want to match values in column 2 and replace the values in column 5. The condition is if there are X or Y in column 2, I want column 5 to have 1 just like in the result below.
1:935662:C:CA 1 0 935662 0
1:941119:A:G 2 0 941119 0
1:942934:G:C 3 0 942934 0
1:942951:C:T X 0 942951 0
1:943937:C:T X 0 943937 0
1:944858:A:G Y 0 944858 0
1:945010:C:A X 0 945010 0
1:946247:G:A 1 0 946247 0
result:
1:935662:C:CA 1 0 935662 0
1:941119:A:G 2 0 941119 0
1:942934:G:C 3 0 942934 0
1:942951:C:T X 0 942951 1
1:943937:C:T X 0 943937 1
1:944858:A:G Y 0 944858 1
1:945010:C:A X 0 945010 1
1:946247:G:A 1 0 946247 0
I tried awk -F'\t' '{ $5 = ($2 == X ? 1 : $2) } 1' OFS='\t' file.txt but I am not sure how to match both X and Y in one step.
With awk:
awk 'BEGIN{FS=OFS="\t"} $2=="X" || $2=="Y"{$5="1"}1' file
Output:
1:935662:C:CA 1 0 935662 0
1:941119:A:G 2 0 941119 0
1:942934:G:C 3 0 942934 0
1:942951:C:T X 0 942951 1
1:943937:C:T X 0 943937 1
1:944858:A:G Y 0 944858 1
1:945010:C:A X 0 945010 1
1:946247:G:A 1 0 946247 0
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
Assuming you want $5 to be zero (as opposed to remaining unchanged) if the condition is false:
$ awk 'BEGIN{FS=OFS="\t"} {$5=($2 ~ /^[XY]$/)} 1' file
1:935662:C:CA 1 0 935662 0
1:941119:A:G 2 0 941119 0
1:942934:G:C 3 0 942934 0
1:942951:C:T X 0 942951 1
1:943937:C:T X 0 943937 1
1:944858:A:G Y 0 944858 1
1:945010:C:A X 0 945010 1
1:946247:G:A 1 0 946247 0

How to find column values and replace in bash

I could do this easily in R with grepl and row indexing, but wanted to try this in shell. I have a text file that looks like what I have below. I would like to find rows where It matches TWGX and wherever it match, I would like to concatenate column 1 and column 2 separated by _ and make it column values for both column 1 and column 2.
text:
NIALOAD NIALOAD 0 0 2 1
NIALOAD NIALOAD 0 0 2 1
NIALOAD NIALOAD 0 0 1 1
TWGX-MAP 10064-8036056040 0 0 0 -9
TWGX-MAP 11570-8036056502 0 0 0 -9
TWGX-MAP 11680-8036055912 0 0 0 -9
This is the result I want:
NIALOAD NIALOAD 0 0 2 1
NIALOAD NIALOAD 0 0 2 1
NIALOAD NIALOAD 0 0 1 1
TWGX-MAP_10064-8036056040 TWGX-MAP_10064-8036056040 0 0 0 -9
TWGX-MAP_11570-8036056502 TWGX-MAP_11570-8036056502 0 0 0 -9
TWGX-MAP_11680-8036055912 TWGX-MAP_11680-8036055912 0 0 0 -9
The regex /TWGX/ selects the lines containing that string and applies the action that follows. The 1 is an awk shorthand that will print both the modified and unmodified lines.
$ awk 'BEGIN{FS=OFS="\t"} /TWGX/ {tmp = $1 "_" $2; $1 = $2 = tmp}1' file
NIALOAD NIALOAD 0 0 2 1
NIALOAD NIALOAD 0 0 2 1
NIALOAD NIALOAD 0 0 1 1
TWGX-MAP_10064-8036056040 TWGX-MAP_10064-8036056040 0 0 0 -9
TWGX-MAP_11570-8036056502 TWGX-MAP_11570-8036056502 0 0 0 -9
TWGX-MAP_11680-8036055912 TWGX-MAP_11680-8036055912 0 0 0 -9
BEGIN { FS = OFS = "\t" }
# Just once, before processing the file, set FS (file separator) and OFS (output file separator) to be the tab character
/TWGX/ {tmp = $1 "_" $2; $1 = $2 = tmp}
# For every line that contains a match for TWGX create a mashup of the first two columns, and assign it to each of columns 1 and 2. (Note that in awk string concatenation is done by simply putting expressions next to one another)
1
# This is an awk idiom that consists of the pattern 1, which is always true. By not explicitly specifying an action to go with that pattern, the default action of printing the whole line will be executed.

Combine count files into one file and keep zero values

I have multiple count files that look like this:
File1.tab
6 10 0
49 0 53
15 0 15
0 0 0
0 0 0
0 0 0
Other file:
File2.tab
3 1 2
29 0 29
4 0 4
0 0 0
0 0 0
0 0 0
I have over 30 files and I want to combine the second column of each file into one big file.
I know this question have already been asked and I found a similar here How to combine column from multiple text files?
I used the answer from previous question for my problem:
paste *.tab | awk '{i=2;while($i); {printf("%d ",$i);i+=3}printf("\n")}'
The problem is that zero values are not printed, I get something like this:
10 1
and I want something like this:
10 1
0 0
0 0
0 0
0 0
0 0
I cheked the printf format specifiers, but none works. How can I solve this problem?
You picked a bad "answer" to build on. Try this:
paste *.tab |
awk '{for (i=2; i<=NF; i+=3) printf "%s%s", (i>2?OFS:""), $i; print ""}'

How to find sum of elements in column inside of a text file (Bash)

I have a log file with lots of unnecessary information. The only important part of that file is a table which describes some statistics. My goal is to have a script which will accept a column name as argument and return the sum of all the elements in the specified column.
Example log file:
.........
Skipped....
........
WARNING: [AA[409]: Some bad thing happened.
--- TOOL_A: READING COMPLETED. CPU TIME = 0 REAL TIME = 2
--------------------------------------------------------------------------------
----- TOOL_A statistics -----
--------------------------------------------------------------------------------
NAME Attr1 Attr2 Attr3 Attr4 Attr5
--------------------------------------------------------------------------------
AAA 885 0 0 0 0
AAAA2 1 0 2 0 0
AAAA4 0 0 2 0 0
AAAA8 0 0 2 0 0
AAAA16 0 0 2 0 0
AAAA1 0 0 2 0 0
AAAA8 0 0 23 0 0
AAAAAAA4 0 0 18 0 0
AAAA2 0 0 14 0 0
AAAAAA2 0 0 21 0 0
AAAAA4 0 0 23 0 0
AAAAA1 0 0 47 0 0
AAAAAA1 2 0 26 0
NOTE: Some notes
......
Skipped ......
The expected usage script.sh Attr1
Expected output:
888
I've tried to find something with sed/awk but failed to figure out a solution.
tldr;
$ cat myscript.sh
#!/bin/sh
logfile=${1}
attribute=${2}
field=$(grep -o "NAME.\+${attribute}" ${logfile} | wc -w)
sed -nre '/NAME/,/NOTE/{/NAME/d;/NOTE/d;s/\s+/\t/gp;}' ${logfile} | \
cut -f${field} | \
paste -sd+ | \
bc
$ ./myscript.sh mylog.log Attr3
182
Explanation:
assign command-line arguments ${1} and ${2} to the logfile and attribute variables, respectively.
with wc -w, count the quantity of words within the line that
contains both NAME and ${attribute} (the field index) and assign it to field
with sed
suppress automatic printing (-n) and enable extended regular expressions (-r)
find lines between the NAME and NOTE lines, inclusive
delete the lines that match NAME and NOTE
translate each contiguous run of whitespace to a single tab and print the result
cut using the field index
paste all numbers as an infix summation
evaluate the infix summation via bc
Quick and dirty (without any other spec)
awk -v CountCol=2 '/^[^[:blank:]]/ && NF == 6 { S += $( CountCol) } END{ print S + 0 }' YourFile
with column name
awk -v ColName='Attr1' '/^[[:blank:]]/ && NF == 6 { for(i=1;i<=NF;i++){if ( $i == ColName) CountCol = i } /^[^[:blank:]]/ && NF == 6 && CountCol{ S += $( CountCol) } END{ print S + 0 }' YourFile
you should add a header/trailer filter to avoid noisy line (a flag suit perfect for this) but lack of info about structure to set this flag, i use sthe simple field count (assuming text field have 0 as value so not changing the sum when taken in count)
$ awk -v col='Attr3' '/NAME/{for (i=1;i<=NF;i++) f[$i]=i} col in f{sum+=$(f[col]); if (!NF) {print sum+0; exit} }' file
182

Insert new lines with missing values in a array

I have a data like below:
2016-07-25:06 5
2016-07-25:07 1
2016-07-25:08 1
2016-07-25:09 2
2016-07-25:10 1
2016-07-25:11 1
2016-07-25:13 9
2016-07-25:14 1
In the above i should display hours from 00 to till 23, like below:
2016-07-25:00 0
2016-07-25:01 0
2016-07-25:02 0
2016-07-25:03 0
2016-07-25:04 0
2016-07-25:05 0
2016-07-25:06 5
2016-07-25:07 1
2016-07-25:08 1
2016-07-25:09 2
2016-07-25:10 1
2016-07-25:11 1
2016-07-25:12 0
2016-07-25:13 9
2016-07-25:14 1
2016-07-25:15 0
2016-07-25:16 0
2016-07-25:17 0
2016-07-25:18 0
2016-07-25:19 0
2016-07-25:20 0
2016-07-25:21 0
2016-07-25:22 0
2016-07-25:23 0
could you please let me know how can i achieve this using awk?
Thank you!!!
Using awk you can do this:
awk -F '[:[:blank:]]+' '{for (;i<$2; i++) printf "%s:%02d\t0\n", $1, i; print; i++; s=$1}
END{for (;i<24; i++) printf "%s:%02d\t0\n", s, i}' file
2016-07-25:00 0
2016-07-25:01 0
2016-07-25:02 0
2016-07-25:03 0
2016-07-25:04 0
2016-07-25:05 0
2016-07-25:06 5
2016-07-25:07 1
2016-07-25:08 1
2016-07-25:09 2
2016-07-25:10 1
2016-07-25:11 1
2016-07-25:12 0
2016-07-25:13 9
2016-07-25:14 1
2016-07-25:15 0
2016-07-25:16 0
2016-07-25:17 0
2016-07-25:18 0
2016-07-25:19 0
2016-07-25:20 0
2016-07-25:21 0
2016-07-25:22 0
2016-07-25:23 0
$ cat tst.awk
BEGIN { FS="[:[:space:]]+" }
function prt() {
if ( NR > 1 ) {
for (i=0; i<=23; i++) {
printf "%s:%02d%s%d\n", $1, i, OFS, val[$1,i]
}
delete val
}
}
$1 != prev { prt() }
{ val[$1,$2+0]=$3; prev=$1 }
END { prt() }
.
$ awk -f tst.awk file
2016-07-25:00 0
2016-07-25:01 0
2016-07-25:02 0
2016-07-25:03 0
2016-07-25:04 0
2016-07-25:05 0
2016-07-25:06 5
2016-07-25:07 1
2016-07-25:08 1
2016-07-25:09 2
2016-07-25:10 1
2016-07-25:11 1
2016-07-25:12 0
2016-07-25:13 9
2016-07-25:14 1
2016-07-25:15 0
2016-07-25:16 0
2016-07-25:17 0
2016-07-25:18 0
2016-07-25:19 0
2016-07-25:20 0
2016-07-25:21 0
2016-07-25:22 0
2016-07-25:23 0
This uses more tools than just awk, but it might be helpful:
#!/bin/bash
date="2016-07-25" #or a method to get the date you are interested in
#Generate all the zero lines
remaining=`for i in 0{0..9} {10..23}; do echo "$date:$i 0"; done | grep -v "$(cat datafile | awk '{print $1}')"`
#Add the original data and sort the lines
echo -e "$remaining\n$(cat datafile)" | sort -n

Resources