I have a file that looks like this:
2360 111037877 111105745 111161458
505 111128359 111026865 111006164
375 117170057 0 0
247 117086016 0 0
613 117030996 117010050 117029287
I want to change all the values in column 3 to zero so that the file looks like this:
2360 111037877 0 111161458
505 111128359 0 111006164
375 117170057 0 0
247 117086016 0 0
613 117030996 0 117029287
How can I do this? I know this is a very basic question but I can't do it with awk. I was trying to do something like this:
awk '{$3 = 0}' old file > new file
and
awk '$3 == "*" { $3=0}' old file > new file
Your first try was almost good, you just forgot to print the line:
awk '{$3 = 0; print}'
A shorter version of the same thing:
awk '{$3 = 0}1'
You just need to print $0
% awk '{$3=0;print $0}' inp.txt
2360 111037877 0 111161458
505 111128359 0 111006164
375 117170057 0 0
247 117086016 0 0
613 117030996 0 117029287
Related
I have multiple count files that look like this:
File1.tab
6 10 0
49 0 53
15 0 15
0 0 0
0 0 0
0 0 0
Other file:
File2.tab
3 1 2
29 0 29
4 0 4
0 0 0
0 0 0
0 0 0
I have over 30 files and I want to combine the second column of each file into one big file.
I know this question have already been asked and I found a similar here How to combine column from multiple text files?
I used the answer from previous question for my problem:
paste *.tab | awk '{i=2;while($i); {printf("%d ",$i);i+=3}printf("\n")}'
The problem is that zero values are not printed, I get something like this:
10 1
and I want something like this:
10 1
0 0
0 0
0 0
0 0
0 0
I cheked the printf format specifiers, but none works. How can I solve this problem?
You picked a bad "answer" to build on. Try this:
paste *.tab |
awk '{for (i=2; i<=NF; i+=3) printf "%s%s", (i>2?OFS:""), $i; print ""}'
I've an input file which looks as below.
pmx . pmnosysrelspeechneighbr -m 1 -r
INFO: The ROP files contain suspected faulty counter values.
They have been discarded but can be kept with pmr/pmx option "k" (pmrk/pmxk) or highlighted with pmx option "s" (pmxs)
Date: 2017-11-04
Object Counter 14:45 15:00 15:15 15:30
UtranCell=UE1069XA0 pmNoSysRelSpeechNeighbr 0 1 0 0
UtranCell=UE1069XA1 pmNoSysRelSpeechNeighbr 0 0 0 0
UtranCell=UE1069XA2 pmNoSysRelSpeechNeighbr 0 0 0 0
UtranCell=UE1069XA3 pmNoSysRelSpeechNeighbr 0 0 2 0
UtranCell=UE1069XB0 pmNoSysRelSpeechNeighbr 0 0 0 0
UtranCell=UE1069XB1 pmNoSysRelSpeechNeighbr 0 0 0 3
UtranCell=UE1069XB2 pmNoSysRelSpeechNeighbr 0 0 0 0
UtranCell=UE1069XB3 pmNoSysRelSpeechNeighbr 0 0 0 0
UtranCell=UE1069XC0 pmNoSysRelSpeechNeighbr 0 0 0 0
UtranCell=UE1069XC1 pmNoSysRelSpeechNeighbr 0 0 0 4
UtranCell=UE1069XC2 pmNoSysRelSpeechNeighbr 0 0 0 0
UtranCell=UE1069XC3 pmNoSysRelSpeechNeighbr 0 0 1 0
UtranCell=UE1164XA0 pmNoSysRelSpeechNeighbr 0 3 0 0
UtranCell=UE1164XA1 pmNoSysRelSpeechNeighbr 0 0 0 0
UtranCell=UE1164XA2 pmNoSysRelSpeechNeighbr 1 0 0 0
Now I want the output as below which is basically sum of the time column (from $3 to $6) values.
Counter 14:45 15:00 15:15 15:30
pmNoSysRelSpeechNeighbr 1 4 3 7
I've been trying with below command. But it's just giving sum of one column values:
pmx . pmnosysrelspeechneighbr -m 1 -r | grep - i ^Object| awk '{sum += $4} END {print $1 , sum}'
Try this out, you will get both header and trailer as sum of individual columns.
BEGIN {
trail="pmNoSysRelSpeechNeighbr";
}
{
if($1=="Object") print $2 OFS $3 OFS $4 OFS $5 OFS $6;
else if($1 ~ /^UtranCell/) {
w+=$3; x+=$4; y+=$5; z+=$6;
}
}
END {
print trail OFS w OFS x OFS y OFS z;
}
You need to sum each of the columns separately:
awk -v g=pmNoSysRelSpeechNeighbr '$0 ~ g { for(i=3;i<=6;i++) sum[i]+=$i }
END { printf g; for(i=3;i<=6;i++) printf OFS sum[i] }' file
but only for lines (records) containing the group (counter) of interest ($0~"pmNoSysRelSpeechNeighbr").
Note you (almost) never need to pipe grep's output to awk, because awk already supports extended regular expressions filtering with /regex/ { action }, or var ~ /regex/ { action }. One exception would be the need for PCRE (grep -P).
As an alternative to awk for simple "command-line statistical operations" on textual files, you could also use GNU datamash.
For example, to sum columns 3 to 6, but group by column 2:
grep 'UtranCell' file | datamash -W -g2 sum 3-6
I have a log file with lots of unnecessary information. The only important part of that file is a table which describes some statistics. My goal is to have a script which will accept a column name as argument and return the sum of all the elements in the specified column.
Example log file:
.........
Skipped....
........
WARNING: [AA[409]: Some bad thing happened.
--- TOOL_A: READING COMPLETED. CPU TIME = 0 REAL TIME = 2
--------------------------------------------------------------------------------
----- TOOL_A statistics -----
--------------------------------------------------------------------------------
NAME Attr1 Attr2 Attr3 Attr4 Attr5
--------------------------------------------------------------------------------
AAA 885 0 0 0 0
AAAA2 1 0 2 0 0
AAAA4 0 0 2 0 0
AAAA8 0 0 2 0 0
AAAA16 0 0 2 0 0
AAAA1 0 0 2 0 0
AAAA8 0 0 23 0 0
AAAAAAA4 0 0 18 0 0
AAAA2 0 0 14 0 0
AAAAAA2 0 0 21 0 0
AAAAA4 0 0 23 0 0
AAAAA1 0 0 47 0 0
AAAAAA1 2 0 26 0
NOTE: Some notes
......
Skipped ......
The expected usage script.sh Attr1
Expected output:
888
I've tried to find something with sed/awk but failed to figure out a solution.
tldr;
$ cat myscript.sh
#!/bin/sh
logfile=${1}
attribute=${2}
field=$(grep -o "NAME.\+${attribute}" ${logfile} | wc -w)
sed -nre '/NAME/,/NOTE/{/NAME/d;/NOTE/d;s/\s+/\t/gp;}' ${logfile} | \
cut -f${field} | \
paste -sd+ | \
bc
$ ./myscript.sh mylog.log Attr3
182
Explanation:
assign command-line arguments ${1} and ${2} to the logfile and attribute variables, respectively.
with wc -w, count the quantity of words within the line that
contains both NAME and ${attribute} (the field index) and assign it to field
with sed
suppress automatic printing (-n) and enable extended regular expressions (-r)
find lines between the NAME and NOTE lines, inclusive
delete the lines that match NAME and NOTE
translate each contiguous run of whitespace to a single tab and print the result
cut using the field index
paste all numbers as an infix summation
evaluate the infix summation via bc
Quick and dirty (without any other spec)
awk -v CountCol=2 '/^[^[:blank:]]/ && NF == 6 { S += $( CountCol) } END{ print S + 0 }' YourFile
with column name
awk -v ColName='Attr1' '/^[[:blank:]]/ && NF == 6 { for(i=1;i<=NF;i++){if ( $i == ColName) CountCol = i } /^[^[:blank:]]/ && NF == 6 && CountCol{ S += $( CountCol) } END{ print S + 0 }' YourFile
you should add a header/trailer filter to avoid noisy line (a flag suit perfect for this) but lack of info about structure to set this flag, i use sthe simple field count (assuming text field have 0 as value so not changing the sum when taken in count)
$ awk -v col='Attr3' '/NAME/{for (i=1;i<=NF;i++) f[$i]=i} col in f{sum+=$(f[col]); if (!NF) {print sum+0; exit} }' file
182
I am writing a bash script using awk with variable passing in AIX powerpc machine.
The code written works fine after I read some questions & answers in this site. :)
Now, I'd like to add if statement in my bash script and I got awk syntax errors.
My requirement (see my script below):
- if awk pattern matching is true, print $0
- else print "transaction: p value not found."
Content of trans.txt
bash-4.2$ cat trans.txt
10291413
8537353
8619033
8619065
8625705
Could someone help me please. Thank you.
content of getDetail.sh
#!/usr/bin/bash
sysDir="/var/syslog-ng/TEST/STS"
syslogFile="DPSTSLog2014-10-22T09.log"
pattern="Latency"
filename=$1
while read f; do
concatPattern="$f.*$pattern"
awk -v p="$concatPattern" '$0 ~ p {print $0}' $sysDir/$syslogFile
done < $filename
run command execution below
./getDetails.sh trans.txt
result
2014-10-22T09:15:53+11:00,10.16.198.50,info,latency,[info] xmlfirewall(ws-TrustValidate): trans(10291413): Latency: 0 0 0 0 14 14 0 14 0 0 0 14 0 0 0 0
2014-10-22T09:15:38+11:00,10.16.198.50,info,latency,[info] xmlfirewall(ws-TrustValidate): trans(8619033): Latency: 0 0 0 0 73 73 0 73 0 0 0 73 0 0 0 0
2014-10-22T09:18:04+11:00,10.16.198.50,info,latency,[info] xmlfirewall(ws-TrustValidate): trans(8625705): Latency: 0 0 0 0 13 13 0 13 0 0 0 13 0 0 0 0
awk -v p="$concatPattern" '
$0 ~ p {print $0; found=1}
END {if (! found) print "transaction: p value not found"}
' $sysDir/$syslogFile
I would like to add two additional conditions to the actual code I have: print '+' if in File2 field 5 is greater than 35 and also field 7 is grater than 90.
Code:
while read -r line
do
grep -q "$line" File2.txt && echo "$line +" || echo "$line -"
done < File1.txt '
Input file 1:
HAPS_0001
HAPS_0002
HAPS_0005
HAPS_0006
HAPS_0007
HAPS_0008
HAPS_0009
HAPS_0010
Input file 2 (tab-delimited):
Query DEG_ID E-value Score %Identity %Positive %Matching_Len
HAPS_0001 protein:plasmid:149679 3.00E-67 645 45 59 91
HAPS_0002 protein:plasmid:139928 4.00E-99 924 34 50 85
HAPS_0005 protein:plasmid:134646 3.00E-98 915 38 55 91
HAPS_0006 protein:plasmid:111988 1.00E-32 345 33 54 86
HAPS_0007 - - 0 0 0 0
HAPS_0008 - - 0 0 0 0
HAPS_0009 - - 0 0 0 0
HAPS_0010 - - 0 0 0 0
Desired output (tab-delimited):
HAPS_0001 +
HAPS_0002 -
HAPS_0005 +
HAPS_0006 -
HAPS_0007 -
HAPS_0008 -
HAPS_0009 -
HAPS_0010 -
Thanks!
This should work:
$ awk '
BEGIN {FS = OFS = "\t"}
NR==FNR {if($5>35 && $7>90) a[$1]++; next}
{print (($1 in a) ? $0 FS "+" : $0 FS "-")}' f2 f1
HAPS_0001 +
HAPS_0002 -
HAPS_0005 +
HAPS_0006 -
HAPS_0007 -
HAPS_0008 -
HAPS_0009 -
HAPS_0010 -
join file1.txt <( tail -n +2 file2.txt) | awk '
$2 = ($5 > 35 && $7 > 90)?"+":"-" { print $1, $2 }'
You don't care about the second field in the output, so overwrite it with the appropriate sign for the output.