bash -- merging and manipulation 2 files - bash

I have 2 files of which I currently manipulate each one in awk:
======================= File 1: ===================
0x0002 RUNNING EXISTS foo 253 65535
0x0003 RUNNING EXISTS foo 252 5
0x0004 RUNNING EXISTS foo 251 3
I'm interested in the first field and the last 2.
Field 1: vdisk(in hex). Last two fields are the possible Cdisks for each vdisk. At least 1 must exist. the values are decimal.
If the number "65535" appears, it means that the 2nd cdisk is non-existent.
I use this awk to display a user friendly table:
awk 'BEGIN {print "vdisk cdisk Mr_cdisk"}
{
if ( $3 ~ /EXISTS|THIS_AGENT_ONLINE/ ) {
sub("65535", "N/A")
printf "%-11s %-6s %s\n",$1,$(NF-1),$(NF)
}
}' ${FILE}
Will produce this table:
vdisk cdisk Mr_cdisk
0x0002 253 N/A
0x0003 252 5
0x0004 1 3
======================= File 2: ===================
0x0000 Cmp cli Foo 0 SOME 0 0x0 0x0 0x0
0x0001 Cmp own Foo 1 NONE 0 0x0 0x0 0x0
0x0002 Cmp cli Foo 0 SOME 0 0x0 0x1 0x0
0x0003 Cmp own Foo 0 NONE 0 0x0 0x0 0x1
0x0004 Cmp cli Foo 0 SOME 0 0x0 0x0 0x0
0x0005 Cmp own Foo 1 NONE 0 0x1 0x0 0x0
I'm interested in the "Cmp own" lines, in which the first field is the Cdisk (in hex). The 5th field from the end (just before the SOME/NONE text), is the instance number. It's either 0 or 1.
I use this awk to display a user friendly table:
awk 'BEGIN {print "cdisk(hex) RACE_Instance"}
/Cmp own/ {
printf "%-11s %-10s\n",$1,$(NF-5)
}' ${FILE};
This will produce the following table:
cdisk(hex) Instance
0x0001 1
0x0003 0
0x0005 1
++++++++++++++++++++++++++++++++++++++
What would I like to display a merged table. Preferably, directly from the original files.
It should spread the first data into 2 lines (if there's more than 1 cdisk). This will be the base for the merge. Then print the Instance number, if exist per this cdisk.
vdisk(hex) cdisk(hex) Instance
0x0002 0x00fd N/A
0x0003 0x00fc N/A
0x0003 0x0005 1
0x0004 0x0001 0
0x0004 0x0003 1
I would definitely prefer a solution with awk. :)
Thanks!
EDIT: added some more info and correction to one data table.
EDIT2: Simplified input

I couldn't figure out what the mapping is from your 2 input files to your output but this should point you in the right direction:
$ cat tst.awk
NR==FNR {
v2c[$1] = sprintf("0x%04x",$5)
v2m[$1] = ( $6==65535 ? "N/A" : sprintf("0x%04x",$6) )
next
}
$1 in v2c {
print $1, v2c[$1], $5
print $1, v2m[$1], $5
}
$
$ awk -f tst.awk file1 file2
0x0002 0x00fd 0
0x0002 N/A 0
0x0003 0x00fc 0
0x0003 0x0005 0
0x0004 0x00fb 0
0x0004 0x0003 0

Related

LD: ALIGN vs SUBALIGN in linker scripts

How do they differ?
I read that SUBALIGN() somehow forces a certain alignment. Are there other differences?
When should I use ALIGN() and when should I use SUBALIGN()?
SUBALIGN is
specifically for adjusting the alignment of the input sections within an output section.
To illustrate:
$ cat one.c
char a_one __attribute__((section(".mysection"))) = 0;
char b_one __attribute__((section(".mysection"))) = 0;
$ cat two.c
char a_two __attribute__((section(".mysection"))) = 0;
char b_two __attribute__((section(".mysection"))) = 0;
$ gcc -c one.c two.c
Case 1
$ cat foo_1.lds
SECTIONS
{
. = 0x10004;
.mysection ALIGN(8) : {
*(.mysection)
}
}
$ ld -T foo_1.lds one.o two.o -o foo1.out
$ readelf -s foo1.out
Symbol table '.symtab' contains 9 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000010008 0 SECTION LOCAL DEFAULT 1
2: 0000000000000000 0 SECTION LOCAL DEFAULT 2
3: 0000000000000000 0 FILE LOCAL DEFAULT ABS one.c
4: 0000000000000000 0 FILE LOCAL DEFAULT ABS two.c
5: 000000000001000b 1 OBJECT GLOBAL DEFAULT 1 b_two
6: 0000000000010008 1 OBJECT GLOBAL DEFAULT 1 a_one
7: 0000000000010009 1 OBJECT GLOBAL DEFAULT 1 b_one
8: 000000000001000a 1 OBJECT GLOBAL DEFAULT 1 a_two
$ readelf -t foo1.out | grep -A3 mysection
[ 1] .mysection
PROGBITS PROGBITS 0000000000010008 0000000000010008 0
0000000000000004 0000000000000000 0 1
[0000000000000003]: WRITE, ALLOC
Here, ALIGN(8) aligns .mysection to the next 8-byte boundary, 0x10008,
after 0x10004.
The char symbol a_one, coming from input section one.o(.mysection), is at the start of .mysection
followed at the next byte by b_two, also coming from input section one.o(.mysection). At the next byte,
is a_two, from input section two.o(.mysection), then b_two, also from two.o(.mysection). All 4
objects from all input sections *(.mysection) are just placed end to end from the start of output section .mysection.
Case 2
$ cat foo_2.lds
SECTIONS
{
. = 0x10004;
.mysection ALIGN(8) : SUBALIGN(16) {
*(.mysection)
}
}
$ ld -T foo_2.lds one.o two.o -o foo2.out
$ readelf -s foo2.out
Symbol table '.symtab' contains 9 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000010008 0 SECTION LOCAL DEFAULT 1
2: 0000000000000000 0 SECTION LOCAL DEFAULT 2
3: 0000000000000000 0 FILE LOCAL DEFAULT ABS one.c
4: 0000000000000000 0 FILE LOCAL DEFAULT ABS two.c
5: 0000000000010021 1 OBJECT GLOBAL DEFAULT 1 b_two
6: 0000000000010010 1 OBJECT GLOBAL DEFAULT 1 a_one
7: 0000000000010011 1 OBJECT GLOBAL DEFAULT 1 b_one
8: 0000000000010020 1 OBJECT GLOBAL DEFAULT 1 a_two
$ readelf -t foo2.out | grep -A3 mysection
[ 1] .mysection
PROGBITS PROGBITS 0000000000010008 0000000000010008 0
000000000000001a 0000000000000000 0 16
[0000000000000003]: WRITE, ALLOC
This time, the 8-byte aligned address of .mysection is unchanged. But the
effect of SUBALIGN(16) is that symbol a_one, coming from input
section one.o(.mysection) is placed at the next 16-byte
boundary, 0x10010, after the start of .mysection, and symbol b_one, coming from the
same input section is at the next byte. But symbol a_two, coming from input section
two.o(.mysection) is at the next 16-byte boundary, 0x10020; and b_two, coming
also from two.o(.mysection), is 1 byte after that.

How to find sum of elements in column inside of a text file (Bash)

I have a log file with lots of unnecessary information. The only important part of that file is a table which describes some statistics. My goal is to have a script which will accept a column name as argument and return the sum of all the elements in the specified column.
Example log file:
.........
Skipped....
........
WARNING: [AA[409]: Some bad thing happened.
--- TOOL_A: READING COMPLETED. CPU TIME = 0 REAL TIME = 2
--------------------------------------------------------------------------------
----- TOOL_A statistics -----
--------------------------------------------------------------------------------
NAME Attr1 Attr2 Attr3 Attr4 Attr5
--------------------------------------------------------------------------------
AAA 885 0 0 0 0
AAAA2 1 0 2 0 0
AAAA4 0 0 2 0 0
AAAA8 0 0 2 0 0
AAAA16 0 0 2 0 0
AAAA1 0 0 2 0 0
AAAA8 0 0 23 0 0
AAAAAAA4 0 0 18 0 0
AAAA2 0 0 14 0 0
AAAAAA2 0 0 21 0 0
AAAAA4 0 0 23 0 0
AAAAA1 0 0 47 0 0
AAAAAA1 2 0 26 0
NOTE: Some notes
......
Skipped ......
The expected usage script.sh Attr1
Expected output:
888
I've tried to find something with sed/awk but failed to figure out a solution.
tldr;
$ cat myscript.sh
#!/bin/sh
logfile=${1}
attribute=${2}
field=$(grep -o "NAME.\+${attribute}" ${logfile} | wc -w)
sed -nre '/NAME/,/NOTE/{/NAME/d;/NOTE/d;s/\s+/\t/gp;}' ${logfile} | \
cut -f${field} | \
paste -sd+ | \
bc
$ ./myscript.sh mylog.log Attr3
182
Explanation:
assign command-line arguments ${1} and ${2} to the logfile and attribute variables, respectively.
with wc -w, count the quantity of words within the line that
contains both NAME and ${attribute} (the field index) and assign it to field
with sed
suppress automatic printing (-n) and enable extended regular expressions (-r)
find lines between the NAME and NOTE lines, inclusive
delete the lines that match NAME and NOTE
translate each contiguous run of whitespace to a single tab and print the result
cut using the field index
paste all numbers as an infix summation
evaluate the infix summation via bc
Quick and dirty (without any other spec)
awk -v CountCol=2 '/^[^[:blank:]]/ && NF == 6 { S += $( CountCol) } END{ print S + 0 }' YourFile
with column name
awk -v ColName='Attr1' '/^[[:blank:]]/ && NF == 6 { for(i=1;i<=NF;i++){if ( $i == ColName) CountCol = i } /^[^[:blank:]]/ && NF == 6 && CountCol{ S += $( CountCol) } END{ print S + 0 }' YourFile
you should add a header/trailer filter to avoid noisy line (a flag suit perfect for this) but lack of info about structure to set this flag, i use sthe simple field count (assuming text field have 0 as value so not changing the sum when taken in count)
$ awk -v col='Attr3' '/NAME/{for (i=1;i<=NF;i++) f[$i]=i} col in f{sum+=$(f[col]); if (!NF) {print sum+0; exit} }' file
182

Divide column values of different files by a constant then output one minus the other

I have two files of the form
file1:
#fileheader1
0 123
1 456
2 789
3 999
4 112
5 131
6 415
etc.
file2:
#fileheader2
0 442
1 232
2 542
3 559
4 888
5 231
6 322
etc.
How can I take the second column of each, divide it by a value then minus one from the other and then output a new third file with the new values?
I want the output file to have the form
#outputheader
0 123/c-422/k
1 456/c-232/k
2 789/c-542/k
etc.
where c and k are numbers I can plug into the script
I have seen this question: subtract columns from different files with awk
But I don't know how to use awk to do this by myself, does anyone know how to do this or could explain what is going on in the linked question so I can try to modify it?
I'd write:
awk -v c=10 -v k=20 ' ;# pass values to awk variables
/^#/ {next} ;# skip headers
FNR==NR {val[$1]=$2; next} ;# store values from file1
$1 in val {print $1, (val[$1]/c - $2/k)} ;# perform the calc and print
' file1 file2
output
0 -9.8
1 34
2 51.8
3 71.95
4 -33.2
5 1.55
6 25.4
etc. 0

combining multiple column values in lines based on identical column

I need help to improve my working code for combining multiple rows based on identical values for specified columns.
Here is a sample data:
c-i1_pos-at1-v2 162a AT1G01040.1 2 3422-3443 3433 1
c-i1_pos-at1-v2 162b AT1G01040.1 2 3422-3443 3433 1
pare-i_226-v2-wt 162a AT1G01040.1 2 3422-3443 3433 0
pare-i_226-v2-wt 162b AT1G01040.1 2 3422-3443 3433 0
xrn4-pare-i_ath-227-v2-wt 827 AT1G02860.1 1 258-278 269 1
i2_lib2-v2 156a AT1G03730.1 4 242-260 252 3
i2_lib2-v2 156b AT1G03730.1 4 242-260 252 3
i2_lib2-v2 156c AT1G03730.1 4 242-260 252 3
i2_lib2-v2 156d AT1G03730.1 4 242-260 252 3
i2_lib2-v2 156e AT1G03730.1 4 242-260 252 3
Basically if values in columns $3,$5 are identical, I want to combine rows for columns $2,$6 (or more), with the unique values of the remaining columns merged like this:
AT1G01040.1 3422-3443 3433 162a,162b
AT1G02860.1 258-278 269 827
AT1G03730.1 242-260 252 156a,156b,156c,156d,156e
Right now I am trying to do this in multiple steps, based on the answers here.
awk 'BEGIN{FS=OFS="\t"} {c=$2 FS $3 FS $5; if (c in a) a[c]=a[c]","$6; else a[c]=$6}END{for (k in a) print k,a[k]}'|awk '{p=$1 FS $2 FS $4; if (p in l) l[p]=l[p]","$3;else l[p]=$3}END{for (m in l) print m,l[m]}' <input.txt
Which gives:
AT1G01040.1 3422-3443 3433,3433 162a,162b
AT1G02860.1 258-278 269 827
AT1G03730.1 242-260 252 156a,156b,156c,156d,156e
I thought I should put the values in the remaining columns as arrays to get my desired output at one step, but I am struggling to figure out the correct context.
How about something like
awk '{if ( $3 in a ) a[$3] = a[$3]","$2; else a[$3] = $3" "$5" "$6" "$2} END{for (i in a) print a[i]}' inputFile
Will produce output as
AT1G03730.1 242-260 252 156a,156b,156c,156d,156e
AT1G02860.1 258-278 269 827
AT1G01040.1 3422-3443 3433 162a,162b,162a,162b
Explanation
a[$3] = $3" "$5" "$6" "$2 creates an array a indexed by the third field $3, the else part ensures that the array is created when the row is encounterd for the first time.
if ( $3 in a ) a[$3] = a[$3]","$2 if the third field $3 is already present in the array, append field two $2 to the array
END{for (i in a) print a[i]} END block is excecuted at the end of input. Prints the entire array giving the output
EDIT
A simpler version would be
awk '{( $3 in a ) ? a[$3] = a[$3]","$2 : a[$3] = $3" "$5" "$6" "$2} END{for (i in a) print a[i]}' inputFile
Thank you Jotne for the suggestion.

bash script awk with passing variable and if statement

I am writing a bash script using awk with variable passing in AIX powerpc machine.
The code written works fine after I read some questions & answers in this site. :)
Now, I'd like to add if statement in my bash script and I got awk syntax errors.
My requirement (see my script below):
- if awk pattern matching is true, print $0
- else print "transaction: p value not found."
Content of trans.txt
bash-4.2$ cat trans.txt
10291413
8537353
8619033
8619065
8625705
Could someone help me please. Thank you.
content of getDetail.sh
#!/usr/bin/bash
sysDir="/var/syslog-ng/TEST/STS"
syslogFile="DPSTSLog2014-10-22T09.log"
pattern="Latency"
filename=$1
while read f; do
concatPattern="$f.*$pattern"
awk -v p="$concatPattern" '$0 ~ p {print $0}' $sysDir/$syslogFile
done < $filename
run command execution below
./getDetails.sh trans.txt
result
2014-10-22T09:15:53+11:00,10.16.198.50,info,latency,[info] xmlfirewall(ws-TrustValidate): trans(10291413): Latency: 0 0 0 0 14 14 0 14 0 0 0 14 0 0 0 0
2014-10-22T09:15:38+11:00,10.16.198.50,info,latency,[info] xmlfirewall(ws-TrustValidate): trans(8619033): Latency: 0 0 0 0 73 73 0 73 0 0 0 73 0 0 0 0
2014-10-22T09:18:04+11:00,10.16.198.50,info,latency,[info] xmlfirewall(ws-TrustValidate): trans(8625705): Latency: 0 0 0 0 13 13 0 13 0 0 0 13 0 0 0 0
awk -v p="$concatPattern" '
$0 ~ p {print $0; found=1}
END {if (! found) print "transaction: p value not found"}
' $sysDir/$syslogFile

Resources