how to group sequence of item into square brackets - bash

how to group sequence of item into square brackets
for example
List of items
cat item.txt
sn01
sn02
sn03
sn05
sn07
sn08
Desired output
sn[01-03,05,07-08]

If your data is same as shown Input_file sample then following may help you in same.
awk 'FNR==1{line=$0} {sub(/[a-z]+/,"")} $0-val>1 && val1!=val{out=out?out "," val1"-"val:line"[" val1"-"val;val1=$0} $0-val>1 && val1==val{out=out?out "," val1:out "," val1;val1=$0} {if(FNR==1){sub(/[0-9]+/,"",line);val1=$0};val=$0}END{if(val1!=val){print out "," val1"-"val"]"} else {print out "," val"]"}}' Input_file
Adding non-one liner form of solution too.
awk '
FNR==1{
line=$0
}
{
sub(/[a-z]+/,"")
}
$0-val>1 && val1!=val{
out=out?out "," val1"-"val:line"[" val1"-"val;
val1=$0
}
$0-val>1 && val1==val{
out=out?out "," val1:out "," val1;
val1=$0
}
{
if(FNR==1){
sub(/[0-9]+/,"",line);
val1=$0
};
val=$0
}
END{
if(val1!=val){
print out "," val1"-"val"]"
}
else{
print out "," val"]"
}
}
' Input_file
Output will be as follows.
sn[01-03,05,07-08]

"sn" being static here. it should pick it from the input file. when I
given list of items start with "cn". still it picks "sn"
Using awk:
$ cat infile
sn01
sn02
sn03
sn05
sn07
sn08
cn08
cn09
cn10
cn11
cn15
when search='sn'
$ awk -v search='sn' 'function pr(){if(f && l)printf("%s%s",n?",":search"[",f==l?f:f"-"l)}$0!~"^"search{next}{t=$1;sub(/[^0-9]+/,"",t)}f==""{f=l=t;next}t==l+1{l=t;next}{pr();f=l=t;n++}END{pr(); print n?"]":"Nothing matched for keyword :"search}' infile
sn[01-03,05,07-08]
when search='cn'
$ awk -v search='cn' 'function pr(){if(f && l)printf("%s%s",n?",":search"[",f==l?f:f"-"l)}$0!~"^"search{next}{t=$1;sub(/[^0-9]+/,"",t)}f==""{f=l=t;next}t==l+1{l=t;next}{pr();f=l=t;n++}END{pr(); print n?"]":"Nothing matched for keyword :"search}' infile
cn[08-11,15]
Better Readable :
awk -v search='sn' '
function pr()
{
if(f && l)
printf("%s%s",n?",":search"[",f==l?f:f"-"l)
}
$0!~"^"search{
next
}
{
t=$1;
sub(/[^0-9]+/,"",t)
}
f==""{
f=l=t;
next
}
t==l+1{
l=t;
next
}
{
pr();
f=l=t;
n++
}
END{
pr();
print n?"]":"Nothing matched for keyword :"search
}' infile

a simple awk solution
We're aiming to set LB and UB for each possible range.
Starting from LB, the last number in sequence upto which the common difference is 1 gives us the UB.
If difference is more than 1 print the last range and set LB again.
$ awk 'FNR==1{ $1=$1; prefix=substr($0,1,2);} {gsub(/[^0-9]/,"",$1); a[++i]=$1;} END{ printf prefix"["; LB=UB=prev=a[1]; for(i=1; i<=NR; i++){ if(int(a[i+1])==int(prev+1)) { UB=a[i+1]; prev=UB; } else { if(LB==UB) { printf LB"," } else {delim=(i==NR)? "]" :","; printf LB "-" UB delim; } prev=LB=UB=a[i+1]; }} }' file
sn[01-03,05,07-08]
gsub(/[^0-9]/,"",$1) : This sets all non-digit chars to null. Therefore $1 ends up with just numbers;
To understand it better :
$ awk 'FNR==1{ $1=$1; prefix=substr($0,1,2); } {gsub(/[^0-9]/,"",$1); a[++i]=$1;}
END
{
printf prefix"["; LB=UB=prev=a[1];
for(i=1; i<=NR; i++)
{
if(int(a[i+1])==int(prev+1))
{
UB=a[i+1];
prev=UB; }
else
{
if(LB==UB)
{
printf LB","
}
else
{
delim=(i==NR)? "]" :",";
printf LB "-" UB delim;
}
prev=LB=UB=a[i+1];
}
}
}' file

Awk solution:
awk '{ v=substr($0,3) }NR==1{ pfx=substr($0,1,2); r=a=v; next }
{ diff=v-a; if(diff>1) { r=r ((a==last)? ",":"-"a",")v; last=v } a=v }
END{ if(diff==1) r=r"-"v; print pfx"["r"]" }' file
The output:
sn[01-03,05,07-08]

Related

how to find out common columns and its records from two files using awk

I have two files:
File 1:
id|name|address|country
1|abc|efg|xyz
2|asd|dfg|uio
File 2(only headers):
id|name|country
Now, I want an output like:
OUTPUT:
id|name|country
1|abc|xyz
2|asd|uio
Basically, I have a user record file(file1) and a header file(file2).Now, I want to extract only those records from (file1) whose columns match with that in the header file.
I want to do this using awk or bash.
I tried using:
awk 'BEGIN { OFS="..."} FNR==NR { a[(FNR"")] = $0; next } { print a[(FNR"")], $0 > "test.txt"}' header.txt file.txt
and have no idea what to do next.
Thank You
Following awk may help you on same.
awk -F"|" 'FNR==NR{for(i=1;i<=NF;i++){a[$i]};next} FNR==1 && FNR!=NR{for(j=1;j<=NF;j++){if($j in a){b[++p]=j}}} {for(o=1;o<=p;o++){printf("%s%s",$b[o],o==p?ORS:OFS)}}' OFS="|" File2 File1
Adding a non-one liner form of solution too now.
awk -F"|" '
FNR==NR{
for(i=1;i<=NF;i++){
a[$i]};
next}
FNR==1 && FNR!=NR{
for(j=1;j<=NF;j++){
if($j in a){ b[++p]=j }}
}
{
for(o=1;o<=p;o++){
printf("%s%s",$b[o],o==p?ORS:OFS)}
}
' OFS="|" File2 File1
Edit by Ed Morton: FWIW here's the same script written with normal indenting/spacing and a couple of more meaningful variable names:
BEGIN { FS=OFS="|" }
NR==FNR {
for (i=1; i<=NF; i++) {
names[$i]
}
next
}
FNR==1 {
for (i=1; i<=NF; i++) {
if ($i in names) {
f[++numFlds] = i
}
}
}
{
for (i=1; i<=numFlds; i++) {
printf "%s%s", $(f[i]), (i<numFlds ? OFS : ORS)
}
}
with (lot's of) unix pipes as Doug McIlroy intended...
$ function p() { sed 1q "$1" | tr '|' '\n' | cat -n | sort -k2; }
$ cut -d'|' -f"$(join -j2 <(p header) <(p file) | sort -k2n | cut -d' ' -f3 | paste -sd,)" file
id|name|country
1|abc|xyz
2|asd|uio
Solution using bash>4:
IFS='|' headers1=($(head -n1 $file1))
IFS='|' headers2=($(head -n1 $file2))
IFS=$'\n'
# find idxes we want to output, ie. mapping of headers1 to headers2
idx=()
for i in $(seq 0 $((${#headers2[#]}-1))); do
for j in $(seq 0 $((${#headers1[#]}-1))); do
if [ "${headers2[$i]}" == "${headers1[$j]}" ]; then
idx+=($j)
break
fi
done
done
# idx=(0 1 3) for example
# simple join output function from https://stackoverflow.com/questions/1527049/join-elements-of-an-array
join_by() { local IFS="$1"; shift; echo "$*"; }
# first line - output headers
join_by '|' "${headers2[#]}"
isfirst=true
while IFS='|' read -a vals; do
# ignore first (header line)
if $isfirst; then
isfirst=false
continue;
fi;
# filter from line only columns with idx indices
tmp=()
for i in ${idx[#]}; do
tmp+=("${vals[$i]}")
done
# join ouptut with '|'
join_by '|' "${tmp[#]}"
done < $file1
This one respects the order of columns in file1, changed the order:
$ cat file1
id|country|name
The awk:
$ awk '
BEGIN { FS=OFS="|" }
NR==1 { # file1
n=split($0,a)
next
}
NR==2 { # file2 header
for(i=1;i<=NF;i++)
b[$i]=i
}
{ # output part
for(i=1;i<=n;i++)
printf "%s%s", $b[a[i]], (i==n?ORS:OFS)
}' file1 file2
id|country|name
1|xyz|abc
2|uio|asd
(Another version using cut for outputing in revisions)
This is similar to RavinderSingh13's solution, in that it first reads the headers from the shorter file, and then decides which columns to keep from the longer file based on the headers on the first line of it.
It however does the output differently. Instead of constructing a string, it shifts the columns to the left if it does not want to include a particular field.
BEGIN { FS = OFS = "|" }
# read headers from first file
NR == FNR { for (i = 1; i <= NF; ++i) header[$i]; next }
# mark fields in second file as "selected" if the header corresponds
# to a header in the first file
FNR == 1 {
for (i = 1; i <= NF; ++i)
select[i] = ($i in header)
}
{
skip = 0
pos = 1
for (i = 1; i <= NF; ++i)
if (!select[i]) { # we don't want this field
++skip
$pos = $(pos + skip) # shift fields left
} else
++pos
NF -= skip # adjust number of fields
print
}
Running this:
$ mawk -f script.awk file2 file1
id|name|country
1|abc|xyz
2|asd|uio

AWK: increment a field based on values from previous line

Given the following input for AWK:
10;20;20
8;41;41
15;52;52
How could I increase/decrease the values so that:
$1 = remains unchanged
$2 = $2 of previous line + $1 of previous line + 1
$3 = $3 of previous line + $1 of previous line + 1
So the desired output would be:
10;20;20
8;31;31
15;40;40
I need to auto-increment and loop over the lines,
using associative arrays, but it's confusing for me.
Surely, this doesn't work as desired:
#!/bin/awk -f
BEGIN { FS = ";" }
{
print ln, st, of
ln=$1
st=$2 + ln + 1
of=$3 + ln + 1
}
with awk
awk -F";" -v OFS=";"
'NR!=1{ $2=a[2]+a[1]+1; $3=a[3]+a[1]+1 } { split($0,a,FS) } 1' file
split the line to an array and when processing the next line we can use the values stored.
test
10;20;20
8;31;31
15;40;40
Following awk may help you in same.
awk -F";" '
FNR==1{
val=$1;
val1=$2;
val2=$3;
print;
next
}
{
$2=val+val1+1;
$3=val+val2+1;
print;
val=$1;
val1=$2;
val2=$3;
}' OFS=";" Input_file
For your given Input_file, output will be as follows.
10;20;20
8;31;31
15;40;40
awk 'BEGIN{
FS = OFS = ";"
}
FNR>1{
$2 = p2 + p1 + 1
$3 = p3 + p1 + 1
}
{
p1=$1; p2=$2; p3=$3
}1
' infile
Input:
$ cat infile
10;20;20
8;41;41
15;52;52
Output:
awk 'BEGIN{FS=OFS=";"}FNR>1{$2=p2+p1+1; $3=p3+p1+1 }{p1=$1; p2=$2; p3=$3}1' infile
10;20;20
8;31;31
15;40;40
Or store only fields of your interest
awk -v myfields="2,3" '
BEGIN{
FS=OFS=";";
split(myfields,t,/,/)
}
{
for(i in t)
{
if(FNR>1)
{
$(t[i]) = a[t[i]] + a[1] + 1
}
a[t[i]] = $(t[i])
}
a[1] = $1
}1' infile

Parsing column data using sed

My input file:
SMOKE_TEST_FIMS,"['a', 'b', 'c']",2015-08-01 14:00:00+0000,100
4.AIQM-B,,2015-04-16 12:04:21+0000,102
000TEST2,['1.034820'],2015-11-19 05:00:00+0000,130
I want to parse the string in such a way that output file will look like below:
Expected output:
'SMOKE_TEST_FIMS', 'a', '2015-08-01 14:00:00+0000','100'
'SMOKE_TEST_FIMS','b', '2015-08-01 14:00:00+0000','100'
'SMOKE_TEST_FIMS','c', '2015-08-01 14:00:00+0000','100'
'4.AIQM-B','','2015-04-16 12:04:21+0000','102'
'000TEST2','1.034820','2015-11-19 05:00:00+0000','130'
I was able to parse the single column data ['a','b','c'] to
'a'
'b'
'c'
sed -i "s/ *\"/'/g;s/ *[^0-9]*\('[^']*'\)\]*'*/\1/g;s/\(.\)''/\1'\n'/g;" updatebomStatement2.cql
If you are are ok with an gnu awk solution here is such a script:
script.awk
BEGIN { FPAT = "(\"[^\"]+\")|(\\[[^\\]]+\\])|([^,]*)"
OFS = ","
}
{ if ( $2~/\[[^\]]+/ ) {
# sanitize input: strip ", [, ]:
gsub(/[\[\]\"]/, "", $2)
# split at "," into parts: and print them
split($2, parts, ",")
for( ind in parts ) {
# further normalize input
gsub(/^ ?'/, "", parts[ind])
gsub(/'$/, "", parts[ind])
tmp=sprintf("'%s','%s','%s','%s'", $1, parts[ind], $3, $4)
print tmp
}
}
else {
tmp=sprintf("'%s','%s','%s','%s'", $1, $2, $3, $4)
print tmp
}
}
Run it like this: awk -f script.awk yourfile.
Imho gnu awk with its FPAT feature and its control statements is much better suited for your requirements than sed.
The first line with the FPAT describes what mades up a field in your input. It is either
something inside double quotes "
something inside brackets [ ... ]
or something separated by comma
The if statement matches that bracket case which has to be split into several lines.
sed is for simple substitutions on individual lines, that is all. For anything more interesting you should be using awk:
$ cat tst.awk
BEGIN { FS=",?\"?[][]\"?,?"; OFS="," }
{
if (split($2,a,/\047/)) {
for (j=2; j in a; j+=2) {
$2 = a[j]
prt()
}
}
else {
prt()
}
}
function prt( out) {
out = "\047" $0 "\047"
gsub(OFS,"\047,\047",out)
print out
}
$ awk -f tst.awk file
'SMOKE_TEST_FIMS','a','2015-08-01 14:00:00+0000','100'
'SMOKE_TEST_FIMS','b','2015-08-01 14:00:00+0000','100'
'SMOKE_TEST_FIMS','c','2015-08-01 14:00:00+0000','100'
'4.AIQM-B','','2015-04-16 12:04:21+0000','102'
'000TEST2','1.034820','2015-11-19 05:00:00+0000','130'
or building on #karakfa's idea:
$ cat tst.awk
BEGIN { FS="([][ \"\047])*,([][ \"\047])*"; OFS="\047,\047" }
{
for(i=2; i<=(NF-2); i++) {
print "\047" $1, $i, $(NF-1), $NF "\047"
}
}
$ awk -f tst.awk file
'SMOKE_TEST_FIMS','a','2015-08-01 14:00:00+0000','100'
'SMOKE_TEST_FIMS','b','2015-08-01 14:00:00+0000','100'
'SMOKE_TEST_FIMS','c','2015-08-01 14:00:00+0000','100'
'4.AIQM-B','','2015-04-16 12:04:21+0000','102'
'000TEST2','1.034820','2015-11-19 05:00:00+0000','130'
alternative hacky awk
$ awk -F, -v OFS=, -v q="'" '{gsub(/[ "\]\[]/, "");
for(i=2;i <=NF-2; i++)
{$i=$i?$i:q q;
print q $1 q, $i, q $(NF-1) q,q $NF q}}' file
'SMOKE_TEST_FIMS','a','2015-08-0114:00:00+0000','100'
'SMOKE_TEST_FIMS','b','2015-08-0114:00:00+0000','100'
'SMOKE_TEST_FIMS','c','2015-08-0114:00:00+0000','100'
'4.AIQM-B','','2015-04-1612:04:21+0000','102'
'000TEST2','1.034820','2015-11-1905:00:00+0000','130'

How to print a pattern using AWK?

I need to find in file word that matches regex pattern.
So if in line, i have:
00:10:20,918 I [AbstractAction.java] - register | 0.0.0.0 | {GW_CHANNEL=AA, PWD=********, ID=777777, GW_USER=BB, NUM=3996, SYSTEM_USER=OS, LOGIC_ID=0}
awk -F' ' '{for(i=1;i<=NF;i++){ if($i ~ /GW_USER/ && /GW_CHANNEL/){print $5 " " $i} } }'
Print only:
register GW_USER=BB
I wonna get:
register GW_USER=BB GW_CHANNEL=AA
How to print GW_USER and GW_CHANNEL columns?
Your if condition isn't looking right, you can use regex alternation:
awk '{for(i=1;i<=NF;i++){ if($i ~ /GW_USER|GW_CHANNEL/) print $5, $i } }' file
There is no need to use -F" " and " " in print as that is default field separator.
Your condition:
if($i ~ /GW_USER/ && /GW_CHANNEL/)
Will match FW_USER against $i but will match GW_CHANNEL in whole line.
Whenever you have name=value pairs in your input, it's a good idea to create an array that maps the names to the values and then print by name:
$ cat tst.awk
match($0,/{[^}]+/) {
str = substr($0,RSTART+1,RLENGTH-1)
split(str,arr,/[ ,=]+/)
delete n2v
for (i=1; i in arr; i+=2) {
n2v[arr[i]] = arr[i+1]
}
print $5, fmt("GW_USER"), fmt("GW_CHANNEL")
}
function fmt(name) { return (name "=" n2v[name]) }
$
$ awk -f tst.awk file
register GW_USER=BB GW_CHANNEL=AA
that way you trivially print or do anything else you want with any other field in future.

awk script, need to return results on multi lines following keyword

Using this:
awk '$1 == "pool" { f=1; print $1,$2; next }
f == 1 { if ($1 == "pool") { print }
else if ($1 == "members") { print }
else if ($0 ~ /^}/) { f=0 }
}' bigip.conf
That works fine until the config has the IPs on following lines.
How can I get it to print the IPs if they are on following lines.
The config has both, some have it on the same line, some on then next 1, 2 or 3 lines.
the data :
pool pl_stage_xxx_microsites_9483 {
monitor all tcp_half_open
members {
11.11.11.11:9483 {}
11.22.22.22:9483 {
session user disabled
}
}
}
Try the following awk code:
awk '
$1 == "pool" {
f=1
print $1,$2
next
}
f == 1 {
if ($1 == "pool") {
print
}
else if ($1 == "members") {
print
getline
while ($0 ~ "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}:[0-9]{1,5}"){
print
getline
}
}
else if ($0 ~ /^}/) {
f=0
}
}'
That will print the IP lines while they exists.
It's hard to say without seeing more of your data and your expected output but I think all you need is something like this:
awk '
/^}/ { inPool=0 }
$1 == "pool" { inPool=1; inMembers=0 }
inPool {
if ($1 == "pool") {
print $1, $2
print
}
else if ($1 == "members") {
inMembers = 1
}
if (inMembers) {
print
}
}
' file
The above should be a good starting point at least. wrt the other answer posted using getline - getline has some appropriate uses but this isn't one of them, don't use getline until you fully understand and can live with all of it's caveats, see http://awk.info/?tip/getline.

Resources