awk dynamic field list to print - bash

what is the best way to do something like
awk '{ print $1, $3, $5 }'
but in a dynamic way?
The case is that the last field to print in only known at runtime.
so it might be $1, $3, $5, it might also be $1, $3, $5, $7, $9 or even more
my first trial is like:
awk -v MAX=7 '{for (i = 2; i < MAX; i+=2) {print i,$i} }'
but it print one field in a line:
a[2]
a[4]
a[6]
instead of
a[2] a[4] a[6]
is there a better way to achieve this?
Thanks for all your suggestion. :)
One follow-up question.
at runtime, I already have the sequence available, say MyArray=(2 4 6 8)
is there a way to "pass" this array into awk and ask awk to print $2 $4 $6 $8 ?
so that I can save one for-loop inside awk

Using printf omits the newlines but you'll need to use a format string and add a newline yourself:
awk -v MAX=7 '{
for (i = 2; i < MAX; i+=2) {
printf "a[%d]=%s " i, $i
}
printf "\n"
}'
Note that the above example includes a trailing space after the last a[N]=VAL on each line. If you want to omit it, you can use something like:
awk -v MAX=7 '{
print_num = 0
for (i = 2; i < MAX; i+=2) {
if ( print_num++ > 0 ) { printf " " }
printf "a[%d]=%s" i, $i
}
printf "\n"
}'

With an array A you can do
for i in $(seq 0 2 ${#A}); do
printf "%s " ${A[i]}
done | sed 's/ $/\n/'

Related

AWK - using element on next record GETLINE?

I got some problem with this basic data:
DP;DG
67;
;10
;14
;14
;18
;18
;22
;65
68;
;0
;9
;25
;25
70;
that I'd like to transform on this kind of output:
DP;DG
67;
;10
;14
;14
;18
;18
;22
;65;x
68;
;0
;9
;25
;25;x
70;
The "x" value comes if on the next line $1 exists or if $2 is null. From my understanding, I've to use getline but I don't get the way!
I've tried the following code:
#!/bin/bash
file2=tmp.csv
file3=fin.csv
awk 'BEGIN {FS=OFS=";"}
{
print $0;
getline;
if($2="") {print $0";x"}
else {print $0}
}' $file2 > $file3
Seemed easy. I don't mention the result, totally different from my expectation.
Some clue? Is getline necessary on this problem?
OK, I continue to test some code:
#!/bin/bash
file2=tmp.csv
file3=fin.csv
awk 'BEGIN {FS=OFS=";"}
{
getline var
if (var ~ /.*;$/) {
print $0";x";
print var;
}
else {
print $0;
print var;
}
}' $file2 > $file3
It's quite better, but still, all lines that should be marked aren't... I don't get why...
alternative one pass version
$ awk -F\; 'NR>1 {printf "%s\n", (f && $2<0?"x":"")}
{f=$1<0; printf "%s", $0}
END {print ""}' file
give this one-liner a try:
awk -F';' 'NR==FNR{if($1>0||!$2)a[NR-1];next}FNR in a{$0=$0";x"}7' file file
or
awk -F';' 'NR==FNR{if($1~/\S/||$2).....

Spreading cell values into columns using UNIX

Suppose we have this file:
head file
id,name,value
1,Je,1
2,Je,1
3,Ko,1
4,Ne,1
5,Ne,1
6,Je,1
7,Ko,1
8,Ne,1
9,Ne,1
And I'd like to get this out:
id,Je,Ko,Ne
1,1,0,0
2,1,0,0
3,0,1,0
4,0,0,1
5,0,0,1
6,1,0,0
7,0,1,0
8,0,0,1
9,0,0,1
Does someone know how to get this output, using awk or sed?
Assuming that the possible values of name are only Je or Ko or Ne, you can do:
awk -F, 'BEGIN{print "id,Je,Ko,Ne"}
NR==1{ next }
{je=$2=="Je"?"1":"0";
ko=$2=="Ko"?"1":"0";
ne=$2=="Ne"?"1":"0";
print $1","je","ko","ne}' file
If you want something that will print the values in the same order they are read and not limited to your example fields, you could do:
awk -F, 'BEGIN{OFS=FS; x=1;y=1}
NR==1 { next }
!($2 in oa){ oa[$2]=1; ar[x++]=$2}
{lines[y++]=$0;}
END{
s="";
for (i=1; i<x; i++)
s=s==""?ar[i]:s OFS ar[i];
print "id" OFS s;
for (j=1; j<y; j++){
split(lines[j], a)
s=""
for (i=1; i<x; i++) {
tt=ar[i]==a[2]?"1":"0"
s=s==""?tt:s OFS tt;
}
print a[1] OFS s;
}
}
' file
Here's a "two-pass solution" (along the lines suggested by #Drakosha) implemented using a single invocation of awk. The implementation would be a little simpler if there was no requirement regarding the ordering of names.
awk -F, '
# global: n, array a
function println(ix,name,value, i,line) {
line=ix;
for (i=0;i<n;i++) {
if (a[i]==name) {line=line OFS value} else {line=line OFS 0}
}
print line;
}
BEGIN {OFS=FS; n=0}
FNR==1 {next} # skip the header each time
NR==FNR {if (!mem[$2]) {mem[$2] = a[n++] = $2}; next}
!s { s="id"; for (i=0;i<n;i++) {s=s OFS a[i]}; print s}
{println($1, $2, $3)}
' file file
I suggest 2 passes.
1st will generate all the possible values of column 2 (Je, Ko, Ne,
...).
2nd will be able to trivially generate the output you are looking for.
awk -F, 'BEGIN{s="Je,Ko,Ne";print "id,"s}
NR>1 {m=s; sub($2,1,m); gsub("[^0-9,]+","0",m); print $1","m}' file

Bash: remove words from string containing numbers

In bash how to perform a string rename deleting all words that contains a number:
name_befor_proc="art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061.jpg"
result:
name_after_proc="art-of-medusa.jpg"
In sed, remove everything between - that contains a number.
sed 's/[^-]*[0-9][^-\.]*-\{0,1\}//g;s/-\././' test
art-of-medusa.jpg
I guess there is no generic solution, also you can use the following python script for your particular use case
name = "art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061.jpg"
ext = name.split(".")[1]
def contains_number(word):
for i in "0123456789":
if i in word:
return False
return True
final = '-'.join([word for word in name.split('-') if contains_number(word)])
if ext not in final:
final += "."+ext
print final
output:
art-of-medusa.jpg
It is not trivial!
awk -F"." -v sep="-" '
{n=split($1,a,sep)
for (i=1; i<=n; i++)
{if (a[i] ~ /[0-9]/) delete a[i]}
n=length(a)
for (i in a)
printf "%s%s", a[i], (++c<n?sep:"")
printf "%s%s\n", FS, $2}'
Split the string (up to the dot) and loop through the pieces. If one contains a digit, remove it. Then, rejoin the array and print accordingly.
Test
$ awk -F"." -v sep="-" '{n=split($1,a,sep); for (i=1; i<=n; i++) {if (a[i] ~ /[0-9]/) delete a[i]}; n=length(a); for (i in a) printf "%s%s", a[i], (++c<n?sep:""); printf "%s%s\n", FS, $2}' <<< "art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061.jpg"
art-of-medusa.jpg
Testing with "art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061-a-23-b.jpg" to make sure other words are also matched:
$ awk -F"." -v sep="-" '{n=split($1,a,sep); for (i=1; i<=n; i++) {if (a[i] ~ /[0-9]/) delete a[i]}; n=length(a); for (i in a) printf "%s%s", a[i], (++c<n?sep:""); printf "%s%s\n", FS, $2}' <<< "art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061-a-23-b.jpg"
art-of-medusa-a-b.jpg
You can use gnu-awk for this:
s="art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061.jpg"
name_after_proc=$(awk -v RS='[.-]' '!/[[:digit:]]/{printf r $1} {r=RT}' <<< "$s")
echo "$name_after_proc"
art-of-medusa.jpg
Two possible solutions:
Using Sed:
sed 's/[a-zA-Z0-9]*[0-9][a-zA-Z0-9]*/ /g' filename
Using grep:
grep -wo -E [a-zA-Z]+ foo | xargs filename

How to print a pattern using AWK?

I need to find in file word that matches regex pattern.
So if in line, i have:
00:10:20,918 I [AbstractAction.java] - register | 0.0.0.0 | {GW_CHANNEL=AA, PWD=********, ID=777777, GW_USER=BB, NUM=3996, SYSTEM_USER=OS, LOGIC_ID=0}
awk -F' ' '{for(i=1;i<=NF;i++){ if($i ~ /GW_USER/ && /GW_CHANNEL/){print $5 " " $i} } }'
Print only:
register GW_USER=BB
I wonna get:
register GW_USER=BB GW_CHANNEL=AA
How to print GW_USER and GW_CHANNEL columns?
Your if condition isn't looking right, you can use regex alternation:
awk '{for(i=1;i<=NF;i++){ if($i ~ /GW_USER|GW_CHANNEL/) print $5, $i } }' file
There is no need to use -F" " and " " in print as that is default field separator.
Your condition:
if($i ~ /GW_USER/ && /GW_CHANNEL/)
Will match FW_USER against $i but will match GW_CHANNEL in whole line.
Whenever you have name=value pairs in your input, it's a good idea to create an array that maps the names to the values and then print by name:
$ cat tst.awk
match($0,/{[^}]+/) {
str = substr($0,RSTART+1,RLENGTH-1)
split(str,arr,/[ ,=]+/)
delete n2v
for (i=1; i in arr; i+=2) {
n2v[arr[i]] = arr[i+1]
}
print $5, fmt("GW_USER"), fmt("GW_CHANNEL")
}
function fmt(name) { return (name "=" n2v[name]) }
$
$ awk -f tst.awk file
register GW_USER=BB GW_CHANNEL=AA
that way you trivially print or do anything else you want with any other field in future.

Transpose CSV data with awk (pivot transformation)

my CSV data looks like this:
Indicator;Country;Value
no_of_people;USA;500
no_of_people;Germany;300
no_of_people;France;200
area_in_km;USA;18
area_in_km;Germany;16
area_in_km;France;17
proportion_males;USA;5.3
proportion_males;Germany;7.9
proportion_males;France;2.4
I want my data to look like this:
Country;no_of_people;area_in_km;proportion_males
USA;500;18;5.3
Germany;300;16;7.9
France;200;17;2.4
There are more Indicators and more countries than listed here.
Pretty large files (number of rows something with 5 digits).
Looked around for some transpose threads, but nothing matched my situation (also I'm quite new to awk, so I couldn't change the code I found to fit my data).
Thanks for your help.
Regards
Ad
If the number of Ind fields is fixed, you can do:
awk 'BEGIN{FS=OFS=";"}
{a[$2,$1]=$3; count[$2]}
END {for (i in count) print i, a[i,"Ind1"], a[i, "Ind2"], a[i, "Ind3"]}' file
Explanation
BEGIN{FS=OFS=";"} set input and output field separator as semicolon.
{a[$2,$1]=$3; count[$2]} get list of countries in count[] array and values of each Ind on a["country","Ind"] array.
END {for (i in count) print i, a[i,"Ind1"], a[i, "Ind2"], a[i, "Ind3"]} print the summary of the values.
Output
$ awk 'BEGIN{FS=OFS=";"} {a[$2,$1]=$3; count[$2]} END {for (i in count) print i, a[i,"Ind1"], a[i, "Ind2"], a[i, "Ind3"]}' file
France;200;17;2.4
Germany;300;16;7.9
USA;500;18;5.3
Update
unfortunately, the number of Indicators is not fixed. Also, they are
not named like "Ind1", "Ind2" etc. but are just strings.' I clarified
my question.
$ awk -v FS=";" '{a[$2,$1]=$3; count[$2]; indic[$1]} END {for (j in indic) printf "%s ", j; printf "\n"; for (i in count) {printf "%s ", i; for (j in indic) printf "%s ", a[i,j]; printf "\n"}}' file
proportion_males no_of_people area_in_km
France 2.4 200 17
Germany 7.9 300 16
USA 5.3 500 18
To have ; separated, do replace each space with ;:
$ awk -v FS=";" '{a[$2,$1]=$3; count[$2]; indic[$1]} END {for (j in indic) printf "%s ", j; printf "\n"; for (i in count) {printf "%s ", i; for (j in indic) printf "%s ", a[i,j]; printf "\n"}}' file | tr ' ' ';'
proportion_males;no_of_people;area_in_km;
France;2.4;200;17;
Germany;7.9;300;16;
USA;5.3;500;18;
Using awk and maintaining the order of output:
awk -F\; '
NR>1 {
if(!($1 in indicators)) { indicator[++types] = $1 }; indicators[$1]++
if(!($2 in countries)) { country[++num] = $2 }; countries[$2]++
map[$1,$2] = $3
}
END {
printf "%s;" ,"Country";
for(ind=1; ind<=types; ind++) {
printf "%s%s", sep, indicator[ind];
sep = ";"
}
print "";
for(coun=1; coun<=num; coun++) {
printf "%s", country[coun]
for(val=1; val<=types; val++) {
printf "%s%s", sep, map[indicator[val], country[coun]];
}
print ""
}
}' file
Country;no_of_people;area_in_km;proportion_males
USA;500;18;5.3
Germany;300;16;7.9
France;200;17;2.4

Resources