Append 2 column variables in unix - shell

I have a file as follows.
file1.csv
H,2 A:B,pq
D,34 C:B,wq
D,64 F:B,rq
D,6 R:B,tq
I want to format 2nd a column as follows
H,02 0A:0B,pq
D,34 0C:0B,wq
D,64 0F:0B,rq
D,06 0R:0B,tq
I am able to separate the column and format it but cannot merge it
I use following command
formated_nums =`awk -F"," '{print $2}' file1.csv | awk '{print $1}' | awk '{if(length($1)!=2){$1="0"$1}}1'`
formated_letters = `awk -F"," '{print $2}' file1.csv | awk '{print $2}' | awk -F":" '{if(length($1)!=2){$1="0"$1}; if(length($2)!=2){$2="0"$2}}1'| awk '{print $1":"$2}'`
Now I want to merge formated_nums and formated_letters with a space in between
I tried echo "${formated_nums} ${formated_letters}" but it takes variables as rows and appends the whole thing as a row

The simplest I found in awk is to use another separation including space and ':' and reformat the final layout. The only real tricky part is the number that need sometimes to add a 0 in front but it's trivial in formating because number are never bigger than 2 digit (here)
awk -F '[[:blank:],:]' '{printf("%s,%02d 0%s:0%s,%s", $1, $2, $3, $4, $5)}' YourFile
Assuming your data are in the same format (no bigger latest field with space or other "separator" inside)

An alternative awk solution based on gnu awk :
awk -F"[, :]" '{sub($2,sprintf("%02d",$2));sub($3,"0" $3);sub($4,"0" $4)}1' file1
H,02 0A:0B,pq
D,34 0C:0B,wq
D,64 0F:0B,rq
D,06 0R:0B,tq

It sounds like this is what you're really looking for:
$ awk '
BEGIN { FS=OFS=","; p=2 }
{ split($2,t,/[ :]/); for (i in t) {n=length(t[i]); t[i] = (n<p ? sprintf("%0*s",p-n,0) : "") t[i]; $2=t[1]" "t[2]":"t[3]} }
1
' file
H,02 0A:0B,pq
D,34 0C:0B,wq
D,64 0F:0B,rq
D,06 0R:0B,tq

Related

enclose a string where missing double quotes

I have an input file like below. The issue is that the file is pipe delimited and enclosed by double quotes, optionally. It is missed in the third field at the end of the string and I could see that it happens whenever the length exceeds say 2.
"SER1828"|"ZXC"|"A1"|10002
"SER1878"|"IOP"|"B1"|98989
"SER1930"|"QWE"|"A2"|10301
"SER1930"|"QWE"|"Asdf2|10301 # 3rd field -> closing " missed out
The output should look like
"SER1828"|"ZXC"|"A1"|10002
"SER1878"|"IOP"|"B1"|98989
"SER1930"|"QWE"|"A2"|10301
"SER1930"|"QWE"|"Asdf2"|10301
I was trying with some awk commands but could not achieve it.
awk -F'|' -v q=\" '{$3=$3 q;}1' OFS=| temp
awk -F'|' -v q=\" '{if (length($3) > 2) ($3=$3;}1)}' OFS='|' temp
Using awk you can write,
awk -F'"?\\|' -vOFS='"|' '{print $1, $2, $3, $4}'
Example
awk -F'"?\\|' -vOFS='"|' '{print $1, $2, $3, $4}' input
"SER1828"|"ZXC"|"A1"|10002
"SER1878"|"IOP"|"B1"|98989
"SER1930"|"QWE"|"A2"|10301
"SER1930"|"QWE"|"Asdf2"|10301
What it does?
-F'"?\\|' Sets the input field separator to either "| or |
-vOFS='"|' Sets the output filed separator to "|. This is set always, that is even if the input field separator is | or "|
Or you can also write
awk -F'"?\|' -vOFS='"|' '1' input
Here 1 is always evaluated to true, in which case it will print the entire line.
awk -F'"?\\|' -vOFS='"|' '1' input
or
awk -F'"?\\|' -vOFS='"|' '{$1=$1}1' input
See #Kent's comment.
EDIT
If you want to add the quoting only for the third filed based on the length, you can write something like
awk -F'|' -vOFS='|' '{print $1, $2, $3(length($3)>4 ? "\"" : ""), $4}'
this sed one-liner works for given example:
sed 's/\([^"]\)|"/\1"|"/' file # this only works for the original example
This works for the original and current example:
sed 's/\([^"]\)|/\1"|/' file
awk '{sub(/Asdf2/,"Asdf2\"")}1' file
"SER1828"|"ZXC"|"A1"|10002
"SER1878"|"IOP"|"B1"|98989
"SER1930"|"QWE"|"A2"|10301
"SER1930"|"QWE"|"Asdf2"|10301

Multiple if statements in awk

I have a file that looks like
01/11/2015;998978000000;4890********3290;5735;ITUNES.COM/BILL;LU;Cross_border_rub;4065;17;915;INSUFF FUNDS;51;0;
There are 13 semicolon separated columns.
I'm trying to calculate 9 columns for all lines:
awk -F ';' -vOFS=';' '{ gsub(",", ".", $9); print }' file |
awk -F ';' '$0 = NR-1";"$0' |
awk -F ';' -vOFS=';' '{bar[$1]=$1;a[$1]=$2;b[$1]=$3;c[$1]=$4;d[$1]=$5;e[$1]=$6;f[$1]=$7;g[$1]=$8;h[$1]=$9;k[$1]=$10;l[$1]=$11;l[$1]=$12;m[$1]=$13;p[$1]=$14;};
if($7="International") {income=0.0162*h[i]+0.0425*h[i]};
else if($7="Domestic") {income=0.0188*h[i]};
else if($7="Cross_border_rub") {income=0.0162*h[i]+0.025*h[i]}
END{for(i in bar) print income";"a[i],b[i],c[i],d[i],e[i],f[i],g[i],h[i],k[i],l[i],m[i],p[i]}'
How exactly do multiple if statements correctly work in awk?
awk to the rescue!
You don't need the multiple awk invocations. Can consolidate into one
$ awk -F';' -v OFS=';' '{gsub(",", ".", $9)}
$7=="International" {income=(0.0162+0.0425)*$9}
$7=="Domestic" {income=0.0188*$9}
$7=="Cross_border_rub" {income=(0.0162+0.025)*$9}
# what happens for other values since previous income will be copied over
{print income, NR-1, $0}' file
test with your file since you didn't provide a enough sample to test.
Perhaps better if you just assign the rate
$ awk -F';' -v OFS=';' '{gsub(",", ".", $9); rate=0}
$7=="International" {rate=0.0162+0.0425}
$7=="Domestic" {rate=0.0188}
$7=="Cross_border_rub" {rate=0.0162+0.025}
{print rate*$9, NR-1, $0}' file

Awk and head not identifying columns properly

Here is my code that I want to use to separate 3 columns from hist.txt into 2 separate files, hist1.dat with first and second column and hist2.dat with first and third column. The columns in hist.txt may be separated with more than one space. I want to save in histogram1.dat and histogram2.dat the first n lines until the last nonzero value.
The script creates histogram1.dat correct, but histogram2.dat contains all the lines from hist2.dat.
hist.txt is like :
http://pastebin.com/JqgSKZrP
#!bin/bash
sed 's/\t/ /g' hist.txt | awk '{print $1 " " $2;}' > hist1.dat
sed 's/\t/ /g' hist.txt | awk '{print $1 " " $3;}' > hist2.dat
head -n $( awk 'BEGIN {last=1}; {if($2!=0) last=NR};END {print last}' hist1.dat) hist1.dat > histogram1.dat
head -n $( awk 'BEGIN {last=1}; {if($2!=0) last=NR};END {print last}' hist2.dat) hist2.dat > histogram2.dat
What is the cause of this problem? Might it be due to some special restriction with head?
Thanks.
For your first histogram, try
awk '$2 ~ /000000/{exit}{print $1, $2}' hist.txt
and for your second:
awk '$3 ~ /000000/{exit}{print $1, $3}' hist.txt
Hope I understood you correctly...

Awk multiple pipes to different parts

I have some data where each line looks more or less like this:
11/11/2013 12:10:10,5000,3000,2000
So with normal awk I would get:
$1 = 11/11/2013
$2 = 12:10:10,5000,3000,2000
Now I want to pipe these two element to different new awk functions, because $1 needs to be split based upon a forward slash and $2 needs to be split based upon a comma..
However, with
awk '{print $1}' $INPUT | awk -F/ '{print $3 "-" $2 "-" $1}'> $OUTPUT
I just get access to the date and then Iam already "through the file". How to pipe multiple times?
You can use awk with multiple field separators also. Consider below code:
> s='11/11/2013 12:10:10,5000,3000,2000'
> awk -F '[,/: ]+' '{for (i=1; i<=NF; i++) print i":"$i}' <<< "$s"
1:11
2:11
3:2013
4:12
5:10
6:10
7:5000
8:3000
9:2000
You can just call the split function. No need for a whole new awk program:
s='11/11/2013 12:10:10,5000,3000,2000'
awk '{split($1,d,"/"); split($2,t,","); print d[3]"-"d[2]"-"d[1]}' <<<"$s"
you can do whatever you need to with the components of t the same way... t[1] for the first, etc.

awk - split only by first occurrence

I have a line like:
one:two:three:four:five:six seven:eight
and I want to use awk to get $1 to be one and $2 to be two:three:four:five:six seven:eight
I know I can get it by doing sed before. That is to change the first occurrence of : with sed then awk it using the new delimiter.
However replacing the delimiter with a new one would not help me since I can not guarantee that the new delimiter will not already be somewhere in the text.
I want to know if there is an option to get awk to behave this way
So something like:
awk -F: '{print $1,$2}'
will print:
one two:three:four:five:six seven:eight
I will also want to do some manipulations on $1 and $2 so I don't want just to substitute the first occurrence of :.
Without any substitutions
echo "one:two:three:four:five" | awk -F: '{ st = index($0,":");print $1 " " substr($0,st+1)}'
The index command finds the first occurance of the ":" in the whole string, so in this case the variable st would be set to 4. I then use substr function to grab all the rest of the string from starting from position st+1, if no end number supplied it'll go to the end of the string. The output being
one two:three:four:five
If you want to do further processing you could always set the string to a variable for further processing.
rem = substr($0,st+1)
Note this was tested on Solaris AWK but I can't see any reason why this shouldn't work on other flavours.
Some like this?
echo "one:two:three:four:five:six" | awk '{sub(/:/," ")}1'
one two:three:four:five:six
This replaces the first : to space.
You can then later get it into $1, $2
echo "one:two:three:four:five:six" | awk '{sub(/:/," ")}1' | awk '{print $1,$2}'
one two:three:four:five:six
Or in same awk, so even with substitution, you get $1 and $2 the way you like
echo "one:two:three:four:five:six" | awk '{sub(/:/," ");$1=$1;print $1,$2}'
one two:three:four:five:six
EDIT:
Using a different separator you can get first one as filed $1 and rest in $2 like this:
echo "one:two:three:four:five:six seven:eight" | awk -F\| '{sub(/:/,"|");$1=$1;print "$1="$1 "\n$2="$2}'
$1=one
$2=two:three:four:five:six seven:eight
Unique separator
echo "one:two:three:four:five:six seven:eight" | awk -F"#;#." '{sub(/:/,"#;#.");$1=$1;print "$1="$1 "\n$2="$2}'
$1=one
$2=two:three:four:five:six seven:eight
The closest you can get with is with GNU awk's FPAT:
$ awk '{print $1}' FPAT='(^[^:]+)|(:.*)' file
one
$ awk '{print $2}' FPAT='(^[^:]+)|(:.*)' file
:two:three:four:five:six seven:eight
But $2 will include the leading delimiter but you could use substr to fix that:
$ awk '{print substr($2,2)}' FPAT='(^[^:]+)|(:.*)' file
two:three:four:five:six seven:eight
So putting it all together:
$ awk '{print $1, substr($2,2)}' FPAT='(^[^:]+)|(:.*)' file
one two:three:four:five:six seven:eight
Storing the results of the substr back in $2 will allow further processing on $2 without the leading delimiter:
$ awk '{$2=substr($2,2); print $1,$2}' FPAT='(^[^:]+)|(:.*)' file
one two:three:four:five:six seven:eight
A solution that should work with mawk 1.3.3:
awk '{n=index($0,":");s=$0;$1=substr(s,1,n-1);$2=substr(s,n+1);print $1}' FS='\0'
one
awk '{n=index($0,":");s=$0;$1=substr(s,1,n-1);$2=substr(s,n+1);print $2}' FS='\0'
two:three:four five:six:seven
awk '{n=index($0,":");s=$0;$1=substr(s,1,n-1);$2=substr(s,n+1);print $1,$2}' FS='\0'
one two:three:four five:six:seven
Just throwing this on here as a solution I came up with where I wanted to split the first two columns on : but keep the rest of the line intact.
Comments inline.
echo "a:b:c:d::e" | \
awk '{
split($0,f,":"); # split $0 into array of fields `f`
sub(/^([^:]+:){2}/,"",$0); # remove first two "fields" from `$0`
print f[1],f[2],$0 # print first two elements of `f` and edited `$0`
}'
Returns:
a b c:d::e
In my input I didn't have to worry about the first two fields containing escaped :, if that was a requirement, this solution wouldn't work as expected.
Amended to match the original requirements:
echo "a:b:c:d::e" | \
awk '{
split($0,f,":");
sub(/^([^:]+:)/,"",$0);
print f[1],$0
}'
Returns:
a b:c:d::e

Resources