Custom field and row separator in AWK (bash) - bash

I have data in this order :
Person:Joe
Age:24
City:PH
---
Person:Joe
Age:22
City:NY
And i want to achieve data in this format
John|24|PH
Joe|22|NY
I tried with custom RS and OFS but i can't do this property.

$ awk -v RS= -F'[:[:space:]]+' -v OFS='|' '{print $2, $4, $6}' file
Joe|24|PH
Joe|22|NY

Related

Is it possible to change column header and filter a column in one command?

I'm using awk to filter interesting lines in a large text file before reading it with a statistical software.
Here is some dummy data
printf 'VEGETABLE_NAME,RECIPE_NAME,OBSCURE_CODE\ncarrot,cake,1\ncarrot,soup,1\npotato,cake,2\nspinach,soup,1' > dummydata.dat
I have managed to :
Change the column header
$ awk -F, 'NR==1 {$0="vegetable,recipe,code"} 1' dummydata.dat
vegetable,recipe,code
carrot,cake,1
carrot,soup,1
potato,cake,2
spinach,soup,1
Filter for product code 1
$ awk -F, '$3 ~ /^1/' dummydata.dat
carrot,cake,1
carrot,soup,1
spinach,soup,1
But when I try to combine both commands, the result doesn't include the column header:
$ awk -F, 'NR==1 {$0="vegetable,recipe,code"} $3 ~ /^1/' dummydata.dat
carrot,cake,1
carrot,soup,1
spinach,soup,1
In your approach, you didn't get the column header because it will print lines
only based on the condition
$3 ~ /^1/
If that evaluates to true(1), then print else(0) don't. Unfortunately it evaluates false for the header.
Below is my try
awk -v FS="," 'BEGIN{print "vegetable,recipe,code"}NR>1 && $3==1'
vegetable,recipe,code
carrot,cake,1
carrot,soup,1
spinach,soup,1
You are setting $0 for NR==1 but that record never gets printed anywhere.
You can make a small change in your script to make it:
awk -F, 'NR==1{print "vegetable,recipe,code"} $3 ~ /^1$/' dummydata.dat
vegetable,recipe,code
carrot,cake,1
carrot,soup,1
spinach,soup,1

Append 2 column variables in unix

I have a file as follows.
file1.csv
H,2 A:B,pq
D,34 C:B,wq
D,64 F:B,rq
D,6 R:B,tq
I want to format 2nd a column as follows
H,02 0A:0B,pq
D,34 0C:0B,wq
D,64 0F:0B,rq
D,06 0R:0B,tq
I am able to separate the column and format it but cannot merge it
I use following command
formated_nums =`awk -F"," '{print $2}' file1.csv | awk '{print $1}' | awk '{if(length($1)!=2){$1="0"$1}}1'`
formated_letters = `awk -F"," '{print $2}' file1.csv | awk '{print $2}' | awk -F":" '{if(length($1)!=2){$1="0"$1}; if(length($2)!=2){$2="0"$2}}1'| awk '{print $1":"$2}'`
Now I want to merge formated_nums and formated_letters with a space in between
I tried echo "${formated_nums} ${formated_letters}" but it takes variables as rows and appends the whole thing as a row
The simplest I found in awk is to use another separation including space and ':' and reformat the final layout. The only real tricky part is the number that need sometimes to add a 0 in front but it's trivial in formating because number are never bigger than 2 digit (here)
awk -F '[[:blank:],:]' '{printf("%s,%02d 0%s:0%s,%s", $1, $2, $3, $4, $5)}' YourFile
Assuming your data are in the same format (no bigger latest field with space or other "separator" inside)
An alternative awk solution based on gnu awk :
awk -F"[, :]" '{sub($2,sprintf("%02d",$2));sub($3,"0" $3);sub($4,"0" $4)}1' file1
H,02 0A:0B,pq
D,34 0C:0B,wq
D,64 0F:0B,rq
D,06 0R:0B,tq
It sounds like this is what you're really looking for:
$ awk '
BEGIN { FS=OFS=","; p=2 }
{ split($2,t,/[ :]/); for (i in t) {n=length(t[i]); t[i] = (n<p ? sprintf("%0*s",p-n,0) : "") t[i]; $2=t[1]" "t[2]":"t[3]} }
1
' file
H,02 0A:0B,pq
D,34 0C:0B,wq
D,64 0F:0B,rq
D,06 0R:0B,tq

Convert time from dd/mm/yyyy hh:mm:ss to unix timestamp in bash script

I have browsed through the similar threads and they helped me come closest to what I want to do but didn't fully answer my question.
I have a date in the format dd/mm/yyyy hh:mm:ss ($mydate = 26/12/2013 09:42:42) that I want to convert in unix timestamp via the command:
date -d $mydate +%s
But here the accepted format is this one: yyyy-mm-dd hh:mm:ss
So I did this transformation:
echo $mydate| awk -F' ' '{printf $1}'| awk -F/ '{printf "%s-%s-%s\n",$3,$2,$1}'
And have this ouput:
2013-12-26
Which is good, now I try to append to hour part before doing the conversion:
echo $mydate| awk -F' ' '{printf $1; $hour=$2}'| awk -F/ '{printf "%s-%s-%s %s\n",$3,$2,$1,$hour}'
But then I have this:
2013-12-26 26/12/2013
It seem to not keep the variable $hour.
I am new in awk, how could I do this ?
In awk you can use a regex as a field separator. In your case instead of using awk twice you may want to do the following:
echo $mydate| awk -F' |/' '{printf "%s-%s-%s %s",$3,$2,$1,$4}'
With this we use both space and / as separators. First 3 parts are the date field, 4th one is time which lies after space.
The thing here is that you are "losing" the time block when you say awk '{print $1}'.
Instead, you can use a single awk command like:
awk -F'[: /]' '{printf "%d-%d-%d %d:%d:%d", $3, $2, $1, $4, $5, $6}'
This slices the record based on either :, / or space and then puts them back together in the desired format.
Test:
$ echo "26/12/2013 09:42:42" | awk -F'[: /]' '{printf "%d-%d-%d %d:%d:%d", $3, $2, $1, $4, $5, $6}'
2013-12-26 9:42:42
Then store it in a var and use date -d: formatted_date=$(awk '...'); date -d"$formatted_date".

enclose a string where missing double quotes

I have an input file like below. The issue is that the file is pipe delimited and enclosed by double quotes, optionally. It is missed in the third field at the end of the string and I could see that it happens whenever the length exceeds say 2.
"SER1828"|"ZXC"|"A1"|10002
"SER1878"|"IOP"|"B1"|98989
"SER1930"|"QWE"|"A2"|10301
"SER1930"|"QWE"|"Asdf2|10301 # 3rd field -> closing " missed out
The output should look like
"SER1828"|"ZXC"|"A1"|10002
"SER1878"|"IOP"|"B1"|98989
"SER1930"|"QWE"|"A2"|10301
"SER1930"|"QWE"|"Asdf2"|10301
I was trying with some awk commands but could not achieve it.
awk -F'|' -v q=\" '{$3=$3 q;}1' OFS=| temp
awk -F'|' -v q=\" '{if (length($3) > 2) ($3=$3;}1)}' OFS='|' temp
Using awk you can write,
awk -F'"?\\|' -vOFS='"|' '{print $1, $2, $3, $4}'
Example
awk -F'"?\\|' -vOFS='"|' '{print $1, $2, $3, $4}' input
"SER1828"|"ZXC"|"A1"|10002
"SER1878"|"IOP"|"B1"|98989
"SER1930"|"QWE"|"A2"|10301
"SER1930"|"QWE"|"Asdf2"|10301
What it does?
-F'"?\\|' Sets the input field separator to either "| or |
-vOFS='"|' Sets the output filed separator to "|. This is set always, that is even if the input field separator is | or "|
Or you can also write
awk -F'"?\|' -vOFS='"|' '1' input
Here 1 is always evaluated to true, in which case it will print the entire line.
awk -F'"?\\|' -vOFS='"|' '1' input
or
awk -F'"?\\|' -vOFS='"|' '{$1=$1}1' input
See #Kent's comment.
EDIT
If you want to add the quoting only for the third filed based on the length, you can write something like
awk -F'|' -vOFS='|' '{print $1, $2, $3(length($3)>4 ? "\"" : ""), $4}'
this sed one-liner works for given example:
sed 's/\([^"]\)|"/\1"|"/' file # this only works for the original example
This works for the original and current example:
sed 's/\([^"]\)|/\1"|/' file
awk '{sub(/Asdf2/,"Asdf2\"")}1' file
"SER1828"|"ZXC"|"A1"|10002
"SER1878"|"IOP"|"B1"|98989
"SER1930"|"QWE"|"A2"|10301
"SER1930"|"QWE"|"Asdf2"|10301

Awk input and output file delimiter

I try to parse a column delimited password file using awk and put hostname in the beginning and add some fields. I need a comma separated output. So what I try is:
/usr/xpg4/bin/awk -F':' MYHOST=$(hostname) 'BEGIN{OFS=",";} {print MYHOST, $1, $3, $4, $5;}' /etc/passwd
But this command didn't produce output I wanted. This is a Solaris box, regular awk didn't work so I try with /usr/xpg4/bin/awk
this may help you:
/usr/xpg4/bin/awk -F':' -v MYHOST="$(hostname)" 'BEGIN{OFS=","} {print MYHOST, $1, $3, $4, $5;}' /etc/passwd

Resources