How to convert date with awk - macos

My file temp.txt
ID53,20150918,2015-09-19,,0,CENTER<br>
ID54,20150911,2015-09-14,,0,CENTER<br>
ID55,20150911,2015-09-14,,0,CENTER
I need to replace and convert the 2nd field (yyyymmdd) for seconds
I try it, but only the first line is replaced
awk -F"," '{ ("date -j -f ""%Y%m%d"" ""20150918"" ""+%s""") | getline $2; print }' OFS="," temp.txt
and tried to like this
awk -F"," '{system("date -j -f ""%Y%m%d"" "$2" ""+%s""") | getline $2; print }' temp.txt
the output is:
1442619474
sh: 0: command not found
ID53,20150918,2015-09-19,,0,CENTER
1442014674
ID54,20150911,2015-09-14,,0,CENTER
1442014674
ID55,20150911,2015-09-14,,0,CENTER
Using gsub also could not
awk -F"," '{gsub($2,"system("date -j -f ""%Y%m%d"" "$2" ""+%s""")",$2); print}' OFS="," temp.txt
awk: syntax error at source line 1
context is
{gsub($2,"system("date -j -f ""%Y%m%d"" "$2" >>> ""+% <<< s""")",$2); print}
awk: illegal statement at source line 1
extra )
I need the output to be so. How to?
ID53,1442619376,2015-09-19,,0,CENTER
ID54,1442014576,2015-09-14,,0,CENTER
ID55,1442014576,2015-09-14,,0,CENTER

This GNU awk script should make it. If it is not yet installed on your mac, I suggest installing macport and then GNU awk. You can also install a decent version of bash, date and other important utilities for which the default are really disappointing on OSX.
BEGIN { FS = ","; OFS = FS; }
{
y = substr($2, 1, 4);
m = substr($2, 5, 2);
d = substr($2, 7, 2);
$2 = mktime(y " " m " " d " 00 00 00");
print;
}
Put it in a file (e.g. txt2ts.awk) and process your file with:
$ awk -f txt2ts.awk data.txt
ID53,1442527200,2015-09-19,,0,CENTER<br>
ID54,1441922400,2015-09-14,,0,CENTER<br>
ID55,1441922400,2015-09-14,,0,CENTER
Note that we do not have the same timestamps. I let you try to understand where it comes from, it is another problem.
Explanations: substr(s, m, n) returns the n-characters sub-string of s that starts at position m (starting with 1). mktime("YYYY MM DD HH MM SS") converts the date string into a timestamp (seconds since epoch). FS and OFS are the input and output filed separators, respectively. The commands between the curly braces of the BEGIN pattern are executed at the beginning only while the others are executed on each line of the file.

You could use substr:
printf "%s-%s-%s", substr($6,0,4), substr($6,5,2), substr($6,7,2)
Assuming that the 6th field was 20150914, this would produce 2015-09-14

Related

Using sed command in shell script for substring and replace position to need

I’m dealing data on text file and I can’t find a way with sed to select a substring at a fixed position and replace it.
This is what I have:
X|001200000000000000000098765432|1234567890|TQ
This is what I need:
‘X’,’00000098765432’,’1234567890’,’TQ’
The following code in sed gives the substring I need (00000098765432) but not overwrites position to need
echo “ X|001200000000000000000098765432|1234567890|TQ” | sed “s/
*//g;s/|/‘,’/g;s/^/‘/;s/$/‘/“
Could you help me?
Rather than sed, I would use awk for this.
echo "X|001200000000000000000098765432|1234567890|TQ" | awk 'BEGIN {FS="|";OFS=","} {print $1,substr($2,17,14),$3,$4}'
Gives output:
X,00000098765432,1234567890,TQ
Here is how it works:
FS = Field separator (in the input)
OFS = Output field separator (the way you want output to be delimited)
BEGIN -> think of it as the place where configurations are set. It runs only one time. So you are saying you want output to be comma delimited and input is pipe delimited.
substr($2,17,14) -> Take $2 (i.e. second field - awk begins counting from 1 - and then apply substring on it. 17 means the beginning character position and 14 means the number of characters from that position onwards)
In my opinion, this is much more readable and maintainable than sed version you have.
If you want to put the quotes in, I'd still use awk.
$: awk -F'|' 'BEGIN{q="\047"} {print q $1 q","q substr($2,17,14) q","q $3 q","q $4 q"\n"}' <<< "X|001200000000000000000098765432|1234567890|TQ"
'X','00000098765432','1234567890','TQ'
If you just want to use sed, note that you say above you want to remove 16 characters, but you are actually only removing 14.
$: sed -E "s/^(.)[|].{14}([^|]+)[|]([^|]+)[|]([^|]+)/'\1','\2','\3','\4'/" <<< "X|0012000000000000000098765432|1234567890|TQ"
'X','00000098765432','1234567890','TQ'
Using sed
$ sed "s/|\(0[0-9]\{15\}\)\?/','/g;s/^\|$/'/g" input_file
'X','00000098765432','1234567890','TQ'
Using any POSIX awk:
$ echo 'X|001200000000000000000098765432|1234567890|TQ' |
awk -F'|' -v OFS="','" -v q="'" '{sub(/.{16}/,"",$2); print q $0 q}'
'X','00000098765432','1234567890','TQ'
not as elegant as I hoped for, but it gets the job done :
'X','00000098765432','1234567890','TQ'
# gawk profile, created Mon May 9 21:19:17 2022
# BEGIN rule(s)
'BEGIN {
1 _ = sprintf("%*s", (__ = +2)^++__+--__*++__,__--)
1 gsub(".", "[0-9]", _)
1 sub("$", "$", _)
1 FS = "[|]"
1 OFS = "\47,\47"
}
# Rule(s)
1 (NF *= NF == __*__) * sub(_, "|&", $__) * \
sub("^.*[|]", "", $__) * sub(".+", "\47&\47") }'
Tested and confirmed working on gnu gawk 5.1.1, mawk 1.3.4, mawk 1.9.9.6, and macosx nawk
— The 4Chan Teller
awk -v del1="\047" \
-v del2="," \
-v start="3" \
-v len="17" \
'{
gsub(substr($0,start+1,len),"");
gsub(/[\|]/,del1 del2 del1);
print del1$0del1
}' input_file
'X',00000098765432','1234567890','TQ'

sed unterminated 's' command modify line of file

I'm trying to modify a groups.tsv file (I'm on repl.it so path to file is fine).
Each line in the file looks like this:
groupname \t amountofpeople \t lastadded
and I'm trying to count the occurences of both groupname($nomgrp) and a login($login), and change lastadded to login.
varcol2=$(grep "$nomgrp" groups | cut "-d " -f2- | awk -F"\t" '{print $2}' )
((varcol21=varcol2+1));
varcol3=$(awk -F"\t" '{print $3}' groups)
sed -i "s|${nomgrp}\t${varcol2}\t$varcol3|${nomgrp}\t${varcol21}\t${login}|" groups
However, I'm getting the error message:
sed : -e expression #1, char 27: unterminated 's' command
The groups file has lines such as " sudo 2 user1" (delimited with a tab): a user inputs "user" which is stored in $login, then "sudo" which is stored in $nomgrp.
What am I doing wrong?
Sorry if this has been answered/super easy to fix, I'm quite the newbie here...
If I understand what you are trying to do correctly and if you have GNU awk, you could do
gawk -i inplace -F '\t' -v group="$nomgrp" -v login="$login" -v OFS='\t' '$1 == group { $2 = $2 + 1; $3 = login; } { print }' groups.tsv
Example:
$ cat groups.tsv
wheel 1000 2019-12-10
staff 1234 2019-12-11
users 9001 2019-12-12
$ gawk -i inplace -F '\t' -v group=wheel -v login=2019-12-12 -v OFS='\t' '$1 == group { $2 = $2 + 1; $3 = login; } 1' groups.tsv
$ cat groups.tsv
wheel 1001 2019-12-12
staff 1234 2019-12-11
users 9001 2019-12-12
This works as follows:
-i inplace is a GNU awk extension that allows you to change a file in place,
-F '\t' sets the input field separator to a tab so that the input is interpreted as TSV and fields with spaces in them are not split apart,
-v variable=name sets an awk variable for use in awk's code,
specifically, -v OFS='\t' sets the output field separator variable to a tab, so that the output is again a TSV
So we set variables group, login to your shell variables and ensure that awk outputs a TSV. The code then works as follows:
$1 == group { # If the first field in a line is equal to the group variable
$2 = $2 + 1; # add 1 to the second field
$3 = login; # and overwrite the third with the login variable
}
{ # in all lines:
print # print
}
{ print } could also be abbreviated as 1, I'm sure people someone will point out, but I find this way easier to explain.
If you do not have GNU awk, you could achieve the same with a temporary file, e.g.
awk -F '\t' -v group="$nomgrp" -v login="$login" -v OFS='\t' '$1 == group { $2 = $2 + 1; $3 = login; } { print }' groups.tsv > groups.tsv.new
mv groups.tsv.new groups.tsv

Add to CSV a timestamp column based on other columns (using bash)

I need to read a CSV file (list.csv) like this:
0;John Doe;2001;03;24
1;Jane Doe;1985;12;05
2;Mr. White;2018;06;01
3;Jake White;2017;11;20
...
and add a column (doesn't matter where I put it) with a Unix timestamp based on the year/month/day being in column 3, 4 and 5, to get this:
0;John Doe;2001;03;24;985392000
1;Jane Doe;1985;12;05;502588800
2;Mr. White;2018;06;01;1527811200
3;Jake White;2017;11;20;1511136000
...
So I wrote this script.sh:
#!/bin/sh
while read line
do
printf "$line;"
date -d $(awk -F\; '{print $3$4$5}' <<<$line) +%s
done
and I ran:
<list.csv ./script.sh
and it works, but it's very slow when it comes to having very large CSVs.
Is there a way to do it faster in a sed/awk command line?
I mean, can I (for instance) inject a bash command into a sed/awk line?
For example (I know this won't work, it's just an example):
awk -F\; '{print $1 ";" $2 ";" $3 ";" $4 ";" $5 ";" $(date -d $3$4$5 +%s)}'
GNU awk to the rescue!
$ gawk -F';' '{$0=$0 FS mktime($3" "$4" "$5" 00 00 00")}1' file
0;John Doe;2001;03;24;985410000
1;Jane Doe;1985;12;05;502606800
2;Mr. White;2018;06;01;1527825600
3;Jake White;2017;11;20;1511154000
not sure what hour/min/sec you use as default.
For other awks without builtin time functions:
awk -F';' '{
cmd = "date -d "$3 $4 $5" +%s"
cmd | getline time
close(cmd)
$0 = $0 FS time
print
}' file
or perl
perl -MTime::Piece -F';' -lane '
print join ";", #F, Time::Piece->strptime("#F[2..4]", "%Y %m %d")->epoch
' file
# or
perl -MTime::Local -F';' -lane '
print join ";", #F, timelocal(0, 0, 0, $F[4], $F[3]-1, $F[2]-1900)
' file

creating a ":" delimited list in bash script using awk

I have following lines
380:<CHECKSUM_VALIDATION>
393:</CHECKSUM_VALIDATION>
437:<CHECKSUM_VALIDATION>
441:</CHECKSUM_VALIDATION>
I need to format it as below
CHECKSUM_VALIDATION:380:393
CHECKSUM_VALIDATION:437:441
Is it possible to achieve above output using "awk"? [I'm using bash]
Thanks you!
Here you go:
awk -F '[:<>/]+' '{ n = $1; getline; print $2 ":" n ":" $1 }'
Explanation:
Set the field separator with -F to be a sequence of a mix of :<>/ characters, this way the first field will be the number, and the second will be CHECKSUM_VALIDATION
Save the first field in variable n and read the next line (which would overwrite $1)
Print the line: a combination of the number from the previous line, and the fields on the current line
Another approach without using getline:
awk -F '[:<>/]+' 'NR % 2 { n = $1 } NR % 2 == 0 { print $2 ":" n ":" $1 }'
This one uses the record counter NR to determine whether it's time to print: if NR is odd, save the first field in n, if NR is even, then print.
You can try this sed,
sed 'N; s/\([0-9]\+\):<\(.*\)>\n\([0-9]\+\):<\(.*\)>/\2:\1:\3/' file.txt
Test:
sat:~$ sed 'N; s/\([0-9]\+\):<\(.*\)>\n\([0-9]\+\):<\(.*\)>/\2:\1:\3/' file.txt
CHECKSUM_VALIDATION:380:393
CHECKSUM_VALIDATION:437:441
Another way:
awk -F: '/<C/ {printf "CHECKSUM_VALIDATION:%d:",$1; next} {print $1}'
Here is one gnu awk
awk -F"[:\n<>]" 'NR==1{print $3,$1,$5;f=$3;next} $3{print f,$3,$7}' OFS=":" RS="</CH" file
CHECKSUM_VALIDATION:380:393
CHECKSUM_VALIDATION:437:441
Based on Jonas post and avoiding getline, this awk should do:
awk -F '[:<>/]+' '/<C/ {f=$1;next} { print $2,f,$1}' OFS=\: file
CHECKSUM_VALIDATION:380:393
CHECKSUM_VALIDATION:437:441

trouble with variable in awk match command

Apologies if below is messy or there's a cleaner way to do it, I'm still learning!
I'm using CURL to grab a page with numbers/HTML in, to get to the table with numbers I'm using the below command
echo $curlo | awk '/<th>00/ { match($0, /<th>00/); print substr($0, RSTART - 10, RLENGTH + 40000); }' | sed 's/d1ffce/\'$'\n/g'| sed 's/88ff7f/\'$'\n/g' | grep -o '[0-9]*'
To begin the output at th00, print the next 40000 characters (the page varies in size but will never be that high), replace some hex colour codes and then print out all the numbers only
However th00 will change to th01, 02 etc with the hour so I'm trying to use a variable. For testing I set cnt=00 and replace it in the command with the variable
echo $curlo | awk '"/<th>$cnt/" { match($0, "/<th>$cnt/"); print substr($0, RSTART - 10, RLENGTH + 40000); }' | sed 's/d1ffce/\'$'\n/g'| sed 's/88ff7f/\'$'\n/g' | grep -o '[0-9]*'
but the output is completely different. If I echo $cnt it's printing 00 fine. I've also tried placing the whole th00 in the cnt variable and the same issue.
For comparison when I use the first command, I get 382 lines, when I use the second I get 896
This is using bash shell btw
Shell variables aren't expanded inside single quotes. But it's better to assign an awk variable with the -v option:
echo "$curlo" | awk -v cnt=$cnt 'match($0, "<th>" cnt "") {
str = substr($0, RSTART-10, RLENGTH+40000);
gsub("d1ffce|88ff7f", "$\n", str);
gsub(/^[^0-9]+|[^0-9]+$/, "", str);
gsub(/[^0-9]+/, "\n", str);
print str; }'
There's also no need to pipe to sed and grep -o, since awk can do the same things with gsub().

Resources