I am trying to work on a script where it has a column concatenated with date and some string. I want to substring the date part and compare with today's date. If it is older than today, I want to replace with today's date. Here is an example.
cat test.txt
aaaa RR 242 644126 20161030O00001.0000
bbbb RR 242 644126 20161225O00001.0000
aaaa RR 242 644126 20161012O00001.0000
aaaa RR 242 644126 20170129O00001.0000
aaaa RR 242 644126 20170326O00001.0000
aaaa RR 242 644126 20170430O00001.0000
aaaa RR 242 644126 20161015O00001.0000
I want the output as below by changing date stamp for colum5 - row 3 and 7. Please help. I am looking for single command to make it work if it is possible.
aaaa RR 242 644126 20161030O00001.0000
bbbb RR 242 644126 20161225O00001.0000
aaaa RR 242 644126 20161020000001.0000
aaaa RR 242 644126 20170129O00001.0000
aaaa RR 242 644126 20170326O00001.0000
aaaa RR 242 644126 20170430O00001.0000
aaaa RR 242 644126 20161020000001.0000
Using plain Awk, and hence not assuming built-in date support:
awk -v refdate="$(date +%Y%m%d)" '{ if ($5 < refdate) $5 = refdate substr($5, 9); print}'
Given data file and current date 2016-10-20:
aaaa RR 242 644126 20161030O00001.0000
bbbb RR 242 644126 20161225O00001.0000
aaaa RR 242 644126 20161012O00001.0000
aaaa RR 242 644126 20170129O00001.0000
aaaa RR 242 644126 20170326O00001.0000
aaaa RR 242 644126 20170430O00001.0000
aaaa RR 242 644126 20161015O00001.0000
The output is:
aaaa RR 242 644126 20161030O00001.0000
bbbb RR 242 644126 20161225O00001.0000
aaaa RR 242 644126 20161020O00001.0000
aaaa RR 242 644126 20170129O00001.0000
aaaa RR 242 644126 20170326O00001.0000
aaaa RR 242 644126 20170430O00001.0000
aaaa RR 242 644126 20161020O00001.0000
In Gnu awk (see last comment in the explanations):
$ awk -v a="$(date +%Y%m%d)" '(b=substr($5,1,8)) && sub(/^.{8}/,(b<a?a:b),$5)' file
aaaa RR 242 644126 20161030O00001.0000
bbbb RR 242 644126 20161225O00001.0000
aaaa RR 242 644126 20161021O00001.0000
aaaa RR 242 644126 20170129O00001.0000
aaaa RR 242 644126 20170326O00001.0000
aaaa RR 242 644126 20170430O00001.0000
aaaa RR 242 644126 20161021O00001.0000
Explained:
awk -v a="$(date +%Y%m%d)" # set the date to a var
' # '
(b = substr($5,1,8) ) && # read first 8 chars of 5th field to b var
sub(/^.{8}/, (b<a?a:b), $5) # replace 8 first chars with a if b is less than a
# to make it compatible with other awks, change
# /^.{8}/ to /^......../
' file
I figured out finally! Thanks James, Jonathan and others.
Here is my command in solaris.
$cat test.txt
aaaa RR 242 644126 20161030O00001.0000
bbbb RR 242 644126 20161012O00001.0000
aaaa RR 242 644126 20161013O00001.0000
aaaa RR 242 644126 20170129O00001.0000
aaaa RR 242 644126 20170326O00001.0000
aaaa RR 242 644126 20170430O00001.0000
aaaa RR 242 644126 20161014O00001.0000
$ /usr/xpg4/bin/awk -v a=`date +%Y%m%d` '(b=substr($5,1,8)) && gsub(/^.{8}/,(b<a?a:b),$5)' test.txt
aaaa RR 242 644126 20161030O00001.0000
bbbb RR 242 644126 20161022O00001.0000
aaaa RR 242 644126 20161022O00001.0000
aaaa RR 242 644126 20170129O00001.0000
aaaa RR 242 644126 20170326O00001.0000
aaaa RR 242 644126 20170430O00001.0000
aaaa RR 242 644126 20161022O00001.0000
If perl is okay... Note: It is 2016-10-21 already in my part of world ;)
To get today's date using Time::Piece module:
$ perl -MTime::Piece -le '$d = localtime->ymd(""); print $d'
20161021
Sample input:
$ cat ip.txt
aaaa RR 242 644126 20161030O00001.0000
bbbb RR 242 644126 20161225O00001.0000
aaaa RR 242 644126 20161012O00001.0000
aaaa RR 242 644126 20170129O00001.0000
aaaa RR 242 644126 20170326O00001.0000
aaaa RR 242 644126 20170430O00001.0000
aaaa RR 242 644126 20161015O00001.0000
Solution:
$ perl -MTime::Piece -pe 'BEGIN{$d=localtime->ymd("")} s/.* \K\d+/$& < $d ? $d : $&/e' ip.txt
aaaa RR 242 644126 20161030O00001.0000
bbbb RR 242 644126 20161225O00001.0000
aaaa RR 242 644126 20161021O00001.0000
aaaa RR 242 644126 20170129O00001.0000
aaaa RR 242 644126 20170326O00001.0000
aaaa RR 242 644126 20170430O00001.0000
aaaa RR 242 644126 20161021O00001.0000
BEGIN{$d=localtime->ymd("")} save today's date in $d variable, do this only at start of script
s/.* \K\d+/$& < $d ? $d : $&/e extract date to be replaced, compare against $d and replace with $d if extracted date is earlier than $d
.* \K match until last space in line, by using \K, matched text up to this point is discarded
$& contains the matched string from \d+
e flag allows the use of code $& < $d ? $d : $& instead of string in replacement section
Using date command instead of Time::Piece module:
perl -pe 'BEGIN{chomp($d=`date +%Y%m%d`)} s/.* \K\d+/$& < $d ? $d : $&/e' ip.txt
Further reading:
Today's Date in Perl in MM/DD/YYYY format
Perl flags -pe, -pi, -p, -w, -d, -i, -t?
Related
I am trying to decode base64 encoded binary content in JQ using explode function.
When I run explode and then through implode, I am expecting it to return the same string. But it is not. Try it here: https://jqplay.org/s/Rt8H1qv8VRP
Base64 encoded string: "AQEAAAABAQAyGWRkZBXNWwcAAAAAAQIDBAUGBwgJClIGnj9SBp4/"
JQ: '#base64d | explode | implode | #base64'
Output: "AQEAAAABAQAyGWRkZBXvv71bBwAAAAABAgMEBQYHCAkKUgbvv70/Ugbvv70/"
Debugging further,
#base64d | explode | .[14]
returns
65533
Running the following on Ubuntu, you can see the [14] char is 315 (octal) == 215(decimal)
$ echo "AQEAAAABAQAyGWRkZBXNWwcAAAAAAQIDBAUGBwgJClIGnj9SBp4/" | base64 -d | od -bc
0000000 001 001 000 000 000 001 001 000 062 031 144 144 144 025 315 133
001 001 \0 \0 \0 001 001 \0 2 031 d d d 025 315 [
0000020 007 000 000 000 000 001 002 003 004 005 006 007 010 011 012 122
\a \0 \0 \0 \0 001 002 003 004 005 006 \a \b \t \n R
0000040 006 236 077 122 006 236 077
006 236 ? R 006 236 ?
0000047
Why is JQ returning this weird 65533 (0xFFFD) character? What am I missing?
First of all, the issue has nothing to do with explode or implode. Using just #base64d | #base64 produces the same result.
jq expects the string encoded with base64 to be text encoded with UTF-8.
If the decoded string is not UTF-8, the results are undefined.
Your input is not UTF-8.
U+FFFD REPLACEMENT CHARACTER is a character used to mark input errors.
I have a pandas dataframe that I would like to sort in descending order of a column.
Name count
AAAA -1.1
BBBB 0
CCCC -10
DDDD inf
EEEE 3
FFFF NaN
GGGG 30
I want sort count in descending order and move inf and NaN rows to the end.
df.sort('count',ascending = False,na_position="last") pushes NaN to the end. How to deal with inf?
You can treat inf values as null:
with pd.option_context('mode.use_inf_as_null', True):
df = df.sort_values('count', ascending=False, na_position='last')
df
Out:
Name count
6 GGGG 30.000000
4 EEEE 3.000000
1 BBBB 0.000000
0 AAAA -1.100000
2 CCCC -10.000000
3 DDDD inf
5 FFFF NaN
One of possible solutions:
In [33]: df.assign(x=df['count'].replace(np.inf, np.nan)) \
.sort_values('x', ascending=False) \
.drop('x', 1)
Out[33]:
Name count
6 GGGG 30.000000
4 EEEE 3.000000
1 BBBB 0.000000
0 AAAA -1.100000
2 CCCC -10.000000
3 DDDD inf
5 FFFF NaN
I am trying to use awk to edit files but I cant manage to do it without creating intermediate files.
Basicaly I want to search using column 1 in file2 and file3 and so on, and replace the 2nd column for matching 1st column lines. (note that file2 and file3 may contain other stuff)
I have
File1.txt
aaa 111
aaa 222
bbb 333
bbb 444
File2.txt
zzz zzz
aaa 999
zzz zzz
aaa 888
File3.txt
bbb 000
bbb 001
yyy yyy
yyy yyy
Desired output
aaa 999
aaa 888
bbb 000
bbb 001
this does what you specified but I guess there are many edge cases not covered.
$ awk 'NR==FNR{a[$1]; next} $1 in a' file{1..3}
aaa 999
aaa 888
bbb 000
bbb 001
I have long tab formatted file with many columns, i would like to calculate % between two columns (3rd and 4rth) and print this % with correspondence numbers with this format (%46.00).
input:
file1 323 434 45 767 254235 275 2345 467
file1 294 584 43 7457 254565 345 235445 4635
file1 224 524 4343 12457 2542165 345 124445 41257
Desired output:
file1 323 434(134.37%) 45(13.93%) 767 254235 275 2345 467
file1 294 584(198.64%) 43(14.63%) 7457 254565 345 235445 4635
file1 224 524(233.93%) 4343(1938.84%) 12457 2542165 345 124445 41257
i tried:
cat test_file.txt | awk '{printf "%s (%.2f%)\n",$0,($4/$2)*100}' OFS="\t" | awk '{printf "%s (%.2f%)\n",$0,($3/$2)*100}' | awk '{print $1,$2,$3,$11,$4,$10,$5,$6,$7,$8,$9}' - | sed 's/ (/(/g' | sed 's/ /\t/g' >out.txt
It works but I want something sort-cut of this.
I would say:
$ awk '{$3=sprintf("%d(%.2f%)", $3, ($3/$2)*100); $4=sprintf("%d(%.2f%)", $4, ($4/$2)*100)}1' file
file1 323 434(134.37%) 45(13.93%) 767 254235 275 2345 467
file1 294 584(198.64%) 43(14.63%) 7457 254565 345 235445 4635
file1 224 524(233.93%) 4343(1938.84%) 12457 2542165 345 124445 41257
With a function to avoid duplicities:
awk 'function print_nice (num1, num2) {
return sprintf("%d(%.2f%)", num1, (num1/num2)*100)
}
{$3=print_nice($3,$2); $4=print_nice($4,$2)}1' file
This uses sprintf to express a specific format and store it in a variable. The calculations are the obvious.
I am trying to find out the decrements in a column and if found then print the last highest value.
For example:
From 111 to 445 there is a continous increment in the column.But 333 is less then the number before it.
111 aaa
112 aaa
112 aaa
113 sdf
115 aaa
222 ddd
333 sss
333 sss
444 sss
445 sss
333 aaa<<<<<<this is less then the number above it (445)
If any such scenario is found then print 445 sss
Like this, for example:
$ awk '{if (before>$1) {print before_line}} {before=$1; before_line=$0}' a
445 sss
What is it doing? Check the variable before and compare its value with the current. In case it is bigger, print the line.
It works for many cases as well:
$ cat a
111 aaa
112 aaa
112 aaa
113 sdf
115 aaa <--- this
15 aaa
222 ddd
333 sss
333 sss
444 sss
445 sss <--- this
333 aaa
$ awk '{if (before>$1) {print before_line}} {before=$1; before_line=$0}' a
115 aaa
445 sss
Store each number in a single variable called prevNumber then when you come to print the next one do a check e.g. if (newNumber < prevNumber) print prevNumber;
dont really know what language you are using
You can say:
awk '$1 > max {max=$1; maxline=$0}; END{ print maxline}' inputfile
For your input, it'd print:
445 sss