sorting pandas dataframe with inf and NaN

sorting pandas dataframe with inf and NaN - sorting

I have a pandas dataframe that I would like to sort in descending order of a column.
Name count
AAAA -1.1
BBBB 0
CCCC -10
DDDD inf
EEEE 3
FFFF NaN
GGGG 30
I want sort count in descending order and move inf and NaN rows to the end.
df.sort('count',ascending = False,na_position="last") pushes NaN to the end. How to deal with inf?

You can treat inf values as null:
with pd.option_context('mode.use_inf_as_null', True):
df = df.sort_values('count', ascending=False, na_position='last')
df
Out:
Name count
6 GGGG 30.000000
4 EEEE 3.000000
1 BBBB 0.000000
0 AAAA -1.100000
2 CCCC -10.000000
3 DDDD inf
5 FFFF NaN

One of possible solutions:
In [33]: df.assign(x=df['count'].replace(np.inf, np.nan)) \
.sort_values('x', ascending=False) \
.drop('x', 1)
Out[33]:
Name count
6 GGGG 30.000000
4 EEEE 3.000000
1 BBBB 0.000000
0 AAAA -1.100000
2 CCCC -10.000000
3 DDDD inf
5 FFFF NaN

Related

JQ explode function is returning incorrect chars

I am trying to decode base64 encoded binary content in JQ using explode function.
When I run explode and then through implode, I am expecting it to return the same string. But it is not. Try it here: https://jqplay.org/s/Rt8H1qv8VRP
Base64 encoded string: "AQEAAAABAQAyGWRkZBXNWwcAAAAAAQIDBAUGBwgJClIGnj9SBp4/"
JQ: '#base64d | explode | implode | #base64'
Output: "AQEAAAABAQAyGWRkZBXvv71bBwAAAAABAgMEBQYHCAkKUgbvv70/Ugbvv70/"
Debugging further,
#base64d | explode | .[14]
returns
65533
Running the following on Ubuntu, you can see the [14] char is 315 (octal) == 215(decimal)
$ echo "AQEAAAABAQAyGWRkZBXNWwcAAAAAAQIDBAUGBwgJClIGnj9SBp4/" | base64 -d | od -bc
0000000 001 001 000 000 000 001 001 000 062 031 144 144 144 025 315 133
001 001 \0 \0 \0 001 001 \0 2 031 d d d 025 315 [
0000020 007 000 000 000 000 001 002 003 004 005 006 007 010 011 012 122
\a \0 \0 \0 \0 001 002 003 004 005 006 \a \b \t \n R
0000040 006 236 077 122 006 236 077
006 236 ? R 006 236 ?
0000047
Why is JQ returning this weird 65533 (0xFFFD) character? What am I missing?

First of all, the issue has nothing to do with explode or implode. Using just #base64d | #base64 produces the same result.
jq expects the string encoded with base64 to be text encoded with UTF-8.
If the decoded string is not UTF-8, the results are undefined.
Your input is not UTF-8.
U+FFFD REPLACEMENT CHARACTER is a character used to mark input errors.

Joining two files that both have duplicate rows

I am trying to join two files that have identical column 1 and different column 2:
File1
aaa 1
bbb 3
bbb 3
ccc 1
ccc 1
ccc 0
File2
aaa 2
bbb 2
bbb 2
ccc 1
ccc 1
ccc 0
When I try to join them with
join File1 File2 > File3
I get
aaa 1 2
bbb 3 2
bbb 3 2
bbb 3 2
bbb 3 2
ccc 1 1
ccc 1 1
ccc 1 0
ccc 1 1
ccc 1 1
ccc 1 0
ccc 0 1
ccc 0 1
ccc 0 0
join is trying to expand the duplicates when all I want it to do is go line-by line so the output should be
aaa 1 2
bbb 3 2
bbb 3 2
ccc 1 1
ccc 1 1
ccc 0 0
How do I tell join to ignore duplicates and just combine the files line-by-line?
EDIT: This is being done in a loop with multiple files that all have the same column 1 but different column 2. I am joining the first two files into a temporary file and then looping through the other files joining with that temporary file.

Based on a suggestion from #Andre Wildberg, this worked best:
paste File1 <(cut -d " " -f 2 File2)
This allowed be to loop through a list of files:
cat File1 > tmp
for file in $files
do
paste tmp <(cut -d " " -f 2 $file) > tmpf
mv tmpf tmp
done
mv tmp FinalFile

Assumptions:
all files have the same number of rows
all files have the same values in the first column for the same numbered row
the final result set can fit into memory
Sample input:
$ for f in f{1..4}
do
echo "############ $f"
cat $f
done
############ f1
aaa 1
bbb 3
bbb 3
ccc 1
ccc 1
ccc 0
############ f2
aaa 2
bbb 2
bbb 2
ccc 1
ccc 1
ccc 0
############ f3
aaa 12
bbb 12
bbb 12
ccc 11
ccc 11
ccc 10
############ f4
aaa 202
bbb 202
bbb 202
ccc 201
ccc 201
ccc 200
One awk idea:
awk '
FNR==NR { a[FNR]=$0; next }
{ a[FNR]=a[FNR] OFS $2 }
END { for (i=1;i<=FNR;i++)
print a[i]
}
' f1 f2 f3 f4
This generates:
aaa 1 2 12 202
bbb 3 2 12 202
bbb 3 2 12 202
ccc 1 1 11 201
ccc 1 1 11 201
ccc 0 0 10 200

How to replace columns (matching pattern) using awk?

I am trying to use awk to edit files but I cant manage to do it without creating intermediate files.
Basicaly I want to search using column 1 in file2 and file3 and so on, and replace the 2nd column for matching 1st column lines. (note that file2 and file3 may contain other stuff)
I have
File1.txt
aaa 111
aaa 222
bbb 333
bbb 444
File2.txt
zzz zzz
aaa 999
zzz zzz
aaa 888
File3.txt
bbb 000
bbb 001
yyy yyy
yyy yyy
Desired output
aaa 999
aaa 888
bbb 000
bbb 001

this does what you specified but I guess there are many edge cases not covered.
$ awk 'NR==FNR{a[$1]; next} $1 in a' file{1..3}
aaa 999
aaa 888
bbb 000
bbb 001

Unix command to replace old date with current date dynamically

I am trying to work on a script where it has a column concatenated with date and some string. I want to substring the date part and compare with today's date. If it is older than today, I want to replace with today's date. Here is an example.
cat test.txt
aaaa RR 242 644126 20161030O00001.0000
bbbb RR 242 644126 20161225O00001.0000
aaaa RR 242 644126 20161012O00001.0000
aaaa RR 242 644126 20170129O00001.0000
aaaa RR 242 644126 20170326O00001.0000
aaaa RR 242 644126 20170430O00001.0000
aaaa RR 242 644126 20161015O00001.0000
I want the output as below by changing date stamp for colum5 - row 3 and 7. Please help. I am looking for single command to make it work if it is possible.
aaaa RR 242 644126 20161030O00001.0000
bbbb RR 242 644126 20161225O00001.0000
aaaa RR 242 644126 20161020000001.0000
aaaa RR 242 644126 20170129O00001.0000
aaaa RR 242 644126 20170326O00001.0000
aaaa RR 242 644126 20170430O00001.0000
aaaa RR 242 644126 20161020000001.0000

Using plain Awk, and hence not assuming built-in date support:
awk -v refdate="$(date +%Y%m%d)" '{ if ($5 < refdate) $5 = refdate substr($5, 9); print}'
Given data file and current date 2016-10-20:
aaaa RR 242 644126 20161030O00001.0000
bbbb RR 242 644126 20161225O00001.0000
aaaa RR 242 644126 20161012O00001.0000
aaaa RR 242 644126 20170129O00001.0000
aaaa RR 242 644126 20170326O00001.0000
aaaa RR 242 644126 20170430O00001.0000
aaaa RR 242 644126 20161015O00001.0000
The output is:
aaaa RR 242 644126 20161030O00001.0000
bbbb RR 242 644126 20161225O00001.0000
aaaa RR 242 644126 20161020O00001.0000
aaaa RR 242 644126 20170129O00001.0000
aaaa RR 242 644126 20170326O00001.0000
aaaa RR 242 644126 20170430O00001.0000
aaaa RR 242 644126 20161020O00001.0000

In Gnu awk (see last comment in the explanations):
$ awk -v a="$(date +%Y%m%d)" '(b=substr($5,1,8)) && sub(/^.{8}/,(b<a?a:b),$5)' file
aaaa RR 242 644126 20161030O00001.0000
bbbb RR 242 644126 20161225O00001.0000
aaaa RR 242 644126 20161021O00001.0000
aaaa RR 242 644126 20170129O00001.0000
aaaa RR 242 644126 20170326O00001.0000
aaaa RR 242 644126 20170430O00001.0000
aaaa RR 242 644126 20161021O00001.0000
Explained:
awk -v a="$(date +%Y%m%d)" # set the date to a var
' # '
(b = substr($5,1,8) ) && # read first 8 chars of 5th field to b var
sub(/^.{8}/, (b<a?a:b), $5) # replace 8 first chars with a if b is less than a
# to make it compatible with other awks, change
# /^.{8}/ to /^......../
' file

I figured out finally! Thanks James, Jonathan and others.
Here is my command in solaris.
$cat test.txt
aaaa RR 242 644126 20161030O00001.0000
bbbb RR 242 644126 20161012O00001.0000
aaaa RR 242 644126 20161013O00001.0000
aaaa RR 242 644126 20170129O00001.0000
aaaa RR 242 644126 20170326O00001.0000
aaaa RR 242 644126 20170430O00001.0000
aaaa RR 242 644126 20161014O00001.0000
$ /usr/xpg4/bin/awk -v a=`date +%Y%m%d` '(b=substr($5,1,8)) && gsub(/^.{8}/,(b<a?a:b),$5)' test.txt
aaaa RR 242 644126 20161030O00001.0000
bbbb RR 242 644126 20161022O00001.0000
aaaa RR 242 644126 20161022O00001.0000
aaaa RR 242 644126 20170129O00001.0000
aaaa RR 242 644126 20170326O00001.0000
aaaa RR 242 644126 20170430O00001.0000
aaaa RR 242 644126 20161022O00001.0000

If perl is okay... Note: It is 2016-10-21 already in my part of world ;)
To get today's date using Time::Piece module:
$ perl -MTime::Piece -le '$d = localtime->ymd(""); print $d'
20161021
Sample input:
$ cat ip.txt
aaaa RR 242 644126 20161030O00001.0000
bbbb RR 242 644126 20161225O00001.0000
aaaa RR 242 644126 20161012O00001.0000
aaaa RR 242 644126 20170129O00001.0000
aaaa RR 242 644126 20170326O00001.0000
aaaa RR 242 644126 20170430O00001.0000
aaaa RR 242 644126 20161015O00001.0000
Solution:
$ perl -MTime::Piece -pe 'BEGIN{$d=localtime->ymd("")} s/.* \K\d+/$& < $d ? $d : $&/e' ip.txt
aaaa RR 242 644126 20161030O00001.0000
bbbb RR 242 644126 20161225O00001.0000
aaaa RR 242 644126 20161021O00001.0000
aaaa RR 242 644126 20170129O00001.0000
aaaa RR 242 644126 20170326O00001.0000
aaaa RR 242 644126 20170430O00001.0000
aaaa RR 242 644126 20161021O00001.0000
BEGIN{$d=localtime->ymd("")} save today's date in $d variable, do this only at start of script
s/.* \K\d+/$& < $d ? $d : $&/e extract date to be replaced, compare against $d and replace with $d if extracted date is earlier than $d
.* \K match until last space in line, by using \K, matched text up to this point is discarded
$& contains the matched string from \d+
e flag allows the use of code $& < $d ? $d : $& instead of string in replacement section
Using date command instead of Time::Piece module:
perl -pe 'BEGIN{chomp($d=`date +%Y%m%d`)} s/.* \K\d+/$& < $d ? $d : $&/e' ip.txt
Further reading:
Today's Date in Perl in MM/DD/YYYY format
Perl flags -pe, -pi, -p, -w, -d, -i, -t?

BCD to ASCII checksum

I have a very old device that I am connecting to through serial. When I am sending data it wants a checksum to be calculated with it. I add up all of the ascii valuesof the characters of the string and convert the sum to BCD. This results in illegal BCD characters such as 1011. In the only example that is provided they convert 1011 to ";". When I sent the data in the example the checksum clears fine. But when I use ";" for other illegal characters it fails. Has anyone seen the use of ";" before and if so does anyone have any idea what the values for the other illegal characters are?
edit : The Example I have:
STX 000 0010
1 011 0001
2 011 0010
3 011 0011
CR 000 1101
A 100 0001
B 100 0010
C 100 0011
CR 000 1101
EXT 000 0011
Total 10111 1011
Convert To BCD 1 0111 1011
Checksum 1 7 ;

Looks like they're using the next six ASCII characters:
DEC HEX1 HEX2 BIN1 BIN2 CHAR
48 3 0 0011 0000 0
49 3 1 0011 0001 1
50 3 2 0011 0010 2
51 3 3 0011 0011 3
52 3 4 0011 0100 4
53 3 5 0011 0101 5
54 3 6 0011 0110 6
55 3 7 0011 0111 7
56 3 8 0011 1000 8
57 3 9 0011 1001 9
58 3 A 0011 1010 :
59 3 B 0011 1011 ;
60 3 C 0011 1100 <
61 3 D 0011 1101 =
62 3 E 0011 1110 >
63 3 F 0011 1111 ?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

sorting pandas dataframe with inf and NaN - sorting

You can treat inf values as null: with pd.option_context('mode.use_inf_as_null', True): df = df.sort_values('count', ascending=False, na_position='last') df Out: Name count 6 GGGG 30.000000 4 EEEE 3.000000 1 BBBB 0.000000 0 AAAA -1.100000 2 CCCC -10.000000 3 DDDD inf 5 FFFF NaN

One of possible solutions: In [33]: df.assign(x=df['count'].replace(np.inf, np.nan)) \ .sort_values('x', ascending=False) \ .drop('x', 1) Out[33]: Name count 6 GGGG 30.000000 4 EEEE 3.000000 1 BBBB 0.000000 0 AAAA -1.100000 2 CCCC -10.000000 3 DDDD inf 5 FFFF NaN

Related

JQ explode function is returning incorrect chars

Joining two files that both have duplicate rows

How to replace columns (matching pattern) using awk?

Unix command to replace old date with current date dynamically

BCD to ASCII checksum

Categories

Resources