Using per-column compression codec in Parquet.write_table - parquet

I have pyarrow 2.0.0 installed. The docs for pyarrow.parquet.write_table state
compression (str or dict) – Specify the compression codec, either on a general basis or per-column. Valid values: {‘NONE’, ‘SNAPPY’, ‘GZIP’, ‘LZO’, ‘BROTLI’, ‘LZ4’, ‘ZSTD’}.
Works fine if compression is a string, but when I try using a dict for per-column specification, I get the following error. What am I doing wrong? I can use a similar dict for compression_level on a per-column basis without error.
(py3) C:\tmp\python>python
Python 3.8.5 (default, Sep 3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow as pa
>>> import pyarrow.parquet as pq
>>> import pandas as pd
>>>
>>> df = pd.DataFrame([[1,2,3],[4,5,6]],columns=['foo','bar','baz'])
>>> t = pa.Table.from_pandas(df)
>>> pq.write_table(t,'test1.pq',compression=dict(foo='zstd',bar='snappy',baz='brotli'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\app\python\anaconda\3\envs\py3\lib\site-packages\pyarrow\parquet.py", line 1717, in write_table
with ParquetWriter(
File "c:\app\python\anaconda\3\envs\py3\lib\site-packages\pyarrow\parquet.py", line 554, in __init__
self.writer = _parquet.ParquetWriter(
File "pyarrow\_parquet.pyx", line 1390, in pyarrow._parquet.ParquetWriter.__cinit__
File "pyarrow\_parquet.pyx", line 1236, in pyarrow._parquet._create_writer_properties
File "stringsource", line 15, in string.from_py.__pyx_convert_string_from_py_std__in_string
TypeError: expected bytes, str found

Related

Parquet 2.4+ and PyArrow 10.0.1 - Attempting to switch pyarrow column from string to datetime

I attempted to follow the advice of Converting string timestamp to datetime using pyarrow , however my formatting seems to not be accepted by pyarrow
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.compute as pc
table = pa.table({'n_legs': [2, 2, 4, 4, 5, 100],
'query_start_time': ["2022-12-30T19:02:40.466",
"2022-12-30T19:02:40.466",
"2022-12-30T19:02:40.466",
"2022-12-30T19:02:40.466",
"2022-12-30T19:02:40.466",
"2022-12-30T19:02:40.466"]})
pc.strptime(table.column("query_start_time"), format='%Y-%m-%dT%H:%M:%S.%f', unit='ms')
writer = pq.ParquetWriter('example.parquet', table.schema)
writer.write_table(table)
writer.close()
I've attempted removing the T , adding a Z at the end of the formatter and string.. seems instead I need to ..?
Traceback (most recent call last):
File "/home/emcp/Dev/temp_pyarrow/main.py", line 16, in <module>
pc.strptime(table.column("query_start_time"), format='%Y-%m-%d %H:%M:%S.%f', unit='ms')
File "/home/emcp/Dev/temp_pyarrow/venv/lib/python3.10/site-packages/pyarrow/compute.py", line 255, in wrapper
return func.call(args, options, memory_pool)
File "pyarrow/_compute.pyx", line 355, in pyarrow._compute.Function.call
File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Failed to parse string: '2022-12-30T19:02:40.466' as a scalar of type timestamp[ms]
Do I need to manually convert the datetime value into an integer column and THEN change the column again?
EDIT: When I strip the .%f and change units to units='s' it seems to not error.. I am looking into that now

netmiko key authentication failure

Seeing some issue with netmiko KEY authentication with N7K.
Python 3.8.10, netmiko 4.1.2.
Firstly tried on N9K without any issue, command can be sent after get the connection.
Python 3.8.10 (default, Jun 22 2022, 20:18:18)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from netmiko import ConnectHandler
>>> n9k = {"device_type": "cisco_nxos", "host": "10.1.1.10", "username": "admin", "use_keys": True,"key_file":"~/.ssh/id_rsa", "passphrase": "Cisco123"}
>>> target_con = ConnectHandler(**n9k)
Hit the issue when trying the connection to a N7K switch.
>>> n7k = {"device_type": "cisco_nxos", "host": "10.1.1.20", "username": "admin", "use_keys": True,"key_file":"~/.ssh/id_rsa", "passphrase": "Cisco123"}
>>> target_con = ConnectHandler(**n7k)
Traceback (most recent call last):
File "/home/admin/netmiko_test/lib/python3.8/site-packages/netmiko/base_connection.py", line 1046, in establish_connection
self.remote_conn_pre.connect(**ssh_connect_params)
File "/home/admin/netmiko_test/lib/python3.8/site-packages/paramiko/client.py", line 435, in connect
self._auth(
File "/home/admin/netmiko_test/lib/python3.8/site-packages/paramiko/client.py", line 771, in _auth
raise saved_exception
File "/home/admin/netmiko_test/lib/python3.8/site-packages/paramiko/client.py", line 747, in _auth
self._transport.auth_publickey(username, key)
File "/home/admin/netmiko_test/lib/python3.8/site-packages/paramiko/transport.py", line 1635, in auth_publickey
return self.auth_handler.wait_for_response(my_event)
File "/home/admin/netmiko_test/lib/python3.8/site-packages/paramiko/auth_handler.py", line 259, in wait_for_response
raise e
paramiko.ssh_exception.AuthenticationException: Authentication failed.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/admin/netmiko_test/lib/python3.8/site-packages/netmiko/ssh_dispatcher.py", line 365, in ConnectHandler
return ConnectionClass(*args, **kwargs)
File "/home/admin/netmiko_test/lib/python3.8/site-packages/netmiko/base_connection.py", line 439, in __init__
self._open()
File "/home/admin/netmiko_test/lib/python3.8/site-packages/netmiko/base_connection.py", line 444, in _open
self.establish_connection()
File "/home/admin/netmiko_test/lib/python3.8/site-packages/netmiko/base_connection.py", line 1083, in establish_connection
raise NetmikoAuthenticationException(msg)
netmiko.exceptions.NetmikoAuthenticationException: Authentication to device failed.
Common causes of this problem are:
1. Invalid username and password
2. Incorrect SSH-key file
3. Connecting to the wrong device
Device settings: cisco_nxos 10.1.1.20:22
Authentication failed.
>>>
The username and ssh key have been validated. All work well if using username/password instead.
Any advice would be appreciated.
Thanks!
Found the issue when studying another similar paramiko problem.
The N9K node I tested against with uses openssh 8.3, thus it is rsa-sha2-256.
The N7K node uses openssh5.9 which is ssh-sha1.
That makes a difference as netmiko seems don't like ssh-sha1 by default.
Adding disabled_algorithms = {'pubkeys': ['rsa-sha2-256', 'rsa-sha2-512']} to ConnectHandler fixed the issue.

NP_NAT error when attempting to import pandas_market_calender

I'm trying things out on windows vs linux where I have this working in 3.8 and 3.9.5, but not on windows using anaconda
import sys
sys.path.append("../")
from datetime import time
import pandas as pd
import pandas_market_calendars as mcal
error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\User\Anaconda3\envs\py39\lib\site-packages\pandas_market_calendars\__init__.py", line 19, in <module>
from .calendar_registry import get_calendar, get_calendar_names
File "C:\Users\User\Anaconda3\envs\py39\lib\site-packages\pandas_market_calendars\calendar_registry.py", line 21, in <module>
from .exchange_calendars_mirror import *
File "C:\Users\User\Anaconda3\envs\py39\lib\site-packages\pandas_market_calendars\exchange_calendars_mirror.py", line 9, in <module>
import exchange_calendars
File "C:\Users\User\Anaconda3\envs\py39\lib\site-packages\exchange_calendars\__init__.py", line 16, in <module>
from .calendar_utils import (
File "C:\Users\User\Anaconda3\envs\py39\lib\site-packages\exchange_calendars\calendar_utils.py", line 3, in <module>
from .always_open import AlwaysOpenCalendar
File "C:\Users\User\Anaconda3\envs\py39\lib\site-packages\exchange_calendars\always_open.py", line 5, in <module>
from .exchange_calendar import ExchangeCalendar
File "C:\Users\User\Anaconda3\envs\py39\lib\site-packages\exchange_calendars\exchange_calendar.py", line 27, in <module>
from .calendar_helpers import (
File "C:\Users\User\Anaconda3\envs\py39\lib\site-packages\exchange_calendars\calendar_helpers.py", line 6, in <module>
NP_NAT = np.array([pd.NaT], dtype=np.int64)[0]
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NaTType
'
Posted a github bug
indeed something changed and the culprit is pandas
https://github.com/rsheftel/pandas_market_calendars/issues/137
to solve, install pandas==1.2.5 and it will work
The error is with the underlying exchange_calendars package (https://github.com/gerrymanoim/exchange_calendars). It appears that error has been fixed in that package. If you update the exchange_calendars package everything will work fine. Nothing to do in this package.
Fix: gerrymanoim/exchange_calendars#41

paho-mqtt authentication error on pythonanywher

I am trying to publish to an mqtt topic on beebotte.com using a simple publish.single on my linux machine it works fine but on python anywhere I get an authentication error. There are small differences in minor version numbers, can that be what is different?
This is the code I put into the python console:
import paho.mqtt.publish as publish
mqtt_host = "beebotte.com"
mqtt_topic = "climate/set/livingroom"
auth = {'username':"token:MY_SECRET_TOKEN"}
publish.single(mqtt_topic, "python sent", hostname=mqtt_host, auth = auth)
this is the error:
Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import paho.mqtt.publish as publish
>>> mqtt_host = "beebotte.com"
>>> mqtt_topic = "climate/set/livingroom"
>>> auth = {'username':"MY_SECRET_TOKEN"}
>>> publish.single(mqtt_topic, "python sent", hostname=mqtt_host, auth = auth)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/<MYUSER>/python-email/local/lib/python2.7/site-packages/paho/mqtt/publish.py", line 216, in single
protocol, transport)
File "/home/<MYUSER>/python-email/local/lib/python2.7/site-packages/paho/mqtt/publish.py", line 152, in multiple
client.connect(hostname, port, keepalive)
File "/home/<MYUSER>/python-email/local/lib/python2.7/site-packages/paho/mqtt/client.py", line 768, in connect
return self.reconnect()
File "/home/<MYUSER>/python-email/local/lib/python2.7/site-packages/paho/mqtt/client.py", line 895, in reconnect
sock = socket.create_connection((self._host, self._port), source_address=(self._bind_address, 0))
File "/usr/lib/python2.7/socket.py", line 571, in create_connection
raise err
socket.error: [Errno 111] Connection refused
>>>
>>> exit()
and here it is working:
Python 2.7.13 (default, Nov 24 2017, 17:33:09)
[GCC 6.3.0 20170516] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import paho.mqtt.publish as publish
>>> mqtt_host = "beebotte.com"
>>> mqtt_topic = "climate/set/livingroom"
>>> auth = {'username':"MY_SECRET_TOKEN"}
>>> publish.single(mqtt_topic, "python sent", hostname=mqtt_host, auth = auth)
>>>
>>> exit()
edit (I'm not sure if this is true):
The key is in the actual format of token:token_KxDxlcmXgQBDfWRNC (not real) beebotte.com also accepts a so-called secret key in the format of 2A4Gfgv0puYFBEVbBQX24szALcyDvMRh If I use the secret key it works from pythonanywhere (sometimes). This is leading me to beleive it is some kind of formatting problem with the : in the token.
So the question now is how to format it to work properly.
I don't think mqtt will work from a free account on PythonAnywhere. Free accounts can only connect out through a proxy using http(s) to a specific whitelist of sites. If there's an http to mqtt bridge somewhere, you could possibly use that.

UnicodeDecodeError using GLib.utf8_collate_key in Windows

I'm using Python 3.3 / PyGObject 3.14 in Windows 7 and I have the following problem: Using either gi.repository.GLib.utf8_collate_key and gi.repository.GLib.utf8_collate_key with a non-ascii-only string always results in an UnicodeDecodeError.
Test case:
>from gi.repository import GLib
>asciiText = "a"
>unicodeText = "á"
>asciiText.decode()
b'a'
>unicodeText.decode()
b'\xc3\xa1'
>GLib.utf8_collate_key(asciiText, -1)
'Aa'
>GLib.utf8_collate_key(unicodeText, -1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 1: unexpect
ed end of data
Expected result (from Linux)
>GLib.utf8_collate_key(asciiText, -1)
'a'
>GLib.utf8_collate_key(unicodeText, -1)
'á'
The Windows system's locale is set to Portuguese (Brazil).
Does anybody knows how to solve this? I'm considering rolling my own collating function if I can't get this to work.

Resources