Parquet 2.4+ and PyArrow 10.0.1 - Attempting to switch pyarrow column from string to datetime - parquet

I attempted to follow the advice of Converting string timestamp to datetime using pyarrow , however my formatting seems to not be accepted by pyarrow
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.compute as pc
table = pa.table({'n_legs': [2, 2, 4, 4, 5, 100],
'query_start_time': ["2022-12-30T19:02:40.466",
"2022-12-30T19:02:40.466",
"2022-12-30T19:02:40.466",
"2022-12-30T19:02:40.466",
"2022-12-30T19:02:40.466",
"2022-12-30T19:02:40.466"]})
pc.strptime(table.column("query_start_time"), format='%Y-%m-%dT%H:%M:%S.%f', unit='ms')
writer = pq.ParquetWriter('example.parquet', table.schema)
writer.write_table(table)
writer.close()
I've attempted removing the T , adding a Z at the end of the formatter and string.. seems instead I need to ..?
Traceback (most recent call last):
File "/home/emcp/Dev/temp_pyarrow/main.py", line 16, in <module>
pc.strptime(table.column("query_start_time"), format='%Y-%m-%d %H:%M:%S.%f', unit='ms')
File "/home/emcp/Dev/temp_pyarrow/venv/lib/python3.10/site-packages/pyarrow/compute.py", line 255, in wrapper
return func.call(args, options, memory_pool)
File "pyarrow/_compute.pyx", line 355, in pyarrow._compute.Function.call
File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Failed to parse string: '2022-12-30T19:02:40.466' as a scalar of type timestamp[ms]
Do I need to manually convert the datetime value into an integer column and THEN change the column again?
EDIT: When I strip the .%f and change units to units='s' it seems to not error.. I am looking into that now

Related

NP_NAT error when attempting to import pandas_market_calender

I'm trying things out on windows vs linux where I have this working in 3.8 and 3.9.5, but not on windows using anaconda
import sys
sys.path.append("../")
from datetime import time
import pandas as pd
import pandas_market_calendars as mcal
error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\User\Anaconda3\envs\py39\lib\site-packages\pandas_market_calendars\__init__.py", line 19, in <module>
from .calendar_registry import get_calendar, get_calendar_names
File "C:\Users\User\Anaconda3\envs\py39\lib\site-packages\pandas_market_calendars\calendar_registry.py", line 21, in <module>
from .exchange_calendars_mirror import *
File "C:\Users\User\Anaconda3\envs\py39\lib\site-packages\pandas_market_calendars\exchange_calendars_mirror.py", line 9, in <module>
import exchange_calendars
File "C:\Users\User\Anaconda3\envs\py39\lib\site-packages\exchange_calendars\__init__.py", line 16, in <module>
from .calendar_utils import (
File "C:\Users\User\Anaconda3\envs\py39\lib\site-packages\exchange_calendars\calendar_utils.py", line 3, in <module>
from .always_open import AlwaysOpenCalendar
File "C:\Users\User\Anaconda3\envs\py39\lib\site-packages\exchange_calendars\always_open.py", line 5, in <module>
from .exchange_calendar import ExchangeCalendar
File "C:\Users\User\Anaconda3\envs\py39\lib\site-packages\exchange_calendars\exchange_calendar.py", line 27, in <module>
from .calendar_helpers import (
File "C:\Users\User\Anaconda3\envs\py39\lib\site-packages\exchange_calendars\calendar_helpers.py", line 6, in <module>
NP_NAT = np.array([pd.NaT], dtype=np.int64)[0]
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NaTType
'
Posted a github bug
indeed something changed and the culprit is pandas
https://github.com/rsheftel/pandas_market_calendars/issues/137
to solve, install pandas==1.2.5 and it will work
The error is with the underlying exchange_calendars package (https://github.com/gerrymanoim/exchange_calendars). It appears that error has been fixed in that package. If you update the exchange_calendars package everything will work fine. Nothing to do in this package.
Fix: gerrymanoim/exchange_calendars#41

'DataFrame' object has no attribute 'split_frame'

Unable to Split frame using split_frame(). The dataframe is able to show() but I cannot split it. Please help.
Below is a sample of the code I have used.
from h2o.estimators.random_forest import H2ORandomForestEstimator
from h2o.estimators.gbm import H2OGradientBoostingEstimator
from h2o.estimators.deeplearning import H2ODeepLearningEstimator
from h2o.estimators.glm import H2OGeneralizedLinearEstimator
from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator
from __future__ import print_function
temp = spark.read.option("header","true").option("inferSchema","true").csv("hdfs://bda-ns/user/august_week2.csv")
train,test,valid = temp.split_frame(ratios=[.75, .15])
Expected: no error. Data split into test and train data frame.
Actual:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101/lib/spark2/python/pyspark/sql/dataframe.py", line 1182, in __getattr__
"'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
AttributeError: 'DataFrame' object has no attribute 'split_frame'
>>> train,test,valid = temp.split_frame(ratios=[.75, .15])
Traceback (most recent call last):
File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101/lib/spark2/python/pyspark/context.py", line 234, in signal_handler
You could use randomsplit on your spark dataframe.
If you want to use the H2O-3 split_frame method, you would first have to convert your spark frame to an h2o frame. In which case you could use hc.as_h2o_frame(spark_df) where hc is your h2o_context (note: you would also need to create the h2o_context for this to work).

SQL Alchemy with oracle: converting column overflows integer datatype

I extract data from an Oracle database with python 2.7 64 Bit . There is a field of the numeric type with 35 digits: 1200000000000000000000000000005151
If I want to read this field with SQL Alchemy, I get the following error:
File "D:\Produkte\CoCo\Sourcen\ConsultingConnector\src\ais.py", line 253, in tabledata2table
for row in tabledata.yield_per(buffersize).enable_eagerloads(False):
File "C:\Python27\lib\site-packages\sqlalchemy\orm\loading.py", line 98, in instances
util.raise_from_cause(err)
File "C:\Python27\lib\site-packages\sqlalchemy\util\compat.py", line 203, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "C:\Python27\lib\site-packages\sqlalchemy\orm\loading.py", line 71, in instances
fetch = cursor.fetchmany(query._yield_per)
File "C:\Python27\lib\site-packages\sqlalchemy\engine\result.py", line 1166, in fetchmany
self.cursor, self.context)
File "C:\Python27\lib\site-packages\sqlalchemy\engine\base.py", line 1413, in _handle_dbapi_exception
exc_info
File "C:\Python27\lib\site-packages\sqlalchemy\util\compat.py", line 203, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "C:\Python27\lib\site-packages\sqlalchemy\engine\result.py", line 1159, in fetchmany
l = self.process_rows(self._fetchmany_impl(size))
File "C:\Python27\lib\site-packages\sqlalchemy\engine\result.py", line 1076, in _fetchmany_impl
return self.cursor.fetchmany(size)
sqlalchemy.exc.DatabaseError: (cx_Oracle.DatabaseError) ORA-01455: converting column overflows integer datatype (Background on this error at: http://sqlalche.me/e/4xp6)
It seems that SQL Alchemy is trying to cast in int, even though it's a long.
Do you have any idea how I can solve the problem or something like a workaround?
Thanks in advance,
Jassin
I just got the same problem and came up with a solution by converting the data type to character. e.g., "select to_char(ID)".

FileNotFoundError: No such file: 'someones_epi.nii.gzip'

I am trying to load an MRI, I keep getting the following error:
Traceback (most recent call last):
File "F:/Study/Projects/BTSaG/Programs/t3.py", line 2, in <module> epi_img = nib.load('someones_epi.nii.gzip')
File "C:\Users\AnkitaShinde\AppData\Local\Programs\Python\Python35-32\lib\site-packages\nibabel\loadsave.py", line 38, in load raise FileNotFoundError("No such file: '%s'" % filename)
FileNotFoundError: No such file: 'someones_epi.nii.gzip'
The code is used is as follows:
import nibabel as nib
epi_img = nib.load('someones_epi.nii.gzip')
epi_img_data = epi_img.get_data()
epi_img_data.shape(53, 61, 33)
import matplotlib.pyplot as plt
def show_slices(slices):
""" Function to display row of image slices """
fig, axes = plt.subplots(1, len(slices))
for i, slice in enumerate(slices):
axes[i].imshow(slice.T, cmap="gray", origin="lower")
slice_0 = epi_img_data[26, :, :]
slice_1 = epi_img_data[:, 30, :]
slice_2 = epi_img_data[:, :, 16]
show_slices([slice_0, slice_1, slice_2])
plt.suptitle("Center slices for EPI image")
I have also updated the loadsave.py file in nibabel but it didn't work. Please help.
Edit:
The earlier error was resolved. Now another error has been encountered.
Traceback (most recent call last):File "F:\Study\Projects\BTSaG\Programs\t3.py", line 2, in <module> epi_img = nib.load('someones_epi.nii.gzip')
File "C:\Users\AnkitaShinde\AppData\Local\Programs\Python\Python35-32\lib\site-packages\nibabel\loadsave.py", line 47, in load filename)
nibabel.filebasedimages.ImageFileError: Cannot work out file type of "someones_epi.nii.gzip"
This is an old question, however I may have the solution for it.
I just figured out that nibabel.save() does not allow me to have dot . or dash - in the folder names. These can exist in filenames however. In your case, the current path is:
C:\Users\AnkitaShinde\AppData\Local\Programs\Python\Python35-32\Lib\site-packages\nibabel\someones_epi.nii.gzip
I would change it to:
C:\Users\AnkitaShinde\AppData\Local\Programs\Python\Python35_32\Lib\site_packages\nibabel\someones_epi.nii.gzip
This is just to give an example. Of course, I don't mean that you actually change the names of these package folders as it might cause other errors.
The actual solution would be to move the file someones_epi.nii.gzip to the user structure, something like:
C:\Users\AnkitaShinde\Desktop\nibabel\someones_epi.nii.gzip

TypeError: hasattr(): attribute name must be string in pymc

I have looked at the following links but none of them provide the solution I am looking for
https://github.com/pymc-devs/pymc/issues/125
PyMC error : hasattr(): attribute name must be string
I have to write a function which given the priors (and other stuff like data etc) returns a pymc model.
eg
m = pym.Model([fittable_params.values(), rv])
return m
and the in the calling function, when I do mcmc = pymc.MCMC(model)
It gives a long error
Traceback (most recent call last):
File "model_constructor.py", line 81, in <module>
mcmc = pm.MCMC(model)
File "/usr/local/lib/python2.7/dist-packages/pymc-2.3.2-py2.7-linux-i686.egg/pymc/MCMC.py", line 81, in __init__
**kwds)
File "/usr/local/lib/python2.7/dist-packages/pymc-2.3.2-py2.7-linux-i686.egg/pymc/Model.py", line 195, in __init__
Model.__init__(self, input, name, verbose)
File "/usr/local/lib/python2.7/dist-packages/pymc-2.3.2-py2.7-linux-i686.egg/pymc/Model.py", line 98, in __init__
ObjectContainer.__init__(self, input)
File "/usr/local/lib/python2.7/dist-packages/pymc-2.3.2-py2.7-linux-i686.egg/pymc/Container.py", line 605, in __init__
conservative_update(self, input_to_file)
File "/usr/local/lib/python2.7/dist-packages/pymc-2.3.2-py2.7-linux-i686.egg/pymc/Container.py", line 548, in conservative_update
if not hasattr(obj, k):
TypeError: hasattr(): attribute name must be string
On the other hand , if in the function (which returns a model), if I do
m = pm.MCMC([fittable_params.values(), rv])
it is running fine, but the function should return a model so that the user can do whatever he wants with the model in other parts of code.
If the linked solutions don't work for you then as a last resort you can just delete the non-string attributes from the model, since they don't seem to be used anyway.
for key in m.__dict__.keys():
if not isinstance(key, basestring):
del m.__dict__[key]

Resources