Parse a byte array into a ByteArray - kotlinx.serialization

Parse a byte array into a ByteArray - kotlinx.serialization

from a server I get an array of bytes, but Kotlinx serializable doesen't seem to like it as it can't parse it to the native Kotlin datatype ByteArray and logs the error Unexpected JSON token at offset 902: Failed to parse 'byte':
JSON = "my_bytes": "[[22, 124, 78, etc...], [233, 89, 112, etc...], [etc...]]" and I wanna serialize that into a ByteArray so I can then write a file with that ByteArray with the File.writeBytes(my_byte_array)
My code by the way:
import kotlinx.serialization.Serializable
#Serializable
data class ServerResponse(
val audio: List<ByteArray>
)

Related

Parquet 2.4+ and PyArrow 10.0.1 - Attempting to switch pyarrow column from string to datetime

I attempted to follow the advice of Converting string timestamp to datetime using pyarrow , however my formatting seems to not be accepted by pyarrow
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.compute as pc
table = pa.table({'n_legs': [2, 2, 4, 4, 5, 100],
'query_start_time': ["2022-12-30T19:02:40.466",
"2022-12-30T19:02:40.466",
"2022-12-30T19:02:40.466",
"2022-12-30T19:02:40.466",
"2022-12-30T19:02:40.466",
"2022-12-30T19:02:40.466"]})
pc.strptime(table.column("query_start_time"), format='%Y-%m-%dT%H:%M:%S.%f', unit='ms')
writer = pq.ParquetWriter('example.parquet', table.schema)
writer.write_table(table)
writer.close()
I've attempted removing the T , adding a Z at the end of the formatter and string.. seems instead I need to ..?
Traceback (most recent call last):
File "/home/emcp/Dev/temp_pyarrow/main.py", line 16, in <module>
pc.strptime(table.column("query_start_time"), format='%Y-%m-%d %H:%M:%S.%f', unit='ms')
File "/home/emcp/Dev/temp_pyarrow/venv/lib/python3.10/site-packages/pyarrow/compute.py", line 255, in wrapper
return func.call(args, options, memory_pool)
File "pyarrow/_compute.pyx", line 355, in pyarrow._compute.Function.call
File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Failed to parse string: '2022-12-30T19:02:40.466' as a scalar of type timestamp[ms]
Do I need to manually convert the datetime value into an integer column and THEN change the column again?
EDIT: When I strip the .%f and change units to units='s' it seems to not error.. I am looking into that now

Proper way to convert an image to TF record Format (Writing an image to TFrecord format generate Sparse Tensor)

I am reading an image from the local file system, converting it to bytes format and finally ingesting the image to tf.train.Feature to convert into TFRecord format. Things are working fine until the moment I read the TFrecord and extract the image bytes format which seems to be a sparse format in the end. Below is my code for the complete process flow.
reading df and image file: No Error
import tensorflow as tf
from PIL import Image
img_bytes_list = []
for img_path in df.filepath:
with tf.io.gfile.GFile(img_path, "rb") as f:
raw_img = f.read()
img_bytes_list.append(raw_img)
defining features : No Error
write_features = {'filename': tf.train.Feature(bytes_list=tf.train.BytesList(value=df['filename'].apply(lambda x: x.encode("utf-8")))),
'img_arr':tf.train.Feature(bytes_list=tf.train.BytesList(value=img_bytes_list)),
'width': tf.train.Feature(int64_list=tf.train.Int64List(value=df['width'])),
'height': tf.train.Feature(int64_list=tf.train.Int64List(value=df['height'])),
'img_class': tf.train.Feature(bytes_list=tf.train.BytesList(value=df['class'].apply(lambda x: x.encode("utf-8")))),
'xmin': tf.train.Feature(int64_list=tf.train.Int64List(value=df['xmin'])),
'ymin': tf.train.Feature(int64_list=tf.train.Int64List(value=df['ymin'])),
'xmax': tf.train.Feature(int64_list=tf.train.Int64List(value=df['xmax'])),
'ymax': tf.train.Feature(int64_list=tf.train.Int64List(value=df['ymax']))}
create example: No Error
example = tf.train.Example(features=tf.train.Features(feature=write_features))
writing data in TfRecord Format: No Error
with tf.io.TFRecordWriter('image_data_tfr') as writer:
writer.write(example.SerializeToString())
Read and print data: No Error
read_features = {"filename": tf.io.VarLenFeature(dtype=tf.string),
"img_arr": tf.io.VarLenFeature(dtype=tf.string),
"width": tf.io.VarLenFeature(dtype=tf.int64),
"height": tf.io.VarLenFeature(dtype=tf.int64),
"class": tf.io.VarLenFeature(dtype=tf.string),
"xmin": tf.io.VarLenFeature(dtype=tf.int64),
"ymin": tf.io.VarLenFeature(dtype=tf.int64),
"xmax": tf.io.VarLenFeature(dtype=tf.int64),
"ymax": tf.io.VarLenFeature(dtype=tf.int64)}
reading single example from tfrecords format: No Error
for serialized_example in tf.data.TFRecordDataset(["image_data_tfr"]):
parsed_s_example = tf.io.parse_single_example(serialized=serialized_example,
features=read_features)
reading image data from tfrecords format: No Error
image_raw = parsed_s_example['img_arr']
encoded_jpg_io = io.BytesIO(image_raw)
Here it is giving error: TypeError: a bytes-like object is required, not 'SparseTensor'
image = Image.open(encoded_jpg_io)
width, height = image.size
print(width, height)
Please tell me what changes are required at the input of "image_arr" so that it will not generate sparse tensor and return a byte format ?
Is there anything that I can do to optimize my existing code?

Unicode Decode Error:invalid start byte

The purpose of the code is to create a graph for the decision tree model.
The code is given below.
dot_data=StringIO()
tree.export_graphviz(clf,out_file=dot_data)
graph=py.graph_from_dot_data(dot_data.getvalue())
print(graph)
Image.open(graph.create_png(),mode='r')
On execution, it gives the following error:
Traceback (most recent call last):
File "C:/Ankur/Python36/Python Files/Decision_Tree.py", line 58, in <module>
Image.open(graph.create_png(),mode='r')
File "C:\Ankur\Python36\lib\site-packages\PIL\Image.py", line 2477, in open
fp = builtins.open(filename, "rb")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0:
invalid start byte
I am having a hard time to resolve this error as I don't understand it.

create_png() returns a bytes object while Image.open (from PIL) expects a filename or file object.
Try
import io
Image.open(io.BytesIO(graph.create_png()))
and it should work

gspread update_cells always return 502 with httpsession error

I am currently trying to overwrite and google spreadsheet with new data using gspread api (version 0.4.1) with sheet.update_cells but it keeps giving me 502 with err msg as follows:
The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds. <ins>That’s all we know.
It seems to be http session problem from the stacktrace:
File "/usr/local/lib/python2.7/dist-packages/gspread/models.py", line 476, in update_cells
self.client.post_cells(self, ElementTree.tostring(feed))
File "/usr/local/lib/python2.7/dist-packages/gspread/client.py", line 303, in post_cells
r = self.session.post(url, data, headers=headers)
File "/usr/local/lib/python2.7/dist-packages/gspread/httpsession.py", line 81, in post
return self.request('POST', url, data=data, headers=headers)
File "/usr/local/lib/python2.7/dist-packages/gspread/httpsession.py", line 67, in request
response = func(url, data=data, headers=request_headers)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 111, in post
return request('post', url, data=data, json=json, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 57, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 475, in request
I did a little bit investigation but it seems that the answer sort of varies, so i think i should better just my version here.
The code snippet is really nothing special like something as follows:
sheet = gspread.authorize(credentials).open_by_key(spreadsheet_key).worksheet(worksheet_title)
if not sheet:
return
if not len(new_rows):
return
sheet.resize(len(new_rows), sheet.col_count)
active_range = 'A1:{0}{1}'.format(last_col, len(new_rows))
cell_list = sheet.range(active_range)
k = 0
for row in new_rows:
for field in row:
cell_list[k].value = field
k+=1
sheet.update_cells(cell_list)
where my new_rows are just the new cell value i want to overwrite the sheet with. I don't think it is an authentication issue, as the same code snippet used to work but somehow at some point it keeps giving the 502.

BeautifulSoup character code error

I am using BeautifulSoup for scraping website info. Specifically, I want to gather information on patents from a google search (title, inventors, abstract, etc). I have a list of URLs for each patent, but BeautifulSoup is having trouble with certain sites, giving me the following error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xcc in position 531: invalid continuation byte
Below is the error traceback:
Traceback (most recent call last):
soup = BeautifulSoup(the_page,from_encoding='utf-8')
File "C:\Python27\lib\site-packages\bs4\__init__.py", line 172, in __init__
self._feed()
File "C:\Python27\lib\site-packages\bs4\__init__.py", line 185, in _feed
self.builder.feed(self.markup)
File "C:\Python27\lib\site-packages\bs4\builder\_lxml.py", line 195, in feed
self.parser.close()
File "parser.pxi", line 1209, in lxml.etree._FeedParser.close (src\lxml\lxml.etree.c:90597)
File "parsertarget.pxi", line 142, in lxml.etree._TargetParserContext._handleParseResult (src\lxml\lxml.etree.c:99984)
File "parsertarget.pxi", line 130, in lxml.etree._TargetParserContext._handleParseResult (src\lxml\lxml.etree.c:99807)
File "lxml.etree.pyx", line 294, in lxml.etree._ExceptionContext._raise_if_stored (src\lxml\lxml.etree.c:9383)
File "saxparser.pxi", line 259, in lxml.etree._handleSaxData (src\lxml\lxml.etree.c:95945)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xcc in position 531: invalid continuation byte
I checked the encoding of the site, and it claims to be 'utf-8'. I specified this as an input to BeautifulSoup as well. Below is my code:
import urllib, urllib2
from bs4 import BeautifulSoup
#url = 'https://www.google.com/patents/WO2001019016A1?cl=en' # This one works
url = 'https://www.google.com/patents/WO2006016929A2?cl=en' # This one doesn't work
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
values = {'name' : 'Somebody',
'location' : 'Somewhere',
'language' : 'Python' }
headers = { 'User-Agent' : user_agent }
data = urllib.urlencode(values)
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
the_page = response.read()
print response.headers['content-type']
print response.headers.getencoding()
soup = BeautifulSoup(the_page,from_encoding='utf-8')
I included two urls. One results in an error, the other works fine (labeled as such in the comments). In both cases, I could print the html to the terminal fine, but BeautifulSoup consistently crashed.
Any recommendations? This is my first usage of BeautifulSoup.

You should encode the string in UTF-8:
soup = BeautifulSoup(the_page.encode('UTF-8'))

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Parse a byte array into a ByteArray - kotlinx.serialization

Related

Parquet 2.4+ and PyArrow 10.0.1 - Attempting to switch pyarrow column from string to datetime

Proper way to convert an image to TF record Format (Writing an image to TFrecord format generate Sparse Tensor)

Unicode Decode Error:invalid start byte

gspread update_cells always return 502 with httpsession error

BeautifulSoup character code error

Categories

Resources