IllegalArgumentException on any functions with GeoDataFrame - geopandas

I'm trying to do anything with my GeoDataFrame that I'm importing from the Census website, but when I try to dissolve or overlay with other data frames I get:
pygeos.GEOSException: IllegalArgumentException: Argument must be Polygonal or LinearRing
It won't every overlay itself which is weird. I've exploded it to get rid of the multi polygon and buffered the three invalid polygons, but I'm lost. New to geopandas but I can general feel my way around these things. Any help would be appreciated.
Is my installed package the issue?
Download this file
>>> import geopandas as gpd
>>> SHAPE_URL = 'https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_cd116_20m.zip'
>>> cd = gpd.read_file(SHAPE_URL)
>>> cd.dissolve()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/joefedorowicz/Development/virtualenvs/briefcase/lib/python3.9/site-packages/geopandas/geodataframe.py", line 1538, in dissolve
g = self.groupby(group_keys=False, **groupby_kwargs)[self.geometry.name].agg(
File "/Users/joefedorowicz/Development/virtualenvs/briefcase/lib/python3.9/site-packages/pandas/core/groupby/generic.py", line 259, in aggregate
return self._python_agg_general(func, *args, **kwargs)
File "/Users/joefedorowicz/Development/virtualenvs/briefcase/lib/python3.9/site-packages/pandas/core/groupby/groupby.py", line 1173, in _python_agg_general
result, counts = self.grouper.agg_series(obj, f)
File "/Users/joefedorowicz/Development/virtualenvs/briefcase/lib/python3.9/site-packages/pandas/core/groupby/ops.py", line 691, in agg_series
return self._aggregate_series_pure_python(obj, func)
File "/Users/joefedorowicz/Development/virtualenvs/briefcase/lib/python3.9/site-packages/pandas/core/groupby/ops.py", line 741, in _aggregate_series_pure_python
res = func(group)
File "/Users/joefedorowicz/Development/virtualenvs/briefcase/lib/python3.9/site-packages/pandas/core/groupby/groupby.py", line 1160, in <lambda>
f = lambda x: func(x, *args, **kwargs)
File "/Users/joefedorowicz/Development/virtualenvs/briefcase/lib/python3.9/site-packages/geopandas/geodataframe.py", line 1535, in merge_geometries
merged_geom = block.unary_union
File "/Users/joefedorowicz/Development/virtualenvs/briefcase/lib/python3.9/site-packages/geopandas/base.py", line 728, in unary_union
return self.geometry.values.unary_union()
File "/Users/joefedorowicz/Development/virtualenvs/briefcase/lib/python3.9/site-packages/geopandas/array.py", line 652, in unary_union
return vectorized.unary_union(self.data)
File "/Users/joefedorowicz/Development/virtualenvs/briefcase/lib/python3.9/site-packages/geopandas/_vectorized.py", line 892, in unary_union
return _pygeos_to_shapely(pygeos.union_all(data))
File "/Users/joefedorowicz/Development/virtualenvs/briefcase/lib/python3.9/site-packages/pygeos/decorators.py", line 80, in wrapped
return func(*args, **kwargs)
File "/Users/joefedorowicz/Development/virtualenvs/briefcase/lib/python3.9/site-packages/pygeos/set_operations.py", line 388, in union_all
result = lib.unary_union(collections, **kwargs)
pygeos.GEOSException: IllegalArgumentException: Argument must be Polygonal or LinearRing

Related

How to use run_mlm with unlabeled text

I want to fine-tune a pre-trained huggingface model for a particular domain. From this answer I know I can do it using run_mlm.py but I can't understan which format should I use for my text file. I tried to use a simple structure with one document per line and I get the following error:
python run_mlm.py --model_name_or_path "neuralmind/bert-base-portuguese-cased" --train_file ../data/full_corpus.csv --output models/ --do_train
Traceback (most recent call last):
File "run_mlm.py", line 449, in <module>
main()
File "run_mlm.py", line 384, in main
load_from_cache_file=not data_args.overwrite_cache,
File "/mnt/sdb/data-mwon/paperChega/env2/lib/python3.6/site-packages/datasets/dataset_dict.py", line 303, in map
for k, dataset in self.items()
File "/mnt/sdb/data-mwon/paperChega/env2/lib/python3.6/site-packages/datasets/dataset_dict.py", line 303, in <dictcomp>
for k, dataset in self.items()
File "/mnt/sdb/data-mwon/paperChega/env2/lib/python3.6/site-packages/datasets/arrow_dataset.py", line 1260, in map
update_data=update_data,
File "/mnt/sdb/data-mwon/paperChega/env2/lib/python3.6/site-packages/datasets/arrow_dataset.py", line 157, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/mnt/sdb/data-mwon/paperChega/env2/lib/python3.6/site-packages/datasets/fingerprint.py", line 163, in wrapper
out = func(self, *args, **kwargs)
File "/mnt/sdb/data-mwon/paperChega/env2/lib/python3.6/site-packages/datasets/arrow_dataset.py", line 1529, in _map_single
writer.write_batch(batch)
File "/mnt/sdb/data-mwon/paperChega/env2/lib/python3.6/site-packages/datasets/arrow_writer.py", line 278, in write_batch
pa_table = pa.Table.from_pydict(typed_sequence_examples)
File "pyarrow/table.pxi", line 1474, in pyarrow.lib.Table.from_pydict
File "pyarrow/array.pxi", line 322, in pyarrow.lib.asarray
File "pyarrow/array.pxi", line 222, in pyarrow.lib.array
File "pyarrow/array.pxi", line 110, in pyarrow.lib._handle_arrow_array_protocol
File "/mnt/sdb/data-mwon/paperChega/env2/lib/python3.6/site-packages/datasets/arrow_writer.py", line 100, in __arrow_array__
if trying_type and out[0].as_py() != self.data[0]:
File "pyarrow/array.pxi", line 1058, in pyarrow.lib.Array.__getitem__
File "pyarrow/array.pxi", line 540, in pyarrow.lib._normalize_index
IndexError: index out of bounds

asyncio_redis and aioredis error on getting a list of keys

When I try to get values for a list of keys using asyncio_redis or aioredis, I am getting the following error. I know it is about something python socket, but unable to resolve the error. I attached both the code and error log with this issue. Here keys are a list of large byte arrays. get_params_redis is called by multiple processes. Any help would be appreciated, thanks!
async def multi_get_key_redis(keys):
redis = await aioredis.create_redis_pool(
'redis://localhost')
result =[]
for key in keys:
result.append(await redis.get(key))
# assert result == await asyncio.gather(*keys)
# return result
redis.close()
await redis.wait_closed()
print(result)
return result
def get_params_redis(shapes):
i = -1
params=[]
keys = []
for s in range(len(shapes)):
keys.append(s)
values = asyncio.get_event_loop().run_until_complete(multi_get_key_redis(keys))
for shape in shapes:
i = i + 1
param_np = pc._loads(values[i]).reshape(shape)
param_tensor = torch.nn.Parameter(torch.from_numpy(param_np))
params.append(param_tensor)
return params
Error Log:
Process Process-1:
Traceback (most recent call last):
File "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/Users/srujithpoondla/largescaleml_project/train_redis.py", line 33, in train_redis
train_redis_epoch(epoch, args, model, train_loader, optimizer,shapes_len, loop)
File "/Users/srujithpoondla/largescaleml_project/train_redis.py", line 43, in train_redis_epoch
params = get_params_redis(shapes_len,loop)
File "/Users/srujithpoondla/largescaleml_project/common_functions.py", line 76, in get_params_redis
params = loop.run_until_complete(multi_get_key_redis(keys))
File "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/base_events.py", line 454, in run_until_complete
self.run_forever()
File "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/base_events.py", line 421, in run_forever
self._run_once()
File "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/base_events.py", line 1395, in _run_once
event_list = self._selector.select(timeout)
File "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/selectors.py", line 577, in select
kev_list = self._kqueue.control(None, max_ev, timeout)
OSError: [Errno 9] Bad file descriptor

pyspark.sql.utils.IllegalArgumentException: u'java.net.UnknownHostException: user'

I am new to Pyspark and I am trying to do a simple count. However it is giving me this error. The text file is inside hdfs.
CODE:
>>> mydata = sc.textFile("hdfs://user/poem.txt")
>>> mydata.count()
ERROR:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/spark-2.0.1-bin-hadoop2.7/python/pyspark/rdd.py", line 1008, in count
return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
File "/usr/local/lib/spark-2.0.1-bin-hadoop2.7/python/pyspark/rdd.py", line 999, in sum
return self.mapPartitions(lambda x: [sum(x)]).fold(0, operator.add)
File "/usr/local/lib/spark-2.0.1-bin-hadoop2.7/python/pyspark/rdd.py", line 873, in fold
vals = self.mapPartitions(func).collect()
File "/usr/local/lib/spark-2.0.1-bin-hadoop2.7/python/pyspark/rdd.py", line 776, in collect
port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
File "/usr/local/lib/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/usr/local/lib/spark-2.0.1-bin-hadoop2.7/python/pyspark/sql/utils.py", line 79, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u'java.net.UnknownHostException: user'
You are missing a "/"
r = sc.textFile("hdfs://user/myFile")
r.count()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p1464.1349/lib/spark/python/pyspark/rdd.py", line 1004, in count
return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
File "/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p1464.1349/lib/spark/python/pyspark/rdd.py", line 995, in sum
return self.mapPartitions(lambda x: [sum(x)]).fold(0, operator.add)
File "/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p1464.1349/lib/spark/python/pyspark/rdd.py", line 869, in fold
vals = self.mapPartitions(func).collect()
File "/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p1464.1349/lib/spark/python/pyspark/rdd.py", line 771, in collect
port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
File "/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p1464.1349/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p1464.1349/lib/spark/python/pyspark/sql/utils.py", line 53, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u'java.net.UnknownHostException: user'
However, if you do
>>> r = sc.textFile("hdfs:///user/myFile")
>>> r.count()
318199
it is because hdfs:// is the URI. And in Fullly qualified syntax, it should be hdfs:///. Hence, Spark is thinking the token "user" as NN-Host

Error when I Update Module List in OpenERP

I am trying to update my modulo list using the Update Module List menu item, but I get the follwing error:
OpenERP Server Error
Client Traceback (most recent call last):
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/addons/web/http.py", line 204, in dispatch
response["result"] = method(self, **self.params)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/addons/web/controllers/main.py", line 1132, in call_button
action = self._call_kw(req, model, method, args, {})
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/addons/web/controllers/main.py", line 1120, in _call_kw
return getattr(req.session.model(model), method)(*args, **kwargs)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/addons/web/session.py", line 42, in proxy
result = self.proxy.execute_kw(self.session._db, self.session._uid, self.session._password, self.model, method, args, kw)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/addons/web/session.py", line 30, in proxy_method
result = self.session.send(self.service_name, method, *args)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/addons/web/session.py", line 103, in send
raise xmlrpclib.Fault(openerp.tools.ustr(e), formatted_info)
Server Traceback (most recent call last):
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/addons/web/session.py", line 89, in send
return openerp.netsvc.dispatch_rpc(service_name, method, args)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/netsvc.py", line 296, in dispatch_rpc
result = ExportService.getService(service_name).dispatch(method, params)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/service/web_services.py", line 626, in dispatch
res = fn(db, uid, *params)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/osv/osv.py", line 190, in execute_kw
return self.execute(db, uid, obj, method, *args, **kw or {})
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/osv/osv.py", line 132, in wrapper
return f(self, dbname, *args, **kwargs)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/osv/osv.py", line 199, in execute
res = self.execute_cr(cr, uid, obj, method, *args, **kw)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/osv/osv.py", line 187, in execute_cr
return getattr(object, method)(cr, uid, *args, **kw)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/addons/base/module/wizard/base_module_update.py", line 42, in update_module
update, add = module_obj.update_list(cr, uid,)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/addons/base/module/module.py", line 617, in update_list
handler.load_addons()
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/addons/web/http.py", line 580, in load_addons
m = __import__('openerp.addons.' + module)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/modules/module.py", line 133, in load_module
mod = imp.load_module('openerp.addons.' + module_part, f, path, descr)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/addons/magento_integration-develop/__init__.py", line 9, in <module>
import magento_
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/addons/magento_integration-develop/magento_.py", line 17, in <module>
import magento
ImportError: No module named magento
I am trying to install a Magento OpenERP connector, but in order to to that I must locate it in the Installed Module list.
Thanks
Please check your file magento_.py, and see whether the class has been called.And if it has been called correctly check whether it has been correctly specified in the import line in your init.py file.

gdata.docs.client.DocsClient

I have the following code, reads oauth2 token form file, then try's to perform a doc's list query to find a specific spreadsheet that I want to copy, however no matter what I try the code either errors out or returns with an object containing no document data.
I am using gdata.docs.client.DocsClient which as far as I can tell is version 3 of the API
def CreateClient():
"""Create a Documents List Client."""
client = gdata.docs.client.DocsClient(source=config.APP_NAME)
client.http_client.debug = config.DEBUG
# Authenticate the user with CLientLogin, OAuth, or AuthSub.
if os.path.exists(config.CONFIG_FILE):
f = open(config.CONFIG_FILE)
tok = pickle.load(f)
f.close()
client.auth_token = tok.auth_token
return client
1st query attempt
def get_doc():
new_api_query = gdata.docs.client.DocsQuery(title='RichSheet', title_exact=True, show_collections=True)
d = client.GetResources(q = new_api_query)
this fails with the following stack trace
Traceback (most recent call last):
File "/Users/richard/PycharmProjects/reportone/make_my_report.py", line 83, in <module>
get_doc()
File "/Users/richard/PycharmProjects/reportone/make_my_report.py", line 57, in get_doc
d = client.GetResources(q = new_api_query)
File "/Users/richard/PycharmProjects/reportone/gdata/docs/client.py", line 151, in get_resources
**kwargs)
File "/Users/richard/PycharmProjects/reportone/gdata/client.py", line 640, in get_feed
**kwargs)
File "/Users/richard/PycharmProjects/reportone/gdata/docs/client.py", line 66, in request
return super(DocsClient, self).request(method=method, uri=uri, **kwargs)
File "/Users/richard/PycharmProjects/reportone/gdata/client.py", line 267, in request
uri=uri, auth_token=auth_token, http_request=http_request, **kwargs)
File "/Users/richard/PycharmProjects/reportone/atom/client.py", line 115, in request
self.auth_token.modify_request(http_request)
File "/Users/richard/PycharmProjects/reportone/gdata/gauth.py", line 1047, in modify_request
token_secret=self.token_secret, verifier=self.verifier)
File "/Users/richard/PycharmProjects/reportone/gdata/gauth.py", line 668, in generate_hmac_signature
next, token, verifier=verifier)
File "/Users/richard/PycharmProjects/reportone/gdata/gauth.py", line 629, in build_oauth_base_string
urllib.quote(params[key], safe='~')))
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 1266, in quote
if not s.rstrip(safe):
AttributeError: 'bool' object has no attribute 'rstrip'
Process finished with exit code 1
then my second attempt
def get_doc():
other = gdata.docs.service.DocumentQuery(text_query='RichSheet')
d = client.GetResources(q = other)
this returns an ResourceFeed object, but has no content. I have been through the source code for these function but thing are not any obvious.
Have i missed something ? or should i go back to version 2 of the api ?

Resources