HF pre-trained model for inference on multiple GPUs - parallel-processing

I am trying to run inference from a pre-trained HF model on 4 GPUs. Do I need PyTorch Dataloader, or I can use any python function? I used pandas chunk split. My idea is to perform data parallelisation, split the data among 4 GPUs and send the pipeline HF_PIPELINE over each GPUs. Below is my my script. Could you suggest what I am doing wrong?
chunksize= 100
filename = 'dataset.csv'
def run_inference(world_size):
dist.init_process_group("gloo", rank=rank, world_size=world_size)
df_chunk_list = []
with pd.read_csv(filename, chunksize=chunksize) as reader:
for chunk in reader:
print(chunk.head(2))
chunk['prediction'] = list(map(HF_PIPELINE, chunk['text']))
df_chunk_list.append(chunk['prediction'])
print(df_chunk_list)
def main():
world_size = torch.cuda.device_count()
mp.spawn(run_inference,
args=(world_size),
nprocs=world_size,
join=True)
if __name__=="__main__":
main())
However I am getting this error
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/conda/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/opt/conda/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'run_inference' on <module '__main__' (built-in)>
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/conda/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/opt/conda/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'run_inference' on <module '__main__' (built-in)>

Related

how to read from h5py in multiprocessing without errors

I have code like:
def get_df(path, key):
with h5py.File(path) as hdf:
df = pd.DataFrame(np.array(hdf[key]))
return df
def f(key):
df = get_df(path, key)
...transform df...
return df
with multiprocessing.Pool(n_cpus) as pool:
rvals = pool.map(f, keys)
And I'm getting this error many times:
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
and one copy of this error:
Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/lenail/.conda/envs/py38/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/lenail/.conda/envs/py38/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/lenail/.conda/envs/py38/lib/python3.8/multiprocessing/pool.py", line 513, in _handle_workers
cls._maintain_pool(ctx, Process, processes, pool, inqueue,
File "/home/lenail/.conda/envs/py38/lib/python3.8/multiprocessing/pool.py", line 337, in _maintain_pool
Pool._repopulate_pool_static(ctx, Process, processes, pool,
File "/home/lenail/.conda/envs/py38/lib/python3.8/multiprocessing/pool.py", line 326, in _repopulate_pool_static
w.start()
File "/home/lenail/.conda/envs/py38/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/home/lenail/.conda/envs/py38/lib/python3.8/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/home/lenail/.conda/envs/py38/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/home/lenail/.conda/envs/py38/lib/python3.8/multiprocessing/popen_fork.py", line 70, in _launch
self.pid = os.fork()
BlockingIOError: [Errno 11] Resource temporarily unavailable
any ideas why this might be or how to fix it? I had read that h5py supports multiprocessing.
Python 3.8.13 | packaged by conda-forge
>>> h5py.__version__
'3.6.0'

IllegalArgumentException on any functions with GeoDataFrame

I'm trying to do anything with my GeoDataFrame that I'm importing from the Census website, but when I try to dissolve or overlay with other data frames I get:
pygeos.GEOSException: IllegalArgumentException: Argument must be Polygonal or LinearRing
It won't every overlay itself which is weird. I've exploded it to get rid of the multi polygon and buffered the three invalid polygons, but I'm lost. New to geopandas but I can general feel my way around these things. Any help would be appreciated.
Is my installed package the issue?
Download this file
>>> import geopandas as gpd
>>> SHAPE_URL = 'https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_cd116_20m.zip'
>>> cd = gpd.read_file(SHAPE_URL)
>>> cd.dissolve()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/joefedorowicz/Development/virtualenvs/briefcase/lib/python3.9/site-packages/geopandas/geodataframe.py", line 1538, in dissolve
g = self.groupby(group_keys=False, **groupby_kwargs)[self.geometry.name].agg(
File "/Users/joefedorowicz/Development/virtualenvs/briefcase/lib/python3.9/site-packages/pandas/core/groupby/generic.py", line 259, in aggregate
return self._python_agg_general(func, *args, **kwargs)
File "/Users/joefedorowicz/Development/virtualenvs/briefcase/lib/python3.9/site-packages/pandas/core/groupby/groupby.py", line 1173, in _python_agg_general
result, counts = self.grouper.agg_series(obj, f)
File "/Users/joefedorowicz/Development/virtualenvs/briefcase/lib/python3.9/site-packages/pandas/core/groupby/ops.py", line 691, in agg_series
return self._aggregate_series_pure_python(obj, func)
File "/Users/joefedorowicz/Development/virtualenvs/briefcase/lib/python3.9/site-packages/pandas/core/groupby/ops.py", line 741, in _aggregate_series_pure_python
res = func(group)
File "/Users/joefedorowicz/Development/virtualenvs/briefcase/lib/python3.9/site-packages/pandas/core/groupby/groupby.py", line 1160, in <lambda>
f = lambda x: func(x, *args, **kwargs)
File "/Users/joefedorowicz/Development/virtualenvs/briefcase/lib/python3.9/site-packages/geopandas/geodataframe.py", line 1535, in merge_geometries
merged_geom = block.unary_union
File "/Users/joefedorowicz/Development/virtualenvs/briefcase/lib/python3.9/site-packages/geopandas/base.py", line 728, in unary_union
return self.geometry.values.unary_union()
File "/Users/joefedorowicz/Development/virtualenvs/briefcase/lib/python3.9/site-packages/geopandas/array.py", line 652, in unary_union
return vectorized.unary_union(self.data)
File "/Users/joefedorowicz/Development/virtualenvs/briefcase/lib/python3.9/site-packages/geopandas/_vectorized.py", line 892, in unary_union
return _pygeos_to_shapely(pygeos.union_all(data))
File "/Users/joefedorowicz/Development/virtualenvs/briefcase/lib/python3.9/site-packages/pygeos/decorators.py", line 80, in wrapped
return func(*args, **kwargs)
File "/Users/joefedorowicz/Development/virtualenvs/briefcase/lib/python3.9/site-packages/pygeos/set_operations.py", line 388, in union_all
result = lib.unary_union(collections, **kwargs)
pygeos.GEOSException: IllegalArgumentException: Argument must be Polygonal or LinearRing

How to use run_mlm with unlabeled text

I want to fine-tune a pre-trained huggingface model for a particular domain. From this answer I know I can do it using run_mlm.py but I can't understan which format should I use for my text file. I tried to use a simple structure with one document per line and I get the following error:
python run_mlm.py --model_name_or_path "neuralmind/bert-base-portuguese-cased" --train_file ../data/full_corpus.csv --output models/ --do_train
Traceback (most recent call last):
File "run_mlm.py", line 449, in <module>
main()
File "run_mlm.py", line 384, in main
load_from_cache_file=not data_args.overwrite_cache,
File "/mnt/sdb/data-mwon/paperChega/env2/lib/python3.6/site-packages/datasets/dataset_dict.py", line 303, in map
for k, dataset in self.items()
File "/mnt/sdb/data-mwon/paperChega/env2/lib/python3.6/site-packages/datasets/dataset_dict.py", line 303, in <dictcomp>
for k, dataset in self.items()
File "/mnt/sdb/data-mwon/paperChega/env2/lib/python3.6/site-packages/datasets/arrow_dataset.py", line 1260, in map
update_data=update_data,
File "/mnt/sdb/data-mwon/paperChega/env2/lib/python3.6/site-packages/datasets/arrow_dataset.py", line 157, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/mnt/sdb/data-mwon/paperChega/env2/lib/python3.6/site-packages/datasets/fingerprint.py", line 163, in wrapper
out = func(self, *args, **kwargs)
File "/mnt/sdb/data-mwon/paperChega/env2/lib/python3.6/site-packages/datasets/arrow_dataset.py", line 1529, in _map_single
writer.write_batch(batch)
File "/mnt/sdb/data-mwon/paperChega/env2/lib/python3.6/site-packages/datasets/arrow_writer.py", line 278, in write_batch
pa_table = pa.Table.from_pydict(typed_sequence_examples)
File "pyarrow/table.pxi", line 1474, in pyarrow.lib.Table.from_pydict
File "pyarrow/array.pxi", line 322, in pyarrow.lib.asarray
File "pyarrow/array.pxi", line 222, in pyarrow.lib.array
File "pyarrow/array.pxi", line 110, in pyarrow.lib._handle_arrow_array_protocol
File "/mnt/sdb/data-mwon/paperChega/env2/lib/python3.6/site-packages/datasets/arrow_writer.py", line 100, in __arrow_array__
if trying_type and out[0].as_py() != self.data[0]:
File "pyarrow/array.pxi", line 1058, in pyarrow.lib.Array.__getitem__
File "pyarrow/array.pxi", line 540, in pyarrow.lib._normalize_index
IndexError: index out of bounds

RQT error with pyside

I am trying to use rqt_graph in ROS kinetic and I am getting the following error. I am not sure what is causing it and have no idea how to fix it.
It seems to be a problem with an undefined symbole but not sure how this would happen.
rosrun rqt_graph rqt_graph
Traceback (most recent call last):
File "/opt/ros/kinetic/lib/rqt_graph/rqt_graph", line 8, in <module>
sys.exit(main.main(sys.argv, standalone='rqt_graph.ros_graph.RosGraph'))
File "/opt/ros/kinetic/lib/python2.7/dist-packages/rqt_gui/main.py", line 59, in main
return super(Main, self).main(argv, standalone=standalone, plugin_argument_provider=plugin_argument_provider, plugin_manager_settings_prefix=str(hash(os.environ['ROS_PACKAGE_PATH'])))
File "/opt/ros/kinetic/lib/python2.7/dist-packages/qt_gui/main.py", line 340, in main
from python_qt_binding import QT_BINDING
File "/opt/ros/kinetic/lib/python2.7/dist-packages/python_qt_binding/__init__.py", line 55, in <module>
from .binding_helper import loadUi, QT_BINDING, QT_BINDING_MODULES, QT_BINDING_VERSION # #UnusedImport
File "/opt/ros/kinetic/lib/python2.7/dist-packages/python_qt_binding/binding_helper.py", line 252, in <module>
getattr(sys, 'SELECT_QT_BINDING_ORDER', None),
File "/opt/ros/kinetic/lib/python2.7/dist-packages/python_qt_binding/binding_helper.py", line 98, in _select_qt_binding
raise ImportError("Could not find Qt binding (looked for: %s):\n%s" % (', '.join(["'%s'" % b for b in binding_order]), '\n'.join(error_msgs)))
ImportError: Could not find Qt binding (looked for: 'pyqt', 'pyside'):
ImportError for 'pyqt': /usr/lib/python2.7/dist-packages/PyQt5/QtCore.x86_64-linux-gnu.so: undefined symbol: _ZTI13QFileSelector
Traceback (most recent call last):
File "/opt/ros/kinetic/lib/python2.7/dist-packages/python_qt_binding/binding_helper.py", line 89, in _select_qt_binding
QT_BINDING_VERSION = binding_loader(required_modules, optional_modules)
File "/opt/ros/kinetic/lib/python2.7/dist-packages/python_qt_binding/binding_helper.py", line 131, in _load_pyqt
_named_import('PyQt5.%s' % module_name)
File "/opt/ros/kinetic/lib/python2.7/dist-packages/python_qt_binding/binding_helper.py", line 111, in _named_import
module = builtins.__import__(name)
ImportError: /usr/lib/python2.7/dist-packages/PyQt5/QtCore.x86_64-linux-gnu.so: undefined symbol: _ZTI13QFileSelector
ImportError for 'pyside': /usr/lib/x86_64-linux-gnu/libQt5Network.so.5: undefined symbol: _ZN16QLoggingCategoryD1Ev
Traceback (most recent call last):
File "/opt/ros/kinetic/lib/python2.7/dist-packages/python_qt_binding/binding_helper.py", line 89, in _select_qt_binding
QT_BINDING_VERSION = binding_loader(required_modules, optional_modules)
File "/opt/ros/kinetic/lib/python2.7/dist-packages/python_qt_binding/binding_helper.py", line 163, in _load_pyside
_named_import('PySide2.%s' % module_name)
File "/opt/ros/kinetic/lib/python2.7/dist-packages/python_qt_binding/binding_helper.py", line 111, in _named_import
module = builtins.__import__(name)
ImportError: /usr/lib/x86_64-linux-gnu/libQt5Network.so.5: undefined symbol: _ZN16QLoggingCategoryD1Ev
code here
From What I can See In The Log Your pyqt and main Qt5 libs are missing And It's Not ROS Based Error
I Suggest You Install Or Reinstall Those Libs and Make Sure That Python Can Find Them

Error when I Update Module List in OpenERP

I am trying to update my modulo list using the Update Module List menu item, but I get the follwing error:
OpenERP Server Error
Client Traceback (most recent call last):
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/addons/web/http.py", line 204, in dispatch
response["result"] = method(self, **self.params)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/addons/web/controllers/main.py", line 1132, in call_button
action = self._call_kw(req, model, method, args, {})
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/addons/web/controllers/main.py", line 1120, in _call_kw
return getattr(req.session.model(model), method)(*args, **kwargs)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/addons/web/session.py", line 42, in proxy
result = self.proxy.execute_kw(self.session._db, self.session._uid, self.session._password, self.model, method, args, kw)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/addons/web/session.py", line 30, in proxy_method
result = self.session.send(self.service_name, method, *args)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/addons/web/session.py", line 103, in send
raise xmlrpclib.Fault(openerp.tools.ustr(e), formatted_info)
Server Traceback (most recent call last):
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/addons/web/session.py", line 89, in send
return openerp.netsvc.dispatch_rpc(service_name, method, args)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/netsvc.py", line 296, in dispatch_rpc
result = ExportService.getService(service_name).dispatch(method, params)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/service/web_services.py", line 626, in dispatch
res = fn(db, uid, *params)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/osv/osv.py", line 190, in execute_kw
return self.execute(db, uid, obj, method, *args, **kw or {})
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/osv/osv.py", line 132, in wrapper
return f(self, dbname, *args, **kwargs)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/osv/osv.py", line 199, in execute
res = self.execute_cr(cr, uid, obj, method, *args, **kw)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/osv/osv.py", line 187, in execute_cr
return getattr(object, method)(cr, uid, *args, **kw)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/addons/base/module/wizard/base_module_update.py", line 42, in update_module
update, add = module_obj.update_list(cr, uid,)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/addons/base/module/module.py", line 617, in update_list
handler.load_addons()
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/addons/web/http.py", line 580, in load_addons
m = __import__('openerp.addons.' + module)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/modules/module.py", line 133, in load_module
mod = imp.load_module('openerp.addons.' + module_part, f, path, descr)
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/addons/magento_integration-develop/__init__.py", line 9, in <module>
import magento_
File "/opt/bitnami/apps/openerp/lib/openerp-7.0_20140330_231328-py2.7.egg/openerp/addons/magento_integration-develop/magento_.py", line 17, in <module>
import magento
ImportError: No module named magento
I am trying to install a Magento OpenERP connector, but in order to to that I must locate it in the Installed Module list.
Thanks
Please check your file magento_.py, and see whether the class has been called.And if it has been called correctly check whether it has been correctly specified in the import line in your init.py file.

Resources