huggingface/transformers: cache directory - caching

I'm trying to use huggingface transformers.
(Win 11, Python 3.9, jupyternotebook, virtual environment)
When I ran code:
from transformers import pipeline
print(pipeline('sentiment-analysis')('I hate you'))
I got an error :
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'C:\Users\user/.cache\huggingface'
There's no directory named '.cache' in my user folder,
so I used cache_dir="./cache"
but I want to change the path of the directory permanently.
P.S.
import os
os.environ['TRANSFORMERS_CACHE'] = './cache'
also didn't work.

Related

ImportError: libc10_cuda.so: cannot open shared object file: No such file or directory

I'm now trying to train my own model in a conda environment, using ABCNet with Ubuntu 16.04, CUDA 10.2. Got a complaint: "libc10_cuda.so: cannot open shared object file: No such file or directory". I reinstalled CUDA-10.2 but the problem still remains. For more details, please see the transcripts below.
OMP_NUM_THREADS=1 python tools/train_net.py --config-file configs/BAText/Pretrain/attn_R_50.yaml OUTPUT_DIR text_pretraining/attn_R_50
Traceback (most recent call last): File "tools/train_net.py", line 40, in from adet.data.dataset_mapper import DatasetMapperWithBasis File "/home/zzr/AdelaiDet/adet/init.py",
line 1, in from adet import modeling File "/home/zzr/AdelaiDet/adet/modeling/init.py",
line 2, in from .fcos import FCOS File "/home/zzr/AdelaiDet/adet/modeling/fcos/init.py",
line 1, in from .fcos import FCOS File "/home/zzr/AdelaiDet/adet/modeling/fcos/fcos.py",
line 10, in from adet.layers import DFConv2d, NaiveGroupNorm File "/home/zzr/AdelaiDet/adet/layers/init.py",
line 5, in from .bezier_align import BezierAlign File "/home/zzr/AdelaiDet/adet/layers/bezier_align.py",
line 7, in from adet import _C
ImportError: libc10_cuda.so: cannot open shared object file: No such file or directory

findspark.init() failing - Cannot get SPARK_HOME environment variables set correctly

I'm new to using Spark and I'm attempting play with Spark on my local (windows) machine using Jupyter Notebook
I've been following several tutorials for setting environment variables, as well as using multiple functions to do so via Python and cmd, and I cannot get any introductory PySpark code to work.
When running (in Jupyter Notebook, using Python)
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext('lcoal', 'Spark SQL')
OR
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext('C:\spark\spark-2.4.3-bin-hadoop2.7', 'Spark SQL')
I get the error:
FileNotFoundError: [WinError 2] The system cannot find the file specified
Additionally,
I attempted using findspark and run into the issue:
findspark.init()
OR
findspark.init("C:\spark\spark-2.4.3-bin-hadoop2.7")
I get the error:
IndexError: list index out of range
Which, from other posts around this topic I've been led to believe that the SPARK_HOME variable could be set incorrectly.
My Environment variables are as follows:
My spark was extracted here:
C:\spark\spark-2.4.3-bin-hadoop2.7
HADOOP_HOME: C:\spark\spark-2.4.3-bin-hadoop2.7
SPARK_HOME: C:\spark\spark-2.4.3-bin-hadoop2.7
JAVA_HOME: C:\Program Files\Java\jdk1.8.0_201
All of these including %SPARK_HOME%\bin have been added to my PATH variable.
Lastly, When I cmd > cd %SPARK_HOME% it correctly brings me to the right directory, \spark\spark-2.4.3-bin-hadoop2.7
As far as I can see, there are no issues with my environment variables so I'm unsure why pyspark through Juputer notebook cannot find my spark_home (or maybe that's not the issue).
Would appreciate any and all help!
Thanks!
You seem to have done rest of the process, just one step needs to be done.In Jupyter NB, run the below command :
import os
os.environ['SPARK_HOME'] = 'C:\\Users\\user_name\\Desktop\\spark'
It should add this path to your environment variable. You can also check if it sets the correct path as expected by running below command in Jupyter NB:
%env
OR
for var in os.environ():
print(var,':',os.environ[var])
PS. Please mind the indentation of the codes

umap collectstatic gives "No such file or directory" error

As I have very little knowledge about Linux, pretty much all I can do is copy and paste things from a good tutorial and in most cases simply hope nothing goes wrong. I really tried finding a solution on my own and searching the internet but to no avail (I found a number of quite similar things but no solution I understood enough to be able to adapt it on my own to fix my problem).
I've installed an osm tile server using this amazing tutorial and it works like a charm. Now I want to install umap, using this tutorial.
Everything works fine until I get to the line "umap collectstatic". The error I get is this:
(venv) $ sudo umap collectstatic
[sudo] Passwort für umap2:
You have requested to collect static files at the destination
location as specified in your settings:
/home/ybon/.virtualenvs/umap/var/static
This will overwrite existing files!
Are you sure you want to do this?
Type 'yes' to continue, or 'no' to cancel: yes
Traceback (most recent call last):
File "/usr/local/bin/umap", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python2.7/dist-packages/umap/bin/__init__.py", line 12, in main
management.execute_from_command_line()
File "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", line 367, in execute_from_command_line
utility.execute()
File "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", line 359, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 294, in run_from_argv
self.execute(*args, **cmd_options)
File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 345, in execute
output = self.handle(*args, **options)
File "/usr/local/lib/python2.7/dist-packages/django/contrib/staticfiles/management/commands/collectstatic.py", line 193, in handle
collected = self.collect()
File "/usr/local/lib/python2.7/dist-packages/django/contrib/staticfiles/management/commands/collectstatic.py", line 115, in collect
for path, storage in finder.list(self.ignore_patterns):
File "/usr/local/lib/python2.7/dist-packages/django/contrib/staticfiles/finders.py", line 112, in list
for path in utils.get_files(storage, ignore_patterns):
File "/usr/local/lib/python2.7/dist-packages/django/contrib/staticfiles/utils.py", line 28, in get_files
directories, files = storage.listdir(location)
File "/usr/local/lib/python2.7/dist-packages/django/core/files/storage.py", line 399, in listdir
for entry in os.listdir(path):
OSError: [Errno 2] No such file or directory: '/home/ybon/Code/js/Leaflet.Storage'
Now, I get the something might be wrong with a setting in a config file somewhere, but changing the directory in local.py
doesn't seem to do anything (like I have set it to STATIC_ROOT = '/home/xxx_myusername_xxx/umap/var/static') - I have no idea where this "/home/ybon/Code/..." path even comes from! What settings ?
I sure didn't specify THIS path anywhere! And the folder is indeed nowhere to be found on my machine. Maybe using virtualenv is somehow generating it, and I can't find it on my machine because it IS virtual (as in "not really there physically") but this is just a very wild guess and I don't really know what I'm talking about.
(I tried running the command with and without sudo and it doesn't change anything).
I have always wanted to install a tile server and have tried the tutorials you have given today. So I'm a learner like you!
Installing the Tile Server with the tutorial https://www.linuxbabe.com/linux-server/openstreetmap-tile-server-ubuntu-16-04 was really straightforward. I only used the part for Rhineland Palatinate.
With Umap (https://umap-project.readthedocs.io/en/latest/ubuntu/#tutorial) I had some problems.
1. A port was used twice. I changed the port for Apache.
2. After creating the local configuration (wget https://raw.githubusercontent.com/umap-project/umap/master/umap/settings/local.py.sample -O /etc/umap/umap.conf) this file was not immediately recognized. I helped myself by changing the file before executing the command "umap migrate".
I have made the following changes:
# For static deployment
STATIC_ROOT = '/etc/umap/var/static'
# For users' statics (geojson mainly)
MEDIA_ROOT = '/etc/umap/umap/var/data'
# Umap Settings
UMAP_SETTINGS='/etc/umap/umap.conf'
STATIC_ROOT und MEDIA_ROOT I have changed, because so the user umap has all permissions. Then I set the envirement variable UMAP_SETTINGS because otherwise the settings file /etc/umap/umap.conf is not found.
( I also have no idea where this "/home/ybon/Code/..." path comes from. After the configuration file is properly loaded, the path is loaded from the configuration file. That's why that's not important anymore. )
Now I could use the following commands without errors:
(venv) $ umap collectstatic
Loaded local config from /etc/umap/umap.conf
You have requested to collect static files at the destination
location as specified in your settings:
/etc/umap/var/static
This will overwrite existing files!
Are you sure you want to do this?
Type 'yes' to continue, or 'no' to cancel: yes
Copying '/srv/umap/venv/lib/python3.5/site-packages/umap/static/favicon.ico'
...
290 static files copied to '/etc/umap/var/static'.
(venv) $ umap storagei18n
Loaded local config from /etc/umap/umap.conf
Processing English
Found file /etc/umap/var/static/storage/src/locale/en.json
Exporting to /etc/umap/var/static/storage/src/locale/en.js
..
Processing Deutsch
Found file /etc/umap/var/static/storage/src/locale/de.json
..
Found file /etc/umap/var/static/storage/src/locale/sk_SK.json
Exporting to /etc/umap/var/static/storage/src/locale/sk_SK.js
(venv) $ umap createsuperuser
Loaded local config from /etc/umap/umap.conf
Username (leave blank to use 'umap'):
Email address:
Password:
Password (again):
Superuser created successfully.
(venv) $ umap runserver 0.0.0.0:8000
Loaded local config from /etc/umap/umap.conf
Loaded local config from /etc/umap/umap.conf
Performing system checks...
System check identified no issues (0 silenced).
April 09, 2018 - 14:02:15
Django version 1.10.5, using settings 'umap.settings'
Starting development server at http://0.0.0.0:8000/
And finally I was able to use umap.

GnuPG home directory

The following is the code that I have been trying.
import os
import gnupg
import pdb
pdb.set_trace()
gpg = gnupg.GPG(gnupghome='new')
input_data = gpg.gen_key_input(
key_type="RSA",key_length=1024,
passphrase='mounika')
key = gpg.gen_key(input_data)
with open(local.txt,'rb')as f:
status=gpg.encrypt_file(f)
And the following is the error message being generated.
C:\Python27\python.exe C:/SAAS/encrypt.py
Traceback (most recent call last):
File "C:/SAAS/encrypt.py", line 4, in <module>
gpg = gnupg.GPG(gnupghome='new')
File "C:\Python27\lib\site-packages\gnupg.py", line 755, in __init__
raise OSError(msg)
OSError: Unable to run gpg - it may not be available.
Process finished with exit code 1
I am fairly new to GnuPG and after doing a bit of research I tried replacing gnupghome with homedir. But this is raising another error that homedir is an unexpected keyword.Can soneone pls help me with this issue. Any help would be appreciated.
You need to install the gpg program and make sure it is in your PATH. Or provide the full path to the gpg binary in your constructor, like
gpg = gnupg.GPG(gnupghome='new', gpgbinary='C:\\path\\to\\GnuPG\\pub\\gpg.exe')
Check also the Deployment Requirements for the python-gnupg package for more info.
As i encountered, the main reason is your local machine or server doesn't have gnupg package, so you need to install it with apt or whatever your package manager is. That could solve the problem.
in the latest version(2.2.0) of python-gnupg(imported as gnupg),
gnupghome=homedir(where the keyring etc is stored).
A few other things:
Binaries(in windows, that's the exe file) are defined as 'binary'.
best to specify a fingerprint, perhaps your local.txt should be a string('local.txt'),
and I think you're using the saltycrane blog post which is a little outdated at the moment.
So the below should work(OP code refactored):
import os
import gnupg
import pdb
pdb.set_trace()
gpg = gnupg.GPG(homedir='new',
binary="C:/Progra~2/GNU/GnuPG/pub/gpg2.exe")
input_data = gpg.gen_key_input(
key_type="RSA",key_length=1024,
passphrase='mounika')
key = gpg.gen_key(input_data)
with open(local.txt,'rb')as f:
status=gpg.encrypt(f, key.fingerprint)
print status.ok
print status.status
print status.stderr
I reckon your code is just failing silently.

Using numpy on hadoop streaming: ImportError: cannot import name multiarray

for one of my projects I am using basically using NLTK for pos tagging, which internally uses a 'english.pickle' file. I managed to package the nltk library with these pickle files to make them available to mapper and reducer for hadoop streaming job using -file option.
However, when nltk library is trying to load that pickle file, it gives error for numpy- since the cluster I am running this job does not have numpy installed. Also, I don't have root access thus, can't install numpy or any other package on cluster. So the only way is to package the python modules to make it available for mapper and reducer. I successfully managed to do that. But now the problem is when numpy is imported, it imports multiarray by default( as seen in init.py) and this is where I am getting the error:
File "/usr/lib64/python2.6/pickle.py", line 1370, in load
return Unpickler(file).load()
File "/usr/lib64/python2.6/pickle.py", line 858, in load
dispatch[key](self)
File "/usr/lib64/python2.6/pickle.py", line 1090, in load_global
klass = self.find_class(module, name)
File "/usr/lib64/python2.6/pickle.py", line 1124, in find_class
__import__(module)
File "numpy.mod/numpy/__init__.py", line 170, in <module>
File "numpy.mod/numpy/add_newdocs.py", line 13, in <module>
File "numpy.mod/numpy/lib/__init__.py", line 8, in <module>
File "numpy.mod/numpy/lib/type_check.py", line 11, in <module>
File "numpy.mod/numpy/core/__init__.py", line 6, in <module>
ImportError: cannot import name multiarray
I tried moving numpy directory on my local machine that contains multiarray.pyd, to the cluster to make it available to mapper and reducer but this didn't help.
Any input on how to resolve this(keeping the constraint that I cannot install anything on cluster machines)?
Thanks!

Resources