Failed to load a .bin.gz pre trained words2vecx - gensim

I'm trying to load the pre-trained words2vecs which I've found here (https://github.com/mmihaltz/word2vec-GoogleNews-vectors)
I used the following command:
model = gensim.models.KeyedVectors.load_word2vec_format('word2vec.bin.gz', binary=False)
And it throws this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/deeplearning/anaconda3/lib/python3.6/site-
packages/gensim/models/keyedvectors.py", line 193, in
load_word2vec_format
header = utils.to_unicode(fin.readline(), encoding=encoding)
File "/home/deeplearning/anaconda3/lib/python3.6/gzip.py", line 374,
in readline
return self._buffer.readline(size)
File "/home/deeplearning/anaconda3/lib/python3.6/_compression.py",
line 68, in readinto
data = self.read(len(byte_view))
File "/home/deeplearning/anaconda3/lib/python3.6/gzip.py", line 463,
in read
if not self._read_gzip_header():
File "/home/deeplearning/anaconda3/lib/python3.6/gzip.py", line 411,
in _read_gzip_header
raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b've')

Because the error says, "Not a gzipped file", maybe the file has been inadvertently uncompressed, but still has the misleading .gzip extension? (Try renaming without .gzip, an loading that file.)
Because the filename includes .bin, it is likely a 'binary' style word2vec format. So the optional parameter may need to be binary=True.
The filename you're using, word2vec.bin.gz, does not match the filename at the link you provided, GoogleNews-vectors-negative300.bin.gz. This suggests other changes may have occurred that could cause problems.
The error also reports a 'magic number' (indicative prefix) from the file, b've', that looks like a bit of plain text, rather than the beginning of a real gzip file. You might want to look at the first few lines of the problem file, via something like head word2vec.bin.gz, to see if there are other indications of what it is (other than what you expect it to be).

for example, I chouse this file called "GoogleNews-vectors-negative300.bin.gz"
curl -o - https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz | gunzip > ./GoogleNews-vectors-negative300.bin
Explanation:
represents the standard output. The standard output is what gets redirected by piping with |.
From man curl:
Specifying the output as '-' (a single dash) will force the output to
be done to stdout.
So for your situation, do the same for your file
for more information see the original solution:
issue and solution

Related

"No module named sets" ren'py

ok, I'm translating a game, a visual novel called "katawa shoujo", to be
precise the beta of this game which is not in Spanish
I got the beta files and moved them to ren'py (where it was created) but when running it inside ren'py I get this error:
File "C:\Users\Anthony\Desktop/game/ui_ingamemenu.rpy", line 1, in script
File "C:\Users\Anthony\Desktop/game/ui_ingamemenu.rpy", line 1, in script
File "C:\Users\Anthony\Desktop/game/ui_ingamemenu.rpy", line 2, in <module>
ImportError: No module named sets
-- Full Traceback ------------------------------------------------------------
Full traceback:
File "renpy/bootstrap.py", line 326, in bootstrap
renpy.main.main()
File "renpy/main.py", line 515, in main
renpy.game.context().run(node)
File "C:\Users\Anthony\Desktop/game/ui_ingamemenu.rpy", line 1, in script
File "C:\Users\Anthony\Desktop/game/ui_ingamemenu.rpy", line 1, in script
File "renpy/ast.py", line 923, in execute
renpy.python.py_exec_bytecode(self.code.bytecode, self.hide, store=self.store)
File "renpy/python.py", line 2235, in py_exec_bytecode
exec(bytecode, globals, locals)
File "C:\Users\Anthony\Desktop/game/ui_ingamemenu.rpy", line 2, in <module>
ImportError: No module named sets
Windows-10-10.0.19041
Ren'Py 7.4.8.1895
Sat Sep 25 20:22:04 2021"
I suppose that the error is in the files of the program by the address of folders that it marks:
```While running game code:
File "C:\Users\Anthony\Desktop/game/ui_ingamemenu.rpy", line 1, in script"
but it marks that the user is "anthony" but my user is "A" ... I am very lazy with names xd
I don't have much experience so I would like you to help me, thank you
by the way, I use Atom to edit the scripts
If you need the files, I leave a download link for these:
https://www.mediafire.com/file/10ld61gyzipn0ar/Katawa_Shoujo_Beta_%2528Proyecto_de_traduccion%2529.rar/file?fbclid=IwAR14jNyYJbyyhaS6S2XEU2re5zD6Af2plIyhV40F9hH7ammfE_qaRIfJTVs
I would enter the file it is mentioning and remove the line importing set. I'm guessing this is a very old beta, right? In updated versions of Python you no longer need to import the "set" library, since it's automatically included (can be seen here).
If you don't have prior programming experience: Open the file named "ui_ingamemenu.rpy". On line 1 there should be a line that says "import sets". Just delete that. However, it's hard to know the exact issue without knowing the files you're working with.
Renpy doesn't display errors all at once, though; it stops at the first one it detects. So once you fix this, more may appear.
Buena suerte :)
A few things could cause this:
There could be something wrong with the name of one of your folders. Maybe a special symbol or character in the name of the folders?
Older versions of Python demanded you to import a library name sets, but this is now included in the standard library and can generate that error. Browse the code and look for something like: from sets import Set as set. If you find, delete or comment out this line. You can use ctrl + f to open the search bar.

Continue fine-tuning from saved checkpoints for run_language_modeling.py

I followed the isntructions to fine-tune the pretrained BERT model with a customized corpus, as shown here https://github.com/huggingface/transformers/tree/master/examples/language-modeling. During the process of training, there are checkpoints generated under the specified output_dir, however, when I wanted to continue training from one of the checkpoints (--model_name_or_path=/path-to-ckpt/), it returns the traceback that
Traceback (most recent call last):
File "run_language_modeling.py", line 277, in <module>
main()
File "run_language_modeling.py", line 186, in main
tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path, cache_dir=model_args.cache_dir)
File "H:\Anaconda3\envs\env_name\lib\site-packages\transformers\tokenization_auto.py", line 203, in from_pretrained
return tokenizer_class_py.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "H:\Anaconda3\envs\env_name\lib\site-packages\transformers\tokenization_utils.py", line 902, in from_pretrained
return cls._from_pretrained(*inputs, **kwargs)
File "H:\Anaconda3\envs\env_name\lib\site-packages\transformers\tokenization_utils.py", line 1007, in _from_pretrained
list(cls.vocab_files_names.values()),
OSError: Model name 'C:\\path-to-ckpt\\checkpoint-17500' was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1, bert-base-dutch-cased). We assumed 'C:\\path-to-ckpt\\checkpoint-17500' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.
As I comprehend, there is a mismatch between the files saved in the checkpoint directory and the files needed to load the model, furthermore, it could also be that there is no explicit argument to specify continue training from checkpoints. However, I am not sure whether I complicated things or there is alternative way to do so.

Incorrect path to addons running odoo.py in Bitnami Odoo Stack that is correct in openerp-server.conf

I'm trying to run ~/ERP/bitnami-odoo/python/bin/python ~/ERP/bitnami-odoo/apps/odoo/bin/odoo.py help and now I am getting this error:
~/ERP/bitnami-odoo/python/lib/python2.7/site-packages/setuptools-20.9.0-py2.7.egg/pkg_resources/__init__.py:1268: UserWarning: ~/ERP/bitnami-odoo/.tmp is writable by group/others and vulnerable to attack when used with get_resource_filename. Consider a more secure location (set with .set_extraction_path or the PYTHON_EGG_CACHE environment variable).
Traceback (most recent call last):
File "~/ERP/bitnami-odoo/apps/odoo/bin/odoo.py", line 4, in <module>
__import__('pkg_resources').run_script('odoo==8.0.post20151126', 'odoo.py')
File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 719, in run_script
File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 1504, in run_script
File "~/ERP/bitnami-odoo/apps/odoo/lib/odoo-8.0.post20151126-py2.7.egg/EGG-INFO/scripts/odoo.py", line 160, in <module>
main()
File "~/ERP/bitnami-odoo/apps/odoo/lib/odoo-8.0.post20151126-py2.7.egg/EGG-INFO/scripts/odoo.py", line 157, in main
openerp.cli.main()
File "~/ERP/bitnami-odoo/apps/odoo/lib/odoo-8.0.post20151126-py2.7.egg/openerp/cli/__init__.py", line 58, in main
for m in module.get_modules():
File "~/ERP/bitnami-odoo/apps/odoo/lib/odoo-8.0.post20151126-py2.7.egg/openerp/modules/module.py", line 351, in get_modules
plist.extend(listdir(ad))
File "~/ERP/bitnami-odoo/apps/odoo/lib/odoo-8.0.post20151126-py2.7.egg/openerp/modules/module.py", line 346, in listdir
return map(clean, filter(is_really_module, os.listdir(dir)))
OSError: [Errno 2] No such file or directory: '~/ERP/odoo/openerp/addons'
This directory is not correct (obviously), but I can't find where it is defined. The full path should be ~/ERP/bitnami-odoo/apps/odoo/lib/odoo-8.0.post20151126-py2.7.egg/openerp/addons and this error shows a wrong path without /bitnami-odoo/apps/odoo/lib/odoo-8.0.post20151126-py2.7.egg/
I've tried passing the path to openerp-server.conf ~/ERP/bitnami-odoo/python/bin/python ~/ERP/bitnami-odoo/apps/odoo/bin/odoo.py help -c ~/ERP/bitnami-odoo/apps/odoo/conf/openerp-server.conf, where the addons path is the correct, but the error continues. Is it a bitnami error? If it is not, Where do I can solve that.
Note: I use ~/ to not expose the whole path. ;-)
There is a default config file in ~/.openerp_server.rc.
you can use addons_path = to set addons file.

32bit Python on Win7-64bit, import errors

When I try to launch ipython qtconsole from anywhere except my Python install dir, I get import errors. It seems to only work if I RUN->cmd.exe; cd C:\Users\me\python\Python27; ipython qtconsole;.
Ive tried Batch files, but it still seems if I don't run it within the context of my install dir, it just doesnt work. Always ipython-script that seems to be complaining. Here is the output...(this was from inside ipython with CurrentWorkingDirectory as my Desktop)
In [2]: !ipython qtconsole
Traceback (most recent call last):
File "C:\Users\me\python\Python27\ipython-script.py", line 770, in <module
>
load_entry_point('ipython==0.12', 'console_scripts', 'ipython')()
File "c:\cpd-svn\branches\qt-4.8\win32-12-01\install\bin\lib\site-packages\ipy
thon-0.12-py2.7.egg\IPython\frontend\terminal\ipapp.py", line 402, in launch_new
_instance
File "<string>", line 2, in initialize
File "c:\cpd-svn\branches\qt-4.8\win32-12-01\install\bin\lib\site-packages\ipy
thon-0.12-py2.7.egg\IPython\config\application.py", line 84, in catch_config_err
or
File "c:\cpd-svn\branches\qt-4.8\win32-12-01\install\bin\lib\site-packages\ipy
thon-0.12-py2.7.egg\IPython\frontend\terminal\ipapp.py", line 302, in initialize
File "<string>", line 2, in initialize
File "c:\cpd-svn\branches\qt-4.8\win32-12-01\install\bin\lib\site-packages\ipy
thon-0.12-py2.7.egg\IPython\config\application.py", line 84, in catch_config_err
or
File "c:\cpd-svn\branches\qt-4.8\win32-12-01\install\bin\lib\site-packages\ipy
thon-0.12-py2.7.egg\IPython\core\application.py", line 325, in initialize
File "c:\cpd-svn\branches\qt-4.8\win32-12-01\install\bin\lib\site-packages\ipy
thon-0.12-py2.7.egg\IPython\frontend\terminal\ipapp.py", line 297, in parse_comm
and_line
File "<string>", line 2, in parse_command_line
File "c:\cpd-svn\branches\qt-4.8\win32-12-01\install\bin\lib\site-packages\ipy
thon-0.12-py2.7.egg\IPython\config\application.py", line 84, in catch_config_err
or
File "c:\cpd-svn\branches\qt-4.8\win32-12-01\install\bin\lib\site-packages\ipy
thon-0.12-py2.7.egg\IPython\config\application.py", line 413, in parse_command_l
ine
File "<string>", line 2, in initialize_subcommand
File "c:\cpd-svn\branches\qt-4.8\win32-12-01\install\bin\lib\site-packages\ipy
thon-0.12-py2.7.egg\IPython\config\application.py", line 84, in catch_config_err
or
File "c:\cpd-svn\branches\qt-4.8\win32-12-01\install\bin\lib\site-packages\ipy
thon-0.12-py2.7.egg\IPython\config\application.py", line 349, in initialize_subc
ommand
File "c:\cpd-svn\branches\qt-4.8\win32-12-01\install\bin\lib\site-packages\ipy
thon-0.12-py2.7.egg\IPython\utils\importstring.py", line 40, in import_item
File "c:\cpd-svn\branches\qt-4.8\win32-12-01\install\bin\lib\site-packages\ipy
thon-0.12-py2.7.egg\IPython\frontend\qt\console\qtconsoleapp.py", line 30, in <m
odule>
File "c:\cpd-svn\branches\qt-4.8\win32-12-01\install\bin\lib\site-packages\ipy
thon-0.12-py2.7.egg\IPython\external\qt.py", line 42, in <module>
ImportError: Cannot import PySide >= 1.0.3 or PyQt4 >= 4.7
Like I said, it works perfectly fine from within my install dir, and I am 96% sure my ENV variables PATH,PYTHONPATH,PYTHONHOME are kosher. Ive gone from deleting them, to completely overloading them one directory at a time, with no change in results.
And I have made sure the Wow6432None Python entries exist (I tried not to pollute those).
Ipython is not the only module with errors like this (also with same solution), but this is definitely the most annoying (I can't launch qtconsole from Run Prompt!)
EDIT::I did dig up a start menu entry, and after changing the 'start in' property, I was able to launch qtconsole, but I still don't quite understand why this import is being performed relative to the current directory.
I believe it had to do with qtconsole dependencies living in locations not directly on my defined PATH, and I believe there were some conflicts in my modules search PATHS.
The easiest solution I found was to use the environment variable PYTHONPATH.
It has been a while since my original post, and several things have changed since then, but I believe the important factors were these:
-Be careful of 32bit python installed on 64bit windows.
-Utilize PYTHONPATH to include important sub-directories, such as '/install-dir/Lib' or any others.
If you still have trouble with your env variables, use absolute commands as a failsafe (wrap these up in a script that makes it easier to use),
C:\Users\johnny42\python\Python27\pythonw.exe
C:\Users\johnny42\python\Python27\ipython-script.py qtconsole
The above example should be on ONE LINE. Do not just copy and paste it.
I have Python 32 bit in my 64 bit Windows OS. Even I have a similar error "c:\cpd-svn\branches\qt-4.8\win32-12-01\install\bin". I do not know where is it being set. Can you please share your findings as to where Python is getting this path from? And which path it should be updated with?

boost::filtering_streambuf with gzip_decompressor(), how to access line by line from file

I wrote a Logparser Application and now I want to implement decompression of .gz files. I tried it with boost::iostreams and zlib which seems to work, but I don't know how to handle the input I get from compressed files.
Here's what I do:
input.open(p.source_at(i).c_str(), ios_base::in | ios_base::binary);
boost::iostreams::filtering_streambuf<boost::iostreams::input> in;
in.push(boost::iostreams::gzip_decompressor());
in.push(input);
boost::iostreams::copy(in, cout);
This code is run, if my sourcefile has the .gz ending. The last line outputs the decompressed filestream correctly to cout.
But how can i fetch line by line from the decompressed file? My Program uses getline(input, transfer) to read lines from the input stream, if it's not compressed.
Now I want to read from the decompressed file the same way, but how can I get a new line from in?
The boost decumentation didn't help me much with this.
Thanks in advance!
Ok I found it out. I just had to create an std::istream and pass a reference to the buffer:
std::istream incoming(&in);
getline(incoming, transfer);

Resources