Huggingface pre-trained model - huggingface-transformers

I try to use the below code:
from transformers import AutoTokenizer, AutoModel
t = "ProsusAI/finbert"
tokenizer = AutoTokenizer.from_pretrained(t)
model = AutoModel.from_pretrained(t)
The error: I think this error is due to the old version of transformers not having such pre-trained model. I checked and its confirmed.
/usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
380 f"- or '{pretrained_model_name_or_path}' is the correct path to a directory containing a {CONFIG_NAME} file\n\n"
381 )
--> 382 raise EnvironmentError(msg)
383
384 except json.JSONDecodeError:
OSError: Can't load config for 'ProsusAI/finbert'. Make sure that:
- 'ProsusAI/finbert' is a correct model identifier listed on 'https://huggingface.co/models'
- or 'ProsusAI/finbert' is the correct path to a directory containing a config.json file
My current versions:
python 3.7
transformers 3.4.0
I understand that my transformers version is old but that is the only version that is compatible with python 3.7. Also, the reason why I cant upgrade it to 3.9 is because I am using the below multimodal-transformers which only support up to 3.7.
Reasons:
https://multimodal-toolkit.readthedocs.io/en/latest/ <- this only support up to 3.7
Transformers only up to 3.4.0 supported by python 3.7.
I need to use multimodal-transformers because it is easy to do text classification with tabular data. My dataset has text and category columns so I wish to use both, so this is the easiest practice I found. (If you have any suggestion, please do share with me thank you. )
My question is, is there a way to use the latest pre-trained model despite having the old tranformers?

Related

Error visualizing the attention maps for an attention model trained in Tensorflow Keras

I am trying to reproduce the notebook at https://www.kaggle.com/kmader/cardiomegaly-pretrained-vgg16/notebook to train a VGG-16 model with attention. The model architecture and training are exactly as discussed in the notebook. I am able to train and save the attention model weights. The error occurs when trying to visualize the attention maps using the trained model.
import tensorflow.keras.backend as K
attn_func = K.function(inputs = [attn_model.get_input_at(0),
K.learning_phase()],
outputs = [attn_layer.get_output_at(0)]
)
The error is:
ValueError: Input tensors to a Functional must come from `tf.keras.Input`. Received: 0 (missing previous layer metadata).
I am using Tensorflow version 2.6.2. The notebook using Tensorflow 1. x version.

Can't load the pre-trained word2vec of korean language

I would like to download and load the pre-trained word2vec for analyzing Korean text.
I download the pre-trained word2vec here: https://drive.google.com/file/d/0B0ZXk88koS2KbDhXdWg1Q2RydlU/view?resourcekey=0-Dq9yyzwZxAqT3J02qvnFwg
from the Github Pre-trained word vectors of 30+ languages: https://github.com/Kyubyong/wordvectors
My gensim version is 4.1.0, thus I used:
KeyedVectors.load_word2vec_format('./ko.bin', binary=False) to load the model. But there was an error that :
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
I already tried many options including in stackoverflow and Github, but it still not work well.
Would you mind letting me the suitable solution?
Thanks,
While the page at https://github.com/Kyubyong/wordvectors isn't clear about the formats this author has chosen, by looking at their source code at...
https://github.com/Kyubyong/wordvectors/blob/master/make_wordvectors.py#L61
...shows it using the Gensim model .save() method.
Such saved models should be reloaded using the .load() class method of the same model class. For example, if a Word2Vec model was saved with...
model.save('language.bin')
...then it could be reloaded with...
loaded_model = Word2Vec.load('language.bin')
Note, through, that:
Models saved this way are often split over multiple files that should be kept together (and all start with the same root name) - but I don't see those here.
This work appears to be ~5 years old, based on a pre-1.0 version of Gensim – so there might be issues loading the models directly into the latest Gensim. If you do run into such issues, & absolutely need to make these vectors work, you might need to temporarily use a prior version of Gensim to .load() the model. Then, you could save the plain vectors out with .save_word2vec_format() for later reloading across any version. (Or, using the latest interim version that can load the model, re-save the model as .save(), then repeat the process with the latest version that can read that model, until you reach the current Gensim.)
But, you also might want to find a more recent & better-documented set of pretrained word-vectors.
For example, Facebook makes FastText pretrained vectors available in both a 'text' format and a 'bin' format for many languages at https://fasttext.cc/docs/en/pretrained-vectors.html (trained on Wikipedia only) or https://fasttext.cc/docs/en/crawl-vectors.html (trained on Wikipedia plus web crawl data).
The 'text' format should in fact be loadable with KeyedVectors.load_word2vec_format(filename, binary=False), but will only include full-word vectors. (It will also be relatively easy to view as text, or write simply code to massage into other formats.)
The 'bin' format is Facebook's own native FastText model format, and should be loadable with either the load_facebook_model() or load_facebook_vectors() utility methods. Then, the loaded model (or vectors) will be able to create the FastText algorithm's substring-based guesstimate vectors even for many words that weren't in the model or training data.

How to load pre-trained fastText model in gensim with .npy extension

I am new to deep learning and I am trying to play with a pretrained word embedding model from a paper. I downloaded the following files:
1)sa-d300-m2-fasttext.model
2)sa-d300-m2-fasttext.model.trainables.syn1neg.npy
3)sa-d300-m2-fasttext.model.trainables.vectors_ngrams_lockf.npy
4)sa-d300-m2-fasttext.model.wv.vectors.npy
5)sa-d300-m2-fasttext.model.wv.vectors_ngrams.npy
6)sa-d300-m2-fasttext.model.wv.vectors_vocab.npy
If in case these details are needed
sa - sanskrit
d300 - embedding dimension
fastText - fastText
I dont have a prior experience with gensim, how can load the model into gensim or into tensorflow.
I tried
from gensim.models.wrappers import FastText
FastText.load_fasttext_format('/content/sa/300/fasttext/sa-d300-m2-fasttext.model.wv.vectors_ngrams.npy')
FileNotFoundError: [Errno 2] No such file or directory: '/content/sa/300/fasttext/sa-d300-m2-fasttext.model.wv.vectors_ngrams.npy.bin'
That set of multiple files looks like it was saved from Gensim's FastText implementation, using Gensim's save() method - and thus is not in Facebook's original 'fasttext_format'.
So, try loading them with the following instead:
from gensim.models.fasttext import FastText
model = FastText.load('/content/sa/300/fasttext/sa-d300-m2-fasttext.model')
(Upon loading that main/root file, it will find the subsidiary related files in the same directory, as long as they're all present.)
The source where you downloaded these files should have included clear instructions for loading them nearby!

Wikipedia word2vec

I followed the example in this link and ran the following script to process the latest english wikipedia articles:
https://radimrehurek.com/gensim/wiki.html
$ python -m gensim.scripts.make_wiki
The result of running the script after 9 hours is that I now have .mm and .txt files. I want to train a word2vec model but all the examples found start from the .bz2 file.
How do I train a word2vec model using the .mm files as input instead of the raw bz2 file? The link below shows how to train an LDA model. Can someone pls share syntax?
https://radimrehurek.com/gensim/wiki.html
Thanks!

cacti - multi cpu util - multi line OID

I have the OID: .1.3.6.1.2.1.25.3.3.1.2
I got 24 rows (I have 24 core server),
I want to create one graph with all the rows to see the utilization.
Please help me :)
Thanks...
Had the same problem and I created a data input methode in Perl which uses Net::SNMP.
Get the script here:
https://gist.github.com/1139477
Get the data template here:
https://gist.github.com/1237260
Put the script into $CACTI_HOME/scripts, make sure it's executable and import the template.
Make sure you got Perl's Net::SNMP installed.
Have fun!
Alex.

Resources