How can we use our own customised embedding with WordMoverDistance? - stanford-nlp

To use WMD we need some word embeddings. For this example a pre-trained embedding provided by Gensim 'word2vec-google-news-300' is used.
Below is code snippet:
import gensim.downloader as api
model = api.load('word2vec-google-news-300')
distance = model.wmdistance(sent1, sent2)
How can I use my own customised embedding in place of that ,How can I load that in model?
for ex: Embedding somewhat looks like - {text:1-D NumPy array}

Related

How to change the task of a pretrained model without jeopardizing the capacity of it?

I want to build a text2text model. Specifically, I want to transfer some automatically generated scrabbling text pieces into a smooth paragraph within the same language. I've already prepared the text inputs and outputs. So corpus is not the primary problem now.
I want to use hugging face models like:
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("bert-base-chinese")
model = AutoModelForMaskedLM.from_pretrained("bert-base-chinese")
because it has already obtained the capacity to generate the language, the model is made for masked language, and there's no mature task like mine as it is really customized. So how could I use the hugging face masked language model as a base text2text model without jeopardizing its capacity? I want to fine-tune it to achieve that task/goal. I want to know how.

Feature extraction using pre-trained model

I'm using pre-trained model for feature extraction of CT image for COVID. Then using a classifier. I need know what are features that will be extracted when pre-trained model is used here.

Where does hugginface's transform library look for models?

I'm trying to make huggingface's transformer library use a model that I have downloaded that is not in the huggingface model repository.
Where does transformers look for models? Is there an equivalent of the $PATH environment variable for transformers models?
Research
This hugging face issues talks about manually downloading models.
This issue suggests that you can work around the question of where huggingface is looking for models by using the path as an argument to from_pretrained (#model = BertModel.from_pretrained('path/to/your/directory')`)
Related questions
Where does hugging face's transformers save models?

Saving gensim LDA model to ONNX

Is there a way to save a gensim LDA model to ONNX format? We need to be able to train using Python/gensim and then operationalize it into an Onnx model to publish and use.
Currently (March 2020, gensim-3.8.1) I don't know of any built-in support for ONNX formats in gensim.
Provided the ONNX format can represent LDA models well – & here's an indiction it does – it would be a plausible new feature.
You could add a feature request at the gensim issue tracker, but for the feature to be added, it would likely require a contribution from a skilled developer who needs the feature, & can write the code & test cases.

How to retrain an Image Classifier with new classes and keep old classes

I'm trying to make an image classifier that can identify how likely it is that an image is an image of a watermelon. To do this I followed the flower classifier example here: https://www.tensorflow.org/hub/tutorials/image_retrainin and trained the model using this command
python retrain.py --image_dir ~/flower_photos
The problem I found when trying this classifier out is that it only classifies among the new classes, that is the flower classes in this case. So when I tried to classify an image of a dog (which I know is present in the Inception module) it classified it as a rose
python label_image.py \
--graph=/tmp/output_graph.pb --labels=/tmp/output_labels.txt \
--input_layer=Placeholder \
--output_layer=final_result \
--image=/images/dog.jpg
Result
roses 0.7626607
tulips 0.12247563
dandelion 0.071335025
sunflowers 0.028395686
daisy 0.0151329385
How could I use TensorFlow to extend the model with an additional class instead of creating a new model with only the new classes?
What you can do is join the two datasets and train them together or just leave the classes of model you are retraining in the possible classes and add a few images of those classes to the dataset just for the model to not forget what it already learned.

Resources