HuggingFace 'TFEmbeddings' object has no attribute 'word_embeddings' - huggingface-transformers

Trying to run this TextualHeatmap example, we encounter 'TFEmbeddings' object has no attribute 'word_embeddings' error in the following code snippet from the HuggingFace transformers library. Any help is appreciated.
from transformers import TFDistilBertForMaskedLM, DistilBertTokenizer
dbert_model = TFDistilBertForMaskedLM.from_pretrained('distilbert-base-uncased')
dbert_tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
dbert_embmat = dbert_model.distilbert.embeddings.word_embeddings

Try to use '.weight' instead of '.word_embeddings' as per hugging face latest implementation. It works for me.

Downgrading the version of transformers will work.
pip install transformers==3.1.0

Related

google's notebook on vertex ai throwing following error: type name google.VertexModel is different from expected: Model

I got this error, when compiling my pipeline:
type name google.VertexModel is different from expected: Model
when running the following notebook by google: automl_tabular_classification_beans
I suppose that kubeflow v2 is not able to handle (yet) google.vertexmodel as type for component input. However, I've been browsing a bit and did not find any good clue, or refs (kfp documentation for v2 is not up to date..) to solve this issue. Hopefully someone here can give me a good pointer? I look forward to all of your ideas.
Cheers
Google.Vertex is defined here:
https://github.com/kubeflow/pipelines/blob/286a49547cce763c502592c822296aa60f50b3e8/components/google-cloud/google_cloud_pipeline_components/types/artifact_types.py#L20
Here is an example on how to define it:
https://github.com/kubeflow/pipelines/blob/286a49547cce763c502592c822296aa60f50b3e8/components/google-cloud/tests/types/artifact_types_test.py#L22
For example,
from google_cloud_pipeline_components.types import artifact_types
model = artifact_types.VertexModel(uri='YOUR_MODEL_URI_STRING')
Can you try specifying your model using the syntax above and let us know if this works for your code?
This was a breaking change with release 0.1.9. Here there are some recommendation:
Pin your release to 0.1.7 and continue to use the Model type.
Use 0.1.9 and switch the output from Output[Model] to Output[Artifact].
Try 0.2.0 release, documentation here.
Hope these suggestions work!

'Ridge' is not a CV regularization model; try ManualAlphaSelection instead

I am trying to find the best Alpha for a Ridge model without CV, using Yellowbrick ManualAlphaSelection API. My code is pretty basic and it has been taken from the yellowbrick´s documentation. Even though it does not work:
from yellowbrick.regressor import ManualAlphaSelection
from sklearn.linear_model import Ridge
model = ManualAlphaSelection(Ridge(), scoring='neg_mean_squared_error')
model.fit(X_train, y_train)
model.show()
Python raises the message: 'Ridge' is not a CV regularization model; try ManualAlphaSelection instead.
But this message is wrong because the ManualAlphaSelection is already being used.
This actually appears to be a bug in our library 😅
Would you mind opening up a bug report on GitHub so we can be sure to fix it? Thank you for checking out Yellowbrick!

pyLDAvis with Mallet LDA implementation : LdaMallet object has no attribute 'inference'

is it possible to plot a pyLDAvis with a Mallet implementation of LDA ? I have no troubles with LDA_Model but when I use Mallet I get :
'LdaMallet' object has no attribute 'inference'
My code :
pyLDAvis.enable_notebook()
vis = pyLDAvis.gensim.prepare(mallet_model, corpus, id2word)
vis
Run this line to convert the class of your mallet model into a LdaModel before pyLDAvis
[Edit]: edited code to use the inbuilt function in gensim instead. I just tried it but am unable to get meaningful results with the pyLDAvis on a converted mallet model; the topics seem to contain random terms.. Anybody encountered this before?
import gensim
model = gensim.models.wrappers.ldamallet.malletmodel2ldamodel(mallet_model)
Got this from the link below, do explore it, lines 565 - 590
https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/wrappers/ldamallet.py#L359
I hope I have helped.
from gensim.models.ldamodel import LdaModel
def convertldaGenToldaMallet(mallet_model):
model_gensim = LdaModel(
id2word=mallet_model.id2word, num_topics=mallet_model.num_topics,
alpha=mallet_model.alpha, eta=0,
)
model_gensim.state.sstats[...] = mallet_model.wordtopics
model_gensim.sync_state()
return model_gensim
I found this blog post helpful, directly using the statefile produced by MALLET which is also produced using Gensim's Mallet wrapper.

sparkR 1.4.0 : how to include jars

I'm trying to hook SparkR 1.4.0 up to Elasticsearch using the elasticsearch-hadoop-2.1.0.rc1.jar jar file (found here). It's requiring a bit of hacking together, calling the SparkR:::callJMethod function. I need to get a jobj R object for a couple of Java classes. For some of the classes, this works:
SparkR:::callJStatic('java.lang.Class',
'forName',
'org.apache.hadoop.io.NullWritable')
But for others, it does not:
SparkR:::callJStatic('java.lang.Class',
'forName',
'org.elasticsearch.hadoop.mr.LinkedMapWritable')
Yielding the error:
java.lang.ClassNotFoundException:org.elasticsearch.hadoop.mr.EsInputFormat
It seems like Java isn't finding the org.elasticsearch.* classes, even though I've tried including them with the command line --jars argument, and the sparkR.init(sparkJars = ...) function.
Any help would be greatly appreciated. Also, if this is a question that more appropriately belongs on the actual SparkR issue tracker, could someone please point me to it? I looked and was not able to find it. Also, if someone knows an alternative way to hook SparkR up to Elasticsearch, I'd be happy to hear that as well.
Thanks!
Ben
Here's how I've achieved it:
# environments, packages, etc ----
Sys.setenv(SPARK_HOME = "/applications/spark-1.4.1")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
# connecting Elasticsearch to Spark via ES-Hadoop-2.1 ----
spark_context <- sparkR.init(master = "local[2]", sparkPackages = "org.elasticsearch:elasticsearch-spark_2.10:2.1.0")
spark_sql_context <- sparkRSQL.init(spark_context)
spark_es <- read.df(spark_sql_context, path = "index/type", source = "org.elasticsearch.spark.sql")
printSchema(spark_es)
(Spark 1.4.1, Elasticsearch 1.5.1, ES-Hadoop 2.1 on OS X Yosemite)
The key idea is to link to the ES-Hadoop package and not the jar file, and to use it to create a Spark SQL context directly.

Hadoop new API - Set OutputFormat

I'm trying to set the OutputFormat of my job to MapFileOutputFormat using:
jobConf.setOutputFormat(MapFileOutputFormat.class);
I get this error: mapred.output.format.class is incompatible with new reduce API mode
I suppose I should use the set setOutputFormatClass() of the new Job class but the problem is that when I try to do this:
job.setOutputFormatClass(MapFileOutputFormat.class);
it expects me to use this class: org.apache.hadoop.mapreduce.lib.output.MapFileOutputFormat.
In hadoop 1.0.X there is no such class. It only exists in earlier versions (e.g 0.x)
How can I solve this problem ?
Thank you!
This problem has no decently easily implementable solution.
I gave up and used Sequence files which fit my requirements too.
Have you tried the following?
import org.apache.hadoop.mapreduce.lib.output;
...
LazyOutputFormat.setOutputFormatClass(job, MapFileOutputFormat.class);

Resources