I have a PMML for a LGBM (python API) model but would like to apply a calibration function to the predictions. An example of a calibration function would be sigmoid or isotonic regression. Not sure on how to add this to the existing PMML.
Details here on how to make is possible https://github.com/jpmml/jpmml-sklearn/issues/146
Related
I want to build a text2text model. Specifically, I want to transfer some automatically generated scrabbling text pieces into a smooth paragraph within the same language. I've already prepared the text inputs and outputs. So corpus is not the primary problem now.
I want to use hugging face models like:
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("bert-base-chinese")
model = AutoModelForMaskedLM.from_pretrained("bert-base-chinese")
because it has already obtained the capacity to generate the language, the model is made for masked language, and there's no mature task like mine as it is really customized. So how could I use the hugging face masked language model as a base text2text model without jeopardizing its capacity? I want to fine-tune it to achieve that task/goal. I want to know how.
I want to do a random-effect multivariate meta regression model using the package 'meta' in r. However, when I used 'metareg' function, the output I got is the mixed effect models. How can I get the estimates from the random-effect model? I am very new to the meta analysis. Please help me.
I want to do chinese Textual Similarity with huggingface:
tokenizer = BertTokenizer.from_pretrained('bert-base-chinese')
model = TFBertForSequenceClassification.from_pretrained('bert-base-chinese')
It doesn't work, system report errors:
Some weights of the model checkpoint at bert-base-chinese were not used when initializing TFBertForSequenceClassification: ['nsp___cls', 'mlm___cls']
- This IS expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-chinese and are newly initialized: ['classifier', 'dropout_37']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
But I can use huggingface to do name entity:
tokenizer = BertTokenizer.from_pretrained('bert-base-chinese')
model = TFBertForTokenClassification.from_pretrained("bert-base-chinese")
Does that mean huggingface haven't done chinese sequenceclassification? If my judge is right, how to sove this problem with colab with only 12G memory?
The reason is simple. The model has not been fine-tuned for the Sequence classification task hence when you try to load the 'bert-base-chinese' model over a Sequence classification model. It updates the rest of the layers ['nsp___cls', 'mlm___cls'] randomly.
And it's a warning which means the model will be giving random results due to the random last layer initialization.
BTW #andy you didn't upload the output for token classification? It should also show a similar warning but with ['classifier'] layer as randomly initiated.
Do use a fine-tuned model, else you would need to fine-tune this loaded model.
I am pretty new to using Google AutoML and I was wondering what the best practice was in the following scenario.
My goal is to update a Google AutoML Translate model without having to change the API call to get translations, and I am not sure if this is possible.
Currently the only way to update a AutoML Translate model is to create a new model, base it on the old one, and train it on the new examples (This is at least what seems to be the case). And when you make an API request to get a translation, you must specify which model you want to use by giving the identifier of that model. Because the old version of the model and the new version have different identifiers does this mean that every API call must be changed so that it uses the new model? Is there any way around changing the API call?
First of all, indeed the only way to update an AutoML Translate model is to create a new one, base it on the old one, and train it with the new examples. This is a clear security measure so you do not loose the old model in the process. Although on paper training with more sentences should help the model accuracy/performance, doing so might hinder the accuracy instead.
Second of all, the API call needs to be changed accordingly. You could code the API call in a way that uses the last model submitted so it does not need to be changed every time you update the model.
To do so, the first idea that comes to my mind is using a cloud function that gets triggered once a model is trained/created and stores the model-id in a bucket in GCS that the code performing the API calls recovers.
Nevertheless, the model performance should be assessed before assigning the translation calls from one model to the other, so I do not recommend simply changing it to the newest version without additional checks unless it is for testing purposes.
i have created a model in rapid miner. it is a classification model and save the model in pmml. i want to use this model in H2O.ai to predict further. is there any way i can import this pmml model to H2O.ai an used this for further prediction.
I appreciate your suggestions.
Thanks
H2O offers no support for importing/exporting(*) pmml models.
It is hard to offer a good suggestion without knowing your motivation for wanting to use both RapidMiner and H2O. I've not used RapidMiner in about 6 or 7 years, and I know H2O well, so my first choice would just be to re-build the model in H2O.
If you are doing a lot of pre-processing steps in RapidMiner, and that is why you want to use it, you could still do all that data munging there, then export the prepared data to csv, import that into H2O, then build the model.
*: Though I did just find this tool for converting H2O models to PMML: https://github.com/jpmml/jpmml-h2o But that is the opposite direction for what you want.