How to change the task of a pretrained model without jeopardizing the capacity of it? - huggingface-transformers

I want to build a text2text model. Specifically, I want to transfer some automatically generated scrabbling text pieces into a smooth paragraph within the same language. I've already prepared the text inputs and outputs. So corpus is not the primary problem now.
I want to use hugging face models like:
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("bert-base-chinese")
model = AutoModelForMaskedLM.from_pretrained("bert-base-chinese")
because it has already obtained the capacity to generate the language, the model is made for masked language, and there's no mature task like mine as it is really customized. So how could I use the hugging face masked language model as a base text2text model without jeopardizing its capacity? I want to fine-tune it to achieve that task/goal. I want to know how.

Related

GPT-3 Fine Tune a Fine Tuned Model?

The OpenAI documentation for the model attribute in the fine-tune API states a bit confusingly:
model
The name of the base model to fine-tune. You can select one of "ada", "babbage", "curie", "davinci", or a fine-tuned model created after 2022-04-21.
My question: is it better to fine-tune a base model or a fine-tuned model?
I created a fine-tune model from ada with file mydata1K.jsonl:
ada + mydata1K.jsonl --> ada:ft-acme-inc-2022-06-25
Now I have a bigger file of samples mydata2K.jsonl that I want to use to improve the fine-tuned model.
In this second round of fine-tuning, is it better to fine-tune ada again or to fine-tune my fine-tuned model ada:ft-acme-inc-2022-06-25? I'm assuming this is possible because my fine tuned model is created after 2022-04-21.
ada + mydata2K.jsonl --> better-model
or
ada:ft-acme-inc-2022-06-25 + mydata2K.jsonl --> even-better-model?
If you read the Fine-tuning documentation as of Jan 4, 2023, the only part talking about "fine-tuning a fine-tuned model" is the following part under Advanced usage:
Continue fine-tuning from a fine-tuned model
If you have already fine-tuned a model for your task and now have
additional training data that you would like to incorporate, you can
continue fine-tuning from the model. This creates a model that has
learned from all of the training data without having to re-train from
scratch.
To do this, pass in the fine-tuned model name when creating a new
fine-tuning job (e.g., -m curie:ft-<org>-<date>). Other training
parameters do not have to be changed, however if your new training
data is much smaller than your previous training data, you may find it
useful to reduce learning_rate_multiplier by a factor of 2 to 4.
Which option to choose?
You're asking about two options:
Option 1: ada + bigger-training-dataset.jsonl
Option 2: ada:ft-acme-inc-2022-06-25 + additional-training-dataset.jsonl
The documentation says nothing about which option is better in terms of which would yield better results.
However...
Choose Option 2
Why?
When training a fine-tuned model, the total tokens used will be billed
according to our training rates.
If you choose Option 1, you'll pay for some tokens in your training dataset twice. First when doing fine-tuning with initial training dataset, second when doing fine-tuning with bigger training dataset (i.e., bigger-training-dataset.jsonl = initial-training-dataset.jsonl + additional-training-dataset.jsonl).
It's better to continue fine-tuning from a fine-tuned model because you'll pay only for tokens in your additional training dataset.
Read more about fine-tuning pricing calculation.

ML.NET doesn't support resuming training for ImageClassificationTrainer

I want to continue training the model.zip file with more images without retraining from the baseline model from scratch, how do I do that?
This isn't possible at the moment. ML.NET's ImageClassificationTrainer already uses a pre-trained model, so you're using transfer learning to create your model. Any additions would have to be "from scratch" on the pre-trained model.
Also, looking at the existing trainers that can be re-trained, the ImageClassificationTrainer isn't listed among them.

how use rapidminer model in H2O.ai

i have created a model in rapid miner. it is a classification model and save the model in pmml. i want to use this model in H2O.ai to predict further. is there any way i can import this pmml model to H2O.ai an used this for further prediction.
I appreciate your suggestions.
Thanks
H2O offers no support for importing/exporting(*) pmml models.
It is hard to offer a good suggestion without knowing your motivation for wanting to use both RapidMiner and H2O. I've not used RapidMiner in about 6 or 7 years, and I know H2O well, so my first choice would just be to re-build the model in H2O.
If you are doing a lot of pre-processing steps in RapidMiner, and that is why you want to use it, you could still do all that data munging there, then export the prepared data to csv, import that into H2O, then build the model.
*: Though I did just find this tool for converting H2O models to PMML: https://github.com/jpmml/jpmml-h2o But that is the opposite direction for what you want.

Multiple Stanford CoreNLP model files made, which one is the correct one to use?

I made a sentiment analysis model using Standford CoreNLP's library. So I have a bunch of ser.gz files that look like the following:
I was wondering what model to use in my java code, but based on a previous question,
I just used the model with the highest F1 score, which in this case is model-0014-93.73.ser.gz. And in my java code, I pointed to the model I want to use by using the following line:
props.put("sentiment.model", "/path/to/model-0014-93.73.ser.gz.");
However, by referring to just that model, am I excluding the sentiment analysis from the other models that were made? Should I be referring to all the model files to make sure I "covered" all the bases or does the highest scoring model trump everything else?
You should point to only the single highest scoring model. The code has no way to make use of multiple models at the same time.

statsmodels - create model from params

I am trying to create an empty model from params saved from a previously trained model, but the constructor stubbornly wants me to provide both endogenous and exogenous variables, which I don't have. Is there any way to get around this?
For example, I only want to do:
logit = sm.Logit()
pred = logit.predict(params, X)
But the first line won't work.
No, this is not supported in statsmodels. Models are always associated with data.
However, for the usecase of prediction, it is possible to pickle the model and optionally delete all full length arrays including the data from the model instance and from the results instance before pickling. This doesn't work with formulas.
On the other hand, since this is Python, there might be several ways how to cheat, at your own risk.
It would be helpful if you open a issue on github https://github.com/statsmodels/statsmodels/issues with a description of your usecase, and it might be possible to get the relevant features into a future version.

Resources