saving finetuned model locally - huggingface-transformers

I'm trying to understand how to save a fine-tuned model locally, instead of pushing it to the hub.
I've done some tutorials and at the last step of fine-tuning a model is running trainer.train() . And then the instruction is usually: trainer.push_to_hub
But what if I don't want to push to the hub? I want to save the model locally, and then later be able to load it from my own computer into future task so I can do inference without re-tuning.
How can I do that?
eg: Initially load a model from hugging face:
model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)
trainer = Trainer(
Somehow save the new trained model locally, so that next time I can pass
model = 'some local directory where model and configs (?) got saved'

You can use the save_model method:
Or alternatively, the save_pretrained method:
Then, when reloading your model, specify the path you saved to:


How combine results from multiple models in Google Vertex AI?

I have multiple models in Google Vertex AI and I want to create an endpoint to serve my predictions.
I need to run aggregation algorithms, like the Voting algorithm on the output of my models.
I have not found any ways of using the models together so that I can run the voting algorithms on the results.
Do I have to create a new model, curl my existing models and then run my algorithms on the results?
There is no in-built provision to implement aggregation algorithms in Vertex AI. To curl results from the models then aggregate them, we would need to deploy all of them to individual endpoints. Instead, I would suggest the below method to deploy the models and the meta-model(aggregate model) to a single endpoint using custom containers for prediction. The custom container requirements can be found here.
You can load the model artifacts from GCS into a custom container. If the same set of models are used (i.e) the input models to the meta-model do not change, you can package them inside the container to reduce load time. Then, a custom HTTP logic can be used to return the aggregation output like so. This is a sample custom flask server logic.
def get_models_from_gcs():
## Pull the required model artifacts from GCS and load them here.
models = [model_1, model_2, model_3]
return models
def aggregate_predictions(predictions):
## Your aggregation algorithm here
return aggregated_result['AIP_PREDICT_ROUTE'])
async def predict(request: Request):
body = await request.json()
instances = body["instances"]
inputs = np.asarray(instances)
preprocessed_inputs = _preprocessor.preprocess(inputs)
models = get_models_from_gcs()
predictions = []
for model in models:
aggregated_result = aggregate_predictions(predictions)
return {"aggregated_predictions": aggregated_result}

Download pre-trained sentence-transformers model locally

I am using the SentenceTransformers library (here: for creating embeddings of sentences using the pre-trained model bert-base-nli-mean-tokens. I have an application that will be deployed to a device that does not have internet access. Here, it's already been answered, how to save the model Download pre-trained BERT model locally. Yet I'm stuck at loading the saved model from the locally saved path.
When I try to save the model using the above-mentioned technique, these are the output files:
When I try to load it in the memory, using
tokenizer = AutoTokenizer.from_pretrained(to_save_path)
I'm getting
Can't load config for '/bert-base-nli-mean-tokens'. Make sure that:
- '/bert-base-nli-mean-tokens' is a correct model identifier listed on ''
- or '/bert-base-nli-mean-tokens' is the correct path to a directory containing a config.json
You can download and load the model like this
from sentence_transformers import SentenceTransformer
modelPath = "local/path/to/model
model = SentenceTransformer('bert-base-nli-stsb-mean-tokens')
model = SentenceTransformer(modelPath)
this worked for me.You can check the SBERT documentation for model details for the SentenceTransformer class [Here][1]
There are many ways to solve this issue:
Assuming you have trained your BERT base model locally (colab/notebook), in order to use it with the Huggingface AutoClass, then the model (along with the tokenizers,vocab.txt,configs,special tokens and tf/pytorch weights) has to be uploaded to Huggingface. The steps to do this is mentioned here. Once it is uploaded, there will be a repository created with your username, and then the model can be accessed as follows:
from transformers import AutoTokenizer
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("<username>/<model-name>")
The second way is to use the trained model locally, and this can be done by using pipelines.The following is an example how to use this model trained(&saved) locally for your use-case (giving an example from my locally trained QA model):
from transformers import AutoModelForQuestionAnswering,AutoTokenizer,pipeline
'question': 'What is the fund price of Huggingface in NYSE?',
'context': 'Huggingface Co. has a total fund price of $19.6 million dollars'
The third way is to directly use Sentence Transformers from the Huggingface models repo.
There are also other ways to resolve this but these might help. Also this list of pretrained models might help.

Reusing h2o model mojo or pojo file from python

As H2o models are only reusable with the same major version of h2o they were saved with, an alternative is to save the model as MOJO/POJO format. Is there a way these saved models can be reused/loaded from python code. Or is there any way to keep the model for further development when upgrading the H2O version??
If you want to use your model for scoring via python, you could use either h2o.mojo_predict_pandas or h2o.mojo_predict_csv. But otherwise if you want to load a binary model that you previously saved, you will need to have compatible versions.
Outside of H2O-3 you can look into pyjnius as Tom recommended:
Another alternative is to use pysparkling, if you only need it for scoring:
from import H2OMOJOModel
# Load test data to predict
df =
# Load mojo model
mojo = H2OMOJOModel.createFromMojo(mojo_path)
# Make predictions
predictions = mojo.transform(df)
# Show predictions with ground truth (y_true and y_pred)'your_target_column', 'prediction').show()

Assert model was not made searchable

I'm building a system to manage some articles for my company using Laravel and Laravel Scout with Algolia as the search backend.
One of the requirements states that whenever something in an article is changed, a backup is kept so we can prove that a certain information was displayed at a specific time.
I've implemented that by cloning the existing article with all its relationships before updating it. Here is the method on the Article model:
public function clone(array $relations = null, array $except = null) {
if($relations) {
$replica = $this->replicate($except);
$syncRelations = collect($this->relations)->only($relations);
foreach($syncRelations as $relation => $models) {
return $replica;
The problem is the $replica->save() line. I need to save the model first, in order for it to have an ID when syncing the relationships.
But: The only thing preventing scout from indexing the model is if the model has its archived_at field set to any non-null value. But since this is a clone of the original model, this field is set to null as expected, and is only changed after the cloning procedure is done.
The problem: Scout is syncing the cloned model to Algolia, so I have duplicates there. I know how to solve this, by wrapping the clone call into the withoutSyncingToSearch ( callback.
But since this is rather important and the bug is already out there, I want to have a unit test backing me up that it was indeed not synced to Algolia.
I don't have any idea how to test this though and searching for a way to test Scout only leads to answers that tell me not to test Scout, but rather that my model can be indexed etc.
The question: How do I create a Unittest that proves that the cloned model wasn't synced to Algolia?
At the moment I'm thinking about creating a custom Scout driver for testing, but it seems to be a total overkill for testing one single function.

GAE blobstore is killing me

I followed this tutorial :
And it's killing me...I managed to make it work...however I was not able to make it suite my needs...problem is that i want to serialize my Model using json...
Why do they put the avatar = db.BlobProperty() in the model and not use a reference to that blob ?...Is there any reason whatsoever?
I could not find a decent tutorial, on how to store an image in Blob, and then store its location/ key/reference in a Model..Any suggestions?
With the code from below...i am doing exacty what is in the do I get the reference to that pic , and how do I store it???
pic = self.request.get('img')
pic = db.Blob(pic)
What i managed to do is to store the id of the entity in JSON, and use that id to retrieve and display the pic. And i display the pic with the following code:
class Image(webapp2.RequestHandler):
def get(self):
#product = db.get(self.request.get('img_id'))
product = MenuProduct.by_id(int(self.request.get('img_id')))
if product.small_pic:
self.response.headers['Content-Type'] = 'image/png'
I am guessing that all efficiency goes to hell by using this approach ...Right?
Sorry is my post sounds messy...but I am kind of tired of the "great" poor documentation related to this topic.
Rather than store the blob as a BlobProperty, you should use the separate Blobstore service and store the BlobKey in the model.
