How to add extra dense layer on top of BertForSequenceClassification? - text-classification

I want to add an extra layer (and dropout) before the classification layer (I'm using PyTorch lightning) What is the best way to do it?

The class BertForSequenceClassification (that comes from the Huggingface Transformers when using PyTorch lightning) implements a fixed architecture. If you want to change it (e.g., by adding layers), you need to inherit your own module.
This is actually quite simple. You can copy the code of BertForSequenceClassification and modify the code between getting the pooled BERT output and getting the logits.
Note however that adding a hidden layer to a classifier does not make much difference when finetuning BERT. The capacity of the additional hidden layer is negligible compared to the entire stack of BERT layers. Even If you cannot finetune the entire model, fine-tuning just the last BERT layer is probably better than adding an extra layer to the classifier.

Related

DistilBert for self-supervision - switch heads for pre-training: MaskedLM and SequenceClassification

Say I want to train a model for sequence classification. And so I define my model to be:
model = DistilBertForSequenceClassification.from_pretrained("bert-base-uncased")
My question is - what would be the optimal way if I want to pre-train this model with masked language modeling task? After pre-training I would like to model to train on the down-stream task of sequence classification.
My understanding is that I can somehow switch the heads of my model and a DistilBertForMaskedLM for pre-training, and then switch it back to the original downstream task. Although I haven't figured out if this is indeed optimal or how to write it.
Does hugging face offer any built in function that accepts the input ids, a percentage of tokens to masked (which aren't pad tokens) and simply trains the model?
Thank you in advance
I've tried to implement this myself, and while it does seem to work it is extremely slow. I figured there could already be implemented solutions instead of trying to optimize my code.

If I train a custom tokenizer on my dataset, I would still be able to leverage a pre-trained model weight

This is a declaration, but I'm not sure it is correct. I can elaborate.
I have a considerably large dataset (23Gb). I'd like to pre-train the Roberta-base or XLM-Roberta-base, so my language model would fit better to be used in further downstream tasks.
I know I can just run it against my dataset for a few epochs and get good results. But, what if I also train the tokenizer to generate a new vocab, and merge files? The weights from the pre-trained model I started from will still be used, or the new set of tokens will demand complete training from scratch?
I'm asking this because maybe some layers can still contribute with knowledge, so the final model will have the better of both worlds: A tokenizer that fits my dataset, and the weights from previous training.
That makes sense?
In short no.
You cannot use your own pretrained tokenizer for a pretrained model. The reason is that the vocabulary for your tokenizer and the vocabulary of the tokenizer that was used to pretrain the model that later you will use it as pretrained model are different. Thus a word-piece token which is present in Tokenizers's vocabulary may not be present in pretrained model's vocabulary.
Detailed answers can be found here,

How to validate my YOLO model trained on custom data set?

I am doing my research regarding object detection using YOLO although I am from civil engineering field and not familiar with computer science. My advisor is asking me to validate my YOLO detection model trained on custom dataset. But my problem is I really don't know how to validate my model. So, please kindly point me out how to validate my model.
Thanks in advance.
I think first you need to make sure that all the cases you are interested in (location of objects, their size, general view of the scene, etc) are represented in your custom dataset - in other words, the collected data reflects your task. You can discuss it with your advisor. Main rule - you label data qualitatively in same manner as you want to see it on the output. more information can be found here
It's really important - garbage in, garbage out, the quality of output of your trained model is determined by the quality of the input (labelled data)
If this is done, it is common practice to split your data into training and test sets. During model training only train set is used, and you can later validate the quality (generalizing ability, robustness, etc) on data that the model did not see - on the test set. It's also important, that this two subsets don't overlap - than your model will be overfitted and the model will not perform the tasks properly.
Than you can train few different models (with some architectural changes for example) on the same train set and validate them on the same test set, and this is a regular validation process.

GEKKO - General - custom reusable flowsheet object - chemical process flowsheet modelling

No problems to speak of and nor am I currently a user. I am seeing advice on the best implementation practice for flowsheet models. Is there a framework to create custom flowsheet objects in GEKKO/chemical? Is the flowsheet module a mature and equal feature of GEKKO?
I am dealing with a number of applications which would benefit from the ability to inherit flowsheet objects from a yet to be developed custom library, if possible. One such item could be a tubular reactor as described here where it is solved in COMSOL (http://umich.edu/~elements/5e/web_mod/radialeffects/unsteady/index1.htm). Scenarios could involve several unit operations connected in series with recycle streams such as mixer settlers in solvent extraction which also has multiple liquid phases (organic and aqueous). It is worth nothing that all of the models would be of the unsteady state type.
I appreciate the thoughts of the user group in this respect.
Gekko doesn't currently allow black-box models where the equations are not available for requesting information such as first and second derivatives in sparse form. For that reason, a model in COMSOL wouldn't be a good fit for Gekko. If you would like to try to model the same PDE in Gekko, that is a possibility. Here are some PDE applications that may help give you inspiration:
Solid Oxide Fuel Cell
Parabolic and Hyperbolic PDEs Solved with Gekko
The Chemicals library is somewhat limited but it does have some thermodynamic data and basic reactor types. You could put many lumped parameter reactors in series to emulate a Plug Flow Reactor but it may be better to just write out the PDE equations. You may want to write out your own equations instead of relying on the Chemicals library.

gensim doc2vec train more documents from pre-trained model

I am trying to train with new labelled document(TaggedDocument) with the pre-trained model.
Pretrained model is the trained model with documents which the unique id with label1_index, for instance, Good_0, Good_1 to Good_999
And the total size of trained data is about 7000
Now, I want to train the pre-trained model with new documents which the unique id with label2_index, for instance, Bad_0, Bad_1... to Bad_1211
And the total size of trained data is about 1211
The train itself was successful without any error, but the problem is that whenever I try to use 'most_similar' it only suggests the similar document labelled with Good_... where I expect the labelled with Bad_.
If I train altogether from the beginning, it gives me the answers I expected - it infers a newly given document similar to either labelled with Good or Bad.
However, the practice above will not work as the one trained altogether from the beginning.
Is continuing train not working properly or did I make some mistake?
The gensim Doc2Vec class can always be fed extra examples via train(), but it only discovers the working vocabulary of both word-tokens and document-tags during an initial build_vocab() step. So unless words/tags were available during the build_vocab(), they'll be ignored as unknown later. (The words get silently dropped from the text; the tags aren't trained or remembered inside the model.)
The Word2Vec superclass from which Doc2Vec borrows a lot of functionality has a newer, more-experimental parameter on its build_vocab() called update. If set true, that call to build_vocab() will add to, rather than replace, any prior vocabulary. However, as of February 2018, this option doesn't yet work with Doc2Vec, and indeed often causes memory-fault crashes.
But even if/when that can be made to work, providing incremental training examples isn't necessarily a good idea. By only updating parts of the model – those exercised by the new examples – the overall model can get worse, or its vectors made less self-consistent with each other. (The essence of these dense-embedding models is that the optimization over all varied examples results in generally-useful vectors. Training over just some subset causes the model to drift towards being good on just that subset, at likely cost to earlier examples.)
If you need new examples to also become part of the results for most_similar(), you might want to create your own separate set-of-vectors outside of Doc2Vec. When you infer new vectors for new texts, you could add those to that outside set, and then implement your own most_similar() (using the gensim code as a model) to search over this expanding set of vectors, rather than just the fixed set that is created by initial bulk Doc2Vec training.

Resources