Updating Weights from Caffe and DIGITS

Updating Weights from Caffe and DIGITS - nvidia-digits

I've built a model using DIGITS by Nvidia, but when I try to run it using caffe, I don't know where the Weights are. Any idea how I'd find this. I have the architecture because that is provided right on the output model screen.

The weights are not accessible from any of the output models given on the Digits UI, however they are accessible!
I use NVIDIAs DGX, which can take python code. To pull weights on that platform (where I route the models to save I use this bit of code:
net = caffe.Net('../models/bvlc_reference_caffenet/deploy.prototxt',
'../models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel',
caffe.TEST)
params = ['fc6', 'fc7', 'fc8']
fc_params = {pr: (net.params[pr][0].data, net.params[pr][1].data) for pr in params}
for fc in params:
print '{} weights are {} dimensional and biases are {} dimensional'.format(fc, fc_params[fc][0].shape, fc_params[fc][1].shape)
Good Luck!

Related

How to include an array of weights to adjust importance of observed data in sm.tsa.UnobservedComponents?

I have used the following 5 lines to achieve a kalman filter with your work for a smoothed pricing model, and it worked great.
mod = sm.tsa.UnobservedComponents(obs, 'local level')
lm = sm.OLS(obs, xlm, missing='drop').fit()
obs_noise = abs(lm.resid).mean()
params = [obs_noise, obs_noise / obs_noise_level]
mod_filter, mod_smooth = mod.filter(params), mod.smooth(params)
However currently I would like to adjust the filtering smoothness at certain time, for example, when unemployment rate or interest rate made a big surge, I would like to make the output (Kalman filtered/smoothed) value closer to the observed value, while in most other time I will keep the what it is from the model. So, I have created an array, while a few items greater than 1, and the others will be exactly 1.
e.g.: ir_coeff = np.array([1,1,1,1,1.345,1.23,1.78,1,1,1])
What could be the best approach to achieve this? Thank you a lot in advance.
I have tried to include it in the output file with a dot product operation, however it is not reasonable to do this.

What's difference RobertaModel, RobertaSequenceClassification (hugging face)

I try to use hugging face transformers api.
As I import library , I have some questions. If anyone who know the answer, please tell me your knowledge.
transformers library have several models that are trained. transformers provide not only bare model like 'BertModel, RobertaModel, ... but also convenient heads like 'ModelForMultipleChoice' , 'ModelForSequenceClassification', 'ModelForTokenClassification' , ModelForQuestionAnswering.
I wonder what's difference between bare model adding new linear transformation myself and modelforsequenceclassification.
what's different custom model (pretrained model with random intialized linear) and transformers modelforsequenceclassification.
is ModelforSequenceClassification trained from glue data?
I look forward to someone's reply Thanks.

I think it's easiest to understand if we have a look at the actual implementation, where I randomly chose RobertaModel and RobertaForSequenceClassification as an example. However, the conclusion is valid for all other models, too.
You can find the implementation for RobertaForSequenceClassification here, which looks roughly like this:
class RobertaForSequenceClassification(RobertaPreTrainedModel):
authorized_missing_keys = [r"position_ids"]
def __init__(self, config):
super().__init__(config)
self.num_labels = config.num_labels
self.roberta = RobertaModel(config, add_pooling_layer=False)
self.classifier = RobertaClassificationHead(config)
self.init_weights()
[...]
def forward([...]):
[...]
As we can see, there is no indication about the pretraining here, and it simply adds another linear layer on top (the implementation of the RobertaClassificationHead can be found a bit further down, namely here):
class RobertaClassificationHead(nn.Module):
"""Head for sentence-level classification tasks."""
def __init__(self, config):
super().__init__()
self.dense = nn.Linear(config.hidden_size, config.hidden_size)
self.dropout = nn.Dropout(config.hidden_dropout_prob)
self.out_proj = nn.Linear(config.hidden_size, config.num_labels)
def forward(self, features, **kwargs):
x = features[:, 0, :] # take <s> token (equiv. to [CLS])
x = self.dropout(x)
x = self.dense(x)
x = torch.tanh(x)
x = self.dropout(x)
x = self.out_proj(x)
return x
So, to answer your question: These models come without any pretrained additional layers on top, and you could easily implement them yourself*.
Now for the asterisk: While it could be easy to wrap this yourself, also note that it is an inherited class RobertaPreTrainedModel. This has several advantages, the most important one being a consistent design between different implementations (sequence classification model, sequence tagging model, etc.). Further, there are some neat functionalities that they are providing, like the forward call including extensive parameters (padding, masking, attention output, ...), which would cost quite some time to implement.
Last but not least, there are existing trained models based on these specific implementations, which you can search for on the Huggingface Model Hub. There, you might find models that are fine-tuned on a sequence classification task (e.g., this one), and then directly load its weights in a RobertaForSequenceClassification model. If you had your own implementation of a sequence classification model, loading and aligning these pre-trained weights would be incredibly more complicated.
I hope this answers your main concern, but feel free to elaborate (either as comment or new question) on any points that have not been addressed!

What meta learners are used by h2o.automl() to build the ensembles?

I am wondering what meta learners are used by h2o.automl() to build the ensembles. So far all the ensembles I've seen were GLMs. Is it because h2o.automl() uses only glm as the meta learner or due to the limited number of base models (25 -50 with my setting), glm is always the best choice?
Thank you.

H2OAutoML uses GLM as a default metalearner algo, and we're not currently trying multiple metalearners to find the best one (this may change in future releases).
For now, you can train a different ensemble using the autoML models as base models:
aml = H2OAutoML(project_name="my_aml",
...,
keep_cross_validation_predictions=True) # important if you want to stack the models later
aml.train(...)
# train another ensemble using GBM as algo metalearner
lb = aml.leaderboard
base_models = [m for m in [lb[i,0] for i in range(lb.nrows)]
if 'StackedEnsemble' not in m]
se = h2o.estimators.H2OStackedEnsembleEstimator(
base_models=base_models,
metalearner_algorithm='gbm',
...
)

Clustering using Representatives (CURE)

I need a numerical example which demonstrates the working of clustering using CURE algorithm.
https://www.cs.ucsb.edu/~veronika/MAE/summary_CURE_01guha.pdf

The pyclustering library has a number of clustering algorithims with examples, and example code on their Github. Here is a link the CURE example.
Googling Cure algorithim example also came up with a fair bit.
Hopefully that helps!

Using pyclustering library you can extract information about representatives points and means using corresponding methods (link to CURE pyclustering generated documentation):
# create instance of the algorithm
cure_instance = cure(<algorithm parameters>);
# start processing
cure_instance.process();
# get allocated clusteres
clusters = cure_instance.get_clusters();
# get representative points
representative = cure_instance.get_representors();
Also you can modify source code of the CURE algorithm to display changes after each step, for example, print them to console or even visualize. Here is an example how to modify code to display changes on each step clustering (after line 219) where star means representative point, small points - points itself and big points - means:
# New cluster and updated clusters should relocated in queue
self.__insert_cluster(merged_cluster);
for item in cluster_relocation_requests:
self.__relocate_cluster(item);
#
# ADD FOLLOWING PEACE OF CODE TO DISPLAY CHANGES ON EACH STEP
#
temp_clusters = [ cure_cluster_unit.indexes for cure_cluster_unit in self.__queue ];
temp_representors = [ cure_cluster_unit.rep for cure_cluster_unit in self.__queue ];
temp_means = [ cure_cluster_unit.mean for cure_cluster_unit in self.__queue ];
visualizer = cluster_visualizer();
visualizer.append_clusters(temp_clusters, self.__pointer_data);
for cluster_index in range(len(temp_clusters)):
visualizer.append_cluster_attribute(0, cluster_index, temp_representors[cluster_index], '*', 7);
visualizer.append_cluster_attribute(0, cluster_index, [ temp_means[cluster_index] ], 'o');
visualizer.show();
You will see sequence of images, something like that:
Thus, you can display any information that you need.
Also I would like to add that you can use C++ implementation of the algorithm for visualization (that is also part of pyclustering): https://github.com/annoviko/pyclustering/blob/master/ccore/src/cluster/cure.cpp

ROC on multiple test sets in h2o (python)

I had a use-case that I thought was really simple but couldn't find a way to do it with h2o. I thought you might know.
I want to train my model once, and then evaluate its ROC on a few different test sets (e.g. a validation set and a test set, though in reality I have more than 2) without having to retrain the model. The way I know to do it now requires retraining the model each time:
train, valid, test = fr.split_frame([0.2, 0.25], seed=1234)
rf_v1 = H2ORandomForestEstimator( ... )
rf_v1.train(features, var_y, training_frame=train, validation_frame=valid)
roc = rf_v1.roc(valid=1)
rf_v1.train(features, var_y, training_frame=train, validation_frame=test) # training again with the same training set - can I avoid this?
roc2 = rf_v1.roc(valid=1)
I can also use model_performance(), which gives me some metrics on an arbitrary test set without retraining, but not the ROC. Is there a way to get the ROC out of the H2OModelMetrics object?
Thanks!

You can use the h2o flow to inspect the model performance. Simply go to: http://localhost:54321/flow/index.html (if you changed the default port change it in the link); type "getModel "rf_v1"" in a cell and it will show you all the measurements of the model in multiple cells in the flow. It's quite handy.
If you are using Python, you can find the performance in your IDE like this:
rf_perf1 = rf_v1.model_performance(test)
and then print the ROC like this:
print (rf_perf1.auc())

Yes, indirectly. Get the TPRs and FPRs from the H2OModelMetrics object:
out = rf_v1.model_performance(test)
fprs = out.fprs
tprs = out.tprs
roc = zip(fprs, tprs)
(By the way, my H2ORandomForestEstimator object does not seem to have an roc() method at all, so I'm not 100% sure that this output is in the exact same format. I'm using h2o version 3.10.4.7.)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Updating Weights from Caffe and DIGITS - nvidia-digits

I've built a model using DIGITS by Nvidia, but when I try to run it using caffe, I don't know where the Weights are. Any idea how I'd find this. I have the architecture because that is provided right on the output model screen.

Related

How to include an array of weights to adjust importance of observed data in sm.tsa.UnobservedComponents?

What's difference RobertaModel, RobertaSequenceClassification (hugging face)

What meta learners are used by h2o.automl() to build the ensembles?

Clustering using Representatives (CURE)

ROC on multiple test sets in h2o (python)

Categories

Resources